* [PATCH v3 1/7] riscv: ftrace: support fastcc in Clang for WITH_ARGS
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2024-12-03 12:05 ` Björn Töpel
2024-11-27 17:29 ` [PATCH v3 2/7] riscv: ftrace: align patchable functions to 4 Byte boundary Andy Chiu
` (8 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mark Rutland, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Nathan Chancellor, Nick Desaulniers,
Bill Wendling, Justin Stitt
Cc: linux-kernel, linux-trace-kernel, linux-riscv, llvm, bjorn,
puranjay12, alexghiti, yongxuan.wang, greentime.hu, nick.hu,
nylon.chen, tommy.wu, eric.lin, viccent.chen, zong.li,
samuel.holland
From: Andy Chiu <andy.chiu@sifive.com>
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.
- [1]: https://reviews.llvm.org/D68559
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
arch/riscv/include/asm/ftrace.h | 7 +++++++
arch/riscv/kernel/asm-offsets.c | 7 +++++++
arch/riscv/kernel/mcount-dyn.S | 16 ++++++++++++++--
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 2cddd79ff21b..4ca7ce7f34d7 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -143,6 +143,13 @@ struct ftrace_regs {
unsigned long a5;
unsigned long a6;
unsigned long a7;
+#ifdef CONFIG_CC_IS_CLANG
+ unsigned long t2;
+ unsigned long t3;
+ unsigned long t4;
+ unsigned long t5;
+ unsigned long t6;
+#endif
};
};
};
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index e94180ba432f..59789dfb2d5d 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -504,6 +504,13 @@ void asm_offsets(void)
DEFINE(FREGS_SP, offsetof(struct ftrace_regs, sp));
DEFINE(FREGS_S0, offsetof(struct ftrace_regs, s0));
DEFINE(FREGS_T1, offsetof(struct ftrace_regs, t1));
+#ifdef CONFIG_CC_IS_CLANG
+ DEFINE(FREGS_T2, offsetof(struct ftrace_regs, t2));
+ DEFINE(FREGS_T3, offsetof(struct ftrace_regs, t3));
+ DEFINE(FREGS_T4, offsetof(struct ftrace_regs, t4));
+ DEFINE(FREGS_T5, offsetof(struct ftrace_regs, t5));
+ DEFINE(FREGS_T6, offsetof(struct ftrace_regs, t6));
+#endif
DEFINE(FREGS_A0, offsetof(struct ftrace_regs, a0));
DEFINE(FREGS_A1, offsetof(struct ftrace_regs, a1));
DEFINE(FREGS_A2, offsetof(struct ftrace_regs, a2));
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index 745dd4c4a69c..e988bd26b28b 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -96,7 +96,13 @@
REG_S x8, FREGS_S0(sp)
#endif
REG_S x6, FREGS_T1(sp)
-
+#ifdef CONFIG_CC_IS_CLANG
+ REG_S x7, FREGS_T2(sp)
+ REG_S x28, FREGS_T3(sp)
+ REG_S x29, FREGS_T4(sp)
+ REG_S x30, FREGS_T5(sp)
+ REG_S x31, FREGS_T6(sp)
+#endif
// save the arguments
REG_S x10, FREGS_A0(sp)
REG_S x11, FREGS_A1(sp)
@@ -115,7 +121,13 @@
REG_L x8, FREGS_S0(sp)
#endif
REG_L x6, FREGS_T1(sp)
-
+#ifdef CONFIG_CC_IS_CLANG
+ REG_L x7, FREGS_T2(sp)
+ REG_L x28, FREGS_T3(sp)
+ REG_L x29, FREGS_T4(sp)
+ REG_L x30, FREGS_T5(sp)
+ REG_L x31, FREGS_T6(sp)
+#endif
// restore the arguments
REG_L x10, FREGS_A0(sp)
REG_L x11, FREGS_A1(sp)
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v3 1/7] riscv: ftrace: support fastcc in Clang for WITH_ARGS
2024-11-27 17:29 ` [PATCH v3 1/7] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
@ 2024-12-03 12:05 ` Björn Töpel
2024-12-03 14:44 ` Evgenii Shatokhin
0 siblings, 1 reply; 32+ messages in thread
From: Björn Töpel @ 2024-12-03 12:05 UTC (permalink / raw)
To: Andy Chiu, Steven Rostedt, Masami Hiramatsu, Mark Rutland,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt
Cc: linux-kernel, linux-trace-kernel, linux-riscv, llvm, bjorn,
puranjay12, alexghiti, yongxuan.wang, greentime.hu, nick.hu,
nylon.chen, tommy.wu, eric.lin, viccent.chen, zong.li,
samuel.holland
Andy Chiu <andybnac@gmail.com> writes:
> From: Andy Chiu <andy.chiu@sifive.com>
>
> Some caller-saved registers which are not defined as function arguments
> in the ABI can still be passed as arguments when the kernel is compiled
> with Clang. As a result, we must save and restore those registers to
> prevent ftrace from clobbering them.
>
> - [1]: https://reviews.llvm.org/D68559
>
> Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
> Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
> Acked-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Fixes tag?
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Re: [PATCH v3 1/7] riscv: ftrace: support fastcc in Clang for WITH_ARGS
2024-12-03 12:05 ` Björn Töpel
@ 2024-12-03 14:44 ` Evgenii Shatokhin
0 siblings, 0 replies; 32+ messages in thread
From: Evgenii Shatokhin @ 2024-12-03 14:44 UTC (permalink / raw)
To: Björn Töpel
Cc: Bill Wendling, Justin Stitt, Nick Desaulniers, Nathan Chancellor,
Albert Ou, Palmer Dabbelt, Paul Walmsley, Mark Rutland,
Masami Hiramatsu, Steven Rostedt, Andy Chiu, linux-kernel,
linux-trace-kernel, linux-riscv, llvm, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
Hi,
On 03.12.2024 15:05, Björn Töpel wrote:
>
> Andy Chiu <andybnac@gmail.com> writes:
>
>> From: Andy Chiu <andy.chiu@sifive.com>
>>
>> Some caller-saved registers which are not defined as function arguments
>> in the ABI can still be passed as arguments when the kernel is compiled
>> with Clang. As a result, we must save and restore those registers to
>> prevent ftrace from clobbering them.
>>
>> - [1]: https://reviews.llvm.org/D68559
>>
>> Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
>> Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
>> Acked-by: Nathan Chancellor <nathan@kernel.org>
>> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
>
> Fixes tag?
As far as I understand it, Ftrace for RISC-V has had this problem since
support for FTRACE_WITH_REGS was added. FTRACE_WITH_ARGS inherited it.
So, it should probably be as follows:
Fixes: aea4c671fb98 ("riscv/ftrace: Add DYNAMIC_FTRACE_WITH_REGS support")
It is more of a workaround rather than a fix though, because it is still
undecided where the problem is, in the kernel or in LLVM/clang. That
discussion went nowhere, unfortunately, so it is better to use a
workaround and move on, IMO.
>
> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
>
Regards,
Evgenii
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 2/7] riscv: ftrace: align patchable functions to 4 Byte boundary
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
2024-11-27 17:29 ` [PATCH v3 1/7] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2024-11-27 17:29 ` [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
` (7 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: linux-riscv, linux-kernel, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
From: Andy Chiu <andy.chiu@sifive.com>
We are changing ftrace code patching in order to remove dependency from
stop_machine() and enable kernel preemption. This requires us to align
functions entry at a 4-B align address.
However, -falign-functions on older versions of GCC alone was not strong
enoungh to align all functions. In fact, cold functions are not aligned
after turning on optimizations. We consider this is a bug in GCC and
turn off guess-branch-probility as a workaround to align all functions.
GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
The option -fmin-function-alignment is able to align all functions
properly on newer versions of gcc. So, we add a cc-option to test if
the toolchain supports it.
Suggested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
---
arch/riscv/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 5bdda86ada37..75a5ebde4427 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -142,6 +142,7 @@ config RISCV
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE)
+ select FUNCTION_ALIGNMENT_4B if HAVE_DYNAMIC_FTRACE && RISCV_ISA_C
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_ARGS if HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
@@ -226,6 +227,7 @@ config CLANG_SUPPORTS_DYNAMIC_FTRACE
config GCC_SUPPORTS_DYNAMIC_FTRACE
def_bool CC_IS_GCC
depends on $(cc-option,-fpatchable-function-entry=8)
+ depends on CC_HAS_MIN_FUNCTION_ALIGNMENT || !RISCV_ISA_C
config HAVE_SHADOW_CALL_STACK
def_bool $(cc-option,-fsanitize=shadow-call-stack)
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
2024-11-27 17:29 ` [PATCH v3 1/7] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
2024-11-27 17:29 ` [PATCH v3 2/7] riscv: ftrace: align patchable functions to 4 Byte boundary Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2024-12-01 15:31 ` Evgenii Shatokhin
2024-12-06 10:02 ` Björn Töpel
2024-11-27 17:29 ` [PATCH v3 4/7] riscv: ftrace: do not use stop_machine to update code Andy Chiu
` (6 subsequent siblings)
9 siblings, 2 replies; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mark Rutland, Paul Walmsley,
Palmer Dabbelt, Albert Ou
Cc: linux-kernel, linux-trace-kernel, linux-riscv, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
From: Andy Chiu <andy.chiu@sifive.com>
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.
Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.
The ordering of reading/updating the targert address should be guarded
by generic ftrace code, where it sends smp_rmb ipi.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
arch/riscv/include/asm/ftrace.h | 4 ++
arch/riscv/kernel/ftrace.c | 80 +++++++++++++++++++++------------
arch/riscv/kernel/mcount-dyn.S | 9 ++--
3 files changed, 62 insertions(+), 31 deletions(-)
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 4ca7ce7f34d7..36734d285aad 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -80,6 +80,7 @@ struct dyn_arch_ftrace {
#define JALR_T0 (0x000282e7)
#define AUIPC_T0 (0x00000297)
#define NOP4 (0x00000013)
+#define JALR_RANGE (JALR_SIGN_MASK - 1)
#define to_jalr_t0(offset) \
(((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
@@ -117,6 +118,9 @@ do { \
* Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
*/
#define MCOUNT_INSN_SIZE 8
+#define MCOUNT_AUIPC_SIZE 4
+#define MCOUNT_JALR_SIZE 4
+#define MCOUNT_NOP4_SIZE 4
#ifndef __ASSEMBLY__
struct dyn_ftrace;
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 4b95c574fd04..5ebe412280ef 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long hook_pos,
return 0;
}
-static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
- bool enable, bool ra)
+static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, bool validate)
{
unsigned int call[2];
- unsigned int nops[2] = {NOP4, NOP4};
+ unsigned int replaced[2];
+
+ make_call_t0(hook_pos, target, call);
- if (ra)
- make_call_ra(hook_pos, target, call);
- else
- make_call_t0(hook_pos, target, call);
+ if (validate) {
+ /*
+ * Read the text we want to modify;
+ * return must be -EFAULT on read error
+ */
+ if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
+ MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ if (replaced[0] != call[0]) {
+ pr_err("%p: expected (%08x) but got (%08x)\n",
+ (void *)hook_pos, call[0], replaced[0]);
+ return -EINVAL;
+ }
+ }
- /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
- if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
+ /* Replace the jalr at once. Return -EPERM on write error. */
+ if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
return -EPERM;
return 0;
}
-int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+static int __ftrace_modify_call_site(ftrace_func_t *hook_pos, ftrace_func_t target, bool enable)
{
- unsigned int call[2];
+ ftrace_func_t call = target;
+ ftrace_func_t nops = &ftrace_stub;
- make_call_t0(rec->ip, addr, call);
-
- if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
- return -EPERM;
+ WRITE_ONCE(*hook_pos, enable ? call : nops);
return 0;
}
+int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
+{
+ unsigned long distance, orig_addr;
+
+ orig_addr = (unsigned long)&ftrace_caller;
+ distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
+ if (distance > JALR_RANGE)
+ return -EINVAL;
+
+ return __ftrace_modify_call(rec->ip, addr, false);
+}
+
int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
unsigned long addr)
{
- unsigned int nops[2] = {NOP4, NOP4};
+ unsigned int nops[1] = {NOP4};
- if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
+ if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE), nops, MCOUNT_NOP4_SIZE))
return -EPERM;
return 0;
@@ -114,21 +136,23 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
*/
int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
{
+ unsigned int nops[2];
int out;
+ make_call_t0(rec->ip, &ftrace_caller, nops);
+ nops[1] = NOP4;
+
mutex_lock(&text_mutex);
- out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
+ out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
mutex_unlock(&text_mutex);
return out;
}
+ftrace_func_t ftrace_call_dest = ftrace_stub;
int ftrace_update_ftrace_func(ftrace_func_t func)
{
- int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
- (unsigned long)func, true, true);
-
- return ret;
+ return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
}
struct ftrace_modify_param {
@@ -182,7 +206,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
if (ret)
return ret;
- return __ftrace_modify_call(caller, addr, true, false);
+ return __ftrace_modify_call(caller, addr, true);
}
#endif
@@ -217,17 +241,17 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
}
#else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
-extern void ftrace_graph_call(void);
+ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
int ftrace_enable_ftrace_graph_caller(void)
{
- return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
- (unsigned long)&prepare_ftrace_return, true, true);
+ return __ftrace_modify_call_site(&ftrace_graph_call_dest,
+ &prepare_ftrace_return, true);
}
int ftrace_disable_ftrace_graph_caller(void)
{
- return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
- (unsigned long)&prepare_ftrace_return, false, true);
+ return __ftrace_modify_call_site(&ftrace_graph_call_dest,
+ &prepare_ftrace_return, false);
}
#endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
#endif /* CONFIG_DYNAMIC_FTRACE */
diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
index e988bd26b28b..bc06e8ab81cf 100644
--- a/arch/riscv/kernel/mcount-dyn.S
+++ b/arch/riscv/kernel/mcount-dyn.S
@@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
mv a3, sp
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
- call ftrace_stub
+ REG_L ra, ftrace_call_dest
+ jalr 0(ra)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
addi a0, sp, ABI_RA
@@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
mv a2, s0
#endif
SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
- call ftrace_stub
+ REG_L ra, ftrace_graph_call_dest
+ jalr 0(ra)
#endif
RESTORE_ABI
jr t0
@@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
PREPARE_ARGS
SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
- call ftrace_stub
+ REG_L ra, ftrace_call_dest
+ jalr 0(ra)
RESTORE_ABI_REGS
bnez t1, .Ldirect
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching
2024-11-27 17:29 ` [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
@ 2024-12-01 15:31 ` Evgenii Shatokhin
2024-12-02 7:29 ` Evgenii Shatokhin
2024-12-06 10:02 ` Björn Töpel
1 sibling, 1 reply; 32+ messages in thread
From: Evgenii Shatokhin @ 2024-12-01 15:31 UTC (permalink / raw)
To: Andy Chiu
Cc: Albert Ou, Palmer Dabbelt, Paul Walmsley, Mark Rutland,
Masami Hiramatsu, Steven Rostedt, linux-kernel,
linux-trace-kernel, linux-riscv, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland, linux
Hi Andy,
First of all, thank you for working on this series.
On 27.11.2024 20:29, Andy Chiu wrote:
> From: Andy Chiu <andy.chiu@sifive.com>
>
> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> instruction fetch can break down to 4 byte at a time, it is impossible
> to update two instructions without a race. In order to mitigate it, we
> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> patching can change NOP4 to JALR to eable/disable ftrcae from a
> function. This limits the reach of each ftrace entry to +-2KB displacing
> from ftrace_caller.
>
> Starting from the trampoline, we add a level of indirection for it to
> reach ftrace caller target. Now, it loads the target address from a
> memory location, then perform the jump. This enable the kernel to update
> the target atomically.
>
> The ordering of reading/updating the targert address should be guarded
> by generic ftrace code, where it sends smp_rmb ipi.
>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> arch/riscv/include/asm/ftrace.h | 4 ++
> arch/riscv/kernel/ftrace.c | 80 +++++++++++++++++++++------------
> arch/riscv/kernel/mcount-dyn.S | 9 ++--
> 3 files changed, 62 insertions(+), 31 deletions(-)
>
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index 4ca7ce7f34d7..36734d285aad 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -80,6 +80,7 @@ struct dyn_arch_ftrace {
> #define JALR_T0 (0x000282e7)
> #define AUIPC_T0 (0x00000297)
> #define NOP4 (0x00000013)
> +#define JALR_RANGE (JALR_SIGN_MASK - 1)
>
> #define to_jalr_t0(offset) \
> (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> @@ -117,6 +118,9 @@ do { \
> * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> */
> #define MCOUNT_INSN_SIZE 8
> +#define MCOUNT_AUIPC_SIZE 4
> +#define MCOUNT_JALR_SIZE 4
> +#define MCOUNT_NOP4_SIZE 4
>
> #ifndef __ASSEMBLY__
> struct dyn_ftrace;
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 4b95c574fd04..5ebe412280ef 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long hook_pos,
> return 0;
> }
>
> -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> - bool enable, bool ra)
> +static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, bool validate)
> {
> unsigned int call[2];
> - unsigned int nops[2] = {NOP4, NOP4};
> + unsigned int replaced[2];
> +
> + make_call_t0(hook_pos, target, call);
>
> - if (ra)
> - make_call_ra(hook_pos, target, call);
> - else
> - make_call_t0(hook_pos, target, call);
> + if (validate) {
> + /*
> + * Read the text we want to modify;
> + * return must be -EFAULT on read error
> + */
> + if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> + MCOUNT_INSN_SIZE))
> + return -EFAULT;
> +
> + if (replaced[0] != call[0]) {
> + pr_err("%p: expected (%08x) but got (%08x)\n",
> + (void *)hook_pos, call[0], replaced[0]);
> + return -EINVAL;
> + }
> + }
>
> - /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> - if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> + /* Replace the jalr at once. Return -EPERM on write error. */
> + if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
> return -EPERM;
>
> return 0;
> }
>
> -int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> +static int __ftrace_modify_call_site(ftrace_func_t *hook_pos, ftrace_func_t target, bool enable)
> {
> - unsigned int call[2];
> + ftrace_func_t call = target;
> + ftrace_func_t nops = &ftrace_stub;
>
> - make_call_t0(rec->ip, addr, call);
> -
> - if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> - return -EPERM;
> + WRITE_ONCE(*hook_pos, enable ? call : nops);
>
> return 0;
> }
>
> +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> +{
> + unsigned long distance, orig_addr;
> +
> + orig_addr = (unsigned long)&ftrace_caller;
> + distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> + if (distance > JALR_RANGE)
> + return -EINVAL;
If I understand it correctly, it is not the range itself that matters
here, but rather, that AUIPC instruction remains the same for the
address of ftrace_caller and for the new addr.
For the displacements like 0xfabcd000 and 0xfabccf00, for example, the
distance is 0x100, which is within JALR range. However, the higher 20
bits differ, so the AUIPC instructions will differ too.
__ftrace_modify_call() would catch this though ("if (replaced[0] !=
call[0]) ...").
I'd suggest checking the higher 20 bits explicitly instead, something
like this:
if ((orig_addr & AUIPC_OFFSET_MASK) != (addr & AUIPC_OFFSET_MASK))
return -EINVAL;
What do you think?
> +
> + return __ftrace_modify_call(rec->ip, addr, false);
> +}
> +
> int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> unsigned long addr)
> {
> - unsigned int nops[2] = {NOP4, NOP4};
> + unsigned int nops[1] = {NOP4};
>
> - if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> + if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE), nops, MCOUNT_NOP4_SIZE))
> return -EPERM;
>
> return 0;
> @@ -114,21 +136,23 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> */
> int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> {
> + unsigned int nops[2];
> int out;
>
> + make_call_t0(rec->ip, &ftrace_caller, nops);
> + nops[1] = NOP4;
> +
> mutex_lock(&text_mutex);
> - out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> + out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
> mutex_unlock(&text_mutex);
>
> return out;
> }
>
> +ftrace_func_t ftrace_call_dest = ftrace_stub;
> int ftrace_update_ftrace_func(ftrace_func_t func)
> {
> - int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> - (unsigned long)func, true, true);
> -
> - return ret;
> + return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
> }
>
> struct ftrace_modify_param {
> @@ -182,7 +206,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> if (ret)
> return ret;
>
> - return __ftrace_modify_call(caller, addr, true, false);
> + return __ftrace_modify_call(caller, addr, true);
> }
> #endif
>
> @@ -217,17 +241,17 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
> prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
> }
> #else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> -extern void ftrace_graph_call(void);
> +ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
> int ftrace_enable_ftrace_graph_caller(void)
> {
> - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> - (unsigned long)&prepare_ftrace_return, true, true);
> + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> + &prepare_ftrace_return, true);
> }
>
> int ftrace_disable_ftrace_graph_caller(void)
> {
> - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> - (unsigned long)&prepare_ftrace_return, false, true);
> + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> + &prepare_ftrace_return, false);
> }
> #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> #endif /* CONFIG_DYNAMIC_FTRACE */
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index e988bd26b28b..bc06e8ab81cf 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
> mv a3, sp
>
> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> - call ftrace_stub
> + REG_L ra, ftrace_call_dest
> + jalr 0(ra)
>
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> addi a0, sp, ABI_RA
> @@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> mv a2, s0
> #endif
> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
> - call ftrace_stub
> + REG_L ra, ftrace_graph_call_dest
> + jalr 0(ra)
> #endif
> RESTORE_ABI
> jr t0
> @@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
> PREPARE_ARGS
>
> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> - call ftrace_stub
> + REG_L ra, ftrace_call_dest
> + jalr 0(ra)
>
> RESTORE_ABI_REGS
> bnez t1, .Ldirect
> --
> 2.39.3 (Apple Git-145)
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching
2024-12-01 15:31 ` Evgenii Shatokhin
@ 2024-12-02 7:29 ` Evgenii Shatokhin
0 siblings, 0 replies; 32+ messages in thread
From: Evgenii Shatokhin @ 2024-12-02 7:29 UTC (permalink / raw)
To: Andy Chiu
Cc: Albert Ou, Palmer Dabbelt, Paul Walmsley, Mark Rutland,
Masami Hiramatsu, Steven Rostedt, linux-kernel,
linux-trace-kernel, linux-riscv, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland, linux
On 01.12.2024 18:31, Evgenii Shatokhin wrote:
> Hi Andy,
>
> First of all, thank you for working on this series.
>
> On 27.11.2024 20:29, Andy Chiu wrote:
>> From: Andy Chiu <andy.chiu@sifive.com>
>>
>> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
>> instruction fetch can break down to 4 byte at a time, it is impossible
>> to update two instructions without a race. In order to mitigate it, we
>> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
>> patching can change NOP4 to JALR to eable/disable ftrcae from a
>> function. This limits the reach of each ftrace entry to +-2KB displacing
>> from ftrace_caller.
>>
>> Starting from the trampoline, we add a level of indirection for it to
>> reach ftrace caller target. Now, it loads the target address from a
>> memory location, then perform the jump. This enable the kernel to update
>> the target atomically.
>>
>> The ordering of reading/updating the targert address should be guarded
>> by generic ftrace code, where it sends smp_rmb ipi.
>>
>> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
>> ---
>> arch/riscv/include/asm/ftrace.h | 4 ++
>> arch/riscv/kernel/ftrace.c | 80 +++++++++++++++++++++------------
>> arch/riscv/kernel/mcount-dyn.S | 9 ++--
>> 3 files changed, 62 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/riscv/include/asm/ftrace.h
>> b/arch/riscv/include/asm/ftrace.h
>> index 4ca7ce7f34d7..36734d285aad 100644
>> --- a/arch/riscv/include/asm/ftrace.h
>> +++ b/arch/riscv/include/asm/ftrace.h
>> @@ -80,6 +80,7 @@ struct dyn_arch_ftrace {
>> #define JALR_T0 (0x000282e7)
>> #define AUIPC_T0 (0x00000297)
>> #define NOP4 (0x00000013)
>> +#define JALR_RANGE (JALR_SIGN_MASK - 1)
>>
>> #define
>> to_jalr_t0(offset) \
>> (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
>> @@ -117,6 +118,9 @@ do
>> { \
>> * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes
>> here.
>> */
>> #define MCOUNT_INSN_SIZE 8
>> +#define MCOUNT_AUIPC_SIZE 4
>> +#define MCOUNT_JALR_SIZE 4
>> +#define MCOUNT_NOP4_SIZE 4
>>
>> #ifndef __ASSEMBLY__
>> struct dyn_ftrace;
>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>> index 4b95c574fd04..5ebe412280ef 100644
>> --- a/arch/riscv/kernel/ftrace.c
>> +++ b/arch/riscv/kernel/ftrace.c
>> @@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long
>> hook_pos,
>> return 0;
>> }
>>
>> -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long
>> target,
>> - bool enable, bool ra)
>> +static int __ftrace_modify_call(unsigned long hook_pos, unsigned long
>> target, bool validate)
>> {
>> unsigned int call[2];
>> - unsigned int nops[2] = {NOP4, NOP4};
>> + unsigned int replaced[2];
>> +
>> + make_call_t0(hook_pos, target, call);
>>
>> - if (ra)
>> - make_call_ra(hook_pos, target, call);
>> - else
>> - make_call_t0(hook_pos, target, call);
>> + if (validate) {
>> + /*
>> + * Read the text we want to modify;
>> + * return must be -EFAULT on read error
>> + */
>> + if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
>> + MCOUNT_INSN_SIZE))
>> + return -EFAULT;
>> +
>> + if (replaced[0] != call[0]) {
>> + pr_err("%p: expected (%08x) but got (%08x)\n",
>> + (void *)hook_pos, call[0], replaced[0]);
>> + return -EINVAL;
>> + }
>> + }
>>
>> - /* Replace the auipc-jalr pair at once. Return -EPERM on write
>> error. */
>> - if (patch_insn_write((void *)hook_pos, enable ? call : nops,
>> MCOUNT_INSN_SIZE))
>> + /* Replace the jalr at once. Return -EPERM on write error. */
>> + if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE),
>> call + 1, MCOUNT_JALR_SIZE))
>> return -EPERM;
>>
>> return 0;
>> }
>>
>> -int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>> +static int __ftrace_modify_call_site(ftrace_func_t *hook_pos,
>> ftrace_func_t target, bool enable)
>> {
>> - unsigned int call[2];
>> + ftrace_func_t call = target;
>> + ftrace_func_t nops = &ftrace_stub;
>>
>> - make_call_t0(rec->ip, addr, call);
>> -
>> - if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
>> - return -EPERM;
>> + WRITE_ONCE(*hook_pos, enable ? call : nops);
>>
>> return 0;
>> }
>>
>> +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>> +{
>> + unsigned long distance, orig_addr;
>> +
>> + orig_addr = (unsigned long)&ftrace_caller;
>> + distance = addr > orig_addr ? addr - orig_addr : orig_addr -
>> addr;
>> + if (distance > JALR_RANGE)
>> + return -EINVAL;
>
> If I understand it correctly, it is not the range itself that matters
> here, but rather, that AUIPC instruction remains the same for the
> address of ftrace_caller and for the new addr.
>
> For the displacements like 0xfabcd000 and 0xfabccf00, for example, the
> distance is 0x100, which is within JALR range. However, the higher 20
> bits differ, so the AUIPC instructions will differ too.
> __ftrace_modify_call() would catch this though ("if (replaced[0] !=
> call[0]) ...").
>
> I'd suggest checking the higher 20 bits explicitly instead, something
> like this:
>
> if ((orig_addr & AUIPC_OFFSET_MASK) != (addr & AUIPC_OFFSET_MASK))
> return -EINVAL;
>
> What do you think?
My bad, the offsets rather than the addresses should be checked.
Something like this:
-----------
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 57a6558e212e..a619b8607738 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -96,11 +96,13 @@ static int __ftrace_modify_call_site(ftrace_func_t
*hook_pos, ftrace_func_t targ
int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
{
- unsigned long distance, orig_addr;
+ unsigned long orig_addr, orig_offset_upper, new_offset_upper;
orig_addr = (unsigned long)&ftrace_caller;
- distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
- if (distance > JALR_RANGE)
+ orig_offset_upper = (orig_addr - rec->ip) & AUIPC_OFFSET_MASK;
+ new_offset_upper = (addr - rec->ip) & AUIPC_OFFSET_MASK;
+
+ if (orig_offset_upper != new_offset_upper)
return -EINVAL;
return __ftrace_modify_call(rec->ip, addr, false);
-----------
>
>> +
>> + return __ftrace_modify_call(rec->ip, addr, false);
>> +}
>> +
>> int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
>> unsigned long addr)
>> {
>> - unsigned int nops[2] = {NOP4, NOP4};
>> + unsigned int nops[1] = {NOP4};
>>
>> - if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
>> + if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE),
>> nops, MCOUNT_NOP4_SIZE))
>> return -EPERM;
>>
>> return 0;
>> @@ -114,21 +136,23 @@ int ftrace_make_nop(struct module *mod, struct
>> dyn_ftrace *rec,
>> */
>> int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>> {
>> + unsigned int nops[2];
>> int out;
>>
>> + make_call_t0(rec->ip, &ftrace_caller, nops);
>> + nops[1] = NOP4;
>> +
>> mutex_lock(&text_mutex);
>> - out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>> + out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
>> mutex_unlock(&text_mutex);
>>
>> return out;
>> }
>>
>> +ftrace_func_t ftrace_call_dest = ftrace_stub;
>> int ftrace_update_ftrace_func(ftrace_func_t func)
>> {
>> - int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
>> - (unsigned long)func, true, true);
>> -
>> - return ret;
>> + return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
>> }
>>
>> struct ftrace_modify_param {
>> @@ -182,7 +206,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec,
>> unsigned long old_addr,
>> if (ret)
>> return ret;
>>
>> - return __ftrace_modify_call(caller, addr, true, false);
>> + return __ftrace_modify_call(caller, addr, true);
>> }
>> #endif
>>
>> @@ -217,17 +241,17 @@ void ftrace_graph_func(unsigned long ip,
>> unsigned long parent_ip,
>> prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
>> }
>> #else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
>> -extern void ftrace_graph_call(void);
>> +ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
>> int ftrace_enable_ftrace_graph_caller(void)
>> {
>> - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
>> - (unsigned
>> long)&prepare_ftrace_return, true, true);
>> + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
>> + &prepare_ftrace_return, true);
>> }
>>
>> int ftrace_disable_ftrace_graph_caller(void)
>> {
>> - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
>> - (unsigned
>> long)&prepare_ftrace_return, false, true);
>> + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
>> + &prepare_ftrace_return, false);
>> }
>> #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
>> #endif /* CONFIG_DYNAMIC_FTRACE */
>> diff --git a/arch/riscv/kernel/mcount-dyn.S
>> b/arch/riscv/kernel/mcount-dyn.S
>> index e988bd26b28b..bc06e8ab81cf 100644
>> --- a/arch/riscv/kernel/mcount-dyn.S
>> +++ b/arch/riscv/kernel/mcount-dyn.S
>> @@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
>> mv a3, sp
>>
>> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>> - call ftrace_stub
>> + REG_L ra, ftrace_call_dest
>> + jalr 0(ra)
>>
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> addi a0, sp, ABI_RA
>> @@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>> mv a2, s0
>> #endif
>> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
>> - call ftrace_stub
>> + REG_L ra, ftrace_graph_call_dest
>> + jalr 0(ra)
>> #endif
>> RESTORE_ABI
>> jr t0
>> @@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
>> PREPARE_ARGS
>>
>> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>> - call ftrace_stub
>> + REG_L ra, ftrace_call_dest
>> + jalr 0(ra)
>>
>> RESTORE_ABI_REGS
>> bnez t1, .Ldirect
>> --
>> 2.39.3 (Apple Git-145)
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
Regards,
Evgenii
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching
2024-11-27 17:29 ` [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
2024-12-01 15:31 ` Evgenii Shatokhin
@ 2024-12-06 10:02 ` Björn Töpel
2024-12-06 23:35 ` Bagas Sanjaya
2024-12-09 14:57 ` Robbin Ehn
1 sibling, 2 replies; 32+ messages in thread
From: Björn Töpel @ 2024-12-06 10:02 UTC (permalink / raw)
To: Andy Chiu, Steven Rostedt, Masami Hiramatsu, Mark Rutland,
Paul Walmsley, Palmer Dabbelt, Albert Ou, Steven Rostedt,
Robbin Ehn
Cc: linux-kernel, linux-trace-kernel, linux-riscv, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
Adding Robbin for input, who's doing much more crazy text patching in
JVM, than what we do in the kernel. ;-)
Andy Chiu <andybnac@gmail.com> writes:
> From: Andy Chiu <andy.chiu@sifive.com>
>
> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> instruction fetch can break down to 4 byte at a time, it is impossible
> to update two instructions without a race. In order to mitigate it, we
> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> patching can change NOP4 to JALR to eable/disable ftrcae from a
> function. This limits the reach of each ftrace entry to +-2KB displacing
> from ftrace_caller.
>
> Starting from the trampoline, we add a level of indirection for it to
> reach ftrace caller target. Now, it loads the target address from a
> memory location, then perform the jump. This enable the kernel to update
> the target atomically.
>
> The ordering of reading/updating the targert address should be guarded
> by generic ftrace code, where it sends smp_rmb ipi.
Let's say we're tracing "f". Previously w/ stop_machine() it was
something like:
f:
1: nop
nop
...
...
ftrace_caller:
...
auipc a2, function_trace_op
ld a2, function_trace_op(a2)
...
2: auipc ra, ftrace_stub
jalr ftrace_stub(ra)
The text was patched by ftrace in 1 and 2.
...and now:
f:
auipc t0, ftrace_caller
A: nop
...
...
ftrace_caller:
...
auipc a2, function_trace_op
ld a2, function_trace_op(a2)
...
auipc ra, ftrace_call_dest
ld ra, ftrace_call_dest(ra)
jalr ra
The text is only patched in A, and the tracer func is loaded via
ftrace_call_dest.
Today, when we enable trace "f" the following is done by ftrace:
Text patch 2: call ftrace_stub -> call arch_ftrace_ops_list_func
Text patch 1: nop,nop -> call ftrace_caller
store function_trace_op
smp_wmb()
IPI: smp_rmb()
Text patch 2: call arch_ftrace_ops_list_func -> call function_trace_call
Disable, would be something like:
Text patch 2: call function_trace_call -> call arch_ftrace_ops_list_func
Text patch 1: call ftrace_caller -> nop,nop
store function_trace_op
smp_wmb()
IPI: smp_rmb()
Text patch 2: call arch_ftrace_ops_list_func -> call ftrace_stub
Now with this change, enable would be:
store ftrace_call_dest (was: Text patch 2: call ftrace_stub -> call arch_ftrace_ops_list_func)
<<ORDERING>>
Text patch A: nop -> jalr ftrace_caller(t0)
store function_trace_op
smp_wmb()
IPI: smp_rmb()
store ftrace_call_dest (was: Text patch 2: call arch_ftrace_ops_list_func -> call function_trace_call)
Seems like we're missing some data ordering in "<<ORDERING>>", wrt to
the text patching A (The arch specific ftrace_update_ftrace_func())? Or
are we OK with reordering there? Maybe add what's done for
function_trace_op?
[...]
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
> arch/riscv/include/asm/ftrace.h | 4 ++
> arch/riscv/kernel/ftrace.c | 80 +++++++++++++++++++++------------
> arch/riscv/kernel/mcount-dyn.S | 9 ++--
> 3 files changed, 62 insertions(+), 31 deletions(-)
>
> diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> index 4ca7ce7f34d7..36734d285aad 100644
> --- a/arch/riscv/include/asm/ftrace.h
> +++ b/arch/riscv/include/asm/ftrace.h
> @@ -80,6 +80,7 @@ struct dyn_arch_ftrace {
> #define JALR_T0 (0x000282e7)
> #define AUIPC_T0 (0x00000297)
> #define NOP4 (0x00000013)
> +#define JALR_RANGE (JALR_SIGN_MASK - 1)
>
> #define to_jalr_t0(offset) \
> (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> @@ -117,6 +118,9 @@ do { \
> * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> */
> #define MCOUNT_INSN_SIZE 8
> +#define MCOUNT_AUIPC_SIZE 4
> +#define MCOUNT_JALR_SIZE 4
> +#define MCOUNT_NOP4_SIZE 4
Align please.
>
> #ifndef __ASSEMBLY__
> struct dyn_ftrace;
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 4b95c574fd04..5ebe412280ef 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long hook_pos,
> return 0;
> }
>
> -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> - bool enable, bool ra)
> +static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, bool validate)
While we're updating this function; Can we rename hook_pos to something
that makes sense from an ftrace perspective?
> {
> unsigned int call[2];
> - unsigned int nops[2] = {NOP4, NOP4};
> + unsigned int replaced[2];
> +
> + make_call_t0(hook_pos, target, call);
>
> - if (ra)
> - make_call_ra(hook_pos, target, call);
> - else
> - make_call_t0(hook_pos, target, call);
> + if (validate) {
> + /*
> + * Read the text we want to modify;
> + * return must be -EFAULT on read error
> + */
> + if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> + MCOUNT_INSN_SIZE))
Don't wrap this line.
> + return -EFAULT;
> +
> + if (replaced[0] != call[0]) {
> + pr_err("%p: expected (%08x) but got (%08x)\n",
> + (void *)hook_pos, call[0], replaced[0]);
> + return -EINVAL;
> + }
> + }
>
> - /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> - if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> + /* Replace the jalr at once. Return -EPERM on write error. */
> + if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
> return -EPERM;
>
> return 0;
> }
>
> -int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> +static int __ftrace_modify_call_site(ftrace_func_t *hook_pos, ftrace_func_t target, bool enable)
> {
> - unsigned int call[2];
> + ftrace_func_t call = target;
> + ftrace_func_t nops = &ftrace_stub;
Confusing to call nops. This is not nops. This is the ftrace_stub. Also
the __ftrace_modify_call_site is not super clear to me. Maybe just ditch
the enable flag, and have two functions? Or just or inline it?
>
> - make_call_t0(rec->ip, addr, call);
> -
> - if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> - return -EPERM;
> + WRITE_ONCE(*hook_pos, enable ? call : nops);
>
> return 0;
> }
>
> +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> +{
> + unsigned long distance, orig_addr;
> +
> + orig_addr = (unsigned long)&ftrace_caller;
> + distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> + if (distance > JALR_RANGE)
> + return -EINVAL;
> +
> + return __ftrace_modify_call(rec->ip, addr, false);
> +}
> +
> int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> unsigned long addr)
> {
> - unsigned int nops[2] = {NOP4, NOP4};
> + unsigned int nops[1] = {NOP4};
It's just one nop, not nops. No biggie, but why array?
>
> - if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> + if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE), nops, MCOUNT_NOP4_SIZE))
> return -EPERM;
>
> return 0;
> @@ -114,21 +136,23 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> */
> int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> {
> + unsigned int nops[2];
> int out;
>
> + make_call_t0(rec->ip, &ftrace_caller, nops);
> + nops[1] = NOP4;
> +
> mutex_lock(&text_mutex);
> - out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> + out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
> mutex_unlock(&text_mutex);
>
> return out;
> }
>
> +ftrace_func_t ftrace_call_dest = ftrace_stub;
> int ftrace_update_ftrace_func(ftrace_func_t func)
> {
> - int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> - (unsigned long)func, true, true);
> -
> - return ret;
> + return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
> }
>
> struct ftrace_modify_param {
> @@ -182,7 +206,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> if (ret)
> return ret;
>
> - return __ftrace_modify_call(caller, addr, true, false);
> + return __ftrace_modify_call(caller, addr, true);
> }
> #endif
>
> @@ -217,17 +241,17 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
> prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
> }
> #else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> -extern void ftrace_graph_call(void);
> +ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
> int ftrace_enable_ftrace_graph_caller(void)
> {
> - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> - (unsigned long)&prepare_ftrace_return, true, true);
> + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> + &prepare_ftrace_return, true);
> }
>
> int ftrace_disable_ftrace_graph_caller(void)
> {
> - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> - (unsigned long)&prepare_ftrace_return, false, true);
> + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> + &prepare_ftrace_return, false);
> }
> #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> #endif /* CONFIG_DYNAMIC_FTRACE */
> diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> index e988bd26b28b..bc06e8ab81cf 100644
> --- a/arch/riscv/kernel/mcount-dyn.S
> +++ b/arch/riscv/kernel/mcount-dyn.S
> @@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
> mv a3, sp
>
> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> - call ftrace_stub
> + REG_L ra, ftrace_call_dest
> + jalr 0(ra)
>
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> addi a0, sp, ABI_RA
> @@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> mv a2, s0
> #endif
> SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
> - call ftrace_stub
> + REG_L ra, ftrace_graph_call_dest
> + jalr 0(ra)
> #endif
> RESTORE_ABI
> jr t0
> @@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
> PREPARE_ARGS
>
> SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
Not used, please remove.
> - call ftrace_stub
> + REG_L ra, ftrace_call_dest
> + jalr 0(ra)
>
> RESTORE_ABI_REGS
> bnez t1, .Ldirect
> --
> 2.39.3 (Apple Git-145)
Björn
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching
2024-12-06 10:02 ` Björn Töpel
@ 2024-12-06 23:35 ` Bagas Sanjaya
2024-12-09 14:57 ` Robbin Ehn
1 sibling, 0 replies; 32+ messages in thread
From: Bagas Sanjaya @ 2024-12-06 23:35 UTC (permalink / raw)
To: Björn Töpel, Andy Chiu, Steven Rostedt,
Masami Hiramatsu, Mark Rutland, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Robbin Ehn
Cc: linux-kernel, linux-trace-kernel, linux-riscv, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]
On Fri, Dec 06, 2024 at 11:02:29AM +0100, Björn Töpel wrote:
> Adding Robbin for input, who's doing much more crazy text patching in
> JVM, than what we do in the kernel. ;-)
>
> Let's say we're tracing "f". Previously w/ stop_machine() it was
> something like:
>
> f:
> 1: nop
> nop
> ...
> ...
>
> ftrace_caller:
> ...
> auipc a2, function_trace_op
> ld a2, function_trace_op(a2)
> ...
> 2: auipc ra, ftrace_stub
> jalr ftrace_stub(ra)
>
> The text was patched by ftrace in 1 and 2.
>
> ...and now:
> f:
> auipc t0, ftrace_caller
> A: nop
> ...
> ...
>
> ftrace_caller:
> ...
> auipc a2, function_trace_op
> ld a2, function_trace_op(a2)
> ...
> auipc ra, ftrace_call_dest
> ld ra, ftrace_call_dest(ra)
> jalr ra
>
> The text is only patched in A, and the tracer func is loaded via
> ftrace_call_dest.
Previously the operation was no-op, right?
Confused...
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching
2024-12-06 10:02 ` Björn Töpel
2024-12-06 23:35 ` Bagas Sanjaya
@ 2024-12-09 14:57 ` Robbin Ehn
1 sibling, 0 replies; 32+ messages in thread
From: Robbin Ehn @ 2024-12-09 14:57 UTC (permalink / raw)
To: Björn Töpel
Cc: Andy Chiu, Steven Rostedt, Masami Hiramatsu, Mark Rutland,
Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-kernel,
linux-trace-kernel, linux-riscv, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
On Fri, Dec 6, 2024 at 11:02 AM Björn Töpel <bjorn@kernel.org> wrote:
>
> Adding Robbin for input, who's doing much more crazy text patching in
> JVM, than what we do in the kernel. ;-)
>
> Andy Chiu <andybnac@gmail.com> writes:
>
> > From: Andy Chiu <andy.chiu@sifive.com>
> >
> > We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> > instruction fetch can break down to 4 byte at a time, it is impossible
> > to update two instructions without a race. In order to mitigate it, we
> > initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> > patching can change NOP4 to JALR to eable/disable ftrcae from a
> > function. This limits the reach of each ftrace entry to +-2KB displacing
> > from ftrace_caller.
> >
> > Starting from the trampoline, we add a level of indirection for it to
> > reach ftrace caller target. Now, it loads the target address from a
> > memory location, then perform the jump. This enable the kernel to update
> > the target atomically.
> >
> > The ordering of reading/updating the targert address should be guarded
> > by generic ftrace code, where it sends smp_rmb ipi.
>
> Let's say we're tracing "f". Previously w/ stop_machine() it was
> something like:
>
> f:
> 1: nop
> nop
> ...
> ...
>
> ftrace_caller:
> ...
> auipc a2, function_trace_op
> ld a2, function_trace_op(a2)
> ...
> 2: auipc ra, ftrace_stub
> jalr ftrace_stub(ra)
>
> The text was patched by ftrace in 1 and 2.
>
> ...and now:
> f:
> auipc t0, ftrace_caller
> A: nop
> ...
> ...
>
> ftrace_caller:
> ...
> auipc a2, function_trace_op
> ld a2, function_trace_op(a2)
> ...
> auipc ra, ftrace_call_dest
> ld ra, ftrace_call_dest(ra)
> jalr ra
>
> The text is only patched in A, and the tracer func is loaded via
> ftrace_call_dest.
>
> Today, when we enable trace "f" the following is done by ftrace:
> Text patch 2: call ftrace_stub -> call arch_ftrace_ops_list_func
> Text patch 1: nop,nop -> call ftrace_caller
> store function_trace_op
> smp_wmb()
> IPI: smp_rmb()
> Text patch 2: call arch_ftrace_ops_list_func -> call function_trace_call
>
> Disable, would be something like:
> Text patch 2: call function_trace_call -> call arch_ftrace_ops_list_func
> Text patch 1: call ftrace_caller -> nop,nop
> store function_trace_op
> smp_wmb()
> IPI: smp_rmb()
> Text patch 2: call arch_ftrace_ops_list_func -> call ftrace_stub
>
> Now with this change, enable would be:
> store ftrace_call_dest (was: Text patch 2: call ftrace_stub -> call arch_ftrace_ops_list_func)
> <<ORDERING>>
> Text patch A: nop -> jalr ftrace_caller(t0)
> store function_trace_op
> smp_wmb()
> IPI: smp_rmb()
> store ftrace_call_dest (was: Text patch 2: call arch_ftrace_ops_list_func -> call function_trace_call)
>
> Seems like we're missing some data ordering in "<<ORDERING>>", wrt to
> the text patching A (The arch specific ftrace_update_ftrace_func())? Or
> are we OK with reordering there? Maybe add what's done for
> function_trace_op?
>
> [...]
>
Hi, so we allow reordering of the following 3 stores (set via
ftrace_modify_all_code()):
ftrace_call_dest = ftrace_ops_list_func
Instruction patch NOP -> JALR
function_trace_op = set_function_trace_op
<data-ordering>
ftrace_call_dest = ftrace_trace_function
<ins-ordering>
From what I can tell all combinations will be fine as trace OP is not
read and ftrace_call_dest should be ftrace_stub in such reordering
case.
It looks like, as we do this under lock (should be an lockdep assert
in ftrace_modify_all_code for ftrace_lock), we only go from:
n tracers => 0 tracers
0 tracers => n tracers
Meaning we never add and remove tracers in the same update, so this
reordering seems fine.
Otherwise we could pass an OP for an old tracer into a new tracer.
(function_trace_op happens before ftrace_call_dest store)
As the function_trace_op can be concurrently loaded via ftrace_caller
it should thus be stored with WRITE_ONCE for good measure.
> > Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> > ---
> > arch/riscv/include/asm/ftrace.h | 4 ++
> > arch/riscv/kernel/ftrace.c | 80 +++++++++++++++++++++------------
> > arch/riscv/kernel/mcount-dyn.S | 9 ++--
> > 3 files changed, 62 insertions(+), 31 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> > index 4ca7ce7f34d7..36734d285aad 100644
> > --- a/arch/riscv/include/asm/ftrace.h
> > +++ b/arch/riscv/include/asm/ftrace.h
> > @@ -80,6 +80,7 @@ struct dyn_arch_ftrace {
> > #define JALR_T0 (0x000282e7)
> > #define AUIPC_T0 (0x00000297)
> > #define NOP4 (0x00000013)
> > +#define JALR_RANGE (JALR_SIGN_MASK - 1)
> >
> > #define to_jalr_t0(offset) \
> > (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> > @@ -117,6 +118,9 @@ do { \
> > * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> > */
> > #define MCOUNT_INSN_SIZE 8
> > +#define MCOUNT_AUIPC_SIZE 4
> > +#define MCOUNT_JALR_SIZE 4
> > +#define MCOUNT_NOP4_SIZE 4
>
> Align please.
>
> >
> > #ifndef __ASSEMBLY__
> > struct dyn_ftrace;
> > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > index 4b95c574fd04..5ebe412280ef 100644
> > --- a/arch/riscv/kernel/ftrace.c
> > +++ b/arch/riscv/kernel/ftrace.c
> > @@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long hook_pos,
> > return 0;
> > }
> >
> > -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> > - bool enable, bool ra)
> > +static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, bool validate)
>
> While we're updating this function; Can we rename hook_pos to something
> that makes sense from an ftrace perspective?
>
> > {
> > unsigned int call[2];
> > - unsigned int nops[2] = {NOP4, NOP4};
> > + unsigned int replaced[2];
> > +
> > + make_call_t0(hook_pos, target, call);
If you use to_jalr_t0 it's easier to read. (maybe remove make_call_t0).
> >
> > - if (ra)
> > - make_call_ra(hook_pos, target, call);
> > - else
> > - make_call_t0(hook_pos, target, call);
> > + if (validate) {
> > + /*
> > + * Read the text we want to modify;
> > + * return must be -EFAULT on read error
> > + */
Use to_auipc_t0 for validation.
> > + if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> > + MCOUNT_INSN_SIZE))
>
> Don't wrap this line.
>
> > + return -EFAULT;
> > +
> > + if (replaced[0] != call[0]) {
> > + pr_err("%p: expected (%08x) but got (%08x)\n",
> > + (void *)hook_pos, call[0], replaced[0]);
> > + return -EINVAL;
> > + }
> > + }
> >
> > - /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> > - if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> > + /* Replace the jalr at once. Return -EPERM on write error. */
> > + if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
> > return -EPERM;
> >
> > return 0;
> > }
> >
> > -int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> > +static int __ftrace_modify_call_site(ftrace_func_t *hook_pos, ftrace_func_t target, bool enable)
As the bool value enable is hardcoded to true/false I would just have
two functions.
IMHO the name ftrace_modify_call_site() makes little sense, especially
since there is a ftrace_modify_call().
> > {
> > - unsigned int call[2];
> > + ftrace_func_t call = target;
> > + ftrace_func_t nops = &ftrace_stub;
>
> Confusing to call nops. This is not nops. This is the ftrace_stub. Also
> the __ftrace_modify_call_site is not super clear to me. Maybe just ditch
> the enable flag, and have two functions? Or just or inline it?
>
> >
> > - make_call_t0(rec->ip, addr, call);
> > -
> > - if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> > - return -EPERM;
> > + WRITE_ONCE(*hook_pos, enable ? call : nops);
> >
> > return 0;
> > }
> >
> > +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> > +{
> > + unsigned long distance, orig_addr;
> > +
> > + orig_addr = (unsigned long)&ftrace_caller;
> > + distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> > + if (distance > JALR_RANGE)
> > + return -EINVAL;
> > +
> > + return __ftrace_modify_call(rec->ip, addr, false);
> > +}
> > +
> > int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> > unsigned long addr)
> > {
> > - unsigned int nops[2] = {NOP4, NOP4};
> > + unsigned int nops[1] = {NOP4};
>
> It's just one nop, not nops. No biggie, but why array?
>
> >
> > - if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> > + if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE), nops, MCOUNT_NOP4_SIZE))
> > return -EPERM;
> >
> > return 0;
> > @@ -114,21 +136,23 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> > */
> > int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> > {
> > + unsigned int nops[2];
> > int out;
> >
> > + make_call_t0(rec->ip, &ftrace_caller, nops);
> > + nops[1] = NOP4;
Use to_auipc_t0.
> > +
> > mutex_lock(&text_mutex);
> > - out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > + out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
> > mutex_unlock(&text_mutex);
> >
> > return out;
> > }
> >
> > +ftrace_func_t ftrace_call_dest = ftrace_stub;
> > int ftrace_update_ftrace_func(ftrace_func_t func)
> > {
> > - int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> > - (unsigned long)func, true, true);
> > -
> > - return ret;
> > + return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
> > }
> >
> > struct ftrace_modify_param {
> > @@ -182,7 +206,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> > if (ret)
> > return ret;
> >
> > - return __ftrace_modify_call(caller, addr, true, false);
> > + return __ftrace_modify_call(caller, addr, true);
> > }
> > #endif
> >
> > @@ -217,17 +241,17 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
> > prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
> > }
> > #else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> > -extern void ftrace_graph_call(void);
> > +ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
> > int ftrace_enable_ftrace_graph_caller(void)
> > {
> > - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> > - (unsigned long)&prepare_ftrace_return, true, true);
> > + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> > + &prepare_ftrace_return, true);
> > }
> >
> > int ftrace_disable_ftrace_graph_caller(void)
> > {
> > - return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> > - (unsigned long)&prepare_ftrace_return, false, true);
> > + return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> > + &prepare_ftrace_return, false);
> > }
> > #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> > #endif /* CONFIG_DYNAMIC_FTRACE */
> > diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> > index e988bd26b28b..bc06e8ab81cf 100644
> > --- a/arch/riscv/kernel/mcount-dyn.S
> > +++ b/arch/riscv/kernel/mcount-dyn.S
> > @@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
> > mv a3, sp
> >
> > SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> > - call ftrace_stub
> > + REG_L ra, ftrace_call_dest
> > + jalr 0(ra)
I would write these as "jalr ra,0(ra)", as it may not be obvious.
Nice improvement, thanks!
/Robbin
> >
> > #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > addi a0, sp, ABI_RA
> > @@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> > mv a2, s0
> > #endif
> > SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
> > - call ftrace_stub
> > + REG_L ra, ftrace_graph_call_dest
> > + jalr 0(ra)
> > #endif
> > RESTORE_ABI
> > jr t0
> > @@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
> > PREPARE_ARGS
> >
> > SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>
> Not used, please remove.
>
> > - call ftrace_stub
> > + REG_L ra, ftrace_call_dest
> > + jalr 0(ra)
> >
> > RESTORE_ABI_REGS
> > bnez t1, .Ldirect
> > --
> > 2.39.3 (Apple Git-145)
>
>
>
> Björn
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 4/7] riscv: ftrace: do not use stop_machine to update code
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (2 preceding siblings ...)
2024-11-27 17:29 ` [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2024-11-27 17:29 ` [PATCH v3 5/7] riscv: vector: Support calling schedule() for preemptible Vector Andy Chiu
` (5 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu, Mark Rutland, Paul Walmsley,
Palmer Dabbelt, Albert Ou
Cc: linux-kernel, linux-trace-kernel, linux-riscv, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
From: Andy Chiu <andy.chiu@sifive.com>
Now it is safe to remove dependency from stop_machine() for us to patch
code in ftrace.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
arch/riscv/kernel/ftrace.c | 53 +++-----------------------------------
1 file changed, 4 insertions(+), 49 deletions(-)
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 5ebe412280ef..57a6558e212e 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -13,23 +13,13 @@
#include <asm/patch.h>
#ifdef CONFIG_DYNAMIC_FTRACE
-void ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
+void arch_ftrace_update_code(int command)
{
mutex_lock(&text_mutex);
-
- /*
- * The code sequences we use for ftrace can't be patched while the
- * kernel is running, so we need to use stop_machine() to modify them
- * for now. This doesn't play nice with text_mutex, we use this flag
- * to elide the check.
- */
- riscv_patch_in_stop_machine = true;
-}
-
-void ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
-{
- riscv_patch_in_stop_machine = false;
+ command |= FTRACE_MAY_SLEEP;
+ ftrace_modify_all_code(command);
mutex_unlock(&text_mutex);
+ flush_icache_all();
}
static int ftrace_check_current_call(unsigned long hook_pos,
@@ -155,41 +145,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
}
-struct ftrace_modify_param {
- int command;
- atomic_t cpu_count;
-};
-
-static int __ftrace_modify_code(void *data)
-{
- struct ftrace_modify_param *param = data;
-
- if (atomic_inc_return(¶m->cpu_count) == num_online_cpus()) {
- ftrace_modify_all_code(param->command);
- /*
- * Make sure the patching store is effective *before* we
- * increment the counter which releases all waiting CPUs
- * by using the release variant of atomic increment. The
- * release pairs with the call to local_flush_icache_all()
- * on the waiting CPU.
- */
- atomic_inc_return_release(¶m->cpu_count);
- } else {
- while (atomic_read(¶m->cpu_count) <= num_online_cpus())
- cpu_relax();
-
- local_flush_icache_all();
- }
-
- return 0;
-}
-
-void arch_ftrace_update_code(int command)
-{
- struct ftrace_modify_param param = { command, ATOMIC_INIT(0) };
-
- stop_machine(__ftrace_modify_code, ¶m, cpu_online_mask);
-}
#endif
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v3 5/7] riscv: vector: Support calling schedule() for preemptible Vector
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (3 preceding siblings ...)
2024-11-27 17:29 ` [PATCH v3 4/7] riscv: ftrace: do not use stop_machine to update code Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2024-11-27 17:29 ` [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode Andy Chiu
` (4 subsequent siblings)
9 siblings, 0 replies; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: linux-riscv, linux-kernel, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
From: Andy Chiu <andy.chiu@sifive.com>
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
arch/riscv/include/asm/processor.h | 5 +++++
arch/riscv/include/asm/vector.h | 22 +++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 5f56eb9d114a..9c1cc716b891 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -79,6 +79,10 @@ struct pt_regs;
* Thus, the task does not own preempt_v. Any use of Vector will have to
* save preempt_v, if dirty, and fallback to non-preemptible kernel-mode
* Vector.
+ * - bit 29: The thread voluntarily calls schedule() while holding an active
+ * preempt_v. All preempt_v context should be dropped in such case because
+ * V-regs are caller-saved. Only sstatus.VS=ON is persisted across a
+ * schedule() call.
* - bit 30: The in-kernel preempt_v context is saved, and requries to be
* restored when returning to the context that owns the preempt_v.
* - bit 31: The in-kernel preempt_v context is dirty, as signaled by the
@@ -93,6 +97,7 @@ struct pt_regs;
#define RISCV_PREEMPT_V 0x00000100
#define RISCV_PREEMPT_V_DIRTY 0x80000000
#define RISCV_PREEMPT_V_NEED_RESTORE 0x40000000
+#define RISCV_PREEMPT_V_IN_SCHEDULE 0x20000000
/* CPU-specific state of a task */
struct thread_struct {
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index c7c023afbacd..c5b6070db99f 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -76,6 +76,11 @@ static __always_inline void riscv_v_disable(void)
csr_clear(CSR_SSTATUS, SR_VS);
}
+static __always_inline bool riscv_v_is_on(void)
+{
+ return !!(csr_read(CSR_SSTATUS) & SR_VS);
+}
+
static __always_inline void __vstate_csr_save(struct __riscv_v_ext_state *dest)
{
asm volatile (
@@ -244,6 +249,11 @@ static inline void __switch_to_vector(struct task_struct *prev,
struct pt_regs *regs;
if (riscv_preempt_v_started(prev)) {
+ if (riscv_v_is_on()) {
+ WARN_ON(prev->thread.riscv_v_flags & RISCV_V_CTX_DEPTH_MASK);
+ riscv_v_disable();
+ prev->thread.riscv_v_flags |= RISCV_PREEMPT_V_IN_SCHEDULE;
+ }
if (riscv_preempt_v_dirty(prev)) {
__riscv_v_vstate_save(&prev->thread.kernel_vstate,
prev->thread.kernel_vstate.datap);
@@ -254,10 +264,16 @@ static inline void __switch_to_vector(struct task_struct *prev,
riscv_v_vstate_save(&prev->thread.vstate, regs);
}
- if (riscv_preempt_v_started(next))
- riscv_preempt_v_set_restore(next);
- else
+ if (riscv_preempt_v_started(next)) {
+ if (next->thread.riscv_v_flags & RISCV_PREEMPT_V_IN_SCHEDULE) {
+ next->thread.riscv_v_flags &= ~RISCV_PREEMPT_V_IN_SCHEDULE;
+ riscv_v_enable();
+ } else {
+ riscv_preempt_v_set_restore(next);
+ }
+ } else {
riscv_v_vstate_set_restore(next, task_pt_regs(next));
+ }
}
void riscv_v_vstate_ctrl_init(struct task_struct *tsk);
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (4 preceding siblings ...)
2024-11-27 17:29 ` [PATCH v3 5/7] riscv: vector: Support calling schedule() for preemptible Vector Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2025-03-10 19:08 ` Björn Töpel
2024-11-27 17:29 ` [PATCH v3 7/7] riscv: ftrace: support PREEMPT Andy Chiu
` (3 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: Andy Chiu, linux-riscv, linux-kernel, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
RISC-V spec explicitly calls out that a local fence.i is not enough for
the code modification to be visble from a remote hart. In fact, it
states:
To make a store to instruction memory visible to all RISC-V harts, the
writing hart also has to execute a data FENCE before requesting that all
remote RISC-V harts execute a FENCE.I.
Thus, add a fence here to order data writes before making the IPI.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
---
arch/riscv/mm/cacheflush.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index b81672729887..b2e4b81763f8 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -24,7 +24,20 @@ void flush_icache_all(void)
if (num_online_cpus() < 2)
return;
- else if (riscv_use_sbi_for_rfence())
+
+ /*
+ * Make sure all previous writes to the D$ are ordered before making
+ * the IPI. The RISC-V spec states that a hart must execute a data fence
+ * before triggering a remote fence.i in order to make the modification
+ * visable for remote harts.
+ *
+ * IPIs on RISC-V are triggered by MMIO writes to either CLINT or
+ * S-IMSIC, so the fence ensures previous data writes "happen before"
+ * the MMIO.
+ */
+ RISCV_FENCE(w, o);
+
+ if (riscv_use_sbi_for_rfence())
sbi_remote_fence_i(NULL);
else
on_each_cpu(ipi_remote_fence_i, NULL, 1);
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2024-11-27 17:29 ` [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode Andy Chiu
@ 2025-03-10 19:08 ` Björn Töpel
2025-03-11 12:44 ` Andrea Parri
0 siblings, 1 reply; 32+ messages in thread
From: Björn Töpel @ 2025-03-10 19:08 UTC (permalink / raw)
To: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: Andy Chiu, linux-riscv, linux-kernel, bjorn, puranjay12,
alexghiti, yongxuan.wang, greentime.hu, nick.hu, nylon.chen,
tommy.wu, eric.lin, viccent.chen, zong.li, samuel.holland
Andy Chiu <andybnac@gmail.com> writes:
> RISC-V spec explicitly calls out that a local fence.i is not enough for
> the code modification to be visble from a remote hart. In fact, it
> states:
>
> To make a store to instruction memory visible to all RISC-V harts, the
> writing hart also has to execute a data FENCE before requesting that all
> remote RISC-V harts execute a FENCE.I.
>
> Thus, add a fence here to order data writes before making the IPI.
>
> Signed-off-by: Andy Chiu <andybnac@gmail.com>
> ---
> arch/riscv/mm/cacheflush.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index b81672729887..b2e4b81763f8 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -24,7 +24,20 @@ void flush_icache_all(void)
>
> if (num_online_cpus() < 2)
> return;
> - else if (riscv_use_sbi_for_rfence())
> +
> + /*
> + * Make sure all previous writes to the D$ are ordered before making
> + * the IPI. The RISC-V spec states that a hart must execute a data fence
> + * before triggering a remote fence.i in order to make the modification
> + * visable for remote harts.
> + *
> + * IPIs on RISC-V are triggered by MMIO writes to either CLINT or
> + * S-IMSIC, so the fence ensures previous data writes "happen before"
> + * the MMIO.
> + */
> + RISCV_FENCE(w, o);
(I love the submit/review latency here! ;-))
FWIW, the for S-IMSIC the write is already writel(), so we'll have the
text patching and IPI ordered. Regardless, there's more than one flavor
of IPI on RISC-V!
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> + if (riscv_use_sbi_for_rfence())
> sbi_remote_fence_i(NULL);
> else
> on_each_cpu(ipi_remote_fence_i, NULL, 1);
> --
> 2.39.3 (Apple Git-145)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2025-03-10 19:08 ` Björn Töpel
@ 2025-03-11 12:44 ` Andrea Parri
2025-03-11 14:53 ` Björn Töpel
0 siblings, 1 reply; 32+ messages in thread
From: Andrea Parri @ 2025-03-11 12:44 UTC (permalink / raw)
To: Björn Töpel
Cc: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
linux-kernel, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland
> FWIW, the for S-IMSIC the write is already writel(), so we'll have the
> text patching and IPI ordered. Regardless, there's more than one flavor
> of IPI on RISC-V!
AFAIU, this writel() is intended to order the insertion (and the initialization)
of the CSD object before the MMIO writes; so, the "right fix" seems to turn the
"other flavors" into using a writel() or providing a similar ordering guarantee.
As a bonus, such change should address/fix all current and future occurrences of
the message-passing scenario in question (the patch addressed the occurrence in
flush_icache_all(), but there appears to be a similar one in flush_icache_mm()).
Or am I misunderstanding your previous comment?
Andrea
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2025-03-11 12:44 ` Andrea Parri
@ 2025-03-11 14:53 ` Björn Töpel
2025-03-11 18:11 ` Andrea Parri
0 siblings, 1 reply; 32+ messages in thread
From: Björn Töpel @ 2025-03-11 14:53 UTC (permalink / raw)
To: Andrea Parri
Cc: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
linux-kernel, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland
Andrea Parri <parri.andrea@gmail.com> writes:
>> FWIW, the for S-IMSIC the write is already writel(), so we'll have the
>> text patching and IPI ordered. Regardless, there's more than one flavor
>> of IPI on RISC-V!
>
> AFAIU, this writel() is intended to order the insertion (and the initialization)
> of the CSD object before the MMIO writes; so, the "right fix" seems to turn the
> "other flavors" into using a writel() or providing a similar ordering guarantee.
Yes, that's probably the right approach, or maybe follow-up!
> As a bonus, such change should address/fix all current and future occurrences of
> the message-passing scenario in question (the patch addressed the occurrence in
> flush_icache_all(), but there appears to be a similar one in flush_icache_mm()).
Indeed. I wonder if the SBI remote fence.i needs it?
Cheers,
Björn
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2025-03-11 14:53 ` Björn Töpel
@ 2025-03-11 18:11 ` Andrea Parri
2025-03-13 18:12 ` Andy Chiu
0 siblings, 1 reply; 32+ messages in thread
From: Andrea Parri @ 2025-03-11 18:11 UTC (permalink / raw)
To: Björn Töpel
Cc: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv,
linux-kernel, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland
On Tue, Mar 11, 2025 at 03:53:36PM +0100, Björn Töpel wrote:
> Andrea Parri <parri.andrea@gmail.com> writes:
>
> >> FWIW, the for S-IMSIC the write is already writel(), so we'll have the
> >> text patching and IPI ordered. Regardless, there's more than one flavor
> >> of IPI on RISC-V!
> >
> > AFAIU, this writel() is intended to order the insertion (and the initialization)
> > of the CSD object before the MMIO writes; so, the "right fix" seems to turn the
> > "other flavors" into using a writel() or providing a similar ordering guarantee.
>
> Yes, that's probably the right approach, or maybe follow-up!
>
> > As a bonus, such change should address/fix all current and future occurrences of
> > the message-passing scenario in question (the patch addressed the occurrence in
> > flush_icache_all(), but there appears to be a similar one in flush_icache_mm()).
>
> Indeed. I wonder if the SBI remote fence.i needs it?
Ah! So I am not alone: AFAICT, that remains a matter of discussions with your SEE
team/developers. :-|
Andrea
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2025-03-11 18:11 ` Andrea Parri
@ 2025-03-13 18:12 ` Andy Chiu
2025-03-14 15:23 ` Andrea Parri
0 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2025-03-13 18:12 UTC (permalink / raw)
To: Andrea Parri
Cc: Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
linux-riscv, linux-kernel, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
Andrea Parri <parri.andrea@gmail.com> 於 2025年3月12日 週三 上午2:11寫道:
>
> On Tue, Mar 11, 2025 at 03:53:36PM +0100, Björn Töpel wrote:
> > Andrea Parri <parri.andrea@gmail.com> writes:
> >
> > >> FWIW, the for S-IMSIC the write is already writel(), so we'll have the
> > >> text patching and IPI ordered. Regardless, there's more than one flavor
> > >> of IPI on RISC-V!
> > >
> > > AFAIU, this writel() is intended to order the insertion (and the initialization)
> > > of the CSD object before the MMIO writes; so, the "right fix" seems to turn the
> > > "other flavors" into using a writel() or providing a similar ordering guarantee.
I found that Apple's aic irqchip uses writel_relaxed for sending IPIs.
I am not sure if it is a practice using relaxed mmio in the driver to
deal with IPIs. I am more convinced that driver should use the relaxed
version if there is no data/io dependency for the driver itself. But
it is true that a fence in the driver makes programming easier.
> >
> > Yes, that's probably the right approach, or maybe follow-up!
> >
> > > As a bonus, such change should address/fix all current and future occurrences of
> > > the message-passing scenario in question (the patch addressed the occurrence in
> > > flush_icache_all(), but there appears to be a similar one in flush_icache_mm()).
> >
> > Indeed. I wonder if the SBI remote fence.i needs it?
>
> Ah! So I am not alone: AFAICT, that remains a matter of discussions with your SEE
> team/developers. :-|
As far as OpenSBI is concerned, there is a wmb(), which translated to
fence ow, ow, in the generic code path. Regardless, there may be more
than one flavor of SBIs, should we also consider that?
Thanks,
Andy
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode
2025-03-13 18:12 ` Andy Chiu
@ 2025-03-14 15:23 ` Andrea Parri
0 siblings, 0 replies; 32+ messages in thread
From: Andrea Parri @ 2025-03-14 15:23 UTC (permalink / raw)
To: Andy Chiu
Cc: Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
linux-riscv, linux-kernel, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
> I found that Apple's aic irqchip uses writel_relaxed for sending IPIs.
> I am not sure if it is a practice using relaxed mmio in the driver to
> deal with IPIs. I am more convinced that driver should use the relaxed
> version if there is no data/io dependency for the driver itself. But
> it is true that a fence in the driver makes programming easier.
I emphatize with this viewpoint.
Perhaps a first counterargument/remark is that lifting those fences (e.g.,
irq-gic-v3) out of the various drivers into core/more generic code would
mean having some generic primitives to "order" the stores vs send_ipis;
however, we (kernel developers) don't have such an API today. Perhaps
unsurprisingly, considered that (as already recalled in this thread) even
on a same architecture send_ipis can mean things/operations as different
as "do an MMIO write", "write a system register", "execute an environment
call instruction" and what more; as a consequence, such matters tend to
become quite tricky even within a given/single driver (e.g., 80e4e1f472889
("irqchip/gic-v3: Use dsb(ishst) to order writes with ICC_SGI1R_EL1
accesses"), more so at "Linux level".
> As far as OpenSBI is concerned, there is a wmb(), which translated to
> fence ow, ow, in the generic code path. Regardless, there may be more
> than one flavor of SBIs, should we also consider that?
For the sake of argument, how would you proceed to do that?
Let me put it this way. If the answer to your question is "no, we should
not", then you have just showed that the fence w, o added by the patch is
redundant if riscv_use_sbi_for_rfence(). If the answer is "yes", then I
think the patch could use some words to describe why the newly added fence
suffices to order the explicit writes before the ecall at stake _for each_
of the relevant implementations. IIRC, the ISA wordings for ecall (and
fences) do not appear to provide much help to that end. Hence, to iterate,
I really don't see other ways other than digging into the implementations
and asking the developers or the experts of interest to achieve that.
After all, what I'm saying seems to align with what you've done for (some
version of) OpenSBI.
Andrea
[1] https://lore.kernel.org/all/6432e7e97b828d887da8794c150161c4@kernel.org/T/#mc90f2a2eb423ce1ba579fc4f566ad49a16825041
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v3 7/7] riscv: ftrace: support PREEMPT
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (5 preceding siblings ...)
2024-11-27 17:29 ` [PATCH v3 6/7] riscv: add a data fence for CMODX in the kernel mode Andy Chiu
@ 2024-11-27 17:29 ` Andy Chiu
2025-03-10 19:09 ` Björn Töpel
2024-11-27 21:25 ` [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Björn Töpel
` (2 subsequent siblings)
9 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2024-11-27 17:29 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: linux-riscv, linux-kernel, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
From: Andy Chiu <andy.chiu@sifive.com>
Now, we can safely enable dynamic ftrace with kernel preemption.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
arch/riscv/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 75a5ebde4427..554e4e363c54 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -148,7 +148,7 @@ config RISCV
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
- select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
+ select HAVE_FUNCTION_TRACER if !XIP_KERNEL
select HAVE_EBPF_JIT if MMU
select HAVE_GUP_FAST if MMU
select HAVE_FUNCTION_ARG_ACCESS_API
--
2.39.3 (Apple Git-145)
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v3 7/7] riscv: ftrace: support PREEMPT
2024-11-27 17:29 ` [PATCH v3 7/7] riscv: ftrace: support PREEMPT Andy Chiu
@ 2025-03-10 19:09 ` Björn Töpel
0 siblings, 0 replies; 32+ messages in thread
From: Björn Töpel @ 2025-03-10 19:09 UTC (permalink / raw)
To: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou
Cc: linux-riscv, linux-kernel, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
Andy Chiu <andybnac@gmail.com> writes:
> From: Andy Chiu <andy.chiu@sifive.com>
>
> Now, we can safely enable dynamic ftrace with kernel preemption.
>
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (6 preceding siblings ...)
2024-11-27 17:29 ` [PATCH v3 7/7] riscv: ftrace: support PREEMPT Andy Chiu
@ 2024-11-27 21:25 ` Björn Töpel
2024-12-24 3:15 ` Steven Rostedt
2024-12-02 7:58 ` Evgenii Shatokhin
2024-12-03 12:18 ` Björn Töpel
9 siblings, 1 reply; 32+ messages in thread
From: Björn Töpel @ 2024-11-27 21:25 UTC (permalink / raw)
To: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Steven Rostedt
Cc: linux-kernel, linux-riscv, llvm, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
Adding Steven.
Andy Chiu <andybnac@gmail.com> writes:
> This series makes atmoic code patching possible in riscv ftrace. A
> direct benefit of this is that we can get rid of stop_machine() when
> patching function entries. This also makes it possible to run ftrace
> with full kernel preemption. Before this series, the kernel initializes
> patchable function entries to NOP4 + NOP4. To start tracing, it updates
> entries to AUIPC + JALR while holding other cores in stop_machine.
> stop_machine() is required because it is impossible to update 2
> instructions, and be seen atomically. And preemption must have to be
> prevented, as kernel preemption allows process to be scheduled out while
> executing on one of these instruction pairs.
>
> This series addresses the problem by initializing the first NOP4 to
> AUIPC. So, atmoic patching is possible because the kernel only has to
> update one instruction. As long as the instruction is naturally aligned,
> then it is expected to be updated atomically.
>
> However, the address range of the ftrace trampoline is limited to +-2K
> from ftrace_caller after appplying this series. This issue is expected
> to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> data in front of pacthable functions and can use it to direct execution
> out to any custom trampolines.
>
> The series is composed by three parts. The first part cleans up the
> existing issues when the kernel is compiled with clang.The second part
> modifies the ftrace code patching mechanism (2-4) as mentioned above.
> Then prepare ftrace to be able to run with kernel preemption (5,6)
>
> An ongoing fix:
>
> Since there is no room for marking *kernel_text_address as notrace[1] at
> source code level, there is a significant performance regression when
> using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> 8 graph handler being called in each function-entry. The current
> workaround requires us echo "*kernel_text_address" into
> set_ftrace_notrace before starting the trace. However, we observed that
> the kernel still enables the patch site in some cases even with
> *kernel_text_address properly added in the file While the root cause is
> still under investagtion, we consider that it should not be the reason
> for holding back the code patching, in order to unblock the call_ops
> part.
Maybe Steven knows this from the top of his head!
As Andy points out, "*kernel_text_address" is used in the stack
unwinding on RISC-V. So, if you do a tracing without filtering *and*
TRACE_IRQFLAGS, one will drown in traces.
E.g. the ftrace selftest:
| $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
will generate a lot of traces.
Now, if we add:
--8<--
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
index ff88f97e41fb..4f30a4d81d99 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
@@ -84,6 +84,7 @@ cd $INSTANCE2
do_test '*rcu*' 'rcu'
cd $WD
cd $INSTANCE3
+echo '*kernel_text_address' > set_ftrace_notrace
echo function_graph > current_tracer
sleep 1
-->8--
The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
but still enable the *kernel_text_address traces. (Note that there are
no filters in the test, so *all* ftrace recs will be enabled.)
Are we holding the graph tracer wrong?
Happy thanksgiving!
Björn
^ permalink raw reply related [flat|nested] 32+ messages in thread* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-11-27 21:25 ` [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Björn Töpel
@ 2024-12-24 3:15 ` Steven Rostedt
2024-12-29 19:08 ` Andy Chiu
0 siblings, 1 reply; 32+ messages in thread
From: Steven Rostedt @ 2024-12-24 3:15 UTC (permalink / raw)
To: Björn Töpel
Cc: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
linux-kernel, linux-riscv, llvm, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
On Wed, 27 Nov 2024 22:25:57 +0100
Björn Töpel <bjorn@kernel.org> wrote:
> Adding Steven.
And this has been in my draft folder for almost a month :-p
I kept coming to this email, but then got distracted by something else.
>
> Andy Chiu <andybnac@gmail.com> writes:
>
> > This series makes atmoic code patching possible in riscv ftrace. A
> > direct benefit of this is that we can get rid of stop_machine() when
> > patching function entries. This also makes it possible to run ftrace
> > with full kernel preemption. Before this series, the kernel initializes
> > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > entries to AUIPC + JALR while holding other cores in stop_machine.
> > stop_machine() is required because it is impossible to update 2
> > instructions, and be seen atomically. And preemption must have to be
> > prevented, as kernel preemption allows process to be scheduled out while
> > executing on one of these instruction pairs.
> >
> > This series addresses the problem by initializing the first NOP4 to
> > AUIPC. So, atmoic patching is possible because the kernel only has to
> > update one instruction. As long as the instruction is naturally aligned,
> > then it is expected to be updated atomically.
> >
> > However, the address range of the ftrace trampoline is limited to +-2K
> > from ftrace_caller after appplying this series. This issue is expected
> > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > data in front of pacthable functions and can use it to direct execution
> > out to any custom trampolines.
> >
> > The series is composed by three parts. The first part cleans up the
> > existing issues when the kernel is compiled with clang.The second part
> > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > Then prepare ftrace to be able to run with kernel preemption (5,6)
> >
> > An ongoing fix:
> >
> > Since there is no room for marking *kernel_text_address as notrace[1] at
> > source code level, there is a significant performance regression when
> > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > 8 graph handler being called in each function-entry. The current
> > workaround requires us echo "*kernel_text_address" into
> > set_ftrace_notrace before starting the trace. However, we observed that
> > the kernel still enables the patch site in some cases even with
> > *kernel_text_address properly added in the file While the root cause is
> > still under investagtion, we consider that it should not be the reason
> > for holding back the code patching, in order to unblock the call_ops
> > part.
>
> Maybe Steven knows this from the top of his head!
>
> As Andy points out, "*kernel_text_address" is used in the stack
> unwinding on RISC-V. So, if you do a tracing without filtering *and*
> TRACE_IRQFLAGS, one will drown in traces.
I tested set_ftrace_notrace on x86 and the function graph tracer does honor
it. I wonder if there's a kernel
>
> E.g. the ftrace selftest:
> | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
>
> will generate a lot of traces.
>
> Now, if we add:
> --8<--
> diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> index ff88f97e41fb..4f30a4d81d99 100644
> --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> @@ -84,6 +84,7 @@ cd $INSTANCE2
> do_test '*rcu*' 'rcu'
> cd $WD
> cd $INSTANCE3
> +echo '*kernel_text_address' > set_ftrace_notrace
> echo function_graph > current_tracer
>
> sleep 1
> -->8--
>
> The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> but still enable the *kernel_text_address traces. (Note that there are
> no filters in the test, so *all* ftrace recs will be enabled.)
>
> Are we holding the graph tracer wrong?
What do you get when you do:
# grep kernel_text_address available_filter_functions
?
>
>
> Happy thanksgiving!
> Björn
And Merry Christmas!
-- Steve
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-12-24 3:15 ` Steven Rostedt
@ 2024-12-29 19:08 ` Andy Chiu
2025-01-06 15:22 ` Andy Chiu
0 siblings, 1 reply; 32+ messages in thread
From: Andy Chiu @ 2024-12-29 19:08 UTC (permalink / raw)
To: Steven Rostedt
Cc: Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
linux-kernel, linux-riscv, llvm, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
Steven Rostedt <rostedt@goodmis.org> 於 2024年12月24日 週二 上午11:15寫道:
>
> On Wed, 27 Nov 2024 22:25:57 +0100
> Björn Töpel <bjorn@kernel.org> wrote:
>
> > Adding Steven.
>
> And this has been in my draft folder for almost a month :-p
>
> I kept coming to this email, but then got distracted by something else.
>
> >
> > Andy Chiu <andybnac@gmail.com> writes:
> >
> > > This series makes atmoic code patching possible in riscv ftrace. A
> > > direct benefit of this is that we can get rid of stop_machine() when
> > > patching function entries. This also makes it possible to run ftrace
> > > with full kernel preemption. Before this series, the kernel initializes
> > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > stop_machine() is required because it is impossible to update 2
> > > instructions, and be seen atomically. And preemption must have to be
> > > prevented, as kernel preemption allows process to be scheduled out while
> > > executing on one of these instruction pairs.
> > >
> > > This series addresses the problem by initializing the first NOP4 to
> > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > update one instruction. As long as the instruction is naturally aligned,
> > > then it is expected to be updated atomically.
> > >
> > > However, the address range of the ftrace trampoline is limited to +-2K
> > > from ftrace_caller after appplying this series. This issue is expected
> > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > data in front of pacthable functions and can use it to direct execution
> > > out to any custom trampolines.
> > >
> > > The series is composed by three parts. The first part cleans up the
> > > existing issues when the kernel is compiled with clang.The second part
> > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > >
> > > An ongoing fix:
> > >
> > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > source code level, there is a significant performance regression when
> > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > 8 graph handler being called in each function-entry. The current
> > > workaround requires us echo "*kernel_text_address" into
> > > set_ftrace_notrace before starting the trace. However, we observed that
> > > the kernel still enables the patch site in some cases even with
> > > *kernel_text_address properly added in the file While the root cause is
> > > still under investagtion, we consider that it should not be the reason
> > > for holding back the code patching, in order to unblock the call_ops
> > > part.
> >
> > Maybe Steven knows this from the top of his head!
> >
> > As Andy points out, "*kernel_text_address" is used in the stack
> > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > TRACE_IRQFLAGS, one will drown in traces.
>
> I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> it. I wonder if there's a kernel
After checking the log buffer, I can confirm that riscv does also
honor set_ftrace_notrace. It does not print out "*kernel_text_address"
in the trace buffer. Sorry for not making this clear.
However, the problem is that the patch sites "*kernel_text_address"
were enabled, allowing the code flow into ftrace and parts of fgraph.
In particular, with TRACE_IRQFLAGS enabled, the functions are called
extensively, causing the significant slow down.
IIUC, it is reasonable to enable a patch site if there is at least one
tracing instance that request the function. However, in another
experiment, the patch sites of "*kernel_text_address" are enabled
even when all instances have them disabled. Here is a way to reproduce
it:
cd /sys/kernel/
mount -t tracefs t tracing/
cd tracing/
mkdir instances/1
mkdir instances/2
cd instances/1/
echo *kernel_text_address > set_ftrace_notrace
echo *sched* > set_ftrace_filter # this is just to make the kernel
more responsive
cat set_ftrace_notrace
echo function_graph > current_tracer
cd ../2/
echo *kernel_text_address > set_ftrace_notrace
cat set_ftrace_notrace
echo function_graph > current_tracer
To figure out this problem, it is probably worth trying to understand
why the patch site is enabled after the above sequence:
It seems like the ftrace_ops being passed into
__ftrace_hash_rec_update() at the start of the second tracer has an
empty notrace hash. This leads to the eventual activation of the patch
sites. However, graph_ops does not have the empty notrace hash when
the user starts the first graph tracer. I am trying to understand this
part of the code, and the relationship between graph_ops and ops in
the fgraph_array.
>
> >
> > E.g. the ftrace selftest:
> > | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> >
> > will generate a lot of traces.
> >
> > Now, if we add:
> > --8<--
> > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > index ff88f97e41fb..4f30a4d81d99 100644
> > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > @@ -84,6 +84,7 @@ cd $INSTANCE2
> > do_test '*rcu*' 'rcu'
> > cd $WD
> > cd $INSTANCE3
> > +echo '*kernel_text_address' > set_ftrace_notrace
> > echo function_graph > current_tracer
> >
> > sleep 1
> > -->8--
> >
> > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > but still enable the *kernel_text_address traces. (Note that there are
> > no filters in the test, so *all* ftrace recs will be enabled.)
> >
> > Are we holding the graph tracer wrong?
>
> What do you get when you do:
>
> # grep kernel_text_address available_filter_functions
>
> ?
We get:
kernel_text_address
__kernel_text_address
>
> >
> >
> > Happy thanksgiving!
> > Björn
>
> And Merry Christmas!
>
> -- Steve
>
And Happy New Year!
Andy
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-12-29 19:08 ` Andy Chiu
@ 2025-01-06 15:22 ` Andy Chiu
0 siblings, 0 replies; 32+ messages in thread
From: Andy Chiu @ 2025-01-06 15:22 UTC (permalink / raw)
To: Steven Rostedt
Cc: Björn Töpel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
linux-kernel, linux-riscv, llvm, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
Andy Chiu <andybnac@gmail.com> 於 2024年12月30日 週一 上午3:08寫道:
>
> Steven Rostedt <rostedt@goodmis.org> 於 2024年12月24日 週二 上午11:15寫道:
> >
> > On Wed, 27 Nov 2024 22:25:57 +0100
> > Björn Töpel <bjorn@kernel.org> wrote:
> >
> > > Adding Steven.
> >
> > And this has been in my draft folder for almost a month :-p
> >
> > I kept coming to this email, but then got distracted by something else.
> >
> > >
> > > Andy Chiu <andybnac@gmail.com> writes:
> > >
> > > > This series makes atmoic code patching possible in riscv ftrace. A
> > > > direct benefit of this is that we can get rid of stop_machine() when
> > > > patching function entries. This also makes it possible to run ftrace
> > > > with full kernel preemption. Before this series, the kernel initializes
> > > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > > stop_machine() is required because it is impossible to update 2
> > > > instructions, and be seen atomically. And preemption must have to be
> > > > prevented, as kernel preemption allows process to be scheduled out while
> > > > executing on one of these instruction pairs.
> > > >
> > > > This series addresses the problem by initializing the first NOP4 to
> > > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > > update one instruction. As long as the instruction is naturally aligned,
> > > > then it is expected to be updated atomically.
> > > >
> > > > However, the address range of the ftrace trampoline is limited to +-2K
> > > > from ftrace_caller after appplying this series. This issue is expected
> > > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > > data in front of pacthable functions and can use it to direct execution
> > > > out to any custom trampolines.
> > > >
> > > > The series is composed by three parts. The first part cleans up the
> > > > existing issues when the kernel is compiled with clang.The second part
> > > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > > >
> > > > An ongoing fix:
> > > >
> > > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > > source code level, there is a significant performance regression when
> > > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > > 8 graph handler being called in each function-entry. The current
> > > > workaround requires us echo "*kernel_text_address" into
> > > > set_ftrace_notrace before starting the trace. However, we observed that
> > > > the kernel still enables the patch site in some cases even with
> > > > *kernel_text_address properly added in the file While the root cause is
> > > > still under investagtion, we consider that it should not be the reason
> > > > for holding back the code patching, in order to unblock the call_ops
> > > > part.
> > >
> > > Maybe Steven knows this from the top of his head!
> > >
> > > As Andy points out, "*kernel_text_address" is used in the stack
> > > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > > TRACE_IRQFLAGS, one will drown in traces.
> >
> > I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> > it. I wonder if there's a kernel
>
> After checking the log buffer, I can confirm that riscv does also
> honor set_ftrace_notrace. It does not print out "*kernel_text_address"
> in the trace buffer. Sorry for not making this clear.
>
> However, the problem is that the patch sites "*kernel_text_address"
> were enabled, allowing the code flow into ftrace and parts of fgraph.
> In particular, with TRACE_IRQFLAGS enabled, the functions are called
> extensively, causing the significant slow down.
>
> IIUC, it is reasonable to enable a patch site if there is at least one
> tracing instance that request the function. However, in another
> experiment, the patch sites of "*kernel_text_address" are enabled
> even when all instances have them disabled. Here is a way to reproduce
> it:
>
> cd /sys/kernel/
> mount -t tracefs t tracing/
> cd tracing/
> mkdir instances/1
> mkdir instances/2
> cd instances/1/
> echo *kernel_text_address > set_ftrace_notrace
> echo *sched* > set_ftrace_filter # this is just to make the kernel
> more responsive
> cat set_ftrace_notrace
> echo function_graph > current_tracer
> cd ../2/
> echo *kernel_text_address > set_ftrace_notrace
> cat set_ftrace_notrace
> echo function_graph > current_tracer
>
> To figure out this problem, it is probably worth trying to understand
> why the patch site is enabled after the above sequence:
>
> It seems like the ftrace_ops being passed into
> __ftrace_hash_rec_update() at the start of the second tracer has an
> empty notrace hash. This leads to the eventual activation of the patch
> sites. However, graph_ops does not have the empty notrace hash when
> the user starts the first graph tracer. I am trying to understand this
> part of the code, and the relationship between graph_ops and ops in
> the fgraph_array.
Here might be the possible reason: the updated notrace hash for
Manager's ops is not reflecting the intersections of all users'
notrace hashes. According to the comment section at
ftrace_startup_subops():
* o If either notrace_hash is empty then the final stays empty
* o Otherwise, the final is an intersection between the hashes
It seems like wrong arguments were passed into the intersect_hash().
Instead of user's notrace hash, the filter hash was passed into the
function. As a result, even if the function is marked as notrace in
all ftrace instances, the notrace hash in Manager's ops would not
contain notrace function, leading to enabled call site at the
notrace'd function. If I understand correctly, we should pass notrace
hash instead of filter hash, as shown in the following modification.
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4c28dd177ca6..3a194559d220 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3515,18 +3515,17 @@ int ftrace_startup_subops(struct ftrace_ops
*ops, struct ftrace_ops *subops, int
ftrace_hash_empty(subops->func_hash->notrace_hash)) {
notrace_hash = EMPTY_HASH;
} else {
- size_bits = max(ops->func_hash->filter_hash->size_bits,
- subops->func_hash->filter_hash->size_bits);
+ size_bits = max(ops->func_hash->notrace_hash->size_bits,
+ subops->func_hash->notrace_hash->size_bits);
notrace_hash = alloc_ftrace_hash(size_bits);
if (!notrace_hash) {
- free_ftrace_hash(filter_hash);
+ free_ftrace_hash(notrace_hash);
return -ENOMEM;
}
- ret = intersect_hash(¬race_hash, ops->func_hash->filter_hash,
- subops->func_hash->filter_hash);
+ ret = intersect_hash(¬race_hash,
ops->func_hash->notrace_hash,
+ subops->func_hash->notrace_hash);
if (ret < 0) {
- free_ftrace_hash(filter_hash);
free_ftrace_hash(notrace_hash);
return ret;
}
I can confirm that callsites at *kernel_text_address are no longer
patched after applying the above patch, if they are set as notrace in
each ftrace instance. I can send out a fix patch if this sounds like a
proper fix.
>
> >
> > >
> > > E.g. the ftrace selftest:
> > > | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> > >
> > > will generate a lot of traces.
> > >
> > > Now, if we add:
> > > --8<--
> > > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > index ff88f97e41fb..4f30a4d81d99 100644
> > > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > > @@ -84,6 +84,7 @@ cd $INSTANCE2
> > > do_test '*rcu*' 'rcu'
> > > cd $WD
> > > cd $INSTANCE3
> > > +echo '*kernel_text_address' > set_ftrace_notrace
> > > echo function_graph > current_tracer
> > >
> > > sleep 1
> > > -->8--
> > >
> > > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > > but still enable the *kernel_text_address traces. (Note that there are
> > > no filters in the test, so *all* ftrace recs will be enabled.)
> > >
> > > Are we holding the graph tracer wrong?
> >
> > What do you get when you do:
> >
> > # grep kernel_text_address available_filter_functions
> >
> > ?
>
> We get:
> kernel_text_address
> __kernel_text_address
>
> >
> > >
> > >
> > > Happy thanksgiving!
> > > Björn
> >
> > And Merry Christmas!
> >
> > -- Steve
> >
>
> And Happy New Year!
>
>
> Andy
Thanks,
Andy
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (7 preceding siblings ...)
2024-11-27 21:25 ` [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Björn Töpel
@ 2024-12-02 7:58 ` Evgenii Shatokhin
2024-12-11 15:38 ` Andy Chiu
2024-12-03 12:18 ` Björn Töpel
9 siblings, 1 reply; 32+ messages in thread
From: Evgenii Shatokhin @ 2024-12-02 7:58 UTC (permalink / raw)
To: Andy Chiu
Cc: Justin Stitt, Bill Wendling, Nick Desaulniers, Nathan Chancellor,
Albert Ou, Palmer Dabbelt, Paul Walmsley, linux-kernel,
linux-riscv, llvm, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland, linux
Hi,
On 27.11.2024 20:29, Andy Chiu wrote:
> This series makes atmoic code patching possible in riscv ftrace. A
> direct benefit of this is that we can get rid of stop_machine() when
> patching function entries. This also makes it possible to run ftrace
> with full kernel preemption. Before this series, the kernel initializes
> patchable function entries to NOP4 + NOP4. To start tracing, it updates
> entries to AUIPC + JALR while holding other cores in stop_machine.
> stop_machine() is required because it is impossible to update 2
> instructions, and be seen atomically. And preemption must have to be
> prevented, as kernel preemption allows process to be scheduled out while
> executing on one of these instruction pairs.
>
> This series addresses the problem by initializing the first NOP4 to
> AUIPC. So, atmoic patching is possible because the kernel only has to
> update one instruction. As long as the instruction is naturally aligned,
> then it is expected to be updated atomically.
>
> However, the address range of the ftrace trampoline is limited to +-2K
> from ftrace_caller after appplying this series. This issue is expected
> to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> data in front of pacthable functions and can use it to direct execution
> out to any custom trampolines.
>
> The series is composed by three parts. The first part cleans up the
> existing issues when the kernel is compiled with clang.The second part
> modifies the ftrace code patching mechanism (2-4) as mentioned above.
> Then prepare ftrace to be able to run with kernel preemption (5,6)
>
> An ongoing fix:
>
> Since there is no room for marking *kernel_text_address as notrace[1] at
> source code level, there is a significant performance regression when
> using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> 8 graph handler being called in each function-entry. The current
> workaround requires us echo "*kernel_text_address" into
> set_ftrace_notrace before starting the trace. However, we observed that
> the kernel still enables the patch site in some cases even with
> *kernel_text_address properly added in the file While the root cause is
> still under investagtion, we consider that it should not be the reason
> for holding back the code patching, in order to unblock the call_ops
> part.
>
> [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/
>
> Changes in v3:
> - Add a fix tag for patch 1
> - Add a data fence before sending out remote fence.i (6)
> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
>
> Changes in v2:
> - Drop patch 1 as it is merged through fixes.
> - Drop patch 2, which converts kernel_text_address into notrace. As
> users can prevent tracing it by configuring the tracefs.
> - Use a more generic way in kconfig to align functions.
> - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com
>
>
> Andy Chiu (7):
> riscv: ftrace: support fastcc in Clang for WITH_ARGS
> riscv: ftrace: align patchable functions to 4 Byte boundary
> riscv: ftrace: prepare ftrace for atomic code patching
> riscv: ftrace: do not use stop_machine to update code
> riscv: vector: Support calling schedule() for preemptible Vector
> riscv: add a data fence for CMODX in the kernel mode
> riscv: ftrace: support PREEMPT
>
> arch/riscv/Kconfig | 4 +-
> arch/riscv/include/asm/ftrace.h | 11 +++
> arch/riscv/include/asm/processor.h | 5 ++
> arch/riscv/include/asm/vector.h | 22 ++++-
> arch/riscv/kernel/asm-offsets.c | 7 ++
> arch/riscv/kernel/ftrace.c | 133 ++++++++++++-----------------
> arch/riscv/kernel/mcount-dyn.S | 25 ++++--
> arch/riscv/mm/cacheflush.c | 15 +++-
> 8 files changed, 135 insertions(+), 87 deletions(-)
> ---
> base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
> --
> 2.39.3 (Apple Git-145)
I have tested this series in a QEMU VM (-machine virt) with the
preemptible kernels, CONFIG_PREEMPT=y.
No issues have been revealed so far.
One of the kernels was built by GCC 13.2.0 (with the patch for minimum
alignment added on top of it), the other - with LLVM 17.0.6.
In both cases, the basic boottime tests for Ftrace passed.
Switching tracers between nop, function, function_graph and blk in a
loop in parallel with stress-ng and with network load for several hours
did not reveal any problems either. Kernel crashes happened in this
scenario a year ago, but now it runs clean, good!
Function redirection via Ftrace seems to work OK too.
The size of .text section increased slightly after this series, by 0.35%
- 0.38%, probably because of function alignment:
* 12075 KB => 12121 KB (GCC)
* 12167 KB => 12209 KB (LLVM/clang)
Not sure, how to test the vector-related part though, "[PATCH v3 5/7]
riscv: vector: Support calling schedule() for preemptible Vector"
For all other patches in the series:
Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Regards,
Evgenii
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-12-02 7:58 ` Evgenii Shatokhin
@ 2024-12-11 15:38 ` Andy Chiu
0 siblings, 0 replies; 32+ messages in thread
From: Andy Chiu @ 2024-12-11 15:38 UTC (permalink / raw)
To: Evgenii Shatokhin
Cc: Justin Stitt, Bill Wendling, Nick Desaulniers, Nathan Chancellor,
Albert Ou, Palmer Dabbelt, Paul Walmsley, linux-kernel,
linux-riscv, llvm, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland, linux
Hi Evgenii,
Evgenii Shatokhin <e.shatokhin@yadro.com> 於 2024年12月2日 週一 下午3:58寫道:
>
> Hi,
>
> On 27.11.2024 20:29, Andy Chiu wrote:
> > This series makes atmoic code patching possible in riscv ftrace. A
> > direct benefit of this is that we can get rid of stop_machine() when
> > patching function entries. This also makes it possible to run ftrace
> > with full kernel preemption. Before this series, the kernel initializes
> > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > entries to AUIPC + JALR while holding other cores in stop_machine.
> > stop_machine() is required because it is impossible to update 2
> > instructions, and be seen atomically. And preemption must have to be
> > prevented, as kernel preemption allows process to be scheduled out while
> > executing on one of these instruction pairs.
> >
> > This series addresses the problem by initializing the first NOP4 to
> > AUIPC. So, atmoic patching is possible because the kernel only has to
> > update one instruction. As long as the instruction is naturally aligned,
> > then it is expected to be updated atomically.
> >
> > However, the address range of the ftrace trampoline is limited to +-2K
> > from ftrace_caller after appplying this series. This issue is expected
> > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > data in front of pacthable functions and can use it to direct execution
> > out to any custom trampolines.
> >
> > The series is composed by three parts. The first part cleans up the
> > existing issues when the kernel is compiled with clang.The second part
> > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > Then prepare ftrace to be able to run with kernel preemption (5,6)
> >
> > An ongoing fix:
> >
> > Since there is no room for marking *kernel_text_address as notrace[1] at
> > source code level, there is a significant performance regression when
> > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > 8 graph handler being called in each function-entry. The current
> > workaround requires us echo "*kernel_text_address" into
> > set_ftrace_notrace before starting the trace. However, we observed that
> > the kernel still enables the patch site in some cases even with
> > *kernel_text_address properly added in the file While the root cause is
> > still under investagtion, we consider that it should not be the reason
> > for holding back the code patching, in order to unblock the call_ops
> > part.
> >
> > [1]: https://lore.kernel.org/linux-riscv/20240613093233.0b349ed0@rorschach.local.home/
> >
> > Changes in v3:
> > - Add a fix tag for patch 1
> > - Add a data fence before sending out remote fence.i (6)
> > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
> >
> > Changes in v2:
> > - Drop patch 1 as it is merged through fixes.
> > - Drop patch 2, which converts kernel_text_address into notrace. As
> > users can prevent tracing it by configuring the tracefs.
> > - Use a more generic way in kconfig to align functions.
> > - Link to v1: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-0-1a538e12c01e@sifive.com
> >
> >
> > Andy Chiu (7):
> > riscv: ftrace: support fastcc in Clang for WITH_ARGS
> > riscv: ftrace: align patchable functions to 4 Byte boundary
> > riscv: ftrace: prepare ftrace for atomic code patching
> > riscv: ftrace: do not use stop_machine to update code
> > riscv: vector: Support calling schedule() for preemptible Vector
> > riscv: add a data fence for CMODX in the kernel mode
> > riscv: ftrace: support PREEMPT
> >
> > arch/riscv/Kconfig | 4 +-
> > arch/riscv/include/asm/ftrace.h | 11 +++
> > arch/riscv/include/asm/processor.h | 5 ++
> > arch/riscv/include/asm/vector.h | 22 ++++-
> > arch/riscv/kernel/asm-offsets.c | 7 ++
> > arch/riscv/kernel/ftrace.c | 133 ++++++++++++-----------------
> > arch/riscv/kernel/mcount-dyn.S | 25 ++++--
> > arch/riscv/mm/cacheflush.c | 15 +++-
> > 8 files changed, 135 insertions(+), 87 deletions(-)
> > ---
> > base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
> > --
> > 2.39.3 (Apple Git-145)
>
> I have tested this series in a QEMU VM (-machine virt) with the
> preemptible kernels, CONFIG_PREEMPT=y.
>
> No issues have been revealed so far.
>
> One of the kernels was built by GCC 13.2.0 (with the patch for minimum
> alignment added on top of it), the other - with LLVM 17.0.6.
>
> In both cases, the basic boottime tests for Ftrace passed.
>
> Switching tracers between nop, function, function_graph and blk in a
> loop in parallel with stress-ng and with network load for several hours
> did not reveal any problems either. Kernel crashes happened in this
> scenario a year ago, but now it runs clean, good!
>
> Function redirection via Ftrace seems to work OK too.
>
> The size of .text section increased slightly after this series, by 0.35%
> - 0.38%, probably because of function alignment:
> * 12075 KB => 12121 KB (GCC)
> * 12167 KB => 12209 KB (LLVM/clang)
>
> Not sure, how to test the vector-related part though, "[PATCH v3 5/7]
> riscv: vector: Support calling schedule() for preemptible Vector"
This should be tested as long as both the kernel is compiled with
RISCV_ISA_V and PREEMPT, and the hardware supports v.
And thanks for the extensive testing
>
> For all other patches in the series:
> Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
>
> Regards,
> Evgenii
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-11-27 17:29 [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
` (8 preceding siblings ...)
2024-12-02 7:58 ` Evgenii Shatokhin
@ 2024-12-03 12:18 ` Björn Töpel
2024-12-03 15:09 ` Evgenii Shatokhin
2024-12-11 15:48 ` Andy Chiu
9 siblings, 2 replies; 32+ messages in thread
From: Björn Töpel @ 2024-12-03 12:18 UTC (permalink / raw)
To: Andy Chiu, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt
Cc: linux-kernel, linux-riscv, llvm, bjorn, puranjay12, alexghiti,
yongxuan.wang, greentime.hu, nick.hu, nylon.chen, tommy.wu,
eric.lin, viccent.chen, zong.li, samuel.holland
Andy!
"atomic" spelling in the Subject line.
Andy Chiu <andybnac@gmail.com> writes:
> Changes in v3:
> - Add a fix tag for patch 1
> - Add a data fence before sending out remote fence.i (6)
> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
Hmm, the fixes tag was not included.
Also, there was a lot of comments from v2 that was not addressed:
* Minor spelling nits
* Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
same series for DIRECT_CALL, to avoid breakage.
I'll have a look at the barriers (which came up at plumbers)!
Cheers,
Björn
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-12-03 12:18 ` Björn Töpel
@ 2024-12-03 15:09 ` Evgenii Shatokhin
2024-12-06 8:39 ` Björn Töpel
2024-12-11 15:48 ` Andy Chiu
1 sibling, 1 reply; 32+ messages in thread
From: Evgenii Shatokhin @ 2024-12-03 15:09 UTC (permalink / raw)
To: Björn Töpel, Andy Chiu
Cc: Palmer Dabbelt, Nathan Chancellor, Justin Stitt, Bill Wendling,
Nick Desaulniers, Albert Ou, Paul Walmsley, linux-kernel,
linux-riscv, llvm, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland
On 03.12.2024 15:18, Björn Töpel wrote:
> Andy!
>
> "atomic" spelling in the Subject line.
>
> Andy Chiu <andybnac@gmail.com> writes:
>
>> Changes in v3:
>> - Add a fix tag for patch 1
>> - Add a data fence before sending out remote fence.i (6)
>> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
>
> Hmm, the fixes tag was not included.
>
> Also, there was a lot of comments from v2 that was not addressed:
>
> * Minor spelling nits
> * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
> same series for DIRECT_CALL, to avoid breakage.
Yes, FTRACE_WITH_DIRECT_CALLS is currently broken. If I try to insmod
samples/ftrace/ftrace-direct.ko, it reports a failure:
[ 179.531472] ------------[ ftrace bug ]------------
[ 179.531761] ftrace failed to modify
[ 179.531786] [<ffffffff8005f9ac>] wake_up_process+0x0/0x24
[ 179.532577] actual: 97:e2:fa:ff:13:00:00:00
[ 179.533211] Setting ftrace call site to call ftrace function
[ 179.534409] ftrace record flags: 99980001
[ 179.534692] (1) tramp: ftrace_caller+0x0/0x34
(call_direct_funcs+0x0/0x14)
[ 179.534692] expected tramp: ffffffff01b0d000
...
>
> I'll have a look at the barriers (which came up at plumbers)!
Thank you!
After this series and the CALL_OPS work are done, dynamic Ftrace for
RISC-V will be even more valuable in production use cases.
>
>
> Cheers,
> Björn
>
Regards,
Evgenii
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-12-03 15:09 ` Evgenii Shatokhin
@ 2024-12-06 8:39 ` Björn Töpel
0 siblings, 0 replies; 32+ messages in thread
From: Björn Töpel @ 2024-12-06 8:39 UTC (permalink / raw)
To: Evgenii Shatokhin, Andy Chiu
Cc: Palmer Dabbelt, Nathan Chancellor, Justin Stitt, Bill Wendling,
Nick Desaulniers, Albert Ou, Paul Walmsley, linux-kernel,
linux-riscv, llvm, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland
Evgenii Shatokhin <e.shatokhin@yadro.com> writes:
> On 03.12.2024 15:18, Björn Töpel wrote:
>> Andy!
>>
>> "atomic" spelling in the Subject line.
>>
>> Andy Chiu <andybnac@gmail.com> writes:
>>
>>> Changes in v3:
>>> - Add a fix tag for patch 1
>>> - Add a data fence before sending out remote fence.i (6)
>>> - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
>>
>> Hmm, the fixes tag was not included.
>>
>> Also, there was a lot of comments from v2 that was not addressed:
>>
>> * Minor spelling nits
>> * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
>> same series for DIRECT_CALL, to avoid breakage.
>
> Yes, FTRACE_WITH_DIRECT_CALLS is currently broken. If I try to insmod
> samples/ftrace/ftrace-direct.ko, it reports a failure:
>
>
> [ 179.531472] ------------[ ftrace bug ]------------
> [ 179.531761] ftrace failed to modify
> [ 179.531786] [<ffffffff8005f9ac>] wake_up_process+0x0/0x24
> [ 179.532577] actual: 97:e2:fa:ff:13:00:00:00
> [ 179.533211] Setting ftrace call site to call ftrace function
> [ 179.534409] ftrace record flags: 99980001
> [ 179.534692] (1) tramp: ftrace_caller+0x0/0x34
> (call_direct_funcs+0x0/0x14)
> [ 179.534692] expected tramp: ffffffff01b0d000
> ...
And just a regular Ubuntu 24.10 will fail with all subsystems using BPF
trampoline, e.g.
------------[ ftrace bug ]------------
ftrace failed to modify
[<ffffffff80250d98>] bpf_lsm_file_open+0x0/0x1c
CALL_OPS with definitely a must for this series.
Björn
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements
2024-12-03 12:18 ` Björn Töpel
2024-12-03 15:09 ` Evgenii Shatokhin
@ 2024-12-11 15:48 ` Andy Chiu
1 sibling, 0 replies; 32+ messages in thread
From: Andy Chiu @ 2024-12-11 15:48 UTC (permalink / raw)
To: Björn Töpel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Nathan Chancellor,
Nick Desaulniers, Bill Wendling, Justin Stitt, linux-kernel,
linux-riscv, llvm, bjorn, puranjay12, alexghiti, yongxuan.wang,
greentime.hu, nick.hu, nylon.chen, tommy.wu, eric.lin,
viccent.chen, zong.li, samuel.holland
Björn Töpel <bjorn@kernel.org> 於 2024年12月3日 週二 下午8:18寫道:
>
> Andy!
>
> "atomic" spelling in the Subject line.
Sorry, I will fix it
>
> Andy Chiu <andybnac@gmail.com> writes:
>
> > Changes in v3:
> > - Add a fix tag for patch 1
> > - Add a data fence before sending out remote fence.i (6)
> > - Link to v2: https://lore.kernel.org/all/20240628-dev-andyc-dyn-ftrace-v4-v2-0-1e5f4cb1f049@sifive.com/
>
> Hmm, the fixes tag was not included.
Do you suggest adding fix tag to the entire series? Or is there any
patches that is missing the fix tag? I am not sure if this is a fix
since we defeatured PREEMPT 2 years ago.
>
> Also, there was a lot of comments from v2 that was not addressed:
>
> * Minor spelling nits
> * Breaking DIRECT_CALL, and include Puranjay's CALL_OPS work in the
> same series for DIRECT_CALL, to avoid breakage.
Sorry I didn't get it at the Plumbers. Yes, I can test and merge
Puranjay's series and send a v4.
>
> I'll have a look at the barriers (which came up at plumbers)!
>
>
> Cheers,
> Björn
Thanks,
Andy
^ permalink raw reply [flat|nested] 32+ messages in thread