* [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults
@ 2024-09-30 19:23 Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
` (7 more replies)
0 siblings, 8 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel
Wire up the system call tracepoints with Tasks Trace RCU to allow
the ftrace, perf, and eBPF tracers to handle page faults.
This series does the initial wire-up allowing tracers to handle page
faults, but leaves out the actual handling of said page faults as future
work.
This series was compile and runtime tested with ftrace and perf syscall
tracing and raw syscall tracing, adding a WARN_ON_ONCE() in the
generated code to validate that the intended probes are used for raw
syscall tracing. The might_fault() added within those probes validate
that they are called from a context where handling a page fault is OK.
This series replaces the "Faultable Tracepoints v6" series found at [1].
This has been rebased on v6.11.1, without any conflicts. I've added the
Acked-by and Tested-by tags to relevant commits.
Steven, can you merge it through the tracing tree ?
Thanks,
Mathieu
Link: https://lore.kernel.org/lkml/20240828144153.829582-1-mathieu.desnoyers@efficios.com/ # [1]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: linux-trace-kernel@vger.kernel.org
Mathieu Desnoyers (8):
tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
tracing/ftrace: guard syscall probe with preempt_notrace
tracing/perf: guard syscall probe with preempt_notrace
tracing/bpf: guard syscall probe with preempt_notrace
tracing: Allow system call tracepoints to handle page faults
tracing/ftrace: Add might_fault check to syscall probes
tracing/perf: Add might_fault check to syscall probes
tracing/bpf: Add might_fault check to syscall probes
include/linux/tracepoint.h | 87 +++++++++++++++++++++++++--------
include/trace/bpf_probe.h | 13 +++++
include/trace/define_trace.h | 5 ++
include/trace/events/syscalls.h | 4 +-
include/trace/perf.h | 43 ++++++++++++++--
include/trace/trace_events.h | 61 +++++++++++++++++++++--
init/Kconfig | 1 +
kernel/entry/common.c | 4 +-
kernel/trace/trace_syscalls.c | 36 ++++++++++++--
9 files changed, 218 insertions(+), 36 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-10-03 9:51 ` kernel test robot
2024-10-03 9:51 ` kernel test robot
2024-09-30 19:23 ` [PATCH resend 2/8] tracing/ftrace: guard syscall probe with preempt_notrace Mathieu Desnoyers
` (6 subsequent siblings)
7 siblings, 2 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
In preparation for allowing system call tracepoints to handle page
faults, introduce TRACE_EVENT_SYSCALL to declare the sys_enter/sys_exit
tracepoints.
Emit the static inlines register_trace_syscall_##name for events
declared with TRACE_EVENT_SYSCALL, allowing source-level validation
that only probes meant to handle system call entry/exit events are
registered to them.
Move the common code between __DECLARE_TRACE and __DECLARE_TRACE_SYSCALL
into __DECLARE_TRACE_COMMON.
This change is not meant to alter the generated code, and only prepares
the following modifications.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/linux/tracepoint.h | 66 +++++++++++++++++++++++++--------
include/trace/bpf_probe.h | 3 ++
include/trace/define_trace.h | 5 +++
include/trace/events/syscalls.h | 4 +-
include/trace/perf.h | 3 ++
include/trace/trace_events.h | 28 ++++++++++++++
kernel/entry/common.c | 4 +-
kernel/trace/trace_syscalls.c | 8 ++--
8 files changed, 98 insertions(+), 23 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 6be396bb4297..2e4b4952bba2 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -248,10 +248,28 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
* site if it is not watching, as it will need to be active when the
* tracepoint is enabled.
*/
-#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \
+#define __DECLARE_TRACE_COMMON(name, proto, args, cond, data_proto) \
extern int __traceiter_##name(data_proto); \
DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name); \
extern struct tracepoint __tracepoint_##name; \
+ static inline int \
+ unregister_trace_##name(void (*probe)(data_proto), void *data) \
+ { \
+ return tracepoint_probe_unregister(&__tracepoint_##name,\
+ (void *)probe, data); \
+ } \
+ static inline void \
+ check_trace_callback_type_##name(void (*cb)(data_proto)) \
+ { \
+ } \
+ static inline bool \
+ trace_##name##_enabled(void) \
+ { \
+ return static_key_false(&__tracepoint_##name.key); \
+ }
+
+#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \
+ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \
static inline void trace_##name(proto) \
{ \
if (static_key_false(&__tracepoint_##name.key)) \
@@ -263,8 +281,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
"RCU not watching for tracepoint"); \
} \
} \
- __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \
- PARAMS(cond)) \
+ static inline void trace_##name##_rcuidle(proto) \
+ { \
+ if (static_key_false(&__tracepoint_##name.key)) \
+ __DO_TRACE(name, \
+ TP_ARGS(args), \
+ TP_CONDITION(cond), 1); \
+ } \
static inline int \
register_trace_##name(void (*probe)(data_proto), void *data) \
{ \
@@ -277,21 +300,26 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
{ \
return tracepoint_probe_register_prio(&__tracepoint_##name, \
(void *)probe, data, prio); \
- } \
- static inline int \
- unregister_trace_##name(void (*probe)(data_proto), void *data) \
- { \
- return tracepoint_probe_unregister(&__tracepoint_##name,\
- (void *)probe, data); \
- } \
- static inline void \
- check_trace_callback_type_##name(void (*cb)(data_proto)) \
+ }
+
+#define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \
+ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \
+ static inline void trace_syscall_##name(proto) \
{ \
+ if (static_key_false(&__tracepoint_##name.key)) \
+ __DO_TRACE(name, \
+ TP_ARGS(args), \
+ TP_CONDITION(cond), 0); \
+ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
+ WARN_ONCE(!rcu_is_watching(), \
+ "RCU not watching for tracepoint"); \
+ } \
} \
- static inline bool \
- trace_##name##_enabled(void) \
+ static inline int \
+ register_trace_syscall_##name(void (*probe)(data_proto), void *data) \
{ \
- return static_key_false(&__tracepoint_##name.key); \
+ return tracepoint_probe_register(&__tracepoint_##name, \
+ (void *)probe, data); \
}
/*
@@ -439,6 +467,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
PARAMS(void *__data, proto))
+#define DECLARE_TRACE_SYSCALL(name, proto, args) \
+ __DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args), \
+ cpu_online(raw_smp_processor_id()), \
+ PARAMS(void *__data, proto))
+
#define TRACE_EVENT_FLAGS(event, flag)
#define TRACE_EVENT_PERF_PERM(event, expr...)
@@ -576,6 +609,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
struct, assign, print) \
DECLARE_TRACE_CONDITION(name, PARAMS(proto), \
PARAMS(args), PARAMS(cond))
+#define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, \
+ print, reg, unreg) \
+ DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
#define TRACE_EVENT_FLAGS(event, flag)
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index a2ea11cc912e..c85bbce5aaa5 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -53,6 +53,9 @@ __bpf_trace_##call(void *__data, proto) \
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
/*
* This part is compiled out, it is only here as a build time check
* to make sure that if the tracepoint handling changes, the
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 00723935dcc7..ff5fa17a6259 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -46,6 +46,10 @@
assign, print, reg, unreg) \
DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args))
+#undef TRACE_EVENT_SYSCALL
+#define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \
+ DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args))
+
#undef TRACE_EVENT_NOP
#define TRACE_EVENT_NOP(name, proto, args, struct, assign, print)
@@ -107,6 +111,7 @@
#undef TRACE_EVENT
#undef TRACE_EVENT_FN
#undef TRACE_EVENT_FN_COND
+#undef TRACE_EVENT_SYSCALL
#undef TRACE_EVENT_CONDITION
#undef TRACE_EVENT_NOP
#undef DEFINE_EVENT_NOP
diff --git a/include/trace/events/syscalls.h b/include/trace/events/syscalls.h
index b6e0cbc2c71f..f31ff446b468 100644
--- a/include/trace/events/syscalls.h
+++ b/include/trace/events/syscalls.h
@@ -15,7 +15,7 @@
#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
-TRACE_EVENT_FN(sys_enter,
+TRACE_EVENT_SYSCALL(sys_enter,
TP_PROTO(struct pt_regs *regs, long id),
@@ -41,7 +41,7 @@ TRACE_EVENT_FN(sys_enter,
TRACE_EVENT_FLAGS(sys_enter, TRACE_EVENT_FL_CAP_ANY)
-TRACE_EVENT_FN(sys_exit,
+TRACE_EVENT_SYSCALL(sys_exit,
TP_PROTO(struct pt_regs *regs, long ret),
diff --git a/include/trace/perf.h b/include/trace/perf.h
index 2c11181c82e0..ded997af481e 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -55,6 +55,9 @@ perf_trace_##call(void *__data, proto) \
head, __task); \
}
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
/*
* This part is compiled out, it is only here as a build time check
* to make sure that if the tracepoint handling changes, the
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index c2f9cabf154d..8bcbb9ee44de 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -45,6 +45,16 @@
PARAMS(print)); \
DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));
+#undef TRACE_EVENT_SYSCALL
+#define TRACE_EVENT_SYSCALL(name, proto, args, tstruct, assign, print, reg, unreg) \
+ DECLARE_EVENT_SYSCALL_CLASS(name, \
+ PARAMS(proto), \
+ PARAMS(args), \
+ PARAMS(tstruct), \
+ PARAMS(assign), \
+ PARAMS(print)); \
+ DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args));
+
#include "stages/stage1_struct_define.h"
#undef DECLARE_EVENT_CLASS
@@ -57,6 +67,9 @@
\
static struct trace_event_class event_class_##name;
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, name, proto, args) \
static struct trace_event_call __used \
@@ -117,6 +130,9 @@
tstruct; \
};
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, name, proto, args)
@@ -208,6 +224,9 @@ static struct trace_event_functions trace_event_type_funcs_##call = { \
.trace = trace_raw_output_##call, \
};
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT_PRINT
#define DEFINE_EVENT_PRINT(template, call, proto, args, print) \
static notrace enum print_line_t \
@@ -265,6 +284,9 @@ static inline notrace int trace_event_get_offsets_##call( \
return __data_size; \
}
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
/*
@@ -409,6 +431,9 @@ trace_event_raw_event_##call(void *__data, proto) \
* fail to compile unless it too is updated.
*/
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, call, proto, args) \
static inline void ftrace_test_probe_##call(void) \
@@ -434,6 +459,9 @@ static struct trace_event_class __used __refdata event_class_##call = { \
_TRACE_PERF_INIT(call) \
};
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, call, proto, args) \
\
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 90843cc38588..d08472421d0e 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -58,7 +58,7 @@ long syscall_trace_enter(struct pt_regs *regs, long syscall,
syscall = syscall_get_nr(current, regs);
if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
- trace_sys_enter(regs, syscall);
+ trace_syscall_sys_enter(regs, syscall);
/*
* Probes or BPF hooks in the tracepoint may have changed the
* system call number as well.
@@ -166,7 +166,7 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
audit_syscall_exit(regs);
if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
- trace_sys_exit(regs, syscall_get_return_value(current, regs));
+ trace_syscall_sys_exit(regs, syscall_get_return_value(current, regs));
step = report_single_step(work);
if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 9c581d6da843..067f8e2b930f 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -377,7 +377,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
if (!tr->sys_refcount_enter)
- ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
+ ret = register_trace_syscall_sys_enter(ftrace_syscall_enter, tr);
if (!ret) {
rcu_assign_pointer(tr->enter_syscall_files[num], file);
tr->sys_refcount_enter++;
@@ -415,7 +415,7 @@ static int reg_event_syscall_exit(struct trace_event_file *file,
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
if (!tr->sys_refcount_exit)
- ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
+ ret = register_trace_syscall_sys_exit(ftrace_syscall_exit, tr);
if (!ret) {
rcu_assign_pointer(tr->exit_syscall_files[num], file);
tr->sys_refcount_exit++;
@@ -631,7 +631,7 @@ static int perf_sysenter_enable(struct trace_event_call *call)
mutex_lock(&syscall_trace_lock);
if (!sys_perf_refcount_enter)
- ret = register_trace_sys_enter(perf_syscall_enter, NULL);
+ ret = register_trace_syscall_sys_enter(perf_syscall_enter, NULL);
if (ret) {
pr_info("event trace: Could not activate syscall entry trace point");
} else {
@@ -728,7 +728,7 @@ static int perf_sysexit_enable(struct trace_event_call *call)
mutex_lock(&syscall_trace_lock);
if (!sys_perf_refcount_exit)
- ret = register_trace_sys_exit(perf_syscall_exit, NULL);
+ ret = register_trace_syscall_sys_exit(perf_syscall_exit, NULL);
if (ret) {
pr_info("event trace: Could not activate syscall exit trace point");
} else {
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 2/8] tracing/ftrace: guard syscall probe with preempt_notrace
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 3/8] tracing/perf: " Mathieu Desnoyers
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that ftrace can handle this change by
explicitly disabling preemption within the ftrace system call tracepoint
probes to respect the current expectations within ftrace ring buffer
code.
This change does not yet allow ftrace to take page faults per se within
its probe, but allows its existing probes to adapt to the upcoming
change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/trace_events.h | 38 ++++++++++++++++++++++++++++-------
kernel/trace/trace_syscalls.c | 12 +++++++++++
2 files changed, 43 insertions(+), 7 deletions(-)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 8bcbb9ee44de..0228d9ed94a3 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -263,6 +263,9 @@ static struct trace_event_fields trace_event_fields_##call[] = { \
tstruct \
{} };
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT_PRINT
#define DEFINE_EVENT_PRINT(template, name, proto, args, print)
@@ -396,11 +399,11 @@ static inline notrace int trace_event_get_offsets_##call( \
#include "stages/stage6_event_callback.h"
-#undef DECLARE_EVENT_CLASS
-#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
- \
+
+#undef __DECLARE_EVENT_CLASS
+#define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static notrace void \
-trace_event_raw_event_##call(void *__data, proto) \
+do_trace_event_raw_event_##call(void *__data, proto) \
{ \
struct trace_event_file *trace_file = __data; \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
@@ -425,15 +428,34 @@ trace_event_raw_event_##call(void *__data, proto) \
\
trace_event_buffer_commit(&fbuffer); \
}
+
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+trace_event_raw_event_##call(void *__data, proto) \
+{ \
+ do_trace_event_raw_event_##call(__data, args); \
+}
+
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+trace_event_raw_event_##call(void *__data, proto) \
+{ \
+ guard(preempt_notrace)(); \
+ do_trace_event_raw_event_##call(__data, args); \
+}
+
/*
* The ftrace_test_probe is compiled out, it is only here as a build time check
* to make sure that if the tracepoint handling changes, the ftrace probe will
* fail to compile unless it too is updated.
*/
-#undef DECLARE_EVENT_SYSCALL_CLASS
-#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
-
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, call, proto, args) \
static inline void ftrace_test_probe_##call(void) \
@@ -443,6 +465,8 @@ static inline void ftrace_test_probe_##call(void) \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#undef __DECLARE_EVENT_CLASS
+
#include "stages/stage7_class_define.h"
#undef DECLARE_EVENT_CLASS
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 067f8e2b930f..abf0e0b7cd0b 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -299,6 +299,12 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
int syscall_nr;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
@@ -338,6 +344,12 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
struct trace_event_buffer fbuffer;
int syscall_nr;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 3/8] tracing/perf: guard syscall probe with preempt_notrace
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 2/8] tracing/ftrace: guard syscall probe with preempt_notrace Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 4/8] tracing/bpf: " Mathieu Desnoyers
` (4 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that perf can handle this change by
explicitly disabling preemption within the perf system call tracepoint
probes to respect the current expectations within perf ring buffer code.
This change does not yet allow perf to take page faults per se within
its probe, but allows its existing probes to adapt to the upcoming
change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/perf.h | 41 +++++++++++++++++++++++++++++++----
kernel/trace/trace_syscalls.c | 12 ++++++++++
2 files changed, 49 insertions(+), 4 deletions(-)
diff --git a/include/trace/perf.h b/include/trace/perf.h
index ded997af481e..5650c1bad088 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -12,10 +12,10 @@
#undef __perf_task
#define __perf_task(t) (__task = (t))
-#undef DECLARE_EVENT_CLASS
-#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+#undef __DECLARE_EVENT_CLASS
+#define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static notrace void \
-perf_trace_##call(void *__data, proto) \
+do_perf_trace_##call(void *__data, proto) \
{ \
struct trace_event_call *event_call = __data; \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
@@ -55,8 +55,38 @@ perf_trace_##call(void *__data, proto) \
head, __task); \
}
+/*
+ * Define unused __count and __task variables to use @args to pass
+ * arguments to do_perf_trace_##call. This is needed because the
+ * macros __perf_count and __perf_task introduce the side-effect to
+ * store copies into those local variables.
+ */
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+perf_trace_##call(void *__data, proto) \
+{ \
+ u64 __count __attribute__((unused)); \
+ struct task_struct *__task __attribute__((unused)); \
+ \
+ do_perf_trace_##call(__data, args); \
+}
+
#undef DECLARE_EVENT_SYSCALL_CLASS
-#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+perf_trace_##call(void *__data, proto) \
+{ \
+ u64 __count __attribute__((unused)); \
+ struct task_struct *__task __attribute__((unused)); \
+ \
+ guard(preempt_notrace)(); \
+ do_perf_trace_##call(__data, args); \
+}
/*
* This part is compiled out, it is only here as a build time check
@@ -76,4 +106,7 @@ static inline void perf_test_probe_##call(void) \
DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+
+#undef __DECLARE_EVENT_CLASS
+
#endif /* CONFIG_PERF_EVENTS */
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index abf0e0b7cd0b..a3d8ac00793e 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -594,6 +594,12 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
int rctx;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
@@ -694,6 +700,12 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
int rctx;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 4/8] tracing/bpf: guard syscall probe with preempt_notrace
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (2 preceding siblings ...)
2024-09-30 19:23 ` [PATCH resend 3/8] tracing/perf: " Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 5/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Andrii Nakryiko, Michael Jeanson
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that bpf can handle this change by
explicitly disabling preemption within the bpf system call tracepoint
probes to respect the current expectations within bpf tracing code.
This change does not yet allow bpf to take page faults per se within its
probe, but allows its existing probes to adapt to the upcoming change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # BPF parts
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/bpf_probe.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index c85bbce5aaa5..211b98d45fc6 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -53,8 +53,17 @@ __bpf_trace_##call(void *__data, proto) \
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
+#define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args) \
+static notrace void \
+__bpf_trace_##call(void *__data, proto) \
+{ \
+ guard(preempt_notrace)(); \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \
+}
+
#undef DECLARE_EVENT_SYSCALL_CLASS
-#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
+ __BPF_DECLARE_TRACE_SYSCALL(call, PARAMS(proto), PARAMS(args))
/*
* This part is compiled out, it is only here as a build time check
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 5/8] tracing: Allow system call tracepoints to handle page faults
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (3 preceding siblings ...)
2024-09-30 19:23 ` [PATCH resend 4/8] tracing/bpf: " Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Use Tasks Trace RCU to protect iteration of system call enter/exit
tracepoint probes to allow those probes to handle page faults.
In preparation for this change, all tracers registering to system call
enter/exit tracepoints should expect those to be called with preemption
enabled.
This allows tracers to fault-in userspace system call arguments such as
path strings within their probe callbacks.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/linux/tracepoint.h | 25 +++++++++++++++++--------
init/Kconfig | 1 +
2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 2e4b4952bba2..106e951896c2 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -17,6 +17,7 @@
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/rcupdate.h>
+#include <linux/rcupdate_trace.h>
#include <linux/tracepoint-defs.h>
#include <linux/static_call.h>
@@ -89,6 +90,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
#ifdef CONFIG_TRACEPOINTS
static inline void tracepoint_synchronize_unregister(void)
{
+ synchronize_rcu_tasks_trace();
synchronize_srcu(&tracepoint_srcu);
synchronize_rcu();
}
@@ -191,7 +193,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
* it_func[0] is never NULL because there is at least one element in the array
* when the array itself is non NULL.
*/
-#define __DO_TRACE(name, args, cond, rcuidle) \
+#define __DO_TRACE(name, args, cond, rcuidle, syscall) \
do { \
int __maybe_unused __idx = 0; \
\
@@ -202,8 +204,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
"Bad RCU usage for tracepoint")) \
return; \
\
- /* keep srcu and sched-rcu usage consistent */ \
- preempt_disable_notrace(); \
+ if (syscall) { \
+ rcu_read_lock_trace(); \
+ } else { \
+ /* keep srcu and sched-rcu usage consistent */ \
+ preempt_disable_notrace(); \
+ } \
\
/* \
* For rcuidle callers, use srcu since sched-rcu \
@@ -221,7 +227,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
} \
\
- preempt_enable_notrace(); \
+ if (syscall) \
+ rcu_read_unlock_trace(); \
+ else \
+ preempt_enable_notrace(); \
} while (0)
#ifndef MODULE
@@ -231,7 +240,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(name, \
TP_ARGS(args), \
- TP_CONDITION(cond), 1); \
+ TP_CONDITION(cond), 1, 0); \
}
#else
#define __DECLARE_TRACE_RCU(name, proto, args, cond)
@@ -275,7 +284,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(name, \
TP_ARGS(args), \
- TP_CONDITION(cond), 0); \
+ TP_CONDITION(cond), 0, 0); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
@@ -286,7 +295,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(name, \
TP_ARGS(args), \
- TP_CONDITION(cond), 1); \
+ TP_CONDITION(cond), 1, 0); \
} \
static inline int \
register_trace_##name(void (*probe)(data_proto), void *data) \
@@ -309,7 +318,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(name, \
TP_ARGS(args), \
- TP_CONDITION(cond), 0); \
+ TP_CONDITION(cond), 0, 1); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
diff --git a/init/Kconfig b/init/Kconfig
index 5783a0b87517..72e13ee73c43 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1936,6 +1936,7 @@ config BINDGEN_VERSION_TEXT
#
config TRACEPOINTS
bool
+ select TASKS_TRACE_RCU
source "kernel/Kconfig.kexec"
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (4 preceding siblings ...)
2024-09-30 19:23 ` [PATCH resend 5/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-10-28 17:42 ` Thomas Gleixner
2024-09-30 19:23 ` [PATCH resend 7/8] tracing/perf: " Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 8/8] tracing/bpf: " Mathieu Desnoyers
7 siblings, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Add a might_fault() check to validate that the ftrace sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/trace_events.h | 1 +
kernel/trace/trace_syscalls.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 0228d9ed94a3..e0d4850b0d77 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -446,6 +446,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
static notrace void \
trace_event_raw_event_##call(void *__data, proto) \
{ \
+ might_fault(); \
guard(preempt_notrace)(); \
do_trace_event_raw_event_##call(__data, args); \
}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index a3d8ac00793e..0430890cbb42 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
@@ -348,6 +349,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 7/8] tracing/perf: Add might_fault check to syscall probes
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (5 preceding siblings ...)
2024-09-30 19:23 ` [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 8/8] tracing/bpf: " Mathieu Desnoyers
7 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Add a might_fault() check to validate that the perf sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/perf.h | 1 +
kernel/trace/trace_syscalls.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/trace/perf.h b/include/trace/perf.h
index 5650c1bad088..321bfd7919f6 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -84,6 +84,7 @@ perf_trace_##call(void *__data, proto) \
u64 __count __attribute__((unused)); \
struct task_struct *__task __attribute__((unused)); \
\
+ might_fault(); \
guard(preempt_notrace)(); \
do_perf_trace_##call(__data, args); \
}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 0430890cbb42..53faa791c735 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -600,6 +600,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
@@ -706,6 +707,7 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH resend 8/8] tracing/bpf: Add might_fault check to syscall probes
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (6 preceding siblings ...)
2024-09-30 19:23 ` [PATCH resend 7/8] tracing/perf: " Mathieu Desnoyers
@ 2024-09-30 19:23 ` Mathieu Desnoyers
7 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-09-30 19:23 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Andrii Nakryiko, Michael Jeanson
Add a might_fault() check to validate that the bpf sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # BPF parts
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/bpf_probe.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 211b98d45fc6..099df5c3e38a 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -57,6 +57,7 @@ __bpf_trace_##call(void *__data, proto) \
static notrace void \
__bpf_trace_##call(void *__data, proto) \
{ \
+ might_fault(); \
guard(preempt_notrace)(); \
CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \
}
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
2024-09-30 19:23 ` [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
@ 2024-10-03 9:51 ` kernel test robot
2024-10-03 9:51 ` kernel test robot
1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2024-10-03 9:51 UTC (permalink / raw)
To: Mathieu Desnoyers, Steven Rostedt, Masami Hiramatsu
Cc: oe-kbuild-all, linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Hi Mathieu,
kernel test robot noticed the following build errors:
[auto build test ERROR on peterz-queue/sched/core]
[also build test ERROR on linus/master v6.12-rc1 next-20241003]
[cannot apply to rostedt-trace/for-next rostedt-trace/for-next-urgent tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/tracing-Declare-system-call-tracepoints-with-TRACE_EVENT_SYSCALL/20241001-032827
base: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core
patch link: https://lore.kernel.org/r/20240930192357.1154417-2-mathieu.desnoyers%40efficios.com
patch subject: [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
config: riscv-allnoconfig (https://download.01.org/0day-ci/archive/20241003/202410031716.sTBC2OLt-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241003/202410031716.sTBC2OLt-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410031716.sTBC2OLt-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/trace/syscall.h:5,
from include/linux/syscalls.h:93,
from include/linux/entry-common.h:7,
from kernel/entry/common.c:4:
include/trace/events/syscalls.h:20:18: error: expected ')' before 'struct'
20 | TP_PROTO(struct pt_regs *regs, long id),
| ^~~~~~
include/linux/tracepoint.h:106:25: note: in definition of macro 'PARAMS'
106 | #define PARAMS(args...) args
| ^~~~
include/linux/tracepoint.h:614:9: note: in expansion of macro 'DECLARE_TRACE_SYSCALL'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/tracepoint.h:614:37: note: in expansion of macro 'PARAMS'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/events/syscalls.h:18:1: note: in expansion of macro 'TRACE_EVENT_SYSCALL'
18 | TRACE_EVENT_SYSCALL(sys_enter,
| ^~~~~~~~~~~~~~~~~~~
include/trace/events/syscalls.h:20:9: note: in expansion of macro 'TP_PROTO'
20 | TP_PROTO(struct pt_regs *regs, long id),
| ^~~~~~~~
include/trace/events/syscalls.h:46:18: error: expected ')' before 'struct'
46 | TP_PROTO(struct pt_regs *regs, long ret),
| ^~~~~~
include/linux/tracepoint.h:106:25: note: in definition of macro 'PARAMS'
106 | #define PARAMS(args...) args
| ^~~~
include/linux/tracepoint.h:614:9: note: in expansion of macro 'DECLARE_TRACE_SYSCALL'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/tracepoint.h:614:37: note: in expansion of macro 'PARAMS'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/events/syscalls.h:44:1: note: in expansion of macro 'TRACE_EVENT_SYSCALL'
44 | TRACE_EVENT_SYSCALL(sys_exit,
| ^~~~~~~~~~~~~~~~~~~
include/trace/events/syscalls.h:46:9: note: in expansion of macro 'TP_PROTO'
46 | TP_PROTO(struct pt_regs *regs, long ret),
| ^~~~~~~~
kernel/entry/common.c: In function 'syscall_trace_enter':
>> kernel/entry/common.c:61:17: error: implicit declaration of function 'trace_syscall_sys_enter' [-Wimplicit-function-declaration]
61 | trace_syscall_sys_enter(regs, syscall);
| ^~~~~~~~~~~~~~~~~~~~~~~
kernel/entry/common.c: In function 'syscall_exit_work':
>> kernel/entry/common.c:169:17: error: implicit declaration of function 'trace_syscall_sys_exit' [-Wimplicit-function-declaration]
169 | trace_syscall_sys_exit(regs, syscall_get_return_value(current, regs));
| ^~~~~~~~~~~~~~~~~~~~~~
vim +/trace_syscall_sys_enter +61 kernel/entry/common.c
27
28 long syscall_trace_enter(struct pt_regs *regs, long syscall,
29 unsigned long work)
30 {
31 long ret = 0;
32
33 /*
34 * Handle Syscall User Dispatch. This must comes first, since
35 * the ABI here can be something that doesn't make sense for
36 * other syscall_work features.
37 */
38 if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
39 if (syscall_user_dispatch(regs))
40 return -1L;
41 }
42
43 /* Handle ptrace */
44 if (work & (SYSCALL_WORK_SYSCALL_TRACE | SYSCALL_WORK_SYSCALL_EMU)) {
45 ret = ptrace_report_syscall_entry(regs);
46 if (ret || (work & SYSCALL_WORK_SYSCALL_EMU))
47 return -1L;
48 }
49
50 /* Do seccomp after ptrace, to catch any tracer changes. */
51 if (work & SYSCALL_WORK_SECCOMP) {
52 ret = __secure_computing(NULL);
53 if (ret == -1L)
54 return ret;
55 }
56
57 /* Either of the above might have changed the syscall number */
58 syscall = syscall_get_nr(current, regs);
59
60 if (unlikely(work & SYSCALL_WORK_SYSCALL_TRACEPOINT)) {
> 61 trace_syscall_sys_enter(regs, syscall);
62 /*
63 * Probes or BPF hooks in the tracepoint may have changed the
64 * system call number as well.
65 */
66 syscall = syscall_get_nr(current, regs);
67 }
68
69 syscall_enter_audit(regs, syscall);
70
71 return ret ? : syscall;
72 }
73
74 noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
75 {
76 enter_from_user_mode(regs);
77 instrumentation_begin();
78 local_irq_enable();
79 instrumentation_end();
80 }
81
82 /* Workaround to allow gradual conversion of architecture code */
83 void __weak arch_do_signal_or_restart(struct pt_regs *regs) { }
84
85 /**
86 * exit_to_user_mode_loop - do any pending work before leaving to user space
87 * @regs: Pointer to pt_regs on entry stack
88 * @ti_work: TIF work flags as read by the caller
89 */
90 __always_inline unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
91 unsigned long ti_work)
92 {
93 /*
94 * Before returning to user space ensure that all pending work
95 * items have been completed.
96 */
97 while (ti_work & EXIT_TO_USER_MODE_WORK) {
98
99 local_irq_enable_exit_to_user(ti_work);
100
101 if (ti_work & _TIF_NEED_RESCHED)
102 schedule();
103
104 if (ti_work & _TIF_UPROBE)
105 uprobe_notify_resume(regs);
106
107 if (ti_work & _TIF_PATCH_PENDING)
108 klp_update_patch_state(current);
109
110 if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
111 arch_do_signal_or_restart(regs);
112
113 if (ti_work & _TIF_NOTIFY_RESUME)
114 resume_user_mode_work(regs);
115
116 /* Architecture specific TIF work */
117 arch_exit_to_user_mode_work(regs, ti_work);
118
119 /*
120 * Disable interrupts and reevaluate the work flags as they
121 * might have changed while interrupts and preemption was
122 * enabled above.
123 */
124 local_irq_disable_exit_to_user();
125
126 /* Check if any of the above work has queued a deferred wakeup */
127 tick_nohz_user_enter_prepare();
128
129 ti_work = read_thread_flags();
130 }
131
132 /* Return the latest work state for arch_exit_to_user_mode() */
133 return ti_work;
134 }
135
136 /*
137 * If SYSCALL_EMU is set, then the only reason to report is when
138 * SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP). This syscall
139 * instruction has been already reported in syscall_enter_from_user_mode().
140 */
141 static inline bool report_single_step(unsigned long work)
142 {
143 if (work & SYSCALL_WORK_SYSCALL_EMU)
144 return false;
145
146 return work & SYSCALL_WORK_SYSCALL_EXIT_TRAP;
147 }
148
149 static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
150 {
151 bool step;
152
153 /*
154 * If the syscall was rolled back due to syscall user dispatching,
155 * then the tracers below are not invoked for the same reason as
156 * the entry side was not invoked in syscall_trace_enter(): The ABI
157 * of these syscalls is unknown.
158 */
159 if (work & SYSCALL_WORK_SYSCALL_USER_DISPATCH) {
160 if (unlikely(current->syscall_dispatch.on_dispatch)) {
161 current->syscall_dispatch.on_dispatch = false;
162 return;
163 }
164 }
165
166 audit_syscall_exit(regs);
167
168 if (work & SYSCALL_WORK_SYSCALL_TRACEPOINT)
> 169 trace_syscall_sys_exit(regs, syscall_get_return_value(current, regs));
170
171 step = report_single_step(work);
172 if (step || work & SYSCALL_WORK_SYSCALL_TRACE)
173 ptrace_report_syscall_exit(regs, step);
174 }
175
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
2024-09-30 19:23 ` [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
2024-10-03 9:51 ` kernel test robot
@ 2024-10-03 9:51 ` kernel test robot
1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2024-10-03 9:51 UTC (permalink / raw)
To: Mathieu Desnoyers, Steven Rostedt, Masami Hiramatsu
Cc: oe-kbuild-all, linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Hi Mathieu,
kernel test robot noticed the following build errors:
[auto build test ERROR on peterz-queue/sched/core]
[also build test ERROR on linus/master v6.12-rc1 next-20241003]
[cannot apply to rostedt-trace/for-next rostedt-trace/for-next-urgent tip/core/entry]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/tracing-Declare-system-call-tracepoints-with-TRACE_EVENT_SYSCALL/20241001-032827
base: https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core
patch link: https://lore.kernel.org/r/20240930192357.1154417-2-mathieu.desnoyers%40efficios.com
patch subject: [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL
config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20241003/202410031750.cFIt2Rmx-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241003/202410031750.cFIt2Rmx-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410031750.cFIt2Rmx-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from include/trace/syscall.h:5,
from include/linux/syscalls.h:93,
from arch/powerpc/kernel/ptrace/ptrace.c:19:
include/trace/events/syscalls.h:20:18: error: expected ')' before 'struct'
20 | TP_PROTO(struct pt_regs *regs, long id),
| ^~~~~~
include/linux/tracepoint.h:106:25: note: in definition of macro 'PARAMS'
106 | #define PARAMS(args...) args
| ^~~~
include/linux/tracepoint.h:614:9: note: in expansion of macro 'DECLARE_TRACE_SYSCALL'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/tracepoint.h:614:37: note: in expansion of macro 'PARAMS'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/events/syscalls.h:18:1: note: in expansion of macro 'TRACE_EVENT_SYSCALL'
18 | TRACE_EVENT_SYSCALL(sys_enter,
| ^~~~~~~~~~~~~~~~~~~
include/trace/events/syscalls.h:20:9: note: in expansion of macro 'TP_PROTO'
20 | TP_PROTO(struct pt_regs *regs, long id),
| ^~~~~~~~
include/trace/events/syscalls.h:46:18: error: expected ')' before 'struct'
46 | TP_PROTO(struct pt_regs *regs, long ret),
| ^~~~~~
include/linux/tracepoint.h:106:25: note: in definition of macro 'PARAMS'
106 | #define PARAMS(args...) args
| ^~~~
include/linux/tracepoint.h:614:9: note: in expansion of macro 'DECLARE_TRACE_SYSCALL'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/tracepoint.h:614:37: note: in expansion of macro 'PARAMS'
614 | DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args))
| ^~~~~~
include/trace/events/syscalls.h:44:1: note: in expansion of macro 'TRACE_EVENT_SYSCALL'
44 | TRACE_EVENT_SYSCALL(sys_exit,
| ^~~~~~~~~~~~~~~~~~~
include/trace/events/syscalls.h:46:9: note: in expansion of macro 'TP_PROTO'
46 | TP_PROTO(struct pt_regs *regs, long ret),
| ^~~~~~~~
arch/powerpc/kernel/ptrace/ptrace.c: In function 'do_syscall_trace_enter':
>> arch/powerpc/kernel/ptrace/ptrace.c:298:17: error: implicit declaration of function 'trace_sys_enter'; did you mean 'ftrace_nmi_enter'? [-Wimplicit-function-declaration]
298 | trace_sys_enter(regs, regs->gpr[0]);
| ^~~~~~~~~~~~~~~
| ftrace_nmi_enter
arch/powerpc/kernel/ptrace/ptrace.c: In function 'do_syscall_trace_leave':
>> arch/powerpc/kernel/ptrace/ptrace.c:329:17: error: implicit declaration of function 'trace_sys_exit'; did you mean 'ftrace_nmi_exit'? [-Wimplicit-function-declaration]
329 | trace_sys_exit(regs, regs->result);
| ^~~~~~~~~~~~~~
| ftrace_nmi_exit
vim +298 arch/powerpc/kernel/ptrace/ptrace.c
2449acc5348b94 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 235
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 236 /**
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 237 * do_syscall_trace_enter() - Do syscall tracing on kernel entry.
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 238 * @regs: the pt_regs of the task to trace (current)
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 239 *
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 240 * Performs various types of tracing on syscall entry. This includes seccomp,
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 241 * ptrace, syscall tracepoints and audit.
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 242 *
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 243 * The pt_regs are potentially visible to userspace via ptrace, so their
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 244 * contents is ABI.
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 245 *
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 246 * One or more of the tracers may modify the contents of pt_regs, in particular
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 247 * to modify arguments or even the syscall number itself.
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 248 *
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 249 * It's also possible that a tracer can choose to reject the system call. In
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 250 * that case this function will return an illegal syscall number, and will put
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 251 * an appropriate return value in regs->r3.
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 252 *
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 253 * Return: the (possibly changed) syscall number.
^1da177e4c3f41 arch/ppc/kernel/ptrace.c Linus Torvalds 2005-04-16 254 */
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 255 long do_syscall_trace_enter(struct pt_regs *regs)
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 256 {
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 257 u32 flags;
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 258
985faa78687de6 arch/powerpc/kernel/ptrace/ptrace.c Mark Rutland 2021-11-29 259 flags = read_thread_flags() & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE);
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 260
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 261 if (flags) {
153474ba1a4aed arch/powerpc/kernel/ptrace/ptrace.c Eric W. Biederman 2022-01-27 262 int rc = ptrace_report_syscall_entry(regs);
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 263
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 264 if (unlikely(flags & _TIF_SYSCALL_EMU)) {
5521eb4bca2db7 arch/powerpc/kernel/ptrace.c Breno Leitao 2018-09-20 265 /*
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 266 * A nonzero return code from
153474ba1a4aed arch/powerpc/kernel/ptrace/ptrace.c Eric W. Biederman 2022-01-27 267 * ptrace_report_syscall_entry() tells us to prevent
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 268 * the syscall execution, but we are not going to
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 269 * execute it anyway.
a225f156740555 arch/powerpc/kernel/ptrace.c Elvira Khabirova 2018-12-07 270 *
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 271 * Returning -1 will skip the syscall execution. We want
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 272 * to avoid clobbering any registers, so we don't goto
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 273 * the skip label below.
5521eb4bca2db7 arch/powerpc/kernel/ptrace.c Breno Leitao 2018-09-20 274 */
5521eb4bca2db7 arch/powerpc/kernel/ptrace.c Breno Leitao 2018-09-20 275 return -1;
5521eb4bca2db7 arch/powerpc/kernel/ptrace.c Breno Leitao 2018-09-20 276 }
5521eb4bca2db7 arch/powerpc/kernel/ptrace.c Breno Leitao 2018-09-20 277
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 278 if (rc) {
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 279 /*
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 280 * The tracer decided to abort the syscall. Note that
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 281 * the tracer may also just change regs->gpr[0] to an
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 282 * invalid syscall number, that is handled below on the
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 283 * exit path.
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 284 */
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 285 goto skip;
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 286 }
8dbdec0bcb416d arch/powerpc/kernel/ptrace.c Dmitry V. Levin 2018-12-16 287 }
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 288
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 289 /* Run seccomp after ptrace; allow it to set gpr[3]. */
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 290 if (do_seccomp(regs))
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 291 return -1;
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 292
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 293 /* Avoid trace and audit when syscall is invalid. */
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 294 if (regs->gpr[0] >= NR_syscalls)
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 295 goto skip;
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 296
02424d8966d803 arch/powerpc/kernel/ptrace.c Ian Munsie 2011-02-02 297 if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
02424d8966d803 arch/powerpc/kernel/ptrace.c Ian Munsie 2011-02-02 @298 trace_sys_enter(regs, regs->gpr[0]);
02424d8966d803 arch/powerpc/kernel/ptrace.c Ian Munsie 2011-02-02 299
cab175f9fa2973 arch/powerpc/kernel/ptrace.c Denis Kirjanov 2010-08-27 300 if (!is_32bit_task())
91397401bb5072 arch/powerpc/kernel/ptrace.c Eric Paris 2014-03-11 301 audit_syscall_entry(regs->gpr[0], regs->gpr[3], regs->gpr[4],
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 302 regs->gpr[5], regs->gpr[6]);
cfcd1705b61ecc arch/powerpc/kernel/ptrace.c David Woodhouse 2007-01-14 303 else
91397401bb5072 arch/powerpc/kernel/ptrace.c Eric Paris 2014-03-11 304 audit_syscall_entry(regs->gpr[0],
cfcd1705b61ecc arch/powerpc/kernel/ptrace.c David Woodhouse 2007-01-14 305 regs->gpr[3] & 0xffffffff,
cfcd1705b61ecc arch/powerpc/kernel/ptrace.c David Woodhouse 2007-01-14 306 regs->gpr[4] & 0xffffffff,
cfcd1705b61ecc arch/powerpc/kernel/ptrace.c David Woodhouse 2007-01-14 307 regs->gpr[5] & 0xffffffff,
cfcd1705b61ecc arch/powerpc/kernel/ptrace.c David Woodhouse 2007-01-14 308 regs->gpr[6] & 0xffffffff);
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 309
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 310 /* Return the possibly modified but valid syscall number */
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 311 return regs->gpr[0];
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 312
1addc57e111b92 arch/powerpc/kernel/ptrace.c Kees Cook 2016-06-02 313 skip:
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 314 /*
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 315 * If we are aborting explicitly, or if the syscall number is
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 316 * now invalid, set the return value to -ENOSYS.
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 317 */
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 318 regs->gpr[3] = -ENOSYS;
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 319 return -1;
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 320 }
d38374142b2560 arch/powerpc/kernel/ptrace.c Michael Ellerman 2015-07-23 321
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 322 void do_syscall_trace_leave(struct pt_regs *regs)
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 323 {
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 324 int step;
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 325
d7e7528bcd456f arch/powerpc/kernel/ptrace.c Eric Paris 2012-01-03 326 audit_syscall_exit(regs);
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 327
02424d8966d803 arch/powerpc/kernel/ptrace.c Ian Munsie 2011-02-02 328 if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
02424d8966d803 arch/powerpc/kernel/ptrace.c Ian Munsie 2011-02-02 @329 trace_sys_exit(regs, regs->result);
02424d8966d803 arch/powerpc/kernel/ptrace.c Ian Munsie 2011-02-02 330
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 331 step = test_thread_flag(TIF_SINGLESTEP);
4f72c4279eab1e arch/powerpc/kernel/ptrace.c Roland McGrath 2008-07-27 332 if (step || test_thread_flag(TIF_SYSCALL_TRACE))
153474ba1a4aed arch/powerpc/kernel/ptrace/ptrace.c Eric W. Biederman 2022-01-27 333 ptrace_report_syscall_exit(regs, step);
ea9c102cb0a796 arch/ppc/kernel/ptrace.c David Woodhouse 2005-05-08 334 }
002af9391bfbe8 arch/powerpc/kernel/ptrace.c Michael Ellerman 2018-10-12 335
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes
2024-09-30 19:23 ` [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
@ 2024-10-28 17:42 ` Thomas Gleixner
2024-10-28 19:02 ` Mathieu Desnoyers
0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2024-10-28 17:42 UTC (permalink / raw)
To: Mathieu Desnoyers, Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
On Mon, Sep 30 2024 at 15:23, Mathieu Desnoyers wrote:
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index a3d8ac00793e..0430890cbb42 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> * Syscall probe called with preemption enabled, but the ring
> * buffer and per-cpu data require preemption to be disabled.
> */
> + might_fault();
> guard(preempt_notrace)();
I find it odd that the might_fault() check is in all the implementations
and not in the tracepoint itself:
if (syscall) {
might_fault();
rcu_read_unlock_trace();
} else ...
That's where I would have expected it to be.
Thanks,
tglx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes
2024-10-28 17:42 ` Thomas Gleixner
@ 2024-10-28 19:02 ` Mathieu Desnoyers
0 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2024-10-28 19:02 UTC (permalink / raw)
To: Thomas Gleixner, Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Peter Zijlstra, Alexei Starovoitov, Yonghong Song,
Paul E . McKenney, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andrii Nakryiko,
bpf, Joel Fernandes, linux-trace-kernel, Michael Jeanson
On 2024-10-28 13:42, Thomas Gleixner wrote:
> On Mon, Sep 30 2024 at 15:23, Mathieu Desnoyers wrote:
>> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
>> index a3d8ac00793e..0430890cbb42 100644
>> --- a/kernel/trace/trace_syscalls.c
>> +++ b/kernel/trace/trace_syscalls.c
>> @@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
>> * Syscall probe called with preemption enabled, but the ring
>> * buffer and per-cpu data require preemption to be disabled.
>> */
>> + might_fault();
>> guard(preempt_notrace)();
>
> I find it odd that the might_fault() check is in all the implementations
> and not in the tracepoint itself:
>
> if (syscall) {
> might_fault();
> rcu_read_unlock_trace();
> } else ...
>
> That's where I would have expected it to be.
You raise a good point: we should also add a might_fault() check in
__DO_TRACE() in the syscall case, so we can catch incorrect use of the
syscall tracepoint even if no probes are registered to it.
I've added the might_fault() in each tracer syscall probe to make sure
a tracer don't end up registering a faultable probe on a tracepoint
protected with preempt_disable by mistake. It validates that the tracers
are using the tracepoint registration as expected.
I'll prepare separate a patch adding this and will add it to this
series.
Thanks,
Mathieu
>
> Thanks,
>
> tglx
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-28 19:04 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-30 19:23 [PATCH resend 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
2024-10-03 9:51 ` kernel test robot
2024-10-03 9:51 ` kernel test robot
2024-09-30 19:23 ` [PATCH resend 2/8] tracing/ftrace: guard syscall probe with preempt_notrace Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 3/8] tracing/perf: " Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 4/8] tracing/bpf: " Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 5/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 6/8] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
2024-10-28 17:42 ` Thomas Gleixner
2024-10-28 19:02 ` Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 7/8] tracing/perf: " Mathieu Desnoyers
2024-09-30 19:23 ` [PATCH resend 8/8] tracing/bpf: " Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).