* [PATCH v2 1/7] tracing/ftrace: guard syscall probe with preempt_notrace
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
@ 2024-10-04 1:11 ` Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 2/7] tracing/perf: " Mathieu Desnoyers
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:11 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that ftrace can handle this change by
explicitly disabling preemption within the ftrace system call tracepoint
probes to respect the current expectations within ftrace ring buffer
code.
This change does not yet allow ftrace to take page faults per se within
its probe, but allows its existing probes to adapt to the upcoming
change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/trace_events.h | 38 ++++++++++++++++++++++++++++-------
kernel/trace/trace_syscalls.c | 12 +++++++++++
2 files changed, 43 insertions(+), 7 deletions(-)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 8bcbb9ee44de..0228d9ed94a3 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -263,6 +263,9 @@ static struct trace_event_fields trace_event_fields_##call[] = { \
tstruct \
{} };
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+
#undef DEFINE_EVENT_PRINT
#define DEFINE_EVENT_PRINT(template, name, proto, args, print)
@@ -396,11 +399,11 @@ static inline notrace int trace_event_get_offsets_##call( \
#include "stages/stage6_event_callback.h"
-#undef DECLARE_EVENT_CLASS
-#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
- \
+
+#undef __DECLARE_EVENT_CLASS
+#define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static notrace void \
-trace_event_raw_event_##call(void *__data, proto) \
+do_trace_event_raw_event_##call(void *__data, proto) \
{ \
struct trace_event_file *trace_file = __data; \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
@@ -425,15 +428,34 @@ trace_event_raw_event_##call(void *__data, proto) \
\
trace_event_buffer_commit(&fbuffer); \
}
+
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+trace_event_raw_event_##call(void *__data, proto) \
+{ \
+ do_trace_event_raw_event_##call(__data, args); \
+}
+
+#undef DECLARE_EVENT_SYSCALL_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+trace_event_raw_event_##call(void *__data, proto) \
+{ \
+ guard(preempt_notrace)(); \
+ do_trace_event_raw_event_##call(__data, args); \
+}
+
/*
* The ftrace_test_probe is compiled out, it is only here as a build time check
* to make sure that if the tracepoint handling changes, the ftrace probe will
* fail to compile unless it too is updated.
*/
-#undef DECLARE_EVENT_SYSCALL_CLASS
-#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
-
#undef DEFINE_EVENT
#define DEFINE_EVENT(template, call, proto, args) \
static inline void ftrace_test_probe_##call(void) \
@@ -443,6 +465,8 @@ static inline void ftrace_test_probe_##call(void) \
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+#undef __DECLARE_EVENT_CLASS
+
#include "stages/stage7_class_define.h"
#undef DECLARE_EVENT_CLASS
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 67ac5366f724..ab4db8c23f36 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -299,6 +299,12 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
int syscall_nr;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
@@ -338,6 +344,12 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
struct trace_event_buffer fbuffer;
int syscall_nr;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 2/7] tracing/perf: guard syscall probe with preempt_notrace
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 1/7] tracing/ftrace: guard syscall probe with preempt_notrace Mathieu Desnoyers
@ 2024-10-04 1:11 ` Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 3/7] tracing/bpf: " Mathieu Desnoyers
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:11 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that perf can handle this change by
explicitly disabling preemption within the perf system call tracepoint
probes to respect the current expectations within perf ring buffer code.
This change does not yet allow perf to take page faults per se within
its probe, but allows its existing probes to adapt to the upcoming
change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/perf.h | 41 +++++++++++++++++++++++++++++++----
kernel/trace/trace_syscalls.c | 12 ++++++++++
2 files changed, 49 insertions(+), 4 deletions(-)
diff --git a/include/trace/perf.h b/include/trace/perf.h
index ded997af481e..5650c1bad088 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -12,10 +12,10 @@
#undef __perf_task
#define __perf_task(t) (__task = (t))
-#undef DECLARE_EVENT_CLASS
-#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+#undef __DECLARE_EVENT_CLASS
+#define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static notrace void \
-perf_trace_##call(void *__data, proto) \
+do_perf_trace_##call(void *__data, proto) \
{ \
struct trace_event_call *event_call = __data; \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
@@ -55,8 +55,38 @@ perf_trace_##call(void *__data, proto) \
head, __task); \
}
+/*
+ * Define unused __count and __task variables to use @args to pass
+ * arguments to do_perf_trace_##call. This is needed because the
+ * macros __perf_count and __perf_task introduce the side-effect to
+ * store copies into those local variables.
+ */
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+perf_trace_##call(void *__data, proto) \
+{ \
+ u64 __count __attribute__((unused)); \
+ struct task_struct *__task __attribute__((unused)); \
+ \
+ do_perf_trace_##call(__data, args); \
+}
+
#undef DECLARE_EVENT_SYSCALL_CLASS
-#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
+__DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
+ PARAMS(assign), PARAMS(print)) \
+static notrace void \
+perf_trace_##call(void *__data, proto) \
+{ \
+ u64 __count __attribute__((unused)); \
+ struct task_struct *__task __attribute__((unused)); \
+ \
+ guard(preempt_notrace)(); \
+ do_perf_trace_##call(__data, args); \
+}
/*
* This part is compiled out, it is only here as a build time check
@@ -76,4 +106,7 @@ static inline void perf_test_probe_##call(void) \
DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
+
+#undef __DECLARE_EVENT_CLASS
+
#endif /* CONFIG_PERF_EVENTS */
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index ab4db8c23f36..edcfa47446c7 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -596,6 +596,12 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
int rctx;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
@@ -698,6 +704,12 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
int rctx;
int size;
+ /*
+ * Syscall probe called with preemption enabled, but the ring
+ * buffer and per-cpu data require preemption to be disabled.
+ */
+ guard(preempt_notrace)();
+
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
return;
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 3/7] tracing/bpf: guard syscall probe with preempt_notrace
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 1/7] tracing/ftrace: guard syscall probe with preempt_notrace Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 2/7] tracing/perf: " Mathieu Desnoyers
@ 2024-10-04 1:11 ` Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 4/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:11 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Andrii Nakryiko, Michael Jeanson
In preparation for allowing system call enter/exit instrumentation to
handle page faults, make sure that bpf can handle this change by
explicitly disabling preemption within the bpf system call tracepoint
probes to respect the current expectations within bpf tracing code.
This change does not yet allow bpf to take page faults per se within its
probe, but allows its existing probes to adapt to the upcoming change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # BPF parts
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/bpf_probe.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index c85bbce5aaa5..211b98d45fc6 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -53,8 +53,17 @@ __bpf_trace_##call(void *__data, proto) \
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
+#define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args) \
+static notrace void \
+__bpf_trace_##call(void *__data, proto) \
+{ \
+ guard(preempt_notrace)(); \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \
+}
+
#undef DECLARE_EVENT_SYSCALL_CLASS
-#define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \
+ __BPF_DECLARE_TRACE_SYSCALL(call, PARAMS(proto), PARAMS(args))
/*
* This part is compiled out, it is only here as a build time check
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 4/7] tracing: Allow system call tracepoints to handle page faults
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (2 preceding siblings ...)
2024-10-04 1:11 ` [PATCH v2 3/7] tracing/bpf: " Mathieu Desnoyers
@ 2024-10-04 1:11 ` Mathieu Desnoyers
2024-10-04 1:11 ` [PATCH v2 5/7] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:11 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Use Tasks Trace RCU to protect iteration of system call enter/exit
tracepoint probes to allow those probes to handle page faults.
In preparation for this change, all tracers registering to system call
enter/exit tracepoints should expect those to be called with preemption
enabled.
This allows tracers to fault-in userspace system call arguments such as
path strings within their probe callbacks.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/linux/tracepoint.h | 18 +++++++++++++-----
init/Kconfig | 1 +
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 1a78c9bbece8..a09a97480f5a 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -17,6 +17,7 @@
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/rcupdate.h>
+#include <linux/rcupdate_trace.h>
#include <linux/tracepoint-defs.h>
#include <linux/static_call.h>
@@ -107,6 +108,7 @@ void for_each_tracepoint_in_module(struct module *mod,
#ifdef CONFIG_TRACEPOINTS
static inline void tracepoint_synchronize_unregister(void)
{
+ synchronize_rcu_tasks_trace();
synchronize_rcu();
}
#else
@@ -197,18 +199,24 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
* it_func[0] is never NULL because there is at least one element in the array
* when the array itself is non NULL.
*/
-#define __DO_TRACE(name, args, cond) \
+#define __DO_TRACE(name, args, cond, syscall) \
do { \
int __maybe_unused __idx = 0; \
\
if (!(cond)) \
return; \
\
- preempt_disable_notrace(); \
+ if (syscall) \
+ rcu_read_lock_trace(); \
+ else \
+ preempt_disable_notrace(); \
\
__DO_TRACE_CALL(name, TP_ARGS(args)); \
\
- preempt_enable_notrace(); \
+ if (syscall) \
+ rcu_read_unlock_trace(); \
+ else \
+ preempt_enable_notrace(); \
} while (0)
/*
@@ -238,7 +246,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(name, \
TP_ARGS(args), \
- TP_CONDITION(cond)); \
+ TP_CONDITION(cond), 0); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
@@ -276,7 +284,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
if (static_key_false(&__tracepoint_##name.key)) \
__DO_TRACE(name, \
TP_ARGS(args), \
- TP_CONDITION(cond), 0); \
+ TP_CONDITION(cond), 1); \
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
WARN_ONCE(!rcu_is_watching(), \
"RCU not watching for tracepoint"); \
diff --git a/init/Kconfig b/init/Kconfig
index fbd0cb06a50a..eedd0064fb36 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1984,6 +1984,7 @@ config BINDGEN_VERSION_TEXT
#
config TRACEPOINTS
bool
+ select TASKS_TRACE_RCU
source "kernel/Kconfig.kexec"
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 5/7] tracing/ftrace: Add might_fault check to syscall probes
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (3 preceding siblings ...)
2024-10-04 1:11 ` [PATCH v2 4/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
@ 2024-10-04 1:11 ` Mathieu Desnoyers
2024-10-04 1:12 ` [PATCH v2 6/7] tracing/perf: " Mathieu Desnoyers
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:11 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Add a might_fault() check to validate that the ftrace sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/trace_events.h | 1 +
kernel/trace/trace_syscalls.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/trace/trace_events.h b/include/trace/trace_events.h
index 0228d9ed94a3..e0d4850b0d77 100644
--- a/include/trace/trace_events.h
+++ b/include/trace/trace_events.h
@@ -446,6 +446,7 @@ __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \
static notrace void \
trace_event_raw_event_##call(void *__data, proto) \
{ \
+ might_fault(); \
guard(preempt_notrace)(); \
do_trace_event_raw_event_##call(__data, args); \
}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index edcfa47446c7..89d7e4c57b5b 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -303,6 +303,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
@@ -348,6 +349,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 6/7] tracing/perf: Add might_fault check to syscall probes
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (4 preceding siblings ...)
2024-10-04 1:11 ` [PATCH v2 5/7] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
@ 2024-10-04 1:12 ` Mathieu Desnoyers
2024-10-04 1:12 ` [PATCH v2 7/7] tracing/bpf: " Mathieu Desnoyers
2024-10-04 14:49 ` [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:12 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Michael Jeanson
Add a might_fault() check to validate that the perf sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/perf.h | 1 +
kernel/trace/trace_syscalls.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/trace/perf.h b/include/trace/perf.h
index 5650c1bad088..321bfd7919f6 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -84,6 +84,7 @@ perf_trace_##call(void *__data, proto) \
u64 __count __attribute__((unused)); \
struct task_struct *__task __attribute__((unused)); \
\
+ might_fault(); \
guard(preempt_notrace)(); \
do_perf_trace_##call(__data, args); \
}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 89d7e4c57b5b..0d42d6f293d6 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -602,6 +602,7 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
@@ -710,6 +711,7 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
* Syscall probe called with preemption enabled, but the ring
* buffer and per-cpu data require preemption to be disabled.
*/
+ might_fault();
guard(preempt_notrace)();
syscall_nr = trace_get_syscall_nr(current, regs);
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 7/7] tracing/bpf: Add might_fault check to syscall probes
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (5 preceding siblings ...)
2024-10-04 1:12 ` [PATCH v2 6/7] tracing/perf: " Mathieu Desnoyers
@ 2024-10-04 1:12 ` Mathieu Desnoyers
2024-10-04 14:49 ` [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 1:12 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Mathieu Desnoyers, Peter Zijlstra,
Alexei Starovoitov, Yonghong Song, Paul E . McKenney, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Namhyung Kim, Andrii Nakryiko, bpf, Joel Fernandes,
linux-trace-kernel, Andrii Nakryiko, Michael Jeanson
Add a might_fault() check to validate that the bpf sys_enter/sys_exit
probe callbacks are indeed called from a context where page faults can
be handled.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Andrii Nakryiko <andrii@kernel.org> # BPF parts
Cc: Michael Jeanson <mjeanson@efficios.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org
Cc: Joel Fernandes <joel@joelfernandes.org>
---
include/trace/bpf_probe.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 211b98d45fc6..099df5c3e38a 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -57,6 +57,7 @@ __bpf_trace_##call(void *__data, proto) \
static notrace void \
__bpf_trace_##call(void *__data, proto) \
{ \
+ might_fault(); \
guard(preempt_notrace)(); \
CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \
}
--
2.39.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults
2024-10-04 1:11 [PATCH v2 0/7] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
` (6 preceding siblings ...)
2024-10-04 1:12 ` [PATCH v2 7/7] tracing/bpf: " Mathieu Desnoyers
@ 2024-10-04 14:49 ` Mathieu Desnoyers
7 siblings, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2024-10-04 14:49 UTC (permalink / raw)
To: Steven Rostedt, Masami Hiramatsu
Cc: linux-kernel, Peter Zijlstra, Alexei Starovoitov, Yonghong Song,
Paul E . McKenney, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andrii Nakryiko,
bpf, Joel Fernandes, linux-trace-kernel
On 2024-10-04 03:11, Mathieu Desnoyers wrote:
> Wire up the system call tracepoints with Tasks Trace RCU to allow
> the ftrace, perf, and eBPF tracers to handle page faults.
>
> This series does the initial wire-up allowing tracers to handle page
> faults, but leaves out the actual handling of said page faults as future
> work.
>
> This series was compile and runtime tested with ftrace and perf syscall
> tracing and raw syscall tracing, adding a WARN_ON_ONCE() in the
> generated code to validate that the intended probes are used for raw
> syscall tracing. The might_fault() added within those probes validate
> that they are called from a context where handling a page fault is OK.
>
> This series replaces the "Faultable Tracepoints v6" series found at [1].
>
> This has been rebased on v6.12-rc1 on top of two patches from Steven:
>
> tracing: Remove definition of trace_*_rcuidle()
> tracepoint: Remove SRCU protection
I'll send an updated series which includes
"tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL"
(missing here), and which rework that patch to remove the mapping from
trace_sys_enter/exit to trace_syscall_sys_enter/exit which requires
modifying architecture code. A lot of churn for little value add.
Thanks,
Mathieu
>
> Thanks,
>
> Mathieu
>
> Link: https://lore.kernel.org/lkml/20240828144153.829582-1-mathieu.desnoyers@efficios.com/ # [1]
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: bpf@vger.kernel.org
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: linux-trace-kernel@vger.kernel.org
>
> Mathieu Desnoyers (7):
> tracing/ftrace: guard syscall probe with preempt_notrace
> tracing/perf: guard syscall probe with preempt_notrace
> tracing/bpf: guard syscall probe with preempt_notrace
> tracing: Allow system call tracepoints to handle page faults
> tracing/ftrace: Add might_fault check to syscall probes
> tracing/perf: Add might_fault check to syscall probes
> tracing/bpf: Add might_fault check to syscall probes
>
> include/linux/tracepoint.h | 18 ++++++++++-----
> include/trace/bpf_probe.h | 12 +++++++++-
> include/trace/perf.h | 42 +++++++++++++++++++++++++++++++----
> include/trace/trace_events.h | 39 ++++++++++++++++++++++++++------
> init/Kconfig | 1 +
> kernel/trace/trace_syscalls.c | 28 +++++++++++++++++++++++
> 6 files changed, 123 insertions(+), 17 deletions(-)
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 9+ messages in thread