From: Steven Rostedt <rostedt@goodmis.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>,
linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Alexei Starovoitov <ast@kernel.org>, Yonghong Song <yhs@fb.com>,
"Paul E . McKenney" <paulmck@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Namhyung Kim <namhyung@kernel.org>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
bpf@vger.kernel.org, Joel Fernandes <joel@joelfernandes.org>,
linux-trace-kernel@vger.kernel.org,
Michael Jeanson <mjeanson@efficios.com>
Subject: Re: [PATCH v1 2/8] tracing/ftrace: guard syscall probe with preempt_notrace
Date: Fri, 4 Oct 2024 09:26:19 -0400 [thread overview]
Message-ID: <20241004092619.0be53f90@gandalf.local.home> (raw)
In-Reply-To: <90ca2fee-cdfb-4d48-ab9e-57d8d2b8b8d8@efficios.com>
On Thu, 3 Oct 2024 21:33:16 -0400
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> On 2024-10-04 03:04, Steven Rostedt wrote:
> > On Thu, 3 Oct 2024 20:26:29 -0400
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >
> >
> >> static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> >> {
> >> struct trace_array *tr = data;
> >> struct trace_event_file *trace_file;
> >> struct syscall_trace_enter *entry;
> >> struct syscall_metadata *sys_data;
> >> struct trace_event_buffer fbuffer;
> >> unsigned long args[6];
> >> int syscall_nr;
> >> int size;
> >>
> >> syscall_nr = trace_get_syscall_nr(current, regs);
> >> if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
> >> return;
> >>
> >> /* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
> >> trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]);
> >>
> >> ^^^^ this function explicitly states that preempt needs to be disabled by
> >> tracepoints.
> >
> > Ah, I should have known it was the syscall portion. I don't care for this
> > hidden dependency. I rather add a preempt disable here and not expect it to
> > be disabled when called.
>
> Which is exactly what this patch is doing.
I was thinking of putting the protection in the function and not the macro.
>
> >
> >>
> >> if (!trace_file)
> >> return;
> >>
> >> if (trace_trigger_soft_disabled(trace_file))
> >> return;
> >>
> >> sys_data = syscall_nr_to_meta(syscall_nr);
> >> if (!sys_data)
> >> return;
> >>
> >> size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
> >>
> >> entry = trace_event_buffer_reserve(&fbuffer, trace_file, size);
> >> ^^^^ it reserves space in the ring buffer without disabling preemption explicitly.
> >>
> >> And also:
> >>
> >> void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
> >> struct trace_event_file *trace_file,
> >> unsigned long len)
> >> {
> >> struct trace_event_call *event_call = trace_file->event_call;
> >>
> >> if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
> >> trace_event_ignore_this_pid(trace_file))
> >> return NULL;
> >>
> >> /*
> >> * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
> >> * preemption (adding one to the preempt_count). Since we are
> >> * interested in the preempt_count at the time the tracepoint was
> >> * hit, we need to subtract one to offset the increment.
> >> */
> >> ^^^ This function also explicitly expects preemption to be disabled.
> >>
> >> So I rest my case. The change I'm introducing for tracepoints
> >> don't make any assumptions about whether or not each tracer require
> >> preempt off or not: it keeps the behavior the _same_ as it was before.
> >>
> >> Then it's up to each tracer's developer to change the behavior of their
> >> own callbacks as they see fit. But I'm not introducing regressions in
> >> tracers with the "big switch" change of making syscall tracepoints
> >> faultable. This will belong to changes that are specific to each tracer.
> >
> >
> > I rather remove these dependencies at the source. So, IMHO, these places
> > should be "fixed" first.
> >
> > At least for the ftrace users. But I think the same can be done for the
> > other users as well. BPF already stated it just needs "migrate_disable()".
> > Let's see what perf has.
> >
> > We can then audit all the tracepoint users to make sure they do not need
> > preemption disabled.
>
> Why does it need to be a broad refactoring of the entire world ? What is
> wrong with the simple approach of introducing this tracepoint faultable
> syscall support as a no-op from the tracer's perspective ?
Because we want in-tree users too ;-)
>
> Then we can build on top and figure out if we want to relax things
> on a tracer-per-tracer basis.
Looking deeper into how ftrace can implement this, it may require some more
work. Doing it your way may be fine for now, but we need this working for
something in-tree instead of having it only work for LTTng.
Note, it doesn't have to be ftrace either. It could be perf or BPF. Or
simply the sframe code (doing stack traces at the entry of system calls).
-- Steve
next prev parent reply other threads:[~2024-10-04 13:25 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-03 15:16 [PATCH v1 0/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-10-03 15:16 ` [PATCH v1 1/8] tracing: Declare system call tracepoints with TRACE_EVENT_SYSCALL Mathieu Desnoyers
2024-10-03 21:32 ` Steven Rostedt
2024-10-04 0:15 ` Mathieu Desnoyers
2024-10-04 1:06 ` Steven Rostedt
2024-10-04 1:34 ` Mathieu Desnoyers
2024-10-04 10:34 ` kernel test robot
2024-10-03 15:16 ` [PATCH v1 2/8] tracing/ftrace: guard syscall probe with preempt_notrace Mathieu Desnoyers
2024-10-03 22:23 ` Steven Rostedt
2024-10-04 0:26 ` Mathieu Desnoyers
2024-10-04 1:04 ` Steven Rostedt
2024-10-04 1:33 ` Mathieu Desnoyers
2024-10-04 13:26 ` Steven Rostedt [this message]
2024-10-04 14:18 ` Mathieu Desnoyers
2024-10-04 14:45 ` Steven Rostedt
2024-10-04 14:19 ` Mathieu Desnoyers
2024-10-04 14:52 ` Steven Rostedt
2024-10-04 14:51 ` Mathieu Desnoyers
2024-10-04 20:04 ` Andrii Nakryiko
2024-10-04 20:59 ` Steven Rostedt
2024-10-03 15:16 ` [PATCH v1 3/8] tracing/perf: " Mathieu Desnoyers
2024-10-03 22:25 ` Steven Rostedt
2024-10-04 0:17 ` Frederic Weisbecker
2024-10-03 15:16 ` [PATCH v1 4/8] tracing/bpf: " Mathieu Desnoyers
2024-10-03 22:26 ` Steven Rostedt
2024-10-03 23:05 ` Alexei Starovoitov
2024-10-04 0:30 ` Mathieu Desnoyers
2024-10-04 1:28 ` Mathieu Desnoyers
2024-10-03 15:16 ` [PATCH v1 5/8] tracing: Allow system call tracepoints to handle page faults Mathieu Desnoyers
2024-10-03 22:29 ` Steven Rostedt
2024-10-04 0:35 ` Mathieu Desnoyers
2024-10-03 15:16 ` [PATCH v1 6/8] tracing/ftrace: Add might_fault check to syscall probes Mathieu Desnoyers
2024-10-03 22:36 ` Steven Rostedt
2024-10-04 0:11 ` Mathieu Desnoyers
2024-10-03 15:16 ` [PATCH v1 7/8] tracing/perf: " Mathieu Desnoyers
2024-10-03 22:37 ` Steven Rostedt
2024-10-04 0:12 ` Mathieu Desnoyers
2024-10-03 15:16 ` [PATCH v1 8/8] tracing/bpf: " Mathieu Desnoyers
2024-10-03 22:38 ` Steven Rostedt
2024-10-04 0:13 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241004092619.0be53f90@gandalf.local.home \
--to=rostedt@goodmis.org \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrii.nakryiko@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=mjeanson@efficios.com \
--cc=namhyung@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).