* Re: [PATCH v2] tracing: Optimize event type allocation with IDA
[not found] <20221122223258.10faaf4e@rorschach.local.home>
@ 2022-11-23 8:01 ` Zheng Yejian
2022-11-23 13:37 ` Steven Rostedt
0 siblings, 1 reply; 2+ messages in thread
From: Zheng Yejian @ 2022-11-23 8:01 UTC (permalink / raw)
To: rostedt
Cc: bpf, linux-kernel, mhiramat, yujie.liu, zhengyejian1,
linux-trace-kernel
On Tue, 22 Nov 2022 22:32:58 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 23 Nov 2022 11:18:06 +0800
> Zheng Yejian <zhengyejian1@huawei.com> wrote:
>
> > But Yujie Liu <yujie.liu@intel.com> reported a problem that highly
> > reproducible after applying this patch:
> > Link: https://lore.kernel.org/lkml/54f23c9c-97ae-e326-5873-bfa5d2c81f52@intel.com/
> >
> > So please DO NOT apply this patch before I find what happened about it.
>
> I know what the issue is.
>
> The current way of assigning types is to always increment. And not to
> reuse until it fills up. And even then, it looks for the next available
> number.
>
> I'm guessing the IDA will reuse a number as soon as it is freed. This
Yes, it is.
> may also have uncovered a bug, as in reality, we must actually clear
> the tracing buffers every time a number is reused.
Thanks for your explanation!
It seems almost the case, and with current way of assigning types, this
problem maybe also happend when reuse type id, am I right?
But instead of clear tracing buffers, would it be better to just mark
that record invalid if we had a way of knowing that the format had changed?
>
> What happens is that the type number is associated to a print format.
> That is, the raw data is tagged with the type. This type maps to how to
> parse the raw data. If you have a kprobe, it creates a new type number.
> If you free it, and create another one. With the IDA, it is likely to
> reassign the previously freed number to a new probe.
>
> To explain this better, let's look at the following scenario:
>
> echo 'p:foo val=$arg1:u64' > kprobe_events
> echo 1 > events/kprobes/foo/enable
> sleep 1
> echo 0 > events/kprobes/foo/enable
>
> echo 'p:bar val=+0($arg1):string' > kprobe_events
>
> # foo kprobe is deleted and bar is created and
> # with IDA, bar has the same number for type as foo
>
> cat trace
>
> When you read the trace, it will see a binary blob representing an
> event and marked with a type. Although the event was foo, it will now
> map it to bar. And it will read foo's $arg1:u64 as bar's
> +0($arg1):string, and will crash.
>
> -- Steve
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v2] tracing: Optimize event type allocation with IDA
2022-11-23 8:01 ` [PATCH v2] tracing: Optimize event type allocation with IDA Zheng Yejian
@ 2022-11-23 13:37 ` Steven Rostedt
0 siblings, 0 replies; 2+ messages in thread
From: Steven Rostedt @ 2022-11-23 13:37 UTC (permalink / raw)
To: Zheng Yejian; +Cc: bpf, linux-kernel, mhiramat, yujie.liu, linux-trace-kernel
On Wed, 23 Nov 2022 16:01:33 +0800
Zheng Yejian <zhengyejian1@huawei.com> wrote:
> But instead of clear tracing buffers, would it be better to just mark
> that record invalid if we had a way of knowing that the format had changed?
We'd have to scan all ring buffers for the records. The only way to know
what a binary blob in the ring buffer represents is this type.
But, I do the same for modules. And by hooking into that infrastructure, I
can do the above commands again with the IDA patch included, and it works
fine. If any dynamic event was enabled and then removed, it will reset the
corresponding buffer where it was enabled in. This is needed regardless of
your patch because once we get over 65536 types, the behavior of the type
assignment is the same as the IDA logic. The IDA code didn't trigger a bug,
it revealed an existing one.
Care to test with the below patch added. I'll add this first (breaking it
up a little) and then apply your patch.
-- Steve
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6a47fed2f473..93a75a97118f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2180,10 +2180,12 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
}
/* Must have trace_types_lock held */
-void tracing_reset_all_online_cpus(void)
+void tracing_reset_all_online_cpus_unlocked(void)
{
struct trace_array *tr;
+ lockdep_assert_held(&trace_types_lock);
+
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
if (!tr->clear_trace)
continue;
@@ -2195,6 +2197,13 @@ void tracing_reset_all_online_cpus(void)
}
}
+void tracing_reset_all_online_cpus(void)
+{
+ mutex_lock(&trace_types_lock);
+ tracing_reset_all_online_cpus_unlocked();
+ mutex_unlock(&trace_types_lock);
+}
+
/*
* The tgid_map array maps from pid to tgid; i.e. the value stored at index i
* is the tgid last observed corresponding to pid=i.
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d9b470a0adf2..48643f07bc01 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -580,6 +580,7 @@ int tracing_is_enabled(void);
void tracing_reset_online_cpus(struct array_buffer *buf);
void tracing_reset_current(int cpu);
void tracing_reset_all_online_cpus(void);
+void tracing_reset_all_online_cpus_unlocked(void);
int tracing_open_generic(struct inode *inode, struct file *filp);
int tracing_open_generic_tr(struct inode *inode, struct file *filp);
bool tracing_is_disabled(void);
diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
index 154996684fb5..4376887e0d8a 100644
--- a/kernel/trace/trace_dynevent.c
+++ b/kernel/trace/trace_dynevent.c
@@ -118,6 +118,7 @@ int dyn_event_release(const char *raw_command, struct dyn_event_operations *type
if (ret)
break;
}
+ tracing_reset_all_online_cpus();
mutex_unlock(&event_mutex);
out:
argv_free(argv);
@@ -214,6 +215,7 @@ int dyn_events_release_all(struct dyn_event_operations *type)
break;
}
out:
+ tracing_reset_all_online_cpus();
mutex_unlock(&event_mutex);
return ret;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7372fffb8109..3bfaf560ecc4 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2947,7 +2947,10 @@ static int probe_remove_event_call(struct trace_event_call *call)
* TRACE_REG_UNREGISTER.
*/
if (file->flags & EVENT_FILE_FL_ENABLED)
- return -EBUSY;
+ goto busy;
+
+ if (file->flags & EVENT_FILE_FL_WAS_ENABLED)
+ tr->clear_trace = true;
/*
* The do_for_each_event_file_safe() is
* a double loop. After finding the call for this
@@ -2960,6 +2963,12 @@ static int probe_remove_event_call(struct trace_event_call *call)
__trace_remove_event_call(call);
return 0;
+ busy:
+ /* No need to clear the trace now */
+ list_for_each_entry(tr, &ftrace_trace_arrays, list) {
+ tr->clear_trace = false;
+ }
+ return -EBUSY;
}
/* Remove an event_call */
@@ -3039,7 +3048,7 @@ static void trace_module_remove_events(struct module *mod)
* over from this module may be passed to the new module events and
* unexpected results may occur.
*/
- tracing_reset_all_online_cpus();
+ tracing_reset_all_online_cpus_unlocked();
}
static int trace_module_notify(struct notifier_block *self,
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 29fbfb27c2b2..c3b582d19b62 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -1425,7 +1425,6 @@ int synth_event_delete(const char *event_name)
mutex_unlock(&event_mutex);
if (mod) {
- mutex_lock(&trace_types_lock);
/*
* It is safest to reset the ring buffer if the module
* being unloaded registered any events that were
@@ -1437,7 +1436,6 @@ int synth_event_delete(const char *event_name)
* occur.
*/
tracing_reset_all_online_cpus();
- mutex_unlock(&trace_types_lock);
}
return ret;
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-11-23 13:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20221122223258.10faaf4e@rorschach.local.home>
2022-11-23 8:01 ` [PATCH v2] tracing: Optimize event type allocation with IDA Zheng Yejian
2022-11-23 13:37 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).