* Re: 【ftrace_events list】race between trace_event_eval_update and __register_event discuss
[not found] <3c1259275c784bbd9e2f33e1b2d846c1@luxshare-ict.com>
@ 2025-07-17 14:50 ` Steven Rostedt
2025-07-17 14:52 ` Steven Rostedt
2025-07-17 14:59 ` Steven Rostedt
0 siblings, 2 replies; 3+ messages in thread
From: Steven Rostedt @ 2025-07-17 14:50 UTC (permalink / raw)
To: Fusheng Huang(黄富生)
Cc: mhiramat@kernel.org, GuoJin Dai( 戴国金),
linux-trace-kernel@vger.kernel.org
On Thu, 17 Jul 2025 12:37:37 +0000
Fusheng Huang(黄富生) <Fusheng.Huang@luxshare-ict.com> wrote:
> Dear rostedt, mhiramat:
>
> We encountered a small problem of ftrace module, and need your help .
>
> 【issue background】During the process of starting and loading ko in
> the Linux kernel, we encountered a crash of the CPU 7 null pointer
> caused by the synchronization operation of ftrace_events list, with a
> probability of less than 1/1000
>
> 【call stack】
>
> CPU7
>
> load_module()
> prepare_coming_module(inline)
> blocking_notifier_call_chain_robust()
> notifier_call_chain_robust(inline)
> notifier_call_chain(inline)
> trace_module_notify()
> trace_module_add_evals(inline)
> trace_insert_eval_map(inline)
> trace_event_eval_update()
> |
> |...
> |---down_write(&trace_event_sem);
> list_for_each_entry_safe(call, p, &ftrace_events, list) {
> /* events are usually grouped together with systems */
> if (!last_system || call->class->system != last_system) {
> first = true;
> last_i = 0;
> last_system = call->class->system;
> }
> ......
>
>
> CPU4
>
> load_module()
Hmm, for some reason I thought module loading was synchronized and
couldn't run in parallel. It may have been that way long ago, but
apparently it isn't now.
> prepare_coming_module(inline)
> blocking_notifier_call_chain_robust()
> notifier_call_chain_robust()
> trace_module_notify()
> trace_module_add_events(inline)
> __register_event(inline)
> |...
> |--list_add(&call->list, &ftrace_events);
> |...
>
>
>
>
> Function1: trace_event_eval_update --->traverse the ftrace_events list
>
> Function2: __register_event --->list_add ftrace_events ,but no write lock protect.
>
> Function1 and Function2 are running at the same time.
>
>
> We want to know Is this a small bug or are there other considerations?
This looks like a small bug.
>
> Should we could add write lock protection to optimize this issue ?
>
> __register_event(inline)
> |...
> |+++down_write(&trace_event_sem);
> |
> |---list_add(&call->list, &ftrace_events);
> |
> |+++down_up(&trace_event_sem);
> |...
>
Can you see if this patch works for you?
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 120531268abf..f00d3901cd3b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3675,10 +3675,12 @@ static void trace_module_add_events(struct module *mod)
start = mod->trace_events;
end = mod->trace_events + mod->num_trace_events;
+ down_write(&trace_event_sem);
for_each_event(call, start, end) {
__register_event(*call, mod);
__add_event_to_tracers(*call);
}
+ up_write(&trace_event_sem);
update_cache_events(mod);
}
The remove takes that semaphore, I don't know why the addition doesn't.
Looks to be a long standing bug. Goes back to 110bf2b764eb6 that was
added in 2009. Probably because I assumed load_module() was
synchronized :-p
Thanks for the report!
-- Steve
^ permalink raw reply related [flat|nested] 3+ messages in thread