public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: "Fusheng Huang(黄富生)" <Fusheng.Huang@luxshare-ict.com>
Cc: "mhiramat@kernel.org" <mhiramat@kernel.org>,
	"GuoJin Dai( 戴国金)" <GuoJin.Dai@luxshare-ict.com>,
	"linux-trace-kernel@vger.kernel.org"
	<linux-trace-kernel@vger.kernel.org>
Subject: Re: 【ftrace_events list】race between trace_event_eval_update and __register_event discuss
Date: Thu, 17 Jul 2025 10:50:07 -0400	[thread overview]
Message-ID: <20250717105007.46ccd18f@batman.local.home> (raw)
In-Reply-To: <3c1259275c784bbd9e2f33e1b2d846c1@luxshare-ict.com>

On Thu, 17 Jul 2025 12:37:37 +0000
Fusheng Huang(黄富生) <Fusheng.Huang@luxshare-ict.com> wrote:

> Dear rostedt, mhiramat:
> 
> We encountered a small problem of ftrace module, and need your help .
> 
> 【issue background】During the process of starting and loading ko in
> the Linux kernel, we encountered a crash of the CPU 7 null pointer
> caused by the synchronization operation of ftrace_events list, with a
> probability of less than 1/1000
> 
> 【call stack】
> 
>        CPU7
> 
>        load_module()
>        prepare_coming_module(inline)
>        blocking_notifier_call_chain_robust()
>        notifier_call_chain_robust(inline)
>        notifier_call_chain(inline)
>        trace_module_notify()
>        trace_module_add_evals(inline)
>        trace_insert_eval_map(inline)
>        trace_event_eval_update()
>                      |
>                      |...
>                      |---down_write(&trace_event_sem);
>                             list_for_each_entry_safe(call, p, &ftrace_events, list) {
>                             /* events are usually grouped together with systems */
>                             if (!last_system || call->class->system != last_system) {
>                                    first = true;
>                                    last_i = 0;
>                                    last_system = call->class->system;
>                             }
>                             ......
> 
> 
>             CPU4
> 
>         load_module()

Hmm, for some reason I thought module loading was synchronized and
couldn't run in parallel. It may have been that way long ago, but
apparently it isn't now.

>          prepare_coming_module(inline)
>          blocking_notifier_call_chain_robust()
>          notifier_call_chain_robust()
>          trace_module_notify()
>          trace_module_add_events(inline)
>          __register_event(inline)
>                      |...
>                      |--list_add(&call->list, &ftrace_events);
>                      |...
> 
> 
> 
> 
>        Function1: trace_event_eval_update --->traverse the ftrace_events list
> 
>        Function2: __register_event --->list_add ftrace_events ,but no write lock protect.
> 
>      Function1 and Function2 are running at the same time.
> 
> 
>       We want to know Is this a small bug or are there other considerations?

This looks like a small bug.

> 
>       Should we could add write lock protection to optimize this issue ?
> 
>       __register_event(inline)
>                 |...
>                 |+++down_write(&trace_event_sem);
>                 |
>                 |---list_add(&call->list, &ftrace_events);
>                 |
>                 |+++down_up(&trace_event_sem);
>               |...
> 

Can you see if this patch works for you?

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 120531268abf..f00d3901cd3b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -3675,10 +3675,12 @@ static void trace_module_add_events(struct module *mod)
 	start = mod->trace_events;
 	end = mod->trace_events + mod->num_trace_events;
 
+	down_write(&trace_event_sem);
 	for_each_event(call, start, end) {
 		__register_event(*call, mod);
 		__add_event_to_tracers(*call);
 	}
+	up_write(&trace_event_sem);
 
 	update_cache_events(mod);
 }

The remove takes that semaphore, I don't know why the addition doesn't.
Looks to be a long standing bug. Goes back to 110bf2b764eb6 that was
added in 2009. Probably because I assumed load_module() was
synchronized :-p

Thanks for the report!

-- Steve

       reply	other threads:[~2025-07-17 14:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3c1259275c784bbd9e2f33e1b2d846c1@luxshare-ict.com>
2025-07-17 14:50 ` Steven Rostedt [this message]
2025-07-17 14:52   ` 【ftrace_events list】race between trace_event_eval_update and __register_event discuss Steven Rostedt
2025-07-17 14:59   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250717105007.46ccd18f@batman.local.home \
    --to=rostedt@goodmis.org \
    --cc=Fusheng.Huang@luxshare-ict.com \
    --cc=GuoJin.Dai@luxshare-ict.com \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox