public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Pavlu <petr.pavlu@suse.com>
To: Cao Ruichuang <create0818@163.com>
Cc: linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org,
	mhiramat@kernel.org, rostedt@goodmis.org
Subject: Re: [PATCH] tracing: separate module tracepoint strings from trace_printk formats
Date: Tue, 14 Apr 2026 13:37:45 +0200	[thread overview]
Message-ID: <65f9ebb2-3d3f-4bf4-9c33-2f3855ed008e@suse.com> (raw)
In-Reply-To: <20260413123359.32517-1-create0818@163.com>

On 4/13/26 2:33 PM, Cao Ruichuang wrote:
> The previous module tracepoint_string() fix took the smallest
> implementation path and reused the existing module trace_printk format
> storage.
> 
> That was enough to make module __tracepoint_str entries show up in
> printk_formats and be accepted by trace_is_tracepoint_string(), but it
> also made those copied mappings persist after module unload. That does
> not match the expected module lifetime semantics.
> 
> Handle module tracepoint_string() mappings separately instead of mixing
> them into the module trace_printk format list. Keep copying the strings
> into tracing-managed storage while the module is loaded, but track them
> on their own list and drop them again on MODULE_STATE_GOING.
> 
> Keep module trace_printk format handling unchanged.
> 
> This split is intentional: module trace_printk formats and module
> tracepoint_string() mappings do not have the same lifetime requirements.
> Keeping them in one shared structure would either preserve
> tracepoint_string() mappings too long again, or require mixed
> ownership/refcount rules in a trace_printk-oriented structure.
> 
> The separate module tracepoint_string() list intentionally keeps one
> copied mapping per module entry instead of trying to share copies across
> modules by string contents. printk_formats is address-based, and sharing
> those copies would add another layer of shared ownership/refcounting
> without changing the lifetime rule this fix is trying to restore.

I believe it would be more productive to first answer the question in my
previous email and give others time to comment on the noted issue and
discuss a possible solution. The previous patch hasn't been accepted
yet, so sending a new patch on top of it just hours after my comment
seems unusual.

The patch contains some unrelated changes, such as renaming last_index
to next_index in find_next(). I suspect it was generated with the help
of AI but it isn't attributed as such. Please check
Documentation/process/coding-assistants.rst.

The implementation splits the trace_bprintk_fmt_list into two lists.
While this looks sensible, it creates nearly duplicate code for handling
the two lists, for example, hold_module_trace_format() and
hold_module_trace_format(), or lookup_mod_format_ptr() and
lookup_mod_tracepoint_ptr(). An alternative would be to keep one list
and have a 'struct module *mod' member to differentiate between the
strings. 'mod == NULL' for printk fmts and 'mod != NULL' for tracepoint
strings.

I also considered whether kernel/trace/trace_printk.c could avoid
tracking tracepoint strings altogether and instead have
find_next_mod_entry() directly traverse the modules list and
module::tracepoint_strings_start. However, this would require taking
both btrace_mutex and the RCU lock, and I believe it would be more
complex.

Since the tracepoint strings are now tracked only for the duration of
a module's lifetime, kernel/trace/trace_printk.c doesn't need to copy
the strings and can reference directly the original module data.
Alternatively, the original strings could be placed in an .init section
so that once trace_printk.c copies them and the module is initialized,
they get dropped.

-- 
Thanks,
Petr


> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217196
> Signed-off-by: Cao Ruichuang <create0818@163.com>
> ---
>  include/linux/tracepoint.h  |   9 +-
>  kernel/trace/trace_printk.c | 250 ++++++++++++++++++++++--------------
>  2 files changed, 157 insertions(+), 102 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index f14da542402..aec598a4017 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -479,11 +479,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>   *
>   * For built-in code, the tracing system uses the original string address.
>   * For modules, the tracing code saves tracepoint strings into
> - * tracing-managed storage when the module loads, so their mappings remain
> - * available through printk_formats and trace string checks even after the
> - * module's own memory goes away. As long as the string does not change
> - * during the life of the module, it is fine to use tracepoint_string()
> - * within a module.
> + * tracing-managed storage while the module is loaded, and drops those
> + * mappings again when the module unloads. As long as the string does not
> + * change during the life of the module, it is fine to use
> + * tracepoint_string() within a module.
>   */
>  #define tracepoint_string(str)						\
>  	({								\
> diff --git a/kernel/trace/trace_printk.c b/kernel/trace/trace_printk.c
> index 9f67ce42ef6..0420ffcff93 100644
> --- a/kernel/trace/trace_printk.c
> +++ b/kernel/trace/trace_printk.c
> @@ -21,24 +21,24 @@
>  
>  #ifdef CONFIG_MODULES
>  
> -/*
> - * modules trace_printk() formats and tracepoint_string() strings are
> - * autosaved in struct trace_bprintk_fmt, which are queued on
> - * trace_bprintk_fmt_list.
> - */
> +/* module trace_printk() formats are autosaved on trace_bprintk_fmt_list. */
>  static LIST_HEAD(trace_bprintk_fmt_list);
> +/* module tracepoint_string() copies live on tracepoint_string_list. */
> +static LIST_HEAD(tracepoint_string_list);
>  
> -/* serialize accesses to trace_bprintk_fmt_list */
> +/* serialize accesses to module format and tracepoint-string lists */
>  static DEFINE_MUTEX(btrace_mutex);
>  
>  struct trace_bprintk_fmt {
>  	struct list_head list;
>  	const char *fmt;
> -	unsigned int type;
>  };
>  
> -#define TRACE_BPRINTK_TYPE		BIT(0)
> -#define TRACE_TRACEPOINT_TYPE		BIT(1)
> +struct tracepoint_string_entry {
> +	struct list_head list;
> +	struct module *mod;
> +	const char *str;
> +};
>  
>  static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
>  {
> @@ -54,24 +54,21 @@ static inline struct trace_bprintk_fmt *lookup_format(const char *fmt)
>  	return NULL;
>  }
>  
> -static void hold_module_trace_format(const char **start, const char **end,
> -				     unsigned int type)
> +static void hold_module_trace_bprintk_format(const char **start, const char **end)
>  {
>  	const char **iter;
>  	char *fmt;
>  
>  	/* allocate the trace_printk per cpu buffers */
> -	if ((type & TRACE_BPRINTK_TYPE) && start != end)
> +	if (start != end)
>  		trace_printk_init_buffers();
>  
>  	mutex_lock(&btrace_mutex);
>  	for (iter = start; iter < end; iter++) {
>  		struct trace_bprintk_fmt *tb_fmt = lookup_format(*iter);
>  		if (tb_fmt) {
> -			if (!IS_ERR(tb_fmt)) {
> -				tb_fmt->type |= type;
> +			if (!IS_ERR(tb_fmt))
>  				*iter = tb_fmt->fmt;
> -			}
>  			continue;
>  		}
>  
> @@ -83,7 +80,6 @@ static void hold_module_trace_format(const char **start, const char **end,
>  				list_add_tail(&tb_fmt->list, &trace_bprintk_fmt_list);
>  				strcpy(fmt, *iter);
>  				tb_fmt->fmt = fmt;
> -				tb_fmt->type = type;
>  			} else
>  				kfree(tb_fmt);
>  		}
> @@ -93,89 +89,156 @@ static void hold_module_trace_format(const char **start, const char **end,
>  	mutex_unlock(&btrace_mutex);
>  }
>  
> +static void hold_module_tracepoint_strings(struct module *mod,
> +					   const char **start,
> +					   const char **end)
> +{
> +	const char **iter;
> +
> +	mutex_lock(&btrace_mutex);
> +	for (iter = start; iter < end; iter++) {
> +		struct tracepoint_string_entry *tp_entry;
> +		char *str;
> +
> +		tp_entry = kmalloc_obj(*tp_entry);
> +		if (!tp_entry)
> +			continue;
> +
> +		str = kstrdup(*iter, GFP_KERNEL);
> +		if (!str) {
> +			kfree(tp_entry);
> +			continue;
> +		}
> +
> +		tp_entry->mod = mod;
> +		tp_entry->str = str;
> +		list_add_tail(&tp_entry->list, &tracepoint_string_list);
> +		*iter = tp_entry->str;
> +	}
> +	mutex_unlock(&btrace_mutex);
> +}
> +
> +static void release_module_tracepoint_strings(struct module *mod)
> +{
> +	struct tracepoint_string_entry *tp_entry, *n;
> +
> +	mutex_lock(&btrace_mutex);
> +	list_for_each_entry_safe(tp_entry, n, &tracepoint_string_list, list) {
> +		if (tp_entry->mod != mod)
> +			continue;
> +		list_del(&tp_entry->list);
> +		kfree(tp_entry->str);
> +		kfree(tp_entry);
> +	}
> +	mutex_unlock(&btrace_mutex);
> +}
> +
>  static int module_trace_format_notify(struct notifier_block *self,
>  				      unsigned long val, void *data)
>  {
>  	struct module *mod = data;
>  
> -	if (val != MODULE_STATE_COMING)
> -		return NOTIFY_OK;
> +	switch (val) {
> +	case MODULE_STATE_COMING:
> +		if (mod->num_trace_bprintk_fmt) {
> +			const char **start = mod->trace_bprintk_fmt_start;
> +			const char **end = start + mod->num_trace_bprintk_fmt;
>  
> -	if (mod->num_trace_bprintk_fmt) {
> -		const char **start = mod->trace_bprintk_fmt_start;
> -		const char **end = start + mod->num_trace_bprintk_fmt;
> +			hold_module_trace_bprintk_format(start, end);
> +		}
> +
> +		if (mod->num_tracepoint_strings) {
> +			const char **start = mod->tracepoint_strings_start;
> +			const char **end = start + mod->num_tracepoint_strings;
>  
> -		hold_module_trace_format(start, end, TRACE_BPRINTK_TYPE);
> +			hold_module_tracepoint_strings(mod, start, end);
> +		}
> +		break;
> +	case MODULE_STATE_GOING:
> +		release_module_tracepoint_strings(mod);
> +		break;
>  	}
>  
> -	if (mod->num_tracepoint_strings) {
> -		const char **start = mod->tracepoint_strings_start;
> -		const char **end = start + mod->num_tracepoint_strings;
> +	return NOTIFY_OK;
> +}
>  
> -		hold_module_trace_format(start, end, TRACE_TRACEPOINT_TYPE);
> +static const char **find_first_mod_entry(void)
> +{
> +	struct trace_bprintk_fmt *tb_fmt;
> +	struct tracepoint_string_entry *tp_entry;
> +
> +	if (!list_empty(&trace_bprintk_fmt_list)) {
> +		tb_fmt = list_first_entry(&trace_bprintk_fmt_list,
> +					  typeof(*tb_fmt), list);
> +		return &tb_fmt->fmt;
>  	}
>  
> -	return NOTIFY_OK;
> +	if (!list_empty(&tracepoint_string_list)) {
> +		tp_entry = list_first_entry(&tracepoint_string_list,
> +					    typeof(*tp_entry), list);
> +		return &tp_entry->str;
> +	}
> +
> +	return NULL;
>  }
>  
> -/*
> - * The debugfs/tracing/printk_formats file maps the addresses with
> - * the ASCII formats that are used in the bprintk events in the
> - * buffer. For userspace tools to be able to decode the events from
> - * the buffer, they need to be able to map the address with the format.
> - *
> - * The addresses of the bprintk formats are in their own section
> - * __trace_printk_fmt. But for modules we copy them into a link list.
> - * The code to print the formats and their addresses passes around the
> - * address of the fmt string. If the fmt address passed into the seq
> - * functions is within the kernel core __trace_printk_fmt section, then
> - * it simply uses the next pointer in the list.
> - *
> - * When the fmt pointer is outside the kernel core __trace_printk_fmt
> - * section, then we need to read the link list pointers. The trick is
> - * we pass the address of the string to the seq function just like
> - * we do for the kernel core formats. To get back the structure that
> - * holds the format, we simply use container_of() and then go to the
> - * next format in the list.
> - */
> -static const char **
> -find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> +static struct trace_bprintk_fmt *lookup_mod_format_ptr(const char **fmt_ptr)
>  {
> -	struct trace_bprintk_fmt *mod_fmt;
> +	struct trace_bprintk_fmt *tb_fmt;
>  
> -	if (list_empty(&trace_bprintk_fmt_list))
> -		return NULL;
> +	list_for_each_entry(tb_fmt, &trace_bprintk_fmt_list, list) {
> +		if (fmt_ptr == &tb_fmt->fmt)
> +			return tb_fmt;
> +	}
>  
> -	/*
> -	 * v will point to the address of the fmt record from t_next
> -	 * v will be NULL from t_start.
> -	 * If this is the first pointer or called from start
> -	 * then we need to walk the list.
> -	 */
> -	if (!v || start_index == *pos) {
> -		struct trace_bprintk_fmt *p;
> -
> -		/* search the module list */
> -		list_for_each_entry(p, &trace_bprintk_fmt_list, list) {
> -			if (start_index == *pos)
> -				return &p->fmt;
> -			start_index++;
> +	return NULL;
> +}
> +
> +static struct tracepoint_string_entry *lookup_mod_tracepoint_ptr(const char **str_ptr)
> +{
> +	struct tracepoint_string_entry *tp_entry;
> +
> +	list_for_each_entry(tp_entry, &tracepoint_string_list, list) {
> +		if (str_ptr == &tp_entry->str)
> +			return tp_entry;
> +	}
> +
> +	return NULL;
> +}
> +
> +static const char **find_next_mod_entry(int start_index, void *v, loff_t *pos)
> +{
> +	struct trace_bprintk_fmt *tb_fmt;
> +	struct tracepoint_string_entry *tp_entry;
> +
> +	if (!v || start_index == *pos)
> +		return find_first_mod_entry();
> +
> +	tb_fmt = lookup_mod_format_ptr(v);
> +	if (tb_fmt) {
> +		if (tb_fmt->list.next != &trace_bprintk_fmt_list) {
> +			tb_fmt = list_next_entry(tb_fmt, list);
> +			return &tb_fmt->fmt;
>  		}
> -		/* pos > index */
> +
> +		if (!list_empty(&tracepoint_string_list)) {
> +			tp_entry = list_first_entry(&tracepoint_string_list,
> +						    typeof(*tp_entry), list);
> +			return &tp_entry->str;
> +		}
> +
>  		return NULL;
>  	}
>  
> -	/*
> -	 * v points to the address of the fmt field in the mod list
> -	 * structure that holds the module print format.
> -	 */
> -	mod_fmt = container_of(v, typeof(*mod_fmt), fmt);
> -	if (mod_fmt->list.next == &trace_bprintk_fmt_list)
> +	tp_entry = lookup_mod_tracepoint_ptr(v);
> +	if (!tp_entry)
>  		return NULL;
>  
> -	mod_fmt = container_of(mod_fmt->list.next, typeof(*mod_fmt), list);
> +	if (tp_entry->list.next == &tracepoint_string_list)
> +		return NULL;
>  
> -	return &mod_fmt->fmt;
> +	tp_entry = list_next_entry(tp_entry, list);
> +	return &tp_entry->str;
>  }
>  
>  static void format_mod_start(void)
> @@ -195,8 +258,8 @@ module_trace_format_notify(struct notifier_block *self,
>  {
>  	return NOTIFY_OK;
>  }
> -static inline const char **
> -find_next_mod_format(int start_index, void *v, const char **fmt, loff_t *pos)
> +static inline const char **find_next_mod_entry(int start_index, void *v,
> +					       loff_t *pos)
>  {
>  	return NULL;
>  }
> @@ -274,7 +337,7 @@ bool trace_is_tracepoint_string(const char *str)
>  {
>  	const char **ptr = __start___tracepoint_str;
>  #ifdef CONFIG_MODULES
> -	struct trace_bprintk_fmt *tb_fmt;
> +	struct tracepoint_string_entry *tp_entry;
>  #endif
>  
>  	for (ptr = __start___tracepoint_str; ptr < __stop___tracepoint_str; ptr++) {
> @@ -284,8 +347,8 @@ bool trace_is_tracepoint_string(const char *str)
>  
>  #ifdef CONFIG_MODULES
>  	mutex_lock(&btrace_mutex);
> -	list_for_each_entry(tb_fmt, &trace_bprintk_fmt_list, list) {
> -		if ((tb_fmt->type & TRACE_TRACEPOINT_TYPE) && str == tb_fmt->fmt) {
> +	list_for_each_entry(tp_entry, &tracepoint_string_list, list) {
> +		if (str == tp_entry->str) {
>  			mutex_unlock(&btrace_mutex);
>  			return true;
>  		}
> @@ -297,9 +360,8 @@ bool trace_is_tracepoint_string(const char *str)
>  
>  static const char **find_next(void *v, loff_t *pos)
>  {
> -	const char **fmt = v;
>  	int start_index;
> -	int last_index;
> +	int next_index;
>  
>  	start_index = __stop___trace_bprintk_fmt - __start___trace_bprintk_fmt;
>  
> @@ -307,25 +369,19 @@ static const char **find_next(void *v, loff_t *pos)
>  		return __start___trace_bprintk_fmt + *pos;
>  
>  	/*
> -	 * The __tracepoint_str section is treated the same as the
> -	 * __trace_printk_fmt section. The difference is that the
> -	 * __trace_printk_fmt section should only be used by trace_printk()
> -	 * in a debugging environment, as if anything exists in that section
> -	 * the trace_prink() helper buffers are allocated, which would just
> -	 * waste space in a production environment.
> -	 *
> -	 * The __tracepoint_str sections on the other hand are used by
> -	 * tracepoints which need to map pointers to their strings to
> -	 * the ASCII text for userspace.
> +	 * Built-in __tracepoint_str entries are exported directly from the
> +	 * core section. Module tracepoint_string() mappings are kept on a
> +	 * separate tracing-managed list below, because their lifetime is tied
> +	 * to module load/unload and differs from module trace_printk() formats.
>  	 */
> -	last_index = start_index;
> +	next_index = start_index;
>  	start_index = __stop___tracepoint_str - __start___tracepoint_str;
>  
> -	if (*pos < last_index + start_index)
> -		return __start___tracepoint_str + (*pos - last_index);
> +	if (*pos < next_index + start_index)
> +		return __start___tracepoint_str + (*pos - next_index);
>  
> -	start_index += last_index;
> -	return find_next_mod_format(start_index, v, fmt, pos);
> +	start_index += next_index;
> +	return find_next_mod_entry(start_index, v, pos);
>  }
>  
>  static void *


  reply	other threads:[~2026-04-14 11:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 17:09 [PATCH] tracing: preserve module tracepoint strings Cao Ruichuang
2026-04-08 20:32 ` Steven Rostedt
2026-04-09 12:37 ` Petr Pavlu
2026-04-10  5:18 ` [PATCH v2] " Cao Ruichuang
2026-04-13  9:40   ` Petr Pavlu
2026-04-13 12:33     ` [PATCH] tracing: separate module tracepoint strings from trace_printk formats Cao Ruichuang
2026-04-14 11:37       ` Petr Pavlu [this message]
2026-04-16  8:03         ` Cao Ruichuang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65f9ebb2-3d3f-4bf4-9c33-2f3855ed008e@suse.com \
    --to=petr.pavlu@suse.com \
    --cc=create0818@163.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox