* [PATCH 0/4] tracing: improve symbolic printing @ 2023-09-21 8:51 Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg ` (4 more replies) 0 siblings, 5 replies; 16+ messages in thread From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw) To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless So I was frustrated with not seeing the names of SKB dropreasons for all but the core reasons, and then while looking into this all, realized, that the current __print_symbolic() is pretty bad anyway. So I came up with a new approach, using a separate declaration of the symbols, and __print_sym() in there, but to userspace it all doesn't matter, it shows it the same way, just dyamically instead of munging with the strings all the time. This is a huge .data savings as far as I can tell, with a modest amount (~4k) of .text addition, while making it all dynamic and in the SKB dropreason case even reusing the existing list that dropmonitor uses today. Surely patch 3 isn't needed here, but it felt right. Anyway, I think it's a pretty reasonable approach overall, and it does works. I've listed a number of open questions in the first patch since that's where the real changes for this are. johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() 2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg @ 2023-09-21 8:51 ` Johannes Berg 2023-09-21 15:17 ` Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing Johannes Berg ` (3 subsequent siblings) 4 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw) To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg From: Johannes Berg <johannes.berg@intel.com> The way __print_symbolic() works is limited and inefficient in multiple ways: - you can only use it with a static list of symbols, but e.g. the SKB dropreasons are now a dynamic list - it builds the list in memory _three_ times, so it takes a lot of memory: - The print_fmt contains the list (since it's passed to the macro there). This actually contains the names _twice_, which is fixed up at runtime. - TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map for every entry, plus the string pointed to by it, which cannot be deduplicated with the strings in the print_fmt - The in-kernel symbolic printing creates yet another list of struct trace_print_flags for trace_print_symbols_seq() - it also requires runtime fixup during init, which is a lot of string parsing due to the print_fmt fixup Introduce __print_sym() to - over time - replace the old one. We can easily extend this also to __print_flags later, but I cared more about the SKB dropreasons for now. This new __print_sym() requires only a single list of items, created by TRACE_DEFINE_SYM_LIST(), for can even use another already existing list by using TRACE_DEFINE_SYM_FNS() with lookup and show methods. Then, instead of doing an init-time fixup, just do this at the time when userspace reads the print_fmt. This way, dynamically updated lists are possible. For userspace, nothing actually changes, because the print_fmt is shown exactly the same way the old __print_symbolic() was. This adds about 4k .text in my test builds, but that'll be more than paid for by the actual conversions. OPEN QUESTIONS: - do we need TRACE_EVENT_FL_DYNPRINT or should we just always go through show_print_fmt()? If we do, we can get rid of set_event_dynprint() which is also a stupid function, and somewhat inefficient. - Is the WARN_ON() in show_print_fmt() correct? Can we have a dynamic event using this infrastructure? I don't know about dynamic events enough. - Is the double-slash (//) thing a good approach? I was looking for something that couldn't happen in C, and that's a comment so it shouldn't be possible ... It's only internal though so we could always change it. Parsing to the next , means we'd have to parse a LOT of C, so that's out of the question. - Is it correct that we can assume RCU critical section when in the lookup function? The SKB code currently does, but I may not ever have actually run this code yet. - Where should all the functions be implemented :-) Signed-off-by: Johannes Berg <johannes.berg@intel.com> --- include/asm-generic/vmlinux.lds.h | 3 +- include/linux/module.h | 2 + include/linux/trace_events.h | 10 ++ include/linux/tracepoint.h | 20 ++++ include/trace/stages/init.h | 54 +++++++++ include/trace/stages/stage2_data_offsets.h | 6 + include/trace/stages/stage3_trace_output.h | 9 ++ include/trace/stages/stage7_class_define.h | 4 + kernel/module/main.c | 3 + kernel/trace/trace_events.c | 131 ++++++++++++++++++++- kernel/trace/trace_output.c | 43 +++++++ 11 files changed, 282 insertions(+), 3 deletions(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 9c59409104f6..b2b45ff0a701 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -263,7 +263,8 @@ #define FTRACE_EVENTS() \ . = ALIGN(8); \ BOUNDED_SECTION(_ftrace_events) \ - BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) + BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \ + BOUNDED_SECTION(_ftrace_sym_defs) #else #define FTRACE_EVENTS() #endif diff --git a/include/linux/module.h b/include/linux/module.h index a98e188cf37b..bf14b74a872c 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -524,6 +524,8 @@ struct module { unsigned int num_trace_events; struct trace_eval_map **trace_evals; unsigned int num_trace_evals; + struct trace_sym_def **trace_sym_defs; + unsigned int num_trace_sym_defs; #endif #ifdef CONFIG_FTRACE_MCOUNT_RECORD unsigned int num_ftrace_callsites; diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index eb5c3add939b..9ab651a8fa0b 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -24,6 +24,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val, + const char *(*lookup)(unsigned long long val)); +const char *trace_sym_lookup(const struct trace_sym_entry *list, + size_t len, unsigned long long value); +void trace_sym_show(struct seq_file *m, + const struct trace_sym_entry *list, size_t len); + #if BITS_PER_LONG == 32 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim, unsigned long long flags, @@ -331,6 +338,7 @@ enum { TRACE_EVENT_FL_EPROBE_BIT, TRACE_EVENT_FL_FPROBE_BIT, TRACE_EVENT_FL_CUSTOM_BIT, + TRACE_EVENT_FL_DYNPRINT_BIT, }; /* @@ -348,6 +356,7 @@ enum { * CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint) * This is set when the custom event has not been attached * to a tracepoint yet, then it is cleared when it is. + * DYNPRINT - Event has dynamic symbol prints */ enum { TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), @@ -361,6 +370,7 @@ enum { TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT), TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT), TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT), + TRACE_EVENT_FL_DYNPRINT = (1 << TRACE_EVENT_FL_DYNPRINT_BIT), }; #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 88c0ba623ee6..7650019f4347 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -31,6 +31,24 @@ struct trace_eval_map { unsigned long eval_value; }; +struct trace_sym_def { + const char *system; + const char *symbol_id; + /* may return NULL */ + const char * (*lookup)(unsigned long long); + /* + * Must print the list: ', { val, "name"}, ...' + * with no trailing comma, but with the leading ', ' + * to simplify things: + */ + void (*show)(struct seq_file *); +}; + +struct trace_sym_entry { + unsigned long long value; + const char *name; +}; + #define TRACEPOINT_DEFAULT_PRIO 10 extern struct srcu_struct tracepoint_srcu; @@ -109,6 +127,8 @@ extern void syscall_unregfunc(void); #define TRACE_DEFINE_ENUM(x) #define TRACE_DEFINE_SIZEOF(x) +#define TRACE_DEFINE_SYM_FNS(...) +#define TRACE_DEFINE_SYM_LIST(...) #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) diff --git a/include/trace/stages/init.h b/include/trace/stages/init.h index 000bcfc8dd2e..251ab22485d1 100644 --- a/include/trace/stages/init.h +++ b/include/trace/stages/init.h @@ -23,6 +23,60 @@ TRACE_MAKE_SYSTEM_STR(); __section("_ftrace_eval_map") \ *TRACE_SYSTEM##_##a = &__##TRACE_SYSTEM##_##a +/* + * Define a symbol for __print_sym by giving lookup and + * show functions. See &struct trace_sym_def. + */ +#define TRACE_DEFINE_SYM_FNS(_symbol_id, _lookup, _show) \ + _TRACE_DEFINE_SYM_FNS(TRACE_SYSTEM, _symbol_id, _lookup, _show) +#define _TRACE_DEFINE_SYM_FNS(_system, _symbol_id, _lookup, _show) \ + __TRACE_DEFINE_SYM_FNS(_system, _symbol_id, _lookup, _show) +#define __TRACE_DEFINE_SYM_FNS(_system, _symbol_id, _lookup, _show) \ + ___TRACE_DEFINE_SYM_FNS(_system ## _ ## _symbol_id, _symbol_id, \ + _lookup, _show) +#define ___TRACE_DEFINE_SYM_FNS(_name, _symbol_id, _lookup, _show) \ + static struct trace_sym_def \ + __trace_sym_def_ ## _name = { \ + .system = TRACE_SYSTEM_STRING, \ + /* need the ) for later strcmp */ \ + .symbol_id = #_symbol_id ")", \ + .lookup = _lookup, \ + .show = _show, \ + }; \ + static struct trace_sym_def \ + __section("_ftrace_sym_defs") \ + *__trace_sym_def_p_ ## _name = &__trace_sym_def_ ## _name + +/* + * Define a symbol for __print_sym by giving lookup and + * show functions. See &struct trace_sym_def. + */ +#define TRACE_DEFINE_SYM_LIST(_symbol_id, ...) \ + _TRACE_DEFINE_SYM_LIST(TRACE_SYSTEM, _symbol_id, __VA_ARGS__) +#define _TRACE_DEFINE_SYM_LIST(_system, _symbol_id, ...) \ + __TRACE_DEFINE_SYM_LIST(_system, _symbol_id, __VA_ARGS__) +#define __TRACE_DEFINE_SYM_LIST(_system, _symbol_id, ...) \ + ___TRACE_DEFINE_SYM_LIST(_system ## _ ## _symbol_id, _symbol_id,\ + __VA_ARGS__) +#define ___TRACE_DEFINE_SYM_LIST(_name, _symbol_id, ...) \ + static struct trace_sym_entry \ + __trace_sym_list_ ## _name[] = { __VA_ARGS__ }; \ + static const char * \ + __trace_sym_lookup_ ## _name(unsigned long long value) \ + { \ + return trace_sym_lookup(__trace_sym_list_ ## _name, \ + ARRAY_SIZE(__trace_sym_list_ ## _name), value); \ + } \ + static void \ + __trace_sym_show_ ## _name(struct seq_file *m) \ + { \ + trace_sym_show(m, __trace_sym_list_ ## _name, \ + ARRAY_SIZE(__trace_sym_list_ ## _name)); \ + } \ + ___TRACE_DEFINE_SYM_FNS(_name, _symbol_id, \ + __trace_sym_lookup_ ## _name, \ + __trace_sym_show_ ## _name) + #undef TRACE_DEFINE_SIZEOF #define TRACE_DEFINE_SIZEOF(a) \ static struct trace_eval_map __used __initdata \ diff --git a/include/trace/stages/stage2_data_offsets.h b/include/trace/stages/stage2_data_offsets.h index 469b6a64293d..83a287f78501 100644 --- a/include/trace/stages/stage2_data_offsets.h +++ b/include/trace/stages/stage2_data_offsets.h @@ -5,6 +5,12 @@ #undef TRACE_DEFINE_ENUM #define TRACE_DEFINE_ENUM(a) +#undef TRACE_DEFINE_SYM_FNS +#define TRACE_DEFINE_SYM_FNS(_symbol_id, _lookup, _show) + +#undef TRACE_DEFINE_SYM_LIST +#define TRACE_DEFINE_SYM_LIST(_symbol_id, ...) + #undef TRACE_DEFINE_SIZEOF #define TRACE_DEFINE_SIZEOF(a) diff --git a/include/trace/stages/stage3_trace_output.h b/include/trace/stages/stage3_trace_output.h index c1fb1355d309..d2c6458b62dc 100644 --- a/include/trace/stages/stage3_trace_output.h +++ b/include/trace/stages/stage3_trace_output.h @@ -79,6 +79,15 @@ trace_print_symbols_seq(p, value, symbols); \ }) +#undef __print_sym +#define __print_sym(value, symbol_id) \ + ___print_sym(TRACE_SYSTEM, value, symbol_id) +#define ___print_sym(sys, value, symbol_id) \ + ____print_sym(sys, value, symbol_id) +#define ____print_sym(sys, value, symbol_id) \ + trace_print_sym_seq(p, value, \ + __trace_sym_def_p_ ## sys ## _ ## symbol_id->lookup) + #undef __print_flags_u64 #undef __print_symbolic_u64 #if BITS_PER_LONG == 32 diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h index bcb960d16fc0..ce802172b5d7 100644 --- a/include/trace/stages/stage7_class_define.h +++ b/include/trace/stages/stage7_class_define.h @@ -25,6 +25,10 @@ #undef __print_hex_dump #undef __get_buf +#undef __print_sym +/* use // as the separator since it'd be a comment in C */ +#define __print_sym(value, symbol_id) __print_sym(value\/\/symbol_id) + /* * The below is not executed in the kernel. It is only what is * displayed in the print format for userspace to parse. diff --git a/kernel/module/main.c b/kernel/module/main.c index 98fedfdb8db5..7752c3883cd2 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2161,6 +2161,9 @@ static int find_module_sections(struct module *mod, struct load_info *info) mod->trace_evals = section_objs(info, "_ftrace_eval_map", sizeof(*mod->trace_evals), &mod->num_trace_evals); + mod->trace_sym_defs = section_objs(info, "_ftrace_sym_defs", + sizeof(*mod->trace_sym_defs), + &mod->num_trace_sym_defs); #endif #ifdef CONFIG_TRACING mod->trace_bprintk_fmt_start = section_objs(info, "__trace_printk_fmt", diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index ed367d713be0..38eaa7dce078 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -1559,6 +1559,104 @@ static void *f_next(struct seq_file *m, void *v, loff_t *pos) return node; } +extern struct trace_sym_def *__start_ftrace_sym_defs[]; +extern struct trace_sym_def *__stop_ftrace_sym_defs[]; + +static void show_sym_list(struct seq_file *m, struct trace_event_call *call, + const char *name) +{ + struct trace_sym_def **sym_defs; + unsigned int n_sym_defs, i; + + if (call->module) { + struct module *mod = call->module; + + sym_defs = mod->trace_sym_defs; + n_sym_defs = mod->num_trace_sym_defs; + } else { + sym_defs = __start_ftrace_sym_defs; + n_sym_defs = __stop_ftrace_sym_defs - __start_ftrace_sym_defs; + } + + for (i = 0; i < n_sym_defs; i++) { + if (!sym_defs[i]) + continue; + if (sym_defs[i]->system != call->class->system) + continue; + if (strncmp(name, sym_defs[i]->symbol_id, + strlen(sym_defs[i]->symbol_id))) + continue; + if (sym_defs[i]->show) + sym_defs[i]->show(m); + break; + } +} + +static void show_print_fmt(struct seq_file *m, struct trace_event_call *call) +{ + char *ptr = call->print_fmt; + int quote = 0; + int wait_for_doubleslash = 0; + + if (!(call->flags & TRACE_EVENT_FL_DYNPRINT) || + /* WARN because dynamic events have no module */ + WARN_ON_ONCE(call->flags & TRACE_EVENT_FL_DYNAMIC)) { + seq_printf(m, "\nprint fmt: %s\n", call->print_fmt); + return; + } + + seq_puts(m, "\nprint fmt: "); + while (*ptr) { + if (*ptr == '\\') { + seq_putc(m, *ptr); + ptr++; + /* paranoid */ + if (!*ptr) + break; + goto next; + } + if (*ptr == '"') { + quote ^= 1; + goto next; + } + if (quote) + goto next; + + if (wait_for_doubleslash == 1 && *ptr == '/') { + wait_for_doubleslash = 2; + ptr++; + continue; + } else if (wait_for_doubleslash == 2 && *ptr != '/') { + seq_putc(m, '/'); + wait_for_doubleslash = 1; + goto next; + } else if (wait_for_doubleslash == 2 && *ptr == '/') { + const char *name; + + ptr++; + name = ptr; + /* skip the name */ + while (*ptr && *ptr != ')') + ptr++; + wait_for_doubleslash = 0; + show_sym_list(m, call, name); + continue; + } + + if (strncmp(ptr, "__print_sym(", 12) == 0) { + ptr += 12; + seq_puts(m, "__print_symbolic("); + wait_for_doubleslash = 1; + continue; + } +next: + seq_putc(m, *ptr); + ptr++; + } + + seq_putc(m, '\n'); +} + static int f_show(struct seq_file *m, void *v) { struct trace_event_call *call = event_file_data(m->private); @@ -1577,8 +1675,7 @@ static int f_show(struct seq_file *m, void *v) return 0; case FORMAT_PRINTFMT: - seq_printf(m, "\nprint fmt: %s\n", - call->print_fmt); + show_print_fmt(m, call); return 0; } @@ -2552,6 +2649,32 @@ static int event_init(struct trace_event_call *call) return ret; } +static void set_event_dynprint(struct trace_event_call *call) +{ + char *ptr; + int quote = 0; + + for (ptr = call->print_fmt; *ptr; ptr++) { + if (*ptr == '\\') { + ptr++; + /* paranoid */ + if (!*ptr) + break; + continue; + } + if (*ptr == '"') { + quote ^= 1; + continue; + } + if (quote) + continue; + if (strncmp(ptr, "__print_sym(", 12) == 0) { + call->flags |= TRACE_EVENT_FL_DYNPRINT; + break; + } + } +} + static int __register_event(struct trace_event_call *call, struct module *mod) { @@ -2567,6 +2690,8 @@ __register_event(struct trace_event_call *call, struct module *mod) else call->module = mod; + set_event_dynprint(call); + return 0; } @@ -3810,6 +3935,8 @@ static __init int event_trace_enable(void) ret = event_init(call); if (!ret) list_add(&call->list, &ftrace_events); + + set_event_dynprint(call); } register_trigger_cmds(); diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index db575094c498..9196f408a1d2 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -124,6 +124,49 @@ trace_print_symbols_seq(struct trace_seq *p, unsigned long val, } EXPORT_SYMBOL(trace_print_symbols_seq); +const char *trace_sym_lookup(const struct trace_sym_entry *list, + size_t len, unsigned long long value) +{ + size_t i; + + for (i = 0; i < len; i++) { + if (list[i].value == value) + return list[i].name; + } + return NULL; +} +EXPORT_SYMBOL(trace_sym_lookup); + +void trace_sym_show(struct seq_file *m, + const struct trace_sym_entry *list, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + seq_printf(m, ", { %lld, \"%s\" }", + list[i].value, list[i].name); +} +EXPORT_SYMBOL(trace_sym_show); + +const char * +trace_print_sym_seq(struct trace_seq *p, unsigned long long val, + const char *(*lookup)(unsigned long long val)) +{ + const char *ret = trace_seq_buffer_ptr(p); + const char *name; + + name = lookup(val); + if (name) + trace_seq_puts(p, name); + else + trace_seq_printf(p, "0x%llx", val); + + trace_seq_putc(p, 0); + + return ret; +} +EXPORT_SYMBOL(trace_print_sym_seq); + #if BITS_PER_LONG == 32 const char * trace_print_flags_seq_u64(struct trace_seq *p, const char *delim, -- 2.41.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() 2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg @ 2023-09-21 15:17 ` Johannes Berg 0 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2023-09-21 15:17 UTC (permalink / raw) To: linux-kernel@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-wireless@vger.kernel.org On Thu, 2023-09-21 at 08:51 +0000, Johannes Berg wrote: > > - Is it correct that we can assume RCU critical section when in > the lookup function? The SKB code currently does, but I may > not ever have actually run this code yet. Well, I could easily answer that myself, and no, it's incorrect. It'd be really useful though for these lookups to be able to do them under RCU, so I think I'll fold this? --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -34,7 +34,7 @@ struct trace_eval_map { struct trace_sym_def { const char *system; const char *symbol_id; - /* may return NULL */ + /* may return NULL, called under rcu_read_lock() */ const char * (*lookup)(unsigned long long); /* * Must print the list: ', { val, "name"}, ...' --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -155,11 +155,13 @@ trace_print_sym_seq(struct trace_seq *p, unsigned long long val, const char *ret = trace_seq_buffer_ptr(p); const char *name; + rcu_read_lock(); name = lookup(val); if (name) trace_seq_puts(p, name); else trace_seq_printf(p, "0x%llx", val); + rcu_read_unlock(); trace_seq_putc(p, 0); johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing 2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg @ 2023-09-21 8:51 ` Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg ` (2 subsequent siblings) 4 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw) To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg From: Johannes Berg <johannes.berg@intel.com> The __print_symbolic() could only ever print the core drop reasons, since that's the way the infrastructure works. Now that we have __print_sym() with all the advantages mentioned in that commit, convert to that and get all the drop reasons from all subsystems. As we already have a list of them, that's really easy. This is a little bit of .text (~100 bytes in my build) and saves a lot of .data (~17k). Signed-off-by: Johannes Berg <johannes.berg@intel.com> --- include/net/dropreason.h | 5 +++++ include/trace/events/skb.h | 16 +++----------- net/core/skbuff.c | 43 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+), 13 deletions(-) diff --git a/include/net/dropreason.h b/include/net/dropreason.h index 56cb7be92244..c157070b5303 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -42,6 +42,11 @@ struct drop_reason_list { extern const struct drop_reason_list __rcu * drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM]; +#ifdef CONFIG_TRACEPOINTS +const char *drop_reason_lookup(unsigned long long value); +void drop_reason_show(struct seq_file *m); +#endif + void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys, const struct drop_reason_list *list); void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys); diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 07e0715628ec..8a1a63f9e796 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -8,15 +8,9 @@ #include <linux/skbuff.h> #include <linux/netdevice.h> #include <linux/tracepoint.h> +#include <net/dropreason.h> -#undef FN -#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason); -DEFINE_DROP_REASON(FN, FN) - -#undef FN -#undef FNe -#define FN(reason) { SKB_DROP_REASON_##reason, #reason }, -#define FNe(reason) { SKB_DROP_REASON_##reason, #reason } +TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show); /* * Tracepoint for free an sk_buff: @@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb, TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s", __entry->skbaddr, __entry->protocol, __entry->location, - __print_symbolic(__entry->reason, - DEFINE_DROP_REASON(FN, FNe))) + __print_sym(__entry->reason, drop_reason )) ); -#undef FN -#undef FNe - TRACE_EVENT(consume_skb, TP_PROTO(struct sk_buff *skb, void *location), diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4eaf7ed0d1f4..415329b76921 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -132,6 +132,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { }; EXPORT_SYMBOL(drop_reasons_by_subsys); +#ifdef CONFIG_TRACEPOINTS +const char *drop_reason_lookup(unsigned long long value) +{ + unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT; + u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK; + const struct drop_reason_list *subsys; + + if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM) + return NULL; + + subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]); + if (!subsys) + return NULL; + if (reason >= subsys->n_reasons) + return NULL; + return subsys->reasons[reason]; +} + +void drop_reason_show(struct seq_file *m) +{ + u32 subsys_id; + + rcu_read_lock(); + for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; subsys_id++) { + const struct drop_reason_list *subsys; + u32 i; + + subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]); + if (!subsys) + continue; + + for (i = 0; i < subsys->n_reasons; i++) { + if (!subsys->reasons[i]) + continue; + seq_printf(m, ", { %u, \"%s\" }", + (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) | i, + subsys->reasons[i]); + } + } + rcu_read_unlock(); +} +#endif + /** * drop_reasons_register_subsys - register another drop reason subsystem * @subsys: the subsystem to register, must not be the core -- 2.41.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() 2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing Johannes Berg @ 2023-09-21 8:51 ` Johannes Berg 2023-09-21 17:54 ` Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 4/4] tracing/timer: use __print_sym() Johannes Berg 2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski 4 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw) To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg From: Johannes Berg <johannes.berg@intel.com> Now that we have drop_reason_lookup(), we can just use it for drop_monitor as well, rather than exporting the list itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> --- include/net/dropreason.h | 4 ---- net/core/drop_monitor.c | 18 +++--------------- net/core/skbuff.c | 6 +++--- 3 files changed, 6 insertions(+), 22 deletions(-) diff --git a/include/net/dropreason.h b/include/net/dropreason.h index c157070b5303..0e2195ccf2cd 100644 --- a/include/net/dropreason.h +++ b/include/net/dropreason.h @@ -38,10 +38,6 @@ struct drop_reason_list { size_t n_reasons; }; -/* Note: due to dynamic registrations, access must be under RCU */ -extern const struct drop_reason_list __rcu * -drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM]; - #ifdef CONFIG_TRACEPOINTS const char *drop_reason_lookup(unsigned long long value); void drop_reason_show(struct seq_file *m); diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c index aff31cd944c2..d110a16cde46 100644 --- a/net/core/drop_monitor.c +++ b/net/core/drop_monitor.c @@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, size_t payload_len) { struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb); - const struct drop_reason_list *list = NULL; - unsigned int subsys, subsys_reason; char buf[NET_DM_MAX_SYMBOL_LEN]; + const char *reason_str; struct nlattr *attr; void *hdr; int rc; @@ -630,19 +629,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb, goto nla_put_failure; rcu_read_lock(); - subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK); - if (subsys < SKB_DROP_REASON_SUBSYS_NUM) - list = rcu_dereference(drop_reasons_by_subsys[subsys]); - subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK; - if (!list || - subsys_reason >= list->n_reasons || - !list->reasons[subsys_reason] || - strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) { - list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]); - subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED; - } - if (nla_put_string(msg, NET_DM_ATTR_REASON, - list->reasons[subsys_reason])) { + reason_str = drop_reason_lookup(cb->reason); + if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) { rcu_read_unlock(); goto nla_put_failure; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 415329b76921..9efde769949d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -126,13 +126,11 @@ static const struct drop_reason_list drop_reasons_core = { .n_reasons = ARRAY_SIZE(drop_reasons), }; -const struct drop_reason_list __rcu * +static const struct drop_reason_list __rcu * drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = { [SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(&drop_reasons_core), }; -EXPORT_SYMBOL(drop_reasons_by_subsys); -#ifdef CONFIG_TRACEPOINTS const char *drop_reason_lookup(unsigned long long value) { unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT; @@ -149,7 +147,9 @@ const char *drop_reason_lookup(unsigned long long value) return NULL; return subsys->reasons[reason]; } +EXPORT_SYMBOL(drop_reason_lookup); +#ifdef CONFIG_TRACEPOINTS void drop_reason_show(struct seq_file *m) { u32 subsys_id; -- 2.41.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() 2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg @ 2023-09-21 17:54 ` Johannes Berg 0 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2023-09-21 17:54 UTC (permalink / raw) To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless On Thu, 2023-09-21 at 10:51 +0200, Johannes Berg wrote: > > - if (!list || > - subsys_reason >= list->n_reasons || > - !list->reasons[subsys_reason] || > - strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) { > - list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]); > - subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED; > - } Oops, I lost this translation of erroneous ones to the core SKB_DROP_REASON_NOT_SPECIFIED. Maybe we should just not have the attribute in that case? But that could be a different change too. > - if (nla_put_string(msg, NET_DM_ATTR_REASON, > - list->reasons[subsys_reason])) { > + reason_str = drop_reason_lookup(cb->reason); > + if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) { > rcu_read_unlock(); johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 4/4] tracing/timer: use __print_sym() 2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg ` (2 preceding siblings ...) 2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg @ 2023-09-21 8:51 ` Johannes Berg 2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski 4 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw) To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg From: Johannes Berg <johannes.berg@intel.com> Use the new __print_sym() in the timer tracing, just to show how to convert something. This adds ~80 bytes of .text for a saving of ~1.5K of data in my builds. Note the format changes from print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { (1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" }, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" }) to print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, "PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, "RCU_EXP" }) Since the values are now just printed in the show function as pure decimal values. Signed-off-by: Johannes Berg <johannes.berg@intel.com> --- include/trace/events/timer.h | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index b4bc2828fa09..cb8294da29d0 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -382,26 +382,18 @@ TRACE_EVENT(itimer_expire, #undef tick_dep_mask_name #undef tick_dep_name_end -/* The MASK will convert to their bits and they need to be processed too */ -#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \ - TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); -#define tick_dep_name_end(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \ - TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); -/* NONE only has a mask defined for it */ -#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep); - -TICK_DEP_NAMES - -#undef tick_dep_name -#undef tick_dep_mask_name -#undef tick_dep_name_end - #define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep }, #define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep }, #define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep } +TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES); + +#undef tick_dep_name +#undef tick_dep_mask_name +#undef tick_dep_name_end + #define show_tick_dep_name(val) \ - __print_symbolic(val, TICK_DEP_NAMES) + __print_sym(val, tick_dep_names) TRACE_EVENT(tick_stop, -- 2.41.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg ` (3 preceding siblings ...) 2023-09-21 8:51 ` [RFC PATCH 4/4] tracing/timer: use __print_sym() Johannes Berg @ 2023-10-04 16:22 ` Jakub Kicinski 2023-10-04 16:35 ` Steven Rostedt 2023-10-04 18:38 ` Johannes Berg 4 siblings, 2 replies; 16+ messages in thread From: Jakub Kicinski @ 2023-10-04 16:22 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-kernel, linux-trace-kernel, netdev, linux-wireless On Thu, 21 Sep 2023 10:51:30 +0200 Johannes Berg wrote: > So I was frustrated with not seeing the names of SKB dropreasons > for all but the core reasons, and then while looking into this > all, realized, that the current __print_symbolic() is pretty bad > anyway. > > So I came up with a new approach, using a separate declaration > of the symbols, and __print_sym() in there, but to userspace it > all doesn't matter, it shows it the same way, just dyamically > instead of munging with the strings all the time. > > This is a huge .data savings as far as I can tell, with a modest > amount (~4k) of .text addition, while making it all dynamic and > in the SKB dropreason case even reusing the existing list that > dropmonitor uses today. Surely patch 3 isn't needed here, but it > felt right. > > Anyway, I think it's a pretty reasonable approach overall, and > it does works. > > I've listed a number of open questions in the first patch since > that's where the real changes for this are. Potentially naive question - the trace point holds enum skb_drop_reason. The user space can get the names from BTF. Can we not teach user space to generically look up names of enums in BTF? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski @ 2023-10-04 16:35 ` Steven Rostedt 2023-10-04 16:54 ` Jakub Kicinski 2023-10-04 18:38 ` Johannes Berg 1 sibling, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2023-10-04 16:35 UTC (permalink / raw) To: Jakub Kicinski Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev, linux-wireless On Wed, 4 Oct 2023 09:22:05 -0700 Jakub Kicinski <kuba@kernel.org> wrote: > Potentially naive question - the trace point holds enum skb_drop_reason. > The user space can get the names from BTF. Can we not teach user space > to generically look up names of enums in BTF? That puts a hard requirement to include BTF in builds where it was not needed before. I really do not want to build with BTF just to get access to these symbols. And since this is used by the embedded world, and BTF is extremely bloated, the short answer is "No". -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 16:35 ` Steven Rostedt @ 2023-10-04 16:54 ` Jakub Kicinski 2023-10-04 17:29 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Jakub Kicinski @ 2023-10-04 16:54 UTC (permalink / raw) To: Steven Rostedt Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev, linux-wireless On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote: > > Potentially naive question - the trace point holds enum skb_drop_reason. > > The user space can get the names from BTF. Can we not teach user space > > to generically look up names of enums in BTF? > > That puts a hard requirement to include BTF in builds where it was not > needed before. I really do not want to build with BTF just to get access to > these symbols. And since this is used by the embedded world, and BTF is > extremely bloated, the short answer is "No". Dunno. BTF is there most of the time. It could make the life of majority of the users far more pleasant. I hope we can at least agree that the current methods of generating the string arrays at C level are... aesthetically displeasing. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 16:54 ` Jakub Kicinski @ 2023-10-04 17:29 ` Steven Rostedt 2023-10-04 21:35 ` Alan Maguire 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2023-10-04 17:29 UTC (permalink / raw) To: Jakub Kicinski Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev, linux-wireless On Wed, 4 Oct 2023 09:54:31 -0700 Jakub Kicinski <kuba@kernel.org> wrote: > On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote: > > > Potentially naive question - the trace point holds enum skb_drop_reason. > > > The user space can get the names from BTF. Can we not teach user space > > > to generically look up names of enums in BTF? > > > > That puts a hard requirement to include BTF in builds where it was not > > needed before. I really do not want to build with BTF just to get access to > > these symbols. And since this is used by the embedded world, and BTF is > > extremely bloated, the short answer is "No". > > Dunno. BTF is there most of the time. It could make the life of > majority of the users far more pleasant. BTF isn't there for a lot of developers working in embedded who use this code. Most my users that I deal with have minimal environments, so BTF is a showstopper. > > I hope we can at least agree that the current methods of generating > the string arrays at C level are... aesthetically displeasing. I don't know, I kinda like it ;-) -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 17:29 ` Steven Rostedt @ 2023-10-04 21:35 ` Alan Maguire 2023-10-04 21:43 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Alan Maguire @ 2023-10-04 21:35 UTC (permalink / raw) To: Steven Rostedt, Jakub Kicinski Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev, linux-wireless On 04/10/2023 18:29, Steven Rostedt wrote: > On Wed, 4 Oct 2023 09:54:31 -0700 > Jakub Kicinski <kuba@kernel.org> wrote: > >> On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote: >>>> Potentially naive question - the trace point holds enum skb_drop_reason. >>>> The user space can get the names from BTF. Can we not teach user space >>>> to generically look up names of enums in BTF? >>> >>> That puts a hard requirement to include BTF in builds where it was not >>> needed before. I really do not want to build with BTF just to get access to >>> these symbols. And since this is used by the embedded world, and BTF is >>> extremely bloated, the short answer is "No". >> >> Dunno. BTF is there most of the time. It could make the life of >> majority of the users far more pleasant. > > BTF isn't there for a lot of developers working in embedded who use this > code. Most my users that I deal with have minimal environments, so BTF is a > showstopper. One thing we've heard from some embedded folks [1] is that having kernel BTF loadable as a separate module (rather than embedded in vmlinux) would help, as there are size limits on vmlinux that they can workaround by having modules on a different partition. We're hoping to get that working soon. I was wondering if you see other issues around BTF adoption for embedded systems that we could put on the to-do list? Not necessarily for this particular use-case (since there are complications with trace data as you describe), but just trying to make sure we can remove barriers to BTF adoption where possible. Thanks! Alan [1] https://lore.kernel.org/bpf/CAHBbfcUkr6fTm2X9GNsFNqV75fTG=aBQXFx_8Ayk+4hk7heB-g@mail.gmail.com/ > >> >> I hope we can at least agree that the current methods of generating >> the string arrays at C level are... aesthetically displeasing. > > I don't know, I kinda like it ;-) > > -- Steve > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 21:35 ` Alan Maguire @ 2023-10-04 21:43 ` Steven Rostedt 2023-10-04 22:07 ` Alan Maguire 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2023-10-04 21:43 UTC (permalink / raw) To: Alan Maguire Cc: Jakub Kicinski, Johannes Berg, linux-kernel, linux-trace-kernel, netdev, linux-wireless On Wed, 4 Oct 2023 22:35:07 +0100 Alan Maguire <alan.maguire@oracle.com> wrote: > One thing we've heard from some embedded folks [1] is that having > kernel BTF loadable as a separate module (rather than embedded in > vmlinux) would help, as there are size limits on vmlinux that they can > workaround by having modules on a different partition. We're hoping > to get that working soon. I was wondering if you see other issues around > BTF adoption for embedded systems that we could put on the to-do list? > Not necessarily for this particular use-case (since there are > complications with trace data as you describe), but just trying to make > sure we can remove barriers to BTF adoption where possible. I wonder how easy is it to create subsets of BTF. For one thing, in the future we want to be able to trace the arguments of all functions. That is, tracing all functions at the same time (function tracer) and getting the arguments within the trace. This would only require information about functions and their arguments, which would be very useful. Is BTF easy to break apart? That is, just generate the information needed for function arguments? Note, pretty much all functions do not pass structures by values, and this would not need to know the contents of a pointer to a structure. This would mean that structure layout information is not needed. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 21:43 ` Steven Rostedt @ 2023-10-04 22:07 ` Alan Maguire 0 siblings, 0 replies; 16+ messages in thread From: Alan Maguire @ 2023-10-04 22:07 UTC (permalink / raw) To: Steven Rostedt Cc: Jakub Kicinski, Johannes Berg, linux-kernel, linux-trace-kernel, netdev, linux-wireless On 04/10/2023 22:43, Steven Rostedt wrote: > On Wed, 4 Oct 2023 22:35:07 +0100 > Alan Maguire <alan.maguire@oracle.com> wrote: > >> One thing we've heard from some embedded folks [1] is that having >> kernel BTF loadable as a separate module (rather than embedded in >> vmlinux) would help, as there are size limits on vmlinux that they can >> workaround by having modules on a different partition. We're hoping >> to get that working soon. I was wondering if you see other issues around >> BTF adoption for embedded systems that we could put on the to-do list? >> Not necessarily for this particular use-case (since there are >> complications with trace data as you describe), but just trying to make >> sure we can remove barriers to BTF adoption where possible. > > I wonder how easy is it to create subsets of BTF. For one thing, in the > future we want to be able to trace the arguments of all functions. That is, > tracing all functions at the same time (function tracer) and getting the > arguments within the trace. > > This would only require information about functions and their arguments, > which would be very useful. Is BTF easy to break apart? That is, just > generate the information needed for function arguments? > There has been a fair bit of effort around this from the userspace side; the BTF gen efforts were focused around applications carrying the minimum BTF for their needs, so just the structures needed by the particular BPF programs rather than the full set of vmlinux structures for example [1]. Parsing BTF in-kernel to pull out the BTF functions (BTF_KIND_FUNC), their prototypes (BTF_KIND_FUNC_PROTO) and all associated parameters would be pretty straightforward I think, especially if you don't need the structures that are passed via pointers. So if you're starting with the full BTF, creating a subset for use in tracing would be reasonably straightforward. My personal preference would always be to have the full BTF where possible, but if that wasn't feasible on some systems we'd need to add some options to pahole/libbpf to support such trimming during the DWARF->BTF translation process. Alan [1] https://lore.kernel.org/bpf/20220209222646.348365-7-mauricio@kinvolk.io/ > Note, pretty much all functions do not pass structures by values, and this > would not need to know the contents of a pointer to a structure. This would > mean that structure layout information is not needed. > > -- Steve > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski 2023-10-04 16:35 ` Steven Rostedt @ 2023-10-04 18:38 ` Johannes Berg 2023-10-04 18:46 ` Steven Rostedt 1 sibling, 1 reply; 16+ messages in thread From: Johannes Berg @ 2023-10-04 18:38 UTC (permalink / raw) To: Jakub Kicinski; +Cc: linux-kernel, linux-trace-kernel, netdev, linux-wireless On Wed, 2023-10-04 at 09:22 -0700, Jakub Kicinski wrote: > > Potentially naive question - the trace point holds enum skb_drop_reason. > The user space can get the names from BTF. Can we not teach user space > to generically look up names of enums in BTF? I'll note that, unrelated to the discussion about whether or not we could use BTF, we couldn't do it in this case anyway since the whole drop reasons aren't captured in enum skb_drop_reason, that contains only the core ones, and now other subsystems are adding their own somewhat dynamically later. johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing 2023-10-04 18:38 ` Johannes Berg @ 2023-10-04 18:46 ` Steven Rostedt 0 siblings, 0 replies; 16+ messages in thread From: Steven Rostedt @ 2023-10-04 18:46 UTC (permalink / raw) To: Johannes Berg Cc: Jakub Kicinski, linux-kernel, linux-trace-kernel, netdev, linux-wireless On Wed, 04 Oct 2023 20:38:46 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > On Wed, 2023-10-04 at 09:22 -0700, Jakub Kicinski wrote: > > > > Potentially naive question - the trace point holds enum skb_drop_reason. > > The user space can get the names from BTF. Can we not teach user space > > to generically look up names of enums in BTF? > > I'll note that, unrelated to the discussion about whether or not we > could use BTF, we couldn't do it in this case anyway since the whole > drop reasons aren't captured in enum skb_drop_reason, that contains only > the core ones, and now other subsystems are adding their own somewhat > dynamically later. Another issue with using BTF, is that the BTF would need to be saved in the trace.dat or perf.data file, as many times the trace data is moved off to another machine for offline analysis. And using the vmlinux would not be useful, because there is several times you have multiple trace files for various versions of a build and that would require mapping which vmlinux/btf file goes with which trace data. Right now, the conversions can easily be saved in the trace file directly. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-10-04 22:08 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg 2023-09-21 15:17 ` Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg 2023-09-21 17:54 ` Johannes Berg 2023-09-21 8:51 ` [RFC PATCH 4/4] tracing/timer: use __print_sym() Johannes Berg 2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski 2023-10-04 16:35 ` Steven Rostedt 2023-10-04 16:54 ` Jakub Kicinski 2023-10-04 17:29 ` Steven Rostedt 2023-10-04 21:35 ` Alan Maguire 2023-10-04 21:43 ` Steven Rostedt 2023-10-04 22:07 ` Alan Maguire 2023-10-04 18:38 ` Johannes Berg 2023-10-04 18:46 ` Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).