* [PATCH 0/4] tracing: improve symbolic printing
@ 2023-09-21 8:51 Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless
So I was frustrated with not seeing the names of SKB dropreasons
for all but the core reasons, and then while looking into this
all, realized, that the current __print_symbolic() is pretty bad
anyway.
So I came up with a new approach, using a separate declaration
of the symbols, and __print_sym() in there, but to userspace it
all doesn't matter, it shows it the same way, just dyamically
instead of munging with the strings all the time.
This is a huge .data savings as far as I can tell, with a modest
amount (~4k) of .text addition, while making it all dynamic and
in the SKB dropreason case even reusing the existing list that
dropmonitor uses today. Surely patch 3 isn't needed here, but it
felt right.
Anyway, I think it's a pretty reasonable approach overall, and
it does works.
I've listed a number of open questions in the first patch since
that's where the real changes for this are.
johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic()
2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg
@ 2023-09-21 8:51 ` Johannes Berg
2023-09-21 15:17 ` Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing Johannes Berg
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg
From: Johannes Berg <johannes.berg@intel.com>
The way __print_symbolic() works is limited and inefficient
in multiple ways:
- you can only use it with a static list of symbols, but
e.g. the SKB dropreasons are now a dynamic list
- it builds the list in memory _three_ times, so it takes
a lot of memory:
- The print_fmt contains the list (since it's passed to
the macro there). This actually contains the names
_twice_, which is fixed up at runtime.
- TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map
for every entry, plus the string pointed to by it, which
cannot be deduplicated with the strings in the print_fmt
- The in-kernel symbolic printing creates yet another list
of struct trace_print_flags for trace_print_symbols_seq()
- it also requires runtime fixup during init, which is a lot
of string parsing due to the print_fmt fixup
Introduce __print_sym() to - over time - replace the old one.
We can easily extend this also to __print_flags later, but I
cared more about the SKB dropreasons for now.
This new __print_sym() requires only a single list of items,
created by TRACE_DEFINE_SYM_LIST(), for can even use another
already existing list by using TRACE_DEFINE_SYM_FNS() with
lookup and show methods.
Then, instead of doing an init-time fixup, just do this at the
time when userspace reads the print_fmt. This way, dynamically
updated lists are possible.
For userspace, nothing actually changes, because the print_fmt
is shown exactly the same way the old __print_symbolic() was.
This adds about 4k .text in my test builds, but that'll be
more than paid for by the actual conversions.
OPEN QUESTIONS:
- do we need TRACE_EVENT_FL_DYNPRINT or should we just always
go through show_print_fmt()? If we do, we can get rid of
set_event_dynprint() which is also a stupid function, and
somewhat inefficient.
- Is the WARN_ON() in show_print_fmt() correct? Can we have a
dynamic event using this infrastructure? I don't know about
dynamic events enough.
- Is the double-slash (//) thing a good approach? I was looking
for something that couldn't happen in C, and that's a comment
so it shouldn't be possible ... It's only internal though so
we could always change it. Parsing to the next , means we'd
have to parse a LOT of C, so that's out of the question.
- Is it correct that we can assume RCU critical section when in
the lookup function? The SKB code currently does, but I may
not ever have actually run this code yet.
- Where should all the functions be implemented :-)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
include/asm-generic/vmlinux.lds.h | 3 +-
include/linux/module.h | 2 +
include/linux/trace_events.h | 10 ++
include/linux/tracepoint.h | 20 ++++
include/trace/stages/init.h | 54 +++++++++
include/trace/stages/stage2_data_offsets.h | 6 +
include/trace/stages/stage3_trace_output.h | 9 ++
include/trace/stages/stage7_class_define.h | 4 +
kernel/module/main.c | 3 +
kernel/trace/trace_events.c | 131 ++++++++++++++++++++-
kernel/trace/trace_output.c | 43 +++++++
11 files changed, 282 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 9c59409104f6..b2b45ff0a701 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -263,7 +263,8 @@
#define FTRACE_EVENTS() \
. = ALIGN(8); \
BOUNDED_SECTION(_ftrace_events) \
- BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps)
+ BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \
+ BOUNDED_SECTION(_ftrace_sym_defs)
#else
#define FTRACE_EVENTS()
#endif
diff --git a/include/linux/module.h b/include/linux/module.h
index a98e188cf37b..bf14b74a872c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -524,6 +524,8 @@ struct module {
unsigned int num_trace_events;
struct trace_eval_map **trace_evals;
unsigned int num_trace_evals;
+ struct trace_sym_def **trace_sym_defs;
+ unsigned int num_trace_sym_defs;
#endif
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
unsigned int num_ftrace_callsites;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index eb5c3add939b..9ab651a8fa0b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -24,6 +24,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
const struct trace_print_flags *symbol_array);
+const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val,
+ const char *(*lookup)(unsigned long long val));
+const char *trace_sym_lookup(const struct trace_sym_entry *list,
+ size_t len, unsigned long long value);
+void trace_sym_show(struct seq_file *m,
+ const struct trace_sym_entry *list, size_t len);
+
#if BITS_PER_LONG == 32
const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
unsigned long long flags,
@@ -331,6 +338,7 @@ enum {
TRACE_EVENT_FL_EPROBE_BIT,
TRACE_EVENT_FL_FPROBE_BIT,
TRACE_EVENT_FL_CUSTOM_BIT,
+ TRACE_EVENT_FL_DYNPRINT_BIT,
};
/*
@@ -348,6 +356,7 @@ enum {
* CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint)
* This is set when the custom event has not been attached
* to a tracepoint yet, then it is cleared when it is.
+ * DYNPRINT - Event has dynamic symbol prints
*/
enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -361,6 +370,7 @@ enum {
TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT),
TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT),
TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT),
+ TRACE_EVENT_FL_DYNPRINT = (1 << TRACE_EVENT_FL_DYNPRINT_BIT),
};
#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 88c0ba623ee6..7650019f4347 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -31,6 +31,24 @@ struct trace_eval_map {
unsigned long eval_value;
};
+struct trace_sym_def {
+ const char *system;
+ const char *symbol_id;
+ /* may return NULL */
+ const char * (*lookup)(unsigned long long);
+ /*
+ * Must print the list: ', { val, "name"}, ...'
+ * with no trailing comma, but with the leading ', '
+ * to simplify things:
+ */
+ void (*show)(struct seq_file *);
+};
+
+struct trace_sym_entry {
+ unsigned long long value;
+ const char *name;
+};
+
#define TRACEPOINT_DEFAULT_PRIO 10
extern struct srcu_struct tracepoint_srcu;
@@ -109,6 +127,8 @@ extern void syscall_unregfunc(void);
#define TRACE_DEFINE_ENUM(x)
#define TRACE_DEFINE_SIZEOF(x)
+#define TRACE_DEFINE_SYM_FNS(...)
+#define TRACE_DEFINE_SYM_LIST(...)
#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
diff --git a/include/trace/stages/init.h b/include/trace/stages/init.h
index 000bcfc8dd2e..251ab22485d1 100644
--- a/include/trace/stages/init.h
+++ b/include/trace/stages/init.h
@@ -23,6 +23,60 @@ TRACE_MAKE_SYSTEM_STR();
__section("_ftrace_eval_map") \
*TRACE_SYSTEM##_##a = &__##TRACE_SYSTEM##_##a
+/*
+ * Define a symbol for __print_sym by giving lookup and
+ * show functions. See &struct trace_sym_def.
+ */
+#define TRACE_DEFINE_SYM_FNS(_symbol_id, _lookup, _show) \
+ _TRACE_DEFINE_SYM_FNS(TRACE_SYSTEM, _symbol_id, _lookup, _show)
+#define _TRACE_DEFINE_SYM_FNS(_system, _symbol_id, _lookup, _show) \
+ __TRACE_DEFINE_SYM_FNS(_system, _symbol_id, _lookup, _show)
+#define __TRACE_DEFINE_SYM_FNS(_system, _symbol_id, _lookup, _show) \
+ ___TRACE_DEFINE_SYM_FNS(_system ## _ ## _symbol_id, _symbol_id, \
+ _lookup, _show)
+#define ___TRACE_DEFINE_SYM_FNS(_name, _symbol_id, _lookup, _show) \
+ static struct trace_sym_def \
+ __trace_sym_def_ ## _name = { \
+ .system = TRACE_SYSTEM_STRING, \
+ /* need the ) for later strcmp */ \
+ .symbol_id = #_symbol_id ")", \
+ .lookup = _lookup, \
+ .show = _show, \
+ }; \
+ static struct trace_sym_def \
+ __section("_ftrace_sym_defs") \
+ *__trace_sym_def_p_ ## _name = &__trace_sym_def_ ## _name
+
+/*
+ * Define a symbol for __print_sym by giving lookup and
+ * show functions. See &struct trace_sym_def.
+ */
+#define TRACE_DEFINE_SYM_LIST(_symbol_id, ...) \
+ _TRACE_DEFINE_SYM_LIST(TRACE_SYSTEM, _symbol_id, __VA_ARGS__)
+#define _TRACE_DEFINE_SYM_LIST(_system, _symbol_id, ...) \
+ __TRACE_DEFINE_SYM_LIST(_system, _symbol_id, __VA_ARGS__)
+#define __TRACE_DEFINE_SYM_LIST(_system, _symbol_id, ...) \
+ ___TRACE_DEFINE_SYM_LIST(_system ## _ ## _symbol_id, _symbol_id,\
+ __VA_ARGS__)
+#define ___TRACE_DEFINE_SYM_LIST(_name, _symbol_id, ...) \
+ static struct trace_sym_entry \
+ __trace_sym_list_ ## _name[] = { __VA_ARGS__ }; \
+ static const char * \
+ __trace_sym_lookup_ ## _name(unsigned long long value) \
+ { \
+ return trace_sym_lookup(__trace_sym_list_ ## _name, \
+ ARRAY_SIZE(__trace_sym_list_ ## _name), value); \
+ } \
+ static void \
+ __trace_sym_show_ ## _name(struct seq_file *m) \
+ { \
+ trace_sym_show(m, __trace_sym_list_ ## _name, \
+ ARRAY_SIZE(__trace_sym_list_ ## _name)); \
+ } \
+ ___TRACE_DEFINE_SYM_FNS(_name, _symbol_id, \
+ __trace_sym_lookup_ ## _name, \
+ __trace_sym_show_ ## _name)
+
#undef TRACE_DEFINE_SIZEOF
#define TRACE_DEFINE_SIZEOF(a) \
static struct trace_eval_map __used __initdata \
diff --git a/include/trace/stages/stage2_data_offsets.h b/include/trace/stages/stage2_data_offsets.h
index 469b6a64293d..83a287f78501 100644
--- a/include/trace/stages/stage2_data_offsets.h
+++ b/include/trace/stages/stage2_data_offsets.h
@@ -5,6 +5,12 @@
#undef TRACE_DEFINE_ENUM
#define TRACE_DEFINE_ENUM(a)
+#undef TRACE_DEFINE_SYM_FNS
+#define TRACE_DEFINE_SYM_FNS(_symbol_id, _lookup, _show)
+
+#undef TRACE_DEFINE_SYM_LIST
+#define TRACE_DEFINE_SYM_LIST(_symbol_id, ...)
+
#undef TRACE_DEFINE_SIZEOF
#define TRACE_DEFINE_SIZEOF(a)
diff --git a/include/trace/stages/stage3_trace_output.h b/include/trace/stages/stage3_trace_output.h
index c1fb1355d309..d2c6458b62dc 100644
--- a/include/trace/stages/stage3_trace_output.h
+++ b/include/trace/stages/stage3_trace_output.h
@@ -79,6 +79,15 @@
trace_print_symbols_seq(p, value, symbols); \
})
+#undef __print_sym
+#define __print_sym(value, symbol_id) \
+ ___print_sym(TRACE_SYSTEM, value, symbol_id)
+#define ___print_sym(sys, value, symbol_id) \
+ ____print_sym(sys, value, symbol_id)
+#define ____print_sym(sys, value, symbol_id) \
+ trace_print_sym_seq(p, value, \
+ __trace_sym_def_p_ ## sys ## _ ## symbol_id->lookup)
+
#undef __print_flags_u64
#undef __print_symbolic_u64
#if BITS_PER_LONG == 32
diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
index bcb960d16fc0..ce802172b5d7 100644
--- a/include/trace/stages/stage7_class_define.h
+++ b/include/trace/stages/stage7_class_define.h
@@ -25,6 +25,10 @@
#undef __print_hex_dump
#undef __get_buf
+#undef __print_sym
+/* use // as the separator since it'd be a comment in C */
+#define __print_sym(value, symbol_id) __print_sym(value\/\/symbol_id)
+
/*
* The below is not executed in the kernel. It is only what is
* displayed in the print format for userspace to parse.
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 98fedfdb8db5..7752c3883cd2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2161,6 +2161,9 @@ static int find_module_sections(struct module *mod, struct load_info *info)
mod->trace_evals = section_objs(info, "_ftrace_eval_map",
sizeof(*mod->trace_evals),
&mod->num_trace_evals);
+ mod->trace_sym_defs = section_objs(info, "_ftrace_sym_defs",
+ sizeof(*mod->trace_sym_defs),
+ &mod->num_trace_sym_defs);
#endif
#ifdef CONFIG_TRACING
mod->trace_bprintk_fmt_start = section_objs(info, "__trace_printk_fmt",
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index ed367d713be0..38eaa7dce078 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1559,6 +1559,104 @@ static void *f_next(struct seq_file *m, void *v, loff_t *pos)
return node;
}
+extern struct trace_sym_def *__start_ftrace_sym_defs[];
+extern struct trace_sym_def *__stop_ftrace_sym_defs[];
+
+static void show_sym_list(struct seq_file *m, struct trace_event_call *call,
+ const char *name)
+{
+ struct trace_sym_def **sym_defs;
+ unsigned int n_sym_defs, i;
+
+ if (call->module) {
+ struct module *mod = call->module;
+
+ sym_defs = mod->trace_sym_defs;
+ n_sym_defs = mod->num_trace_sym_defs;
+ } else {
+ sym_defs = __start_ftrace_sym_defs;
+ n_sym_defs = __stop_ftrace_sym_defs - __start_ftrace_sym_defs;
+ }
+
+ for (i = 0; i < n_sym_defs; i++) {
+ if (!sym_defs[i])
+ continue;
+ if (sym_defs[i]->system != call->class->system)
+ continue;
+ if (strncmp(name, sym_defs[i]->symbol_id,
+ strlen(sym_defs[i]->symbol_id)))
+ continue;
+ if (sym_defs[i]->show)
+ sym_defs[i]->show(m);
+ break;
+ }
+}
+
+static void show_print_fmt(struct seq_file *m, struct trace_event_call *call)
+{
+ char *ptr = call->print_fmt;
+ int quote = 0;
+ int wait_for_doubleslash = 0;
+
+ if (!(call->flags & TRACE_EVENT_FL_DYNPRINT) ||
+ /* WARN because dynamic events have no module */
+ WARN_ON_ONCE(call->flags & TRACE_EVENT_FL_DYNAMIC)) {
+ seq_printf(m, "\nprint fmt: %s\n", call->print_fmt);
+ return;
+ }
+
+ seq_puts(m, "\nprint fmt: ");
+ while (*ptr) {
+ if (*ptr == '\\') {
+ seq_putc(m, *ptr);
+ ptr++;
+ /* paranoid */
+ if (!*ptr)
+ break;
+ goto next;
+ }
+ if (*ptr == '"') {
+ quote ^= 1;
+ goto next;
+ }
+ if (quote)
+ goto next;
+
+ if (wait_for_doubleslash == 1 && *ptr == '/') {
+ wait_for_doubleslash = 2;
+ ptr++;
+ continue;
+ } else if (wait_for_doubleslash == 2 && *ptr != '/') {
+ seq_putc(m, '/');
+ wait_for_doubleslash = 1;
+ goto next;
+ } else if (wait_for_doubleslash == 2 && *ptr == '/') {
+ const char *name;
+
+ ptr++;
+ name = ptr;
+ /* skip the name */
+ while (*ptr && *ptr != ')')
+ ptr++;
+ wait_for_doubleslash = 0;
+ show_sym_list(m, call, name);
+ continue;
+ }
+
+ if (strncmp(ptr, "__print_sym(", 12) == 0) {
+ ptr += 12;
+ seq_puts(m, "__print_symbolic(");
+ wait_for_doubleslash = 1;
+ continue;
+ }
+next:
+ seq_putc(m, *ptr);
+ ptr++;
+ }
+
+ seq_putc(m, '\n');
+}
+
static int f_show(struct seq_file *m, void *v)
{
struct trace_event_call *call = event_file_data(m->private);
@@ -1577,8 +1675,7 @@ static int f_show(struct seq_file *m, void *v)
return 0;
case FORMAT_PRINTFMT:
- seq_printf(m, "\nprint fmt: %s\n",
- call->print_fmt);
+ show_print_fmt(m, call);
return 0;
}
@@ -2552,6 +2649,32 @@ static int event_init(struct trace_event_call *call)
return ret;
}
+static void set_event_dynprint(struct trace_event_call *call)
+{
+ char *ptr;
+ int quote = 0;
+
+ for (ptr = call->print_fmt; *ptr; ptr++) {
+ if (*ptr == '\\') {
+ ptr++;
+ /* paranoid */
+ if (!*ptr)
+ break;
+ continue;
+ }
+ if (*ptr == '"') {
+ quote ^= 1;
+ continue;
+ }
+ if (quote)
+ continue;
+ if (strncmp(ptr, "__print_sym(", 12) == 0) {
+ call->flags |= TRACE_EVENT_FL_DYNPRINT;
+ break;
+ }
+ }
+}
+
static int
__register_event(struct trace_event_call *call, struct module *mod)
{
@@ -2567,6 +2690,8 @@ __register_event(struct trace_event_call *call, struct module *mod)
else
call->module = mod;
+ set_event_dynprint(call);
+
return 0;
}
@@ -3810,6 +3935,8 @@ static __init int event_trace_enable(void)
ret = event_init(call);
if (!ret)
list_add(&call->list, &ftrace_events);
+
+ set_event_dynprint(call);
}
register_trigger_cmds();
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index db575094c498..9196f408a1d2 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -124,6 +124,49 @@ trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
}
EXPORT_SYMBOL(trace_print_symbols_seq);
+const char *trace_sym_lookup(const struct trace_sym_entry *list,
+ size_t len, unsigned long long value)
+{
+ size_t i;
+
+ for (i = 0; i < len; i++) {
+ if (list[i].value == value)
+ return list[i].name;
+ }
+ return NULL;
+}
+EXPORT_SYMBOL(trace_sym_lookup);
+
+void trace_sym_show(struct seq_file *m,
+ const struct trace_sym_entry *list, size_t len)
+{
+ size_t i;
+
+ for (i = 0; i < len; i++)
+ seq_printf(m, ", { %lld, \"%s\" }",
+ list[i].value, list[i].name);
+}
+EXPORT_SYMBOL(trace_sym_show);
+
+const char *
+trace_print_sym_seq(struct trace_seq *p, unsigned long long val,
+ const char *(*lookup)(unsigned long long val))
+{
+ const char *ret = trace_seq_buffer_ptr(p);
+ const char *name;
+
+ name = lookup(val);
+ if (name)
+ trace_seq_puts(p, name);
+ else
+ trace_seq_printf(p, "0x%llx", val);
+
+ trace_seq_putc(p, 0);
+
+ return ret;
+}
+EXPORT_SYMBOL(trace_print_sym_seq);
+
#if BITS_PER_LONG == 32
const char *
trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing
2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg
@ 2023-09-21 8:51 ` Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg
From: Johannes Berg <johannes.berg@intel.com>
The __print_symbolic() could only ever print the core
drop reasons, since that's the way the infrastructure
works. Now that we have __print_sym() with all the
advantages mentioned in that commit, convert to that
and get all the drop reasons from all subsystems. As
we already have a list of them, that's really easy.
This is a little bit of .text (~100 bytes in my build)
and saves a lot of .data (~17k).
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
include/net/dropreason.h | 5 +++++
include/trace/events/skb.h | 16 +++-----------
net/core/skbuff.c | 43 ++++++++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+), 13 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 56cb7be92244..c157070b5303 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -42,6 +42,11 @@ struct drop_reason_list {
extern const struct drop_reason_list __rcu *
drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value);
+void drop_reason_show(struct seq_file *m);
+#endif
+
void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys,
const struct drop_reason_list *list);
void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys);
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..8a1a63f9e796 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -8,15 +8,9 @@
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <linux/tracepoint.h>
+#include <net/dropreason.h>
-#undef FN
-#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason);
-DEFINE_DROP_REASON(FN, FN)
-
-#undef FN
-#undef FNe
-#define FN(reason) { SKB_DROP_REASON_##reason, #reason },
-#define FNe(reason) { SKB_DROP_REASON_##reason, #reason }
+TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show);
/*
* Tracepoint for free an sk_buff:
@@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb,
TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
__entry->skbaddr, __entry->protocol, __entry->location,
- __print_symbolic(__entry->reason,
- DEFINE_DROP_REASON(FN, FNe)))
+ __print_sym(__entry->reason, drop_reason ))
);
-#undef FN
-#undef FNe
-
TRACE_EVENT(consume_skb,
TP_PROTO(struct sk_buff *skb, void *location),
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4eaf7ed0d1f4..415329b76921 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -132,6 +132,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
};
EXPORT_SYMBOL(drop_reasons_by_subsys);
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value)
+{
+ unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
+ u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK;
+ const struct drop_reason_list *subsys;
+
+ if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM)
+ return NULL;
+
+ subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+ if (!subsys)
+ return NULL;
+ if (reason >= subsys->n_reasons)
+ return NULL;
+ return subsys->reasons[reason];
+}
+
+void drop_reason_show(struct seq_file *m)
+{
+ u32 subsys_id;
+
+ rcu_read_lock();
+ for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; subsys_id++) {
+ const struct drop_reason_list *subsys;
+ u32 i;
+
+ subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+ if (!subsys)
+ continue;
+
+ for (i = 0; i < subsys->n_reasons; i++) {
+ if (!subsys->reasons[i])
+ continue;
+ seq_printf(m, ", { %u, \"%s\" }",
+ (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) | i,
+ subsys->reasons[i]);
+ }
+ }
+ rcu_read_unlock();
+}
+#endif
+
/**
* drop_reasons_register_subsys - register another drop reason subsystem
* @subsys: the subsystem to register, must not be the core
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup()
2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing Johannes Berg
@ 2023-09-21 8:51 ` Johannes Berg
2023-09-21 17:54 ` Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 4/4] tracing/timer: use __print_sym() Johannes Berg
2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski
4 siblings, 1 reply; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg
From: Johannes Berg <johannes.berg@intel.com>
Now that we have drop_reason_lookup(), we can just use it for
drop_monitor as well, rather than exporting the list itself.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
include/net/dropreason.h | 4 ----
net/core/drop_monitor.c | 18 +++---------------
net/core/skbuff.c | 6 +++---
3 files changed, 6 insertions(+), 22 deletions(-)
diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index c157070b5303..0e2195ccf2cd 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -38,10 +38,6 @@ struct drop_reason_list {
size_t n_reasons;
};
-/* Note: due to dynamic registrations, access must be under RCU */
-extern const struct drop_reason_list __rcu *
-drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
-
#ifdef CONFIG_TRACEPOINTS
const char *drop_reason_lookup(unsigned long long value);
void drop_reason_show(struct seq_file *m);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index aff31cd944c2..d110a16cde46 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
size_t payload_len)
{
struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb);
- const struct drop_reason_list *list = NULL;
- unsigned int subsys, subsys_reason;
char buf[NET_DM_MAX_SYMBOL_LEN];
+ const char *reason_str;
struct nlattr *attr;
void *hdr;
int rc;
@@ -630,19 +629,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, struct sk_buff *skb,
goto nla_put_failure;
rcu_read_lock();
- subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK);
- if (subsys < SKB_DROP_REASON_SUBSYS_NUM)
- list = rcu_dereference(drop_reasons_by_subsys[subsys]);
- subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK;
- if (!list ||
- subsys_reason >= list->n_reasons ||
- !list->reasons[subsys_reason] ||
- strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) {
- list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]);
- subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED;
- }
- if (nla_put_string(msg, NET_DM_ATTR_REASON,
- list->reasons[subsys_reason])) {
+ reason_str = drop_reason_lookup(cb->reason);
+ if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) {
rcu_read_unlock();
goto nla_put_failure;
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 415329b76921..9efde769949d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -126,13 +126,11 @@ static const struct drop_reason_list drop_reasons_core = {
.n_reasons = ARRAY_SIZE(drop_reasons),
};
-const struct drop_reason_list __rcu *
+static const struct drop_reason_list __rcu *
drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
[SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(&drop_reasons_core),
};
-EXPORT_SYMBOL(drop_reasons_by_subsys);
-#ifdef CONFIG_TRACEPOINTS
const char *drop_reason_lookup(unsigned long long value)
{
unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
@@ -149,7 +147,9 @@ const char *drop_reason_lookup(unsigned long long value)
return NULL;
return subsys->reasons[reason];
}
+EXPORT_SYMBOL(drop_reason_lookup);
+#ifdef CONFIG_TRACEPOINTS
void drop_reason_show(struct seq_file *m)
{
u32 subsys_id;
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH 4/4] tracing/timer: use __print_sym()
2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg
` (2 preceding siblings ...)
2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg
@ 2023-09-21 8:51 ` Johannes Berg
2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski
4 siblings, 0 replies; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 8:51 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless, Johannes Berg
From: Johannes Berg <johannes.berg@intel.com>
Use the new __print_sym() in the timer tracing, just to show
how to convert something. This adds ~80 bytes of .text for a
saving of ~1.5K of data in my builds.
Note the format changes from
print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { (1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" }, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" })
to
print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, "PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, "RCU_EXP" })
Since the values are now just printed in the show function as
pure decimal values.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
include/trace/events/timer.h | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)
diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index b4bc2828fa09..cb8294da29d0 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -382,26 +382,18 @@ TRACE_EVENT(itimer_expire,
#undef tick_dep_mask_name
#undef tick_dep_name_end
-/* The MASK will convert to their bits and they need to be processed too */
-#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
- TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-#define tick_dep_name_end(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
- TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-/* NONE only has a mask defined for it */
-#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-
-TICK_DEP_NAMES
-
-#undef tick_dep_name
-#undef tick_dep_mask_name
-#undef tick_dep_name_end
-
#define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
#define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
#define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep }
+TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES);
+
+#undef tick_dep_name
+#undef tick_dep_mask_name
+#undef tick_dep_name_end
+
#define show_tick_dep_name(val) \
- __print_symbolic(val, TICK_DEP_NAMES)
+ __print_sym(val, tick_dep_names)
TRACE_EVENT(tick_stop,
--
2.41.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic()
2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg
@ 2023-09-21 15:17 ` Johannes Berg
0 siblings, 0 replies; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 15:17 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org, netdev@vger.kernel.org,
linux-wireless@vger.kernel.org
On Thu, 2023-09-21 at 08:51 +0000, Johannes Berg wrote:
>
> - Is it correct that we can assume RCU critical section when in
> the lookup function? The SKB code currently does, but I may
> not ever have actually run this code yet.
Well, I could easily answer that myself, and no, it's incorrect.
It'd be really useful though for these lookups to be able to do them
under RCU, so I think I'll fold this?
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -34,7 +34,7 @@ struct trace_eval_map {
struct trace_sym_def {
const char *system;
const char *symbol_id;
- /* may return NULL */
+ /* may return NULL, called under rcu_read_lock() */
const char * (*lookup)(unsigned long long);
/*
* Must print the list: ', { val, "name"}, ...'
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -155,11 +155,13 @@ trace_print_sym_seq(struct trace_seq *p, unsigned
long long val,
const char *ret = trace_seq_buffer_ptr(p);
const char *name;
+ rcu_read_lock();
name = lookup(val);
if (name)
trace_seq_puts(p, name);
else
trace_seq_printf(p, "0x%llx", val);
+ rcu_read_unlock();
trace_seq_putc(p, 0);
johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup()
2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg
@ 2023-09-21 17:54 ` Johannes Berg
0 siblings, 0 replies; 16+ messages in thread
From: Johannes Berg @ 2023-09-21 17:54 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-trace-kernel, netdev, linux-wireless
On Thu, 2023-09-21 at 10:51 +0200, Johannes Berg wrote:
>
> - if (!list ||
> - subsys_reason >= list->n_reasons ||
> - !list->reasons[subsys_reason] ||
> - strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) {
> - list = rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]);
> - subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED;
> - }
Oops, I lost this translation of erroneous ones to the core
SKB_DROP_REASON_NOT_SPECIFIED.
Maybe we should just not have the attribute in that case? But that could
be a different change too.
> - if (nla_put_string(msg, NET_DM_ATTR_REASON,
> - list->reasons[subsys_reason])) {
> + reason_str = drop_reason_lookup(cb->reason);
> + if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) {
> rcu_read_unlock();
johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg
` (3 preceding siblings ...)
2023-09-21 8:51 ` [RFC PATCH 4/4] tracing/timer: use __print_sym() Johannes Berg
@ 2023-10-04 16:22 ` Jakub Kicinski
2023-10-04 16:35 ` Steven Rostedt
2023-10-04 18:38 ` Johannes Berg
4 siblings, 2 replies; 16+ messages in thread
From: Jakub Kicinski @ 2023-10-04 16:22 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-kernel, linux-trace-kernel, netdev, linux-wireless
On Thu, 21 Sep 2023 10:51:30 +0200 Johannes Berg wrote:
> So I was frustrated with not seeing the names of SKB dropreasons
> for all but the core reasons, and then while looking into this
> all, realized, that the current __print_symbolic() is pretty bad
> anyway.
>
> So I came up with a new approach, using a separate declaration
> of the symbols, and __print_sym() in there, but to userspace it
> all doesn't matter, it shows it the same way, just dyamically
> instead of munging with the strings all the time.
>
> This is a huge .data savings as far as I can tell, with a modest
> amount (~4k) of .text addition, while making it all dynamic and
> in the SKB dropreason case even reusing the existing list that
> dropmonitor uses today. Surely patch 3 isn't needed here, but it
> felt right.
>
> Anyway, I think it's a pretty reasonable approach overall, and
> it does works.
>
> I've listed a number of open questions in the first patch since
> that's where the real changes for this are.
Potentially naive question - the trace point holds enum skb_drop_reason.
The user space can get the names from BTF. Can we not teach user space
to generically look up names of enums in BTF?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski
@ 2023-10-04 16:35 ` Steven Rostedt
2023-10-04 16:54 ` Jakub Kicinski
2023-10-04 18:38 ` Johannes Berg
1 sibling, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2023-10-04 16:35 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev,
linux-wireless
On Wed, 4 Oct 2023 09:22:05 -0700
Jakub Kicinski <kuba@kernel.org> wrote:
> Potentially naive question - the trace point holds enum skb_drop_reason.
> The user space can get the names from BTF. Can we not teach user space
> to generically look up names of enums in BTF?
That puts a hard requirement to include BTF in builds where it was not
needed before. I really do not want to build with BTF just to get access to
these symbols. And since this is used by the embedded world, and BTF is
extremely bloated, the short answer is "No".
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 16:35 ` Steven Rostedt
@ 2023-10-04 16:54 ` Jakub Kicinski
2023-10-04 17:29 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2023-10-04 16:54 UTC (permalink / raw)
To: Steven Rostedt
Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev,
linux-wireless
On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote:
> > Potentially naive question - the trace point holds enum skb_drop_reason.
> > The user space can get the names from BTF. Can we not teach user space
> > to generically look up names of enums in BTF?
>
> That puts a hard requirement to include BTF in builds where it was not
> needed before. I really do not want to build with BTF just to get access to
> these symbols. And since this is used by the embedded world, and BTF is
> extremely bloated, the short answer is "No".
Dunno. BTF is there most of the time. It could make the life of
majority of the users far more pleasant.
I hope we can at least agree that the current methods of generating
the string arrays at C level are... aesthetically displeasing.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 16:54 ` Jakub Kicinski
@ 2023-10-04 17:29 ` Steven Rostedt
2023-10-04 21:35 ` Alan Maguire
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2023-10-04 17:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev,
linux-wireless
On Wed, 4 Oct 2023 09:54:31 -0700
Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote:
> > > Potentially naive question - the trace point holds enum skb_drop_reason.
> > > The user space can get the names from BTF. Can we not teach user space
> > > to generically look up names of enums in BTF?
> >
> > That puts a hard requirement to include BTF in builds where it was not
> > needed before. I really do not want to build with BTF just to get access to
> > these symbols. And since this is used by the embedded world, and BTF is
> > extremely bloated, the short answer is "No".
>
> Dunno. BTF is there most of the time. It could make the life of
> majority of the users far more pleasant.
BTF isn't there for a lot of developers working in embedded who use this
code. Most my users that I deal with have minimal environments, so BTF is a
showstopper.
>
> I hope we can at least agree that the current methods of generating
> the string arrays at C level are... aesthetically displeasing.
I don't know, I kinda like it ;-)
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski
2023-10-04 16:35 ` Steven Rostedt
@ 2023-10-04 18:38 ` Johannes Berg
2023-10-04 18:46 ` Steven Rostedt
1 sibling, 1 reply; 16+ messages in thread
From: Johannes Berg @ 2023-10-04 18:38 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: linux-kernel, linux-trace-kernel, netdev, linux-wireless
On Wed, 2023-10-04 at 09:22 -0700, Jakub Kicinski wrote:
>
> Potentially naive question - the trace point holds enum skb_drop_reason.
> The user space can get the names from BTF. Can we not teach user space
> to generically look up names of enums in BTF?
I'll note that, unrelated to the discussion about whether or not we
could use BTF, we couldn't do it in this case anyway since the whole
drop reasons aren't captured in enum skb_drop_reason, that contains only
the core ones, and now other subsystems are adding their own somewhat
dynamically later.
johannes
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 18:38 ` Johannes Berg
@ 2023-10-04 18:46 ` Steven Rostedt
0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2023-10-04 18:46 UTC (permalink / raw)
To: Johannes Berg
Cc: Jakub Kicinski, linux-kernel, linux-trace-kernel, netdev,
linux-wireless
On Wed, 04 Oct 2023 20:38:46 +0200
Johannes Berg <johannes@sipsolutions.net> wrote:
> On Wed, 2023-10-04 at 09:22 -0700, Jakub Kicinski wrote:
> >
> > Potentially naive question - the trace point holds enum skb_drop_reason.
> > The user space can get the names from BTF. Can we not teach user space
> > to generically look up names of enums in BTF?
>
> I'll note that, unrelated to the discussion about whether or not we
> could use BTF, we couldn't do it in this case anyway since the whole
> drop reasons aren't captured in enum skb_drop_reason, that contains only
> the core ones, and now other subsystems are adding their own somewhat
> dynamically later.
Another issue with using BTF, is that the BTF would need to be saved in the
trace.dat or perf.data file, as many times the trace data is moved off to
another machine for offline analysis. And using the vmlinux would not be
useful, because there is several times you have multiple trace files for
various versions of a build and that would require mapping which
vmlinux/btf file goes with which trace data.
Right now, the conversions can easily be saved in the trace file directly.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 17:29 ` Steven Rostedt
@ 2023-10-04 21:35 ` Alan Maguire
2023-10-04 21:43 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Alan Maguire @ 2023-10-04 21:35 UTC (permalink / raw)
To: Steven Rostedt, Jakub Kicinski
Cc: Johannes Berg, linux-kernel, linux-trace-kernel, netdev,
linux-wireless
On 04/10/2023 18:29, Steven Rostedt wrote:
> On Wed, 4 Oct 2023 09:54:31 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
>
>> On Wed, 4 Oct 2023 12:35:24 -0400 Steven Rostedt wrote:
>>>> Potentially naive question - the trace point holds enum skb_drop_reason.
>>>> The user space can get the names from BTF. Can we not teach user space
>>>> to generically look up names of enums in BTF?
>>>
>>> That puts a hard requirement to include BTF in builds where it was not
>>> needed before. I really do not want to build with BTF just to get access to
>>> these symbols. And since this is used by the embedded world, and BTF is
>>> extremely bloated, the short answer is "No".
>>
>> Dunno. BTF is there most of the time. It could make the life of
>> majority of the users far more pleasant.
>
> BTF isn't there for a lot of developers working in embedded who use this
> code. Most my users that I deal with have minimal environments, so BTF is a
> showstopper.
One thing we've heard from some embedded folks [1] is that having
kernel BTF loadable as a separate module (rather than embedded in
vmlinux) would help, as there are size limits on vmlinux that they can
workaround by having modules on a different partition. We're hoping
to get that working soon. I was wondering if you see other issues around
BTF adoption for embedded systems that we could put on the to-do list?
Not necessarily for this particular use-case (since there are
complications with trace data as you describe), but just trying to make
sure we can remove barriers to BTF adoption where possible.
Thanks!
Alan
[1]
https://lore.kernel.org/bpf/CAHBbfcUkr6fTm2X9GNsFNqV75fTG=aBQXFx_8Ayk+4hk7heB-g@mail.gmail.com/
>
>>
>> I hope we can at least agree that the current methods of generating
>> the string arrays at C level are... aesthetically displeasing.
>
> I don't know, I kinda like it ;-)
>
> -- Steve
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 21:35 ` Alan Maguire
@ 2023-10-04 21:43 ` Steven Rostedt
2023-10-04 22:07 ` Alan Maguire
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2023-10-04 21:43 UTC (permalink / raw)
To: Alan Maguire
Cc: Jakub Kicinski, Johannes Berg, linux-kernel, linux-trace-kernel,
netdev, linux-wireless
On Wed, 4 Oct 2023 22:35:07 +0100
Alan Maguire <alan.maguire@oracle.com> wrote:
> One thing we've heard from some embedded folks [1] is that having
> kernel BTF loadable as a separate module (rather than embedded in
> vmlinux) would help, as there are size limits on vmlinux that they can
> workaround by having modules on a different partition. We're hoping
> to get that working soon. I was wondering if you see other issues around
> BTF adoption for embedded systems that we could put on the to-do list?
> Not necessarily for this particular use-case (since there are
> complications with trace data as you describe), but just trying to make
> sure we can remove barriers to BTF adoption where possible.
I wonder how easy is it to create subsets of BTF. For one thing, in the
future we want to be able to trace the arguments of all functions. That is,
tracing all functions at the same time (function tracer) and getting the
arguments within the trace.
This would only require information about functions and their arguments,
which would be very useful. Is BTF easy to break apart? That is, just
generate the information needed for function arguments?
Note, pretty much all functions do not pass structures by values, and this
would not need to know the contents of a pointer to a structure. This would
mean that structure layout information is not needed.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/4] tracing: improve symbolic printing
2023-10-04 21:43 ` Steven Rostedt
@ 2023-10-04 22:07 ` Alan Maguire
0 siblings, 0 replies; 16+ messages in thread
From: Alan Maguire @ 2023-10-04 22:07 UTC (permalink / raw)
To: Steven Rostedt
Cc: Jakub Kicinski, Johannes Berg, linux-kernel, linux-trace-kernel,
netdev, linux-wireless
On 04/10/2023 22:43, Steven Rostedt wrote:
> On Wed, 4 Oct 2023 22:35:07 +0100
> Alan Maguire <alan.maguire@oracle.com> wrote:
>
>> One thing we've heard from some embedded folks [1] is that having
>> kernel BTF loadable as a separate module (rather than embedded in
>> vmlinux) would help, as there are size limits on vmlinux that they can
>> workaround by having modules on a different partition. We're hoping
>> to get that working soon. I was wondering if you see other issues around
>> BTF adoption for embedded systems that we could put on the to-do list?
>> Not necessarily for this particular use-case (since there are
>> complications with trace data as you describe), but just trying to make
>> sure we can remove barriers to BTF adoption where possible.
>
> I wonder how easy is it to create subsets of BTF. For one thing, in the
> future we want to be able to trace the arguments of all functions. That is,
> tracing all functions at the same time (function tracer) and getting the
> arguments within the trace.
>
> This would only require information about functions and their arguments,
> which would be very useful. Is BTF easy to break apart? That is, just
> generate the information needed for function arguments?
>
There has been a fair bit of effort around this from the userspace side;
the BTF gen efforts were focused around applications carrying the
minimum BTF for their needs, so just the structures needed by the
particular BPF programs rather than the full set of vmlinux structures
for example [1].
Parsing BTF in-kernel to pull out the BTF functions (BTF_KIND_FUNC),
their prototypes (BTF_KIND_FUNC_PROTO) and all associated parameters
would be pretty straightforward I think, especially if you don't need
the structures that are passed via pointers. So if you're starting with
the full BTF, creating a subset for use in tracing would be reasonably
straightforward. My personal preference would always be to have the
full BTF where possible, but if that wasn't feasible on some systems
we'd need to add some options to pahole/libbpf to support such trimming
during the DWARF->BTF translation process.
Alan
[1] https://lore.kernel.org/bpf/20220209222646.348365-7-mauricio@kinvolk.io/
> Note, pretty much all functions do not pass structures by values, and this
> would not need to know the contents of a pointer to a structure. This would
> mean that structure layout information is not needed.
>
> -- Steve
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-10-04 22:08 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-21 8:51 [PATCH 0/4] tracing: improve symbolic printing Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic() Johannes Berg
2023-09-21 15:17 ` Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup() Johannes Berg
2023-09-21 17:54 ` Johannes Berg
2023-09-21 8:51 ` [RFC PATCH 4/4] tracing/timer: use __print_sym() Johannes Berg
2023-10-04 16:22 ` [PATCH 0/4] tracing: improve symbolic printing Jakub Kicinski
2023-10-04 16:35 ` Steven Rostedt
2023-10-04 16:54 ` Jakub Kicinski
2023-10-04 17:29 ` Steven Rostedt
2023-10-04 21:35 ` Alan Maguire
2023-10-04 21:43 ` Steven Rostedt
2023-10-04 22:07 ` Alan Maguire
2023-10-04 18:38 ` Johannes Berg
2023-10-04 18:46 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).