* [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix
@ 2024-10-09 14:02 Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 1/3] dtrace: add support for probe-specific prog types, stack skips Alan Maguire
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Alan Maguire @ 2024-10-09 14:02 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
This series is focused on solving a few issues with fprobe-based
attachment which prevent us being able to attach to functions
like finish_task_switch.isra.0. Such functions are present in
available_filter_functions, but because they either lack BTF
representations or those representations are named without the
.isra suffix, attachment via fentry/fexit is currently impossible.
Falling back to kprobe attach is the best solution here.
However providers have been written with the implicit assumption
that certain aspects of the probes are shared across the provider;
program type, stack skips etc. Patch 1 addresses this by providing
optional provider implementation callbacks to return these values
for a specific probe. Patch 2 then updates the fbt provider to
use these mechanisms to support fallback to kprobe and to support
"."-suffixed functions. Finally patch 3 can then specify
fbt::finish_task_switch*:return as the underlying probe for
sched:::on-cpu.
Tested on upstream, UEK7 (5.15-based kernel) and UEK6 (5.4-based).
Alan Maguire (3):
dtrace: add support for probe-specific prog types, stack skips
fbt: support ".isra.0" suffixed functions
sched: fix on-cpu firing for kernels < 5.16
libdtrace/dt_bpf.c | 4 +-
libdtrace/dt_cc.c | 2 +-
libdtrace/dt_probe.c | 16 ++++++++
libdtrace/dt_probe.h | 2 +
libdtrace/dt_prov_fbt.c | 84 +++++++++++++++++++++++++++++++-------
libdtrace/dt_prov_rawtp.c | 2 +-
libdtrace/dt_prov_sched.c | 23 +----------
libdtrace/dt_prov_sdt.c | 2 +-
libdtrace/dt_provider.h | 4 ++
libdtrace/dt_provider_tp.c | 12 +++++-
libdtrace/dt_provider_tp.h | 9 +++-
11 files changed, 117 insertions(+), 43 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH dtrace 1/3] dtrace: add support for probe-specific prog types, stack skips
2024-10-09 14:02 [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Alan Maguire
@ 2024-10-09 14:02 ` Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 2/3] fbt: support ".isra.0" suffixed functions Alan Maguire
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Alan Maguire @ 2024-10-09 14:02 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
Because fprobe-based tracing does not support all the functions
kprobes+available_filter_functions do, it would be useful to
fall back to a kprobe-based prog where fentry/fexit is not
feasible.
However there are a few blockers to that today.
First, we specify program type and number of stack frames to
skip on a provider basis. Add support for a probe-specific
callback function that provider implementations can use
to return these values.
Secondly, the tp event id is ambiguous - it can represent a BTF
or a trace event id. Add a field to allow specification, as
this will be useful in determining whether a probe has a kprobe
or a BTF (fentry/fexit)-based attachment. This can be used
in provider implementation methods to determine which trampoline
to use for example.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_bpf.c | 4 ++--
libdtrace/dt_cc.c | 2 +-
libdtrace/dt_probe.c | 16 ++++++++++++++++
libdtrace/dt_probe.h | 2 ++
libdtrace/dt_prov_fbt.c | 2 +-
libdtrace/dt_prov_rawtp.c | 2 +-
libdtrace/dt_prov_sdt.c | 2 +-
libdtrace/dt_provider.h | 4 ++++
libdtrace/dt_provider_tp.c | 12 +++++++++++-
libdtrace/dt_provider_tp.h | 9 +++++++--
10 files changed, 46 insertions(+), 9 deletions(-)
diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
index ad11d178..20b564aa 100644
--- a/libdtrace/dt_bpf.c
+++ b/libdtrace/dt_bpf.c
@@ -117,7 +117,7 @@ int
dt_bpf_prog_load(dtrace_hdl_t *dtp, const dt_probe_t *prp,
const dtrace_difo_t *dp, uint32_t lvl, char *buf, size_t sz)
{
- return dt_bpf_prog_attach(prp->prov->impl->prog_type, 0, 0, 0, dp,
+ return dt_bpf_prog_attach(dt_probe_prog_type(dtp, prp), 0, 0, 0, dp,
lvl, buf, sz);
}
@@ -1226,7 +1226,7 @@ dt_bpf_make_progs(dtrace_hdl_t *dtp, uint_t cflags)
* Enabled probes with no trampoline act like they exist but
* no code is generated for them.
*/
- if (prp->prov->impl->prog_type == BPF_PROG_TYPE_UNSPEC)
+ if (dt_probe_prog_type(dtp, prp) == BPF_PROG_TYPE_UNSPEC)
continue;
dp = dt_construct(dtp, prp, cflags, NULL);
diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
index 4202771a..ce0fd60b 100644
--- a/libdtrace/dt_cc.c
+++ b/libdtrace/dt_cc.c
@@ -1182,7 +1182,7 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
nrp->dofr_data = DMEM_STACK(dtp);
continue;
case DT_CONST_STACK_SKIP:
- nrp->dofr_data = prp->prov->impl->stack_skip;
+ nrp->dofr_data = dt_probe_stack_skip(dtp, prp);
continue;
}
diff --git a/libdtrace/dt_probe.c b/libdtrace/dt_probe.c
index 686e2a66..4be7a81c 100644
--- a/libdtrace/dt_probe.c
+++ b/libdtrace/dt_probe.c
@@ -1378,3 +1378,19 @@ dtrace_id2desc(dtrace_hdl_t *dtp, dtrace_id_t id, dtrace_probedesc_t *pdp)
return 0;
}
+
+int
+dt_probe_prog_type(dtrace_hdl_t *dtp, const dt_probe_t *prp)
+{
+ if (prp->prov->impl->get_prog_type)
+ return prp->prov->impl->get_prog_type(dtp, prp);
+ return prp->prov->impl->prog_type;
+}
+
+int
+dt_probe_stack_skip(dtrace_hdl_t *dtp, const dt_probe_t *prp)
+{
+ if (prp->prov->impl->get_stack_skip)
+ return prp->prov->impl->get_stack_skip(dtp, prp);
+ return prp->prov->impl->stack_skip;
+}
diff --git a/libdtrace/dt_probe.h b/libdtrace/dt_probe.h
index 2a78cb9c..03bac787 100644
--- a/libdtrace/dt_probe.h
+++ b/libdtrace/dt_probe.h
@@ -103,6 +103,8 @@ extern void dt_probe_init(dtrace_hdl_t *dtp);
extern void dt_probe_detach_all(dtrace_hdl_t *dtp);
extern void dt_probe_fini(dtrace_hdl_t *dtp);
extern void dt_probe_stats(dtrace_hdl_t *dtp);
+extern int dt_probe_prog_type(dtrace_hdl_t *, const dt_probe_t *);
+extern int dt_probe_stack_skip(dtrace_hdl_t *, const dt_probe_t *);
#ifdef __cplusplus
}
diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 21f63ddf..718aadbe 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -239,7 +239,7 @@ static int fprobe_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
if (btf_id <= 0)
goto done;
- dt_tp_set_event_id(prp, btf_id);
+ dt_tp_set_event_id(prp, btf_id, TP_EVENT_ID_BTF);
if (strcmp(desc->prb, "return") == 0) {
/* Void function return probes only provide 1 argument. */
diff --git a/libdtrace/dt_prov_rawtp.c b/libdtrace/dt_prov_rawtp.c
index 778a6f9c..22a18050 100644
--- a/libdtrace/dt_prov_rawtp.c
+++ b/libdtrace/dt_prov_rawtp.c
@@ -221,7 +221,7 @@ done:
static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
int *argcp, dt_argdesc_t **argvp)
{
- dt_tp_set_event_id(prp, UINT32_MAX);
+ dt_tp_set_event_id(prp, UINT32_MAX, TP_EVENT_ID_TRACEPOINT);
#ifdef HAVE_LIBCTF
int rc, i;
diff --git a/libdtrace/dt_prov_sdt.c b/libdtrace/dt_prov_sdt.c
index 675e0458..8d43282a 100644
--- a/libdtrace/dt_prov_sdt.c
+++ b/libdtrace/dt_prov_sdt.c
@@ -241,7 +241,7 @@ static int probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
if (rc < 0 || id == 0)
return dt_set_errno(dtp, EDT_ENABLING_ERR);
- dt_tp_set_event_id(prp, id);
+ dt_tp_set_event_id(prp, id, TP_EVENT_ID_TRACEPOINT);
if (asprintf(&str, "struct trace_event_raw_%s", prp->desc->prb) == -1)
return dt_set_errno(dtp, EDT_NOMEM);
diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
index 17b1844c..724d72d7 100644
--- a/libdtrace/dt_provider.h
+++ b/libdtrace/dt_provider.h
@@ -44,7 +44,11 @@ typedef struct dt_argdesc {
typedef struct dt_provimpl {
const char *name; /* provider generic name */
int prog_type; /* BPF program type */
+ int (*get_prog_type)(dtrace_hdl_t *dtp, /* probe-specific type */
+ const struct dt_probe *prp);
uint32_t stack_skip; /* # of stack frames to skip */
+ int (*get_stack_skip)(dtrace_hdl_t *dtp,/* probe-specific # */
+ const struct dt_probe *prp);
int (*populate)(dtrace_hdl_t *dtp); /* register probes */
int (*provide)(dtrace_hdl_t *dtp, /* provide probes */
const dtrace_probedesc_t *pdp);
diff --git a/libdtrace/dt_provider_tp.c b/libdtrace/dt_provider_tp.c
index 50df2328..f8e68f24 100644
--- a/libdtrace/dt_provider_tp.c
+++ b/libdtrace/dt_provider_tp.c
@@ -33,6 +33,7 @@
struct tp_probe {
uint32_t event_id; /* tracepoint event id or BTF id */
int event_fd; /* tracepoint perf event fd */
+ tp_event_id_type event_type; /* tracepoint event or BTF-based */
};
/*
@@ -342,12 +343,21 @@ dt_tp_get_event_id(const dt_probe_t *prp)
return tpp->event_id;
}
+tp_event_id_type
+dt_tp_get_event_id_type(const dt_probe_t *prp)
+{
+ tp_probe_t *tpp = prp->prv_data;
+
+ return tpp->event_type;
+}
+
void
-dt_tp_set_event_id(const dt_probe_t *prp, uint32_t id)
+dt_tp_set_event_id(const dt_probe_t *prp, uint32_t id, tp_event_id_type type)
{
tp_probe_t *tpp = prp->prv_data;
tpp->event_id = id;
+ tpp->event_type = type;
}
/*
diff --git a/libdtrace/dt_provider_tp.h b/libdtrace/dt_provider_tp.h
index e7d52876..09002fa8 100644
--- a/libdtrace/dt_provider_tp.h
+++ b/libdtrace/dt_provider_tp.h
@@ -28,9 +28,14 @@ extern "C" {
typedef struct tp_probe tp_probe_t;
+typedef enum {
+ TP_EVENT_ID_TRACEPOINT = 0,
+ TP_EVENT_ID_BTF
+} tp_event_id_type;
+
extern tp_probe_t *dt_tp_alloc(dtrace_hdl_t *dtp);
extern int dt_tp_attach(dtrace_hdl_t *dtp, tp_probe_t *tpp, int bpf_fd);
-extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id);
+extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id, tp_event_id_type);
extern int dt_tp_has_info(const tp_probe_t *tpp);
extern int dt_tp_event_info(dtrace_hdl_t *dtp, FILE *f, int skip,
tp_probe_t *tpp, int *argcp, dt_argdesc_t **argvp);
@@ -42,7 +47,7 @@ extern struct dt_probe *dt_tp_probe_insert(dtrace_hdl_t *dtp,
const char *prv, const char *mod,
const char *fun, const char *prb);
extern uint32_t dt_tp_get_event_id(const struct dt_probe *prp);
-extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id);
+extern tp_event_id_type dt_tp_get_event_id_type(const struct dt_probe *prp);
extern int dt_tp_probe_info(dtrace_hdl_t *dtp, FILE *f, int skip,
const struct dt_probe *prp, int *argcp,
dt_argdesc_t **argvp);
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH dtrace 2/3] fbt: support ".isra.0" suffixed functions
2024-10-09 14:02 [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 1/3] dtrace: add support for probe-specific prog types, stack skips Alan Maguire
@ 2024-10-09 14:02 ` Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 3/3] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
2024-10-09 17:34 ` [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Kris Van Hees
3 siblings, 0 replies; 7+ messages in thread
From: Alan Maguire @ 2024-10-09 14:02 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
gcc adds suffixes when it carries out optimizations, but often
these leave parameters to functions intact. Many of these functions
(like finish_task_switch()) are important for tracing (and they
are present in available_filter_functions so are traceable) so it is
valuable to support probing them. For kprobes, all that is needed
is to ensure that the event name does not contain a ".".
For fprobes the situation is more complex - the function may be
in BTF, but even if it is, its BTF representation does not have
a "." suffix so searching kallsyms based on the BTF name fails.
As a result it is desirable to fall back to using kprobes for
such cases. Implement fallbacks in the fprobe provider methods
which use kprobe trampoline, attachment etc to cover these cases.
Having callback functions for stack size/prog type allows us to
dynamically determine these values on a per-probe basis.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_prov_fbt.c | 82 ++++++++++++++++++++++++++++++++++-------
1 file changed, 69 insertions(+), 13 deletions(-)
diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 718aadbe..a7c2627a 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -57,6 +57,10 @@ static const dtrace_pattr_t pattr = {
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_ISA },
};
+static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl);
+static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd);
+static void kprobe_detach(dtrace_hdl_t *dtp, const dt_probe_t *prp);
+
/*
* Scan the PROBE_LIST file and add entry and return probes for every function
* that is listed.
@@ -106,10 +110,6 @@ static int populate(dtrace_hdl_t *dtp)
p++;
}
- /* Weed out synthetic symbol names (that are invalid). */
- if (strchr(buf, '.') != NULL)
- continue;
-
#define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
/* Weed out __ftrace_invalid_address___* entries. */
if (strstarts(buf, "__ftrace_invalid_address__") ||
@@ -176,6 +176,10 @@ static int fprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
dt_irlist_t *dlp = &pcb->pcb_ir;
dt_probe_t *prp = pcb->pcb_probe;
+ /* legacy kprobe fallback? */
+ if (dt_tp_get_event_id_type(prp) == TP_EVENT_ID_TRACEPOINT)
+ return kprobe_trampoline(pcb, exitlbl);
+
dt_cg_tramp_prologue(pcb);
if (strcmp(pcb->pcb_probe->desc->prb, "entry") == 0) {
@@ -236,8 +240,12 @@ static int fprobe_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
btf_id = dt_btf_lookup_name_kind(dtp, dmp->dm_btf, desc->fun,
BTF_KIND_FUNC);
- if (btf_id <= 0)
+ if (btf_id <= 0) {
+ /* No BTF info, we will fall back to kprobe */
+ if (btf_id == -ENOENT)
+ return 0;
goto done;
+ }
dt_tp_set_event_id(prp, btf_id, TP_EVENT_ID_BTF);
@@ -294,6 +302,10 @@ static int fprobe_prog_load(dtrace_hdl_t *dtp, const dt_probe_t *prp,
const dtrace_probedesc_t *desc = prp->desc;
dt_module_t *dmp;
+ /* legacy kprobe fallback? */
+ if (dt_tp_get_event_id_type(prp) == TP_EVENT_ID_TRACEPOINT)
+ return dt_bpf_prog_load(dtp, prp, dp, lvl, buf, sz);
+
atype = strcmp(desc->prb, "entry") == 0 ? BPF_TRACE_FENTRY
: BPF_TRACE_FEXIT;
@@ -312,6 +324,34 @@ static int fprobe_prog_load(dtrace_hdl_t *dtp, const dt_probe_t *prp,
return rc;
}
+static int fprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
+{
+ if (dt_tp_get_event_id_type(prp) == TP_EVENT_ID_TRACEPOINT)
+ return kprobe_attach(dtp, prp, bpf_fd);
+ return dt_tp_probe_attach_raw(dtp, prp, bpf_fd);
+}
+
+static void fprobe_detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
+{
+ if (dt_tp_get_event_id_type(prp) == TP_EVENT_ID_TRACEPOINT)
+ return kprobe_detach(dtp, prp);
+ return dt_tp_probe_detach(dtp, prp);
+}
+
+static int fprobe_prog_type(dtrace_hdl_t *dtp, const dt_probe_t *prp)
+{
+ if (dt_tp_get_event_id_type(prp) == TP_EVENT_ID_TRACEPOINT)
+ return BPF_PROG_TYPE_KPROBE;
+ return BPF_PROG_TYPE_TRACING;
+}
+
+static int fprobe_stack_skip(dtrace_hdl_t *dtp, const dt_probe_t *prp)
+{
+ if (dt_tp_get_event_id_type(prp) == TP_EVENT_ID_TRACEPOINT)
+ return 0;
+ return 4;
+}
+
/*******************************\
* KPROBE-based implementation *
\*******************************/
@@ -363,7 +403,7 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
{
if (!dt_tp_probe_has_info(prp)) {
- char *fn;
+ char *fn, *fun, *suffix;
FILE *f;
size_t len;
int fd, rc = -1;
@@ -376,22 +416,36 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
if (fd == -1)
return -ENOENT;
+ /* remove possible .isra.0 suffix as "." is not supported in
+ * trace event names.
+ */
+ if (asprintf(&fun, "%s", prp->desc->fun) < 0)
+ return -ENOMEM;
+ suffix = strchr(fun, '.');
+ if (suffix)
+ *suffix = '\0';
+
rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
prp->desc->prb[0] == 'e' ? 'p' : 'r',
- FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
+ FBT_GROUP_DATA, fun, prp->desc->fun);
close(fd);
- if (rc == -1)
+ if (rc == -1) {
+ free(fun);
return -ENOENT;
+ }
/* create format file name */
len = snprintf(NULL, 0, "%s" FBT_GROUP_FMT "/%s/format",
- EVENTSFS, FBT_GROUP_DATA, prp->desc->fun) + 1;
+ EVENTSFS, FBT_GROUP_DATA, fun) + 1;
fn = dt_alloc(dtp, len);
- if (fn == NULL)
+ if (fn == NULL) {
+ free(fun);
return -ENOENT;
+ }
snprintf(fn, len, "%s" FBT_GROUP_FMT "/%s/format", EVENTSFS,
- FBT_GROUP_DATA, prp->desc->fun);
+ FBT_GROUP_DATA, fun);
+ free(fun);
/* open format file */
f = fopen(fn, "r");
@@ -443,13 +497,15 @@ static void kprobe_detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
dt_provimpl_t dt_fbt_fprobe = {
.name = prvname,
.prog_type = BPF_PROG_TYPE_TRACING,
+ .get_prog_type = fprobe_prog_type,
.stack_skip = 4,
+ .get_stack_skip = fprobe_stack_skip,
.populate = &populate,
.load_prog = &fprobe_prog_load,
.trampoline = &fprobe_trampoline,
- .attach = &dt_tp_probe_attach_raw,
+ .attach = &fprobe_attach,
.probe_info = &fprobe_probe_info,
- .detach = &dt_tp_probe_detach,
+ .detach = &fprobe_detach,
.probe_destroy = &dt_tp_probe_destroy,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH dtrace 3/3] sched: fix on-cpu firing for kernels < 5.16
2024-10-09 14:02 [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 1/3] dtrace: add support for probe-specific prog types, stack skips Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 2/3] fbt: support ".isra.0" suffixed functions Alan Maguire
@ 2024-10-09 14:02 ` Alan Maguire
2024-10-09 17:34 ` [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Kris Van Hees
3 siblings, 0 replies; 7+ messages in thread
From: Alan Maguire @ 2024-10-09 14:02 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
The solution for sched:::on-cpu firing (probing
__perf_event_task_sched_in) only works on 5.16 and later as
the relevant function is not in available_filter_functions
in earlier kernels.
Instead use the fbt::finish_task_switch*:return probe as this
will cover .isra.0-suffixed optimizations if they are applied
to finish_task_switch().
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_prov_sched.c | 23 ++---------------------
1 file changed, 2 insertions(+), 21 deletions(-)
diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c
index 4fbd24ab..e9a1b0c9 100644
--- a/libdtrace/dt_prov_sched.c
+++ b/libdtrace/dt_prov_sched.c
@@ -28,7 +28,7 @@ static probe_dep_t probes[] = {
{ "off-cpu",
DTRACE_PROBESPEC_NAME, "rawtp:sched::sched_switch" },
{ "on-cpu",
- DTRACE_PROBESPEC_NAME, "fbt::__perf_event_task_sched_in:entry" },
+ DTRACE_PROBESPEC_NAME, "fbt:vmlinux:finish_task_switch*:return" },
{ "surrender",
DTRACE_PROBESPEC_NAME, "fbt::do_sched_yield:entry" },
{ "tick",
@@ -157,26 +157,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
*/
static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
{
- struct perf_event_attr attr = {};
- int swfd;
-
- if (strcmp(prp->desc->prb, "on-cpu") != 0)
- return dt_sdt_enable(dtp, prp);
-
- memset(&attr, 0, sizeof(attr));
- attr.size = sizeof(attr);
- attr.type = PERF_TYPE_SOFTWARE;
- attr.config = PERF_COUNT_SW_CONTEXT_SWITCHES;
- attr.freq = 1;
- attr.sample_freq = 1000;
- attr.context_switch = 1;
-
- swfd = dt_perf_event_open(&attr, -1, 0, -1, 0);
- if (swfd < 0)
- dt_dprintf("perf event open failed for context_switch: %d\n", errno);
- else
- prp->prv_data = (void *)(long)swfd;
- dt_sdt_enable(dtp, prp);
+ return dt_sdt_enable(dtp, prp);
}
static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix
2024-10-09 14:02 [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Alan Maguire
` (2 preceding siblings ...)
2024-10-09 14:02 ` [PATCH dtrace 3/3] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
@ 2024-10-09 17:34 ` Kris Van Hees
2024-10-10 21:03 ` Alan Maguire
3 siblings, 1 reply; 7+ messages in thread
From: Kris Van Hees @ 2024-10-09 17:34 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
Hi Alan,
I see what you are doing here and I think that it certainly has merit, but I
would say that the implementation is breaking some assumptions that are part
of the provider design in the code base. The way that a probe is related to
a provider implementation is rather fundamental and you end up breaking that
by having a fprobe-backed probe actually use the kprobe implementation. I am
thinking how we can establish the same without breaking that association.
There is also the fact to consider that if we end up attaching a given probe
both as an fprobe-backed FBT probe and a kprobe-based FBT probe, things can
get really interesting because the defined order of execution of clauses will
no longer be satisfied.
Now, what perhaps could work is if we were to first try to determine whether
we can use a fprobe-backed for an FBT probe, and if that is not possible, do
a fall-back to a kprobe-backed probe by removing the probe and re-inserting it
as a kprobe-backed FBT probe. Since a probe is associated with a provider by
a pointer to the provider implementation, that should work.
It should also still work for probe lookup (though that should no longer be
taking place at that point in processing) because the lookup code is based
on the provider name selecting the correct bucket, which can contain probes
that have that same provider name even if the implementation for them is
different.
This way we can never have a probe that is associated with the fprobe
implementation but really ends up using the kprobe implementation. The
association between probe and provider implementation would still be kept
as-is.
Do you think that would work?
ONe complication I can think of is cases where both func and func.<suffix>
should be traced when you use an fbt::func:entry (or return) probe. They
would both need to be done as kprobes, because (again) we cannot have a mix.
On Wed, Oct 09, 2024 at 03:02:33PM +0100, Alan Maguire wrote:
> This series is focused on solving a few issues with fprobe-based
> attachment which prevent us being able to attach to functions
> like finish_task_switch.isra.0. Such functions are present in
> available_filter_functions, but because they either lack BTF
> representations or those representations are named without the
> .isra suffix, attachment via fentry/fexit is currently impossible.
> Falling back to kprobe attach is the best solution here.
>
> However providers have been written with the implicit assumption
> that certain aspects of the probes are shared across the provider;
> program type, stack skips etc. Patch 1 addresses this by providing
> optional provider implementation callbacks to return these values
> for a specific probe. Patch 2 then updates the fbt provider to
> use these mechanisms to support fallback to kprobe and to support
> "."-suffixed functions. Finally patch 3 can then specify
> fbt::finish_task_switch*:return as the underlying probe for
> sched:::on-cpu.
>
> Tested on upstream, UEK7 (5.15-based kernel) and UEK6 (5.4-based).
>
> Alan Maguire (3):
> dtrace: add support for probe-specific prog types, stack skips
> fbt: support ".isra.0" suffixed functions
> sched: fix on-cpu firing for kernels < 5.16
>
> libdtrace/dt_bpf.c | 4 +-
> libdtrace/dt_cc.c | 2 +-
> libdtrace/dt_probe.c | 16 ++++++++
> libdtrace/dt_probe.h | 2 +
> libdtrace/dt_prov_fbt.c | 84 +++++++++++++++++++++++++++++++-------
> libdtrace/dt_prov_rawtp.c | 2 +-
> libdtrace/dt_prov_sched.c | 23 +----------
> libdtrace/dt_prov_sdt.c | 2 +-
> libdtrace/dt_provider.h | 4 ++
> libdtrace/dt_provider_tp.c | 12 +++++-
> libdtrace/dt_provider_tp.h | 9 +++-
> 11 files changed, 117 insertions(+), 43 deletions(-)
>
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix
2024-10-09 17:34 ` [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Kris Van Hees
@ 2024-10-10 21:03 ` Alan Maguire
2024-10-11 2:47 ` Kris Van Hees
0 siblings, 1 reply; 7+ messages in thread
From: Alan Maguire @ 2024-10-10 21:03 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
hi Kris
thanks for the detailed explanation! Replies below..
On 09/10/2024 18:34, Kris Van Hees wrote:
> Hi Alan,
>
> I see what you are doing here and I think that it certainly has merit, but I
> would say that the implementation is breaking some assumptions that are part
> of the provider design in the code base. The way that a probe is related to
> a provider implementation is rather fundamental and you end up breaking that
> by having a fprobe-backed probe actually use the kprobe implementation. I am
> thinking how we can establish the same without breaking that association.
>
> There is also the fact to consider that if we end up attaching a given probe
> both as an fprobe-backed FBT probe and a kprobe-based FBT probe, things can
> get really interesting because the defined order of execution of clauses will
> no longer be satisfied.
>
> Now, what perhaps could work is if we were to first try to determine whether
> we can use a fprobe-backed for an FBT probe, and if that is not possible, do
> a fall-back to a kprobe-backed probe by removing the probe and re-inserting it
> as a kprobe-backed FBT probe. Since a probe is associated with a provider by
> a pointer to the provider implementation, that should work.
>
> It should also still work for probe lookup (though that should no longer be
> taking place at that point in processing) because the lookup code is based
> on the provider name selecting the correct bucket, which can contain probes
> that have that same provider name even if the implementation for them is
> different.
>
> This way we can never have a probe that is associated with the fprobe
> implementation but really ends up using the kprobe implementation. The
> association between probe and provider implementation would still be kept
> as-is.
>
> Do you think that would work?
I've been looking at the code, and the problem I see is that it's hard
to find a place where we can look at the actually _used_ probes (as
opposed to those we populate()) and make determinations around kprobe
support. My first go at this was simply to use the kprobe implementation
for all functions containing a ".". However, as you observed, that could
potentially result in a mix of fprobe and kprobe for a particular probe
context which isn't allowed.
And for programs where a probe participates in multiple contexts we'd
need to worry about whether probe replacement in one then had knock-on
effects for fprobe/kprobe mixing, timing etc. Specifically if we
replaced a fprobe by a kprobe for one clause, we'd need to do the same
for it in other clauses I think.
So I _think_ we're stuck with an either-or; either we can use fprobes
everywhere, or we have to use kprobes everywhere in a program. Is that
the case do you think?
If so, the question then becomes is there a way to iterate over the
actually used fbt probes to make sure they are usable as fprobes; if not
replacing all probes with a kprobe backend. The problem is the only
place I see that is in the compile of the various clauses. We'd need (I
think) some kind of prepare phase where we walk the clauses and figure
out which probes are associated to make this determination. Once we
start compiling clauses, I think it's too late to backpedal and replace
an implementation.
Or are there other places/ways to solve this I'm missing? Thanks!
Alan
> ONe complication I can think of is cases where both func and func.<suffix>
> should be traced when you use an fbt::func:entry (or return) probe. They
> would both need to be done as kprobes, because (again) we cannot have a mix.
>
> On Wed, Oct 09, 2024 at 03:02:33PM +0100, Alan Maguire wrote:
>> This series is focused on solving a few issues with fprobe-based
>> attachment which prevent us being able to attach to functions
>> like finish_task_switch.isra.0. Such functions are present in
>> available_filter_functions, but because they either lack BTF
>> representations or those representations are named without the
>> .isra suffix, attachment via fentry/fexit is currently impossible.
>> Falling back to kprobe attach is the best solution here.
>>
>> However providers have been written with the implicit assumption
>> that certain aspects of the probes are shared across the provider;
>> program type, stack skips etc. Patch 1 addresses this by providing
>> optional provider implementation callbacks to return these values
>> for a specific probe. Patch 2 then updates the fbt provider to
>> use these mechanisms to support fallback to kprobe and to support
>> "."-suffixed functions. Finally patch 3 can then specify
>> fbt::finish_task_switch*:return as the underlying probe for
>> sched:::on-cpu.
>>
>> Tested on upstream, UEK7 (5.15-based kernel) and UEK6 (5.4-based).
>>
>> Alan Maguire (3):
>> dtrace: add support for probe-specific prog types, stack skips
>> fbt: support ".isra.0" suffixed functions
>> sched: fix on-cpu firing for kernels < 5.16
>>
>> libdtrace/dt_bpf.c | 4 +-
>> libdtrace/dt_cc.c | 2 +-
>> libdtrace/dt_probe.c | 16 ++++++++
>> libdtrace/dt_probe.h | 2 +
>> libdtrace/dt_prov_fbt.c | 84 +++++++++++++++++++++++++++++++-------
>> libdtrace/dt_prov_rawtp.c | 2 +-
>> libdtrace/dt_prov_sched.c | 23 +----------
>> libdtrace/dt_prov_sdt.c | 2 +-
>> libdtrace/dt_provider.h | 4 ++
>> libdtrace/dt_provider_tp.c | 12 +++++-
>> libdtrace/dt_provider_tp.h | 9 +++-
>> 11 files changed, 117 insertions(+), 43 deletions(-)
>>
>> --
>> 2.43.5
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix
2024-10-10 21:03 ` Alan Maguire
@ 2024-10-11 2:47 ` Kris Van Hees
0 siblings, 0 replies; 7+ messages in thread
From: Kris Van Hees @ 2024-10-11 2:47 UTC (permalink / raw)
To: Alan Maguire; +Cc: Kris Van Hees, dtrace, dtrace-devel
On Thu, Oct 10, 2024 at 10:03:37PM +0100, Alan Maguire wrote:
> hi Kris
>
> thanks for the detailed explanation! Replies below..
>
> On 09/10/2024 18:34, Kris Van Hees wrote:
> > Hi Alan,
> >
> > I see what you are doing here and I think that it certainly has merit, but I
> > would say that the implementation is breaking some assumptions that are part
> > of the provider design in the code base. The way that a probe is related to
> > a provider implementation is rather fundamental and you end up breaking that
> > by having a fprobe-backed probe actually use the kprobe implementation. I am
> > thinking how we can establish the same without breaking that association.
> >
> > There is also the fact to consider that if we end up attaching a given probe
> > both as an fprobe-backed FBT probe and a kprobe-based FBT probe, things can
> > get really interesting because the defined order of execution of clauses will
> > no longer be satisfied.
> >
> > Now, what perhaps could work is if we were to first try to determine whether
> > we can use a fprobe-backed for an FBT probe, and if that is not possible, do
> > a fall-back to a kprobe-backed probe by removing the probe and re-inserting it
> > as a kprobe-backed FBT probe. Since a probe is associated with a provider by
> > a pointer to the provider implementation, that should work.
> >
> > It should also still work for probe lookup (though that should no longer be
> > taking place at that point in processing) because the lookup code is based
> > on the provider name selecting the correct bucket, which can contain probes
> > that have that same provider name even if the implementation for them is
> > different.
> >
> > This way we can never have a probe that is associated with the fprobe
> > implementation but really ends up using the kprobe implementation. The
> > association between probe and provider implementation would still be kept
> > as-is.
> >
> > Do you think that would work?
>
> I've been looking at the code, and the problem I see is that it's hard
> to find a place where we can look at the actually _used_ probes (as
> opposed to those we populate()) and make determinations around kprobe
> support. My first go at this was simply to use the kprobe implementation
> for all functions containing a ".". However, as you observed, that could
> potentially result in a mix of fprobe and kprobe for a particular probe
> context which isn't allowed.
>
> And for programs where a probe participates in multiple contexts we'd
> need to worry about whether probe replacement in one then had knock-on
> effects for fprobe/kprobe mixing, timing etc. Specifically if we
> replaced a fprobe by a kprobe for one clause, we'd need to do the same
> for it in other clauses I think.
Yes, but note that the compilation of clauses does not actually involve
knowing anything about the probe implementation. So that is no issue.
And since clauses for a particular probe are associated with the very
probe, replacing it (or perhaps moving it to another implementation) is
automatically going to affect all clauses associated with it.
> So I _think_ we're stuck with an either-or; either we can use fprobes
> everywhere, or we have to use kprobes everywhere in a program. Is that
> the case do you think?
Yes, but mostly for other reasons... DTrace guarantees the execution of
clauses in program-order. And that can only be provided for in BPF if we
execute the clauses for a particular probe within a single BPF program.
So, mixing kprobe and fprobe based instances of the same FBT probe would
result in 2 BPF programs, and thereby violate the clause order guarantee.
> If so, the question then becomes is there a way to iterate over the
> actually used fbt probes to make sure they are usable as fprobes; if not
> replacing all probes with a kprobe backend. The problem is the only
> place I see that is in the compile of the various clauses. We'd need (I
> think) some kind of prepare phase where we walk the clauses and figure
> out which probes are associated to make this determination. Once we
> start compiling clauses, I think it's too late to backpedal and replace
> an implementation.
Actually, on systems where fprobes are available, the probe definition (name,
arguments) won't change so changing the underlhing implementation won't be
an issue at the time of compiling the clauses. It is only when we generate
the trampolines that it matters which implementation is being used. So, we
can change the implementation after we know what the clauses need.
> Or are there other places/ways to solve this I'm missing? Thanks!
>
> Alan
>
> > ONe complication I can think of is cases where both func and func.<suffix>
> > should be traced when you use an fbt::func:entry (or return) probe. They
> > would both need to be done as kprobes, because (again) we cannot have a mix.
> >
> > On Wed, Oct 09, 2024 at 03:02:33PM +0100, Alan Maguire wrote:
> >> This series is focused on solving a few issues with fprobe-based
> >> attachment which prevent us being able to attach to functions
> >> like finish_task_switch.isra.0. Such functions are present in
> >> available_filter_functions, but because they either lack BTF
> >> representations or those representations are named without the
> >> .isra suffix, attachment via fentry/fexit is currently impossible.
> >> Falling back to kprobe attach is the best solution here.
> >>
> >> However providers have been written with the implicit assumption
> >> that certain aspects of the probes are shared across the provider;
> >> program type, stack skips etc. Patch 1 addresses this by providing
> >> optional provider implementation callbacks to return these values
> >> for a specific probe. Patch 2 then updates the fbt provider to
> >> use these mechanisms to support fallback to kprobe and to support
> >> "."-suffixed functions. Finally patch 3 can then specify
> >> fbt::finish_task_switch*:return as the underlying probe for
> >> sched:::on-cpu.
> >>
> >> Tested on upstream, UEK7 (5.15-based kernel) and UEK6 (5.4-based).
> >>
> >> Alan Maguire (3):
> >> dtrace: add support for probe-specific prog types, stack skips
> >> fbt: support ".isra.0" suffixed functions
> >> sched: fix on-cpu firing for kernels < 5.16
> >>
> >> libdtrace/dt_bpf.c | 4 +-
> >> libdtrace/dt_cc.c | 2 +-
> >> libdtrace/dt_probe.c | 16 ++++++++
> >> libdtrace/dt_probe.h | 2 +
> >> libdtrace/dt_prov_fbt.c | 84 +++++++++++++++++++++++++++++++-------
> >> libdtrace/dt_prov_rawtp.c | 2 +-
> >> libdtrace/dt_prov_sched.c | 23 +----------
> >> libdtrace/dt_prov_sdt.c | 2 +-
> >> libdtrace/dt_provider.h | 4 ++
> >> libdtrace/dt_provider_tp.c | 12 +++++-
> >> libdtrace/dt_provider_tp.h | 9 +++-
> >> 11 files changed, 117 insertions(+), 43 deletions(-)
> >>
> >> --
> >> 2.43.5
> >>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-11 2:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 14:02 [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 1/3] dtrace: add support for probe-specific prog types, stack skips Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 2/3] fbt: support ".isra.0" suffixed functions Alan Maguire
2024-10-09 14:02 ` [PATCH dtrace 3/3] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
2024-10-09 17:34 ` [PATCH dtrace 0/3] fprobe fallback kprobe support, sched fix Kris Van Hees
2024-10-10 21:03 ` Alan Maguire
2024-10-11 2:47 ` Kris Van Hees
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox