* [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free
2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
2024-11-14 5:26 ` [DTrace-devel] " Kris Van Hees
2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
It will be used to store suffixed function name for kprobe attach to
"foo.isra.0"-style functions. Such functions use the unsuffixed form
as probe name, so we need to store the full function name to support
attach.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_provider_tp.c | 27 +++++++++++++++++++++++++++
libdtrace/dt_provider_tp.h | 8 ++++++++
2 files changed, 35 insertions(+)
diff --git a/libdtrace/dt_provider_tp.c b/libdtrace/dt_provider_tp.c
index 50df2328..fc3c216a 100644
--- a/libdtrace/dt_provider_tp.c
+++ b/libdtrace/dt_provider_tp.c
@@ -33,6 +33,7 @@
struct tp_probe {
uint32_t event_id; /* tracepoint event id or BTF id */
int event_fd; /* tracepoint perf event fd */
+ void *event_data; /* optional data */
};
/*
@@ -313,6 +314,8 @@ dt_tp_detach(dtrace_hdl_t *dtp, tp_probe_t *tpp)
void
dt_tp_destroy(dtrace_hdl_t *dtp, tp_probe_t *tpp)
{
+ if (tpp && tpp->event_data)
+ free(tpp->event_data);
dt_free(dtp, tpp);
}
@@ -334,6 +337,30 @@ dt_tp_probe_insert(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
return dt_probe_insert(dtp, prov, prv, mod, fun, prb, tpp);
}
+dt_probe_t *
+dt_tp_probe_insert_data(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
+ const char *mod, const char *fun, const char *prb,
+ void *data)
+{
+ tp_probe_t *tpp;
+
+ dt_probe_t *prp = dt_tp_probe_insert(dtp, prov, prv, mod, fun, prb);
+
+ if (prp) {
+ tpp = prp->prv_data;
+ tpp->event_data = data;
+ }
+ return prp;
+}
+
+void *
+dt_tp_get_event_data(const dt_probe_t *prp)
+{
+ tp_probe_t *tpp = prp->prv_data;
+
+ return tpp->event_data;
+}
+
uint32_t
dt_tp_get_event_id(const dt_probe_t *prp)
{
diff --git a/libdtrace/dt_provider_tp.h b/libdtrace/dt_provider_tp.h
index e7d52876..66a9bd80 100644
--- a/libdtrace/dt_provider_tp.h
+++ b/libdtrace/dt_provider_tp.h
@@ -41,6 +41,14 @@ extern struct dt_probe *dt_tp_probe_insert(dtrace_hdl_t *dtp,
dt_provider_t *prov,
const char *prv, const char *mod,
const char *fun, const char *prb);
+extern struct dt_probe *dt_tp_probe_insert_data(dtrace_hdl_t *dtp,
+ dt_provider_t *prov,
+ const char *prv,
+ const char *mod,
+ const char *fun,
+ const char *prb,
+ void *data);
+extern void *dt_tp_get_event_data(const struct dt_probe *prp);
extern uint32_t dt_tp_get_event_id(const struct dt_probe *prp);
extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id);
extern int dt_tp_probe_info(dtrace_hdl_t *dtp, FILE *f, int skip,
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [DTrace-devel] [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free
2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
@ 2024-11-14 5:26 ` Kris Van Hees
2024-11-14 16:45 ` Alan Maguire
0 siblings, 1 reply; 12+ messages in thread
From: Kris Van Hees @ 2024-11-14 5:26 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
I don't think this is the right approach. At least, with other code (uprobe
provider) we went the route of having the provider declare its own custom
struct to store provider-specific probe info along with a pointer to the
tp_probe_t (tracepoint probe info). I think we should do the same here.
I'm preparing a modified patch for this as part of a small series to fix a
few design issues with function naming in dt_provider_tp, to show what I mean.
I'll continue reviewing the rest of this series based on that upcomming change.
E.g. here we can use:
struct fbt_probe {
char *tp_name;
tp_probe_t *tp;
} fbt_probe_t;
and an fbt_probe_t instance will be passed to dt_probe_insert( and thus will
be available as prp->prv_data. Whenevef this provider needs to do anything
with an FBT probe's underlying tracepoint, it can be done by calling the
appropriate dt_tp_* function and passing in ((fbt_probe_t *prp->prv_data)->tp.
Anyway, series coming (hopefully) tomorrow.
On Wed, Oct 16, 2024 at 04:54:06PM +0100, Alan Maguire via DTrace-devel wrote:
> It will be used to store suffixed function name for kprobe attach to
> "foo.isra.0"-style functions. Such functions use the unsuffixed form
> as probe name, so we need to store the full function name to support
> attach.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> libdtrace/dt_provider_tp.c | 27 +++++++++++++++++++++++++++
> libdtrace/dt_provider_tp.h | 8 ++++++++
> 2 files changed, 35 insertions(+)
>
> diff --git a/libdtrace/dt_provider_tp.c b/libdtrace/dt_provider_tp.c
> index 50df2328..fc3c216a 100644
> --- a/libdtrace/dt_provider_tp.c
> +++ b/libdtrace/dt_provider_tp.c
> @@ -33,6 +33,7 @@
> struct tp_probe {
> uint32_t event_id; /* tracepoint event id or BTF id */
> int event_fd; /* tracepoint perf event fd */
> + void *event_data; /* optional data */
> };
>
> /*
> @@ -313,6 +314,8 @@ dt_tp_detach(dtrace_hdl_t *dtp, tp_probe_t *tpp)
> void
> dt_tp_destroy(dtrace_hdl_t *dtp, tp_probe_t *tpp)
> {
> + if (tpp && tpp->event_data)
> + free(tpp->event_data);
> dt_free(dtp, tpp);
> }
>
> @@ -334,6 +337,30 @@ dt_tp_probe_insert(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
> return dt_probe_insert(dtp, prov, prv, mod, fun, prb, tpp);
> }
>
> +dt_probe_t *
> +dt_tp_probe_insert_data(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
> + const char *mod, const char *fun, const char *prb,
> + void *data)
> +{
> + tp_probe_t *tpp;
> +
> + dt_probe_t *prp = dt_tp_probe_insert(dtp, prov, prv, mod, fun, prb);
> +
> + if (prp) {
> + tpp = prp->prv_data;
> + tpp->event_data = data;
> + }
> + return prp;
> +}
> +
> +void *
> +dt_tp_get_event_data(const dt_probe_t *prp)
> +{
> + tp_probe_t *tpp = prp->prv_data;
> +
> + return tpp->event_data;
> +}
> +
> uint32_t
> dt_tp_get_event_id(const dt_probe_t *prp)
> {
> diff --git a/libdtrace/dt_provider_tp.h b/libdtrace/dt_provider_tp.h
> index e7d52876..66a9bd80 100644
> --- a/libdtrace/dt_provider_tp.h
> +++ b/libdtrace/dt_provider_tp.h
> @@ -41,6 +41,14 @@ extern struct dt_probe *dt_tp_probe_insert(dtrace_hdl_t *dtp,
> dt_provider_t *prov,
> const char *prv, const char *mod,
> const char *fun, const char *prb);
> +extern struct dt_probe *dt_tp_probe_insert_data(dtrace_hdl_t *dtp,
> + dt_provider_t *prov,
> + const char *prv,
> + const char *mod,
> + const char *fun,
> + const char *prb,
> + void *data);
> +extern void *dt_tp_get_event_data(const struct dt_probe *prp);
> extern uint32_t dt_tp_get_event_id(const struct dt_probe *prp);
> extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id);
> extern int dt_tp_probe_info(dtrace_hdl_t *dtp, FILE *f, int skip,
> --
> 2.43.5
>
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [DTrace-devel] [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free
2024-11-14 5:26 ` [DTrace-devel] " Kris Van Hees
@ 2024-11-14 16:45 ` Alan Maguire
0 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-11-14 16:45 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 14/11/2024 05:26, Kris Van Hees wrote:
> I don't think this is the right approach. At least, with other code (uprobe
> provider) we went the route of having the provider declare its own custom
> struct to store provider-specific probe info along with a pointer to the
> tp_probe_t (tracepoint probe info). I think we should do the same here.
>
> I'm preparing a modified patch for this as part of a small series to fix a
> few design issues with function naming in dt_provider_tp, to show what I mean.
>
> I'll continue reviewing the rest of this series based on that upcomming change.
>
> E.g. here we can use:
> struct fbt_probe {
> char *tp_name;
> tp_probe_t *tp;
> } fbt_probe_t;
>
> and an fbt_probe_t instance will be passed to dt_probe_insert( and thus will
> be available as prp->prv_data. Whenevef this provider needs to do anything
> with an FBT probe's underlying tracepoint, it can be done by calling the
> appropriate dt_tp_* function and passing in ((fbt_probe_t *prp->prv_data)->tp.
>
> Anyway, series coming (hopefully) tomorrow.
>
Sounds good, thanks!
Alan
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes
2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
2024-11-28 2:22 ` Kris Van Hees
2024-10-16 15:54 ` [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes Alan Maguire
` (2 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
gcc adds suffixes when it carries out optimizations, but often
these leave parameters to functions intact. Many of these functions
(like finish_task_switch()) are important for tracing (and they
are present in available_filter_functions so are traceable) so it is
valuable to support probing them. For kprobes, the probe name removes
the "." suffix, so store it as additional tp event data for kprobe
attach (attach time is the only time we need the full function name).
fprobe does not support such functions, so the kprobe impl is always
used in such cases.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_prov_fbt.c | 76 ++++++++++++++++++++++++++++++++---------
1 file changed, 59 insertions(+), 17 deletions(-)
diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 21f63ddf..06ea78ca 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -21,6 +21,7 @@
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
+#include <port.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -63,8 +64,8 @@ static const dtrace_pattr_t pattr = {
*/
static int populate(dtrace_hdl_t *dtp)
{
- dt_provider_t *prv;
- dt_provimpl_t *impl;
+ dt_provider_t *default_prv, *kprobe_prv = NULL;
+ dt_provimpl_t *default_impl;
FILE *f;
char *buf = NULL;
char *p;
@@ -73,17 +74,25 @@ static int populate(dtrace_hdl_t *dtp)
dtrace_syminfo_t sip;
dtrace_probedesc_t pd;
- impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
+ default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
- prv = dt_provider_create(dtp, prvname, impl, &pattr, NULL);
- if (prv == NULL)
+ default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
+ if (default_prv == NULL)
return -1; /* errno already set */
+ if (default_impl == &dt_fbt_kprobe)
+ kprobe_prv = default_prv;
f = fopen(PROBE_LIST, "r");
if (f == NULL)
return 0;
while (getline(&buf, &n, f) >= 0) {
+ dt_provimpl_t *impl = default_impl;
+ dt_provider_t *prv = default_prv;
+ char fun[DTRACE_FUNCNAMELEN] = {};
+ void *entry_data = NULL;
+ void *return_data = NULL;
+
/*
* Here buf is either "funcname\n" or "funcname [modname]\n".
* The last line may not have a linefeed.
@@ -106,10 +115,6 @@ static int populate(dtrace_hdl_t *dtp)
p++;
}
- /* Weed out synthetic symbol names (that are invalid). */
- if (strchr(buf, '.') != NULL)
- continue;
-
#define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
/* Weed out __ftrace_invalid_address___* entries. */
if (strstarts(buf, "__ftrace_invalid_address__") ||
@@ -129,6 +134,31 @@ static int populate(dtrace_hdl_t *dtp)
} else
mod = p;
+ /* '.'-suffixed functions (e.g. foo.isra.0) must always be
+ * kprobe-backed as fprobe does not support them. They
+ * are represented with their unsuffixed names however, so
+ * the full name including suffix is stored as data associated
+ * with the tp event.
+ */
+ strlcpy(fun, buf, sizeof(fun));
+ if (strchr(buf, '.') != NULL) {
+ char *suffix;
+
+ impl = &dt_fbt_kprobe;
+ if (!kprobe_prv) {
+ kprobe_prv = dt_provider_create(dtp, prvname,
+ impl, &pattr,
+ NULL);
+ if (kprobe_prv == NULL)
+ return -1;
+ }
+ prv = kprobe_prv;
+ suffix = strchr(fun, '.');
+ *suffix = '\0';
+ entry_data = strdup(buf);
+ return_data = strdup(buf);
+ }
+
/*
* Due to the lack of module names in
* TRACEFS/available_filter_functions, there are some duplicate
@@ -138,14 +168,16 @@ static int populate(dtrace_hdl_t *dtp)
pd.id = DTRACE_IDNONE;
pd.prv = prvname;
pd.mod = mod;
- pd.fun = buf;
+ pd.fun = fun;
pd.prb = "entry";
if (dt_probe_lookup(dtp, &pd) != NULL)
continue;
- if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "entry"))
+ if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
+ "entry", entry_data))
n++;
- if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "return"))
+ if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
+ "return", return_data))
n++;
}
@@ -363,10 +395,11 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
{
if (!dt_tp_probe_has_info(prp)) {
- char *fn;
- FILE *f;
- size_t len;
- int fd, rc = -1;
+ char *fn;
+ const char *fun;
+ FILE *f;
+ size_t len;
+ int fd, rc = -1;
/*
* Register the kprobe with the tracing subsystem. This will
@@ -376,9 +409,18 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
if (fd == -1)
return -ENOENT;
+ /* If actual function name contained a '.', it is stored in
+ * tp data; use non-suffix name for event as event names
+ * cannot contain a '.'. User-visible probe name does
+ * not contain a '.'.
+ */
+ fun = dt_tp_get_event_data(prp);
+ if (!fun)
+ fun = prp->desc->fun;
+
rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
prp->desc->prb[0] == 'e' ? 'p' : 'r',
- FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
+ FBT_GROUP_DATA, prp->desc->fun, fun);
close(fd);
if (rc == -1)
return -ENOENT;
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes
2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
@ 2024-11-28 2:22 ` Kris Van Hees
2024-11-28 9:58 ` Alan Maguire
0 siblings, 1 reply; 12+ messages in thread
From: Kris Van Hees @ 2024-11-28 2:22 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
I am still workimg through these patches (and writing some additional
support patches). But one annoying detail has come up... kprobes do
not provide argument type data whereas we do have that for fprobes.
So, if there is a function (like find_bio_block) that can be probed
with fprobe and there is also a find_bio_block.isra.0 optimized form
that can be probed with probe, they *cannot* be represented by the
same DTrace probe because one has known argument data and the other
does not. And it does not seem safe to assume that the optimized
form will have the same signature as the regular function.
Even using the dependent probe mechanism won't help here because that
is based on being able to convert the arguments of the OS level probe
intp arguments for the dependent DTrace probe. But in this case the
OS level probe (kprobe on find_bio_block.isra.0) has no defined
arguments (i.e. they are unknown - lack of prototype data) yet the
dependent DTrace probe expects to be able to populate arguments.
I think that the only way to really get around this (at least until
prototype information is made available for the optimized functions,
if ever) is to have two distinct probes, and requiring the user to
specify them explicitly, e.g.
fbt::find_bio_block:entry, fbt::find_bio_block.*:entry
Another option would be to introduce a fbt-opt provider so you can
write:
fbt::find_bio_block:entry, fbt-opt::find_bio_block:entry
or even (although it might conflict with USDT probes):
fbt*::find_bio_block:entry
and the fbt-opt provider can place kprobes on any find_bio_block.*
symbols and they all would get reported as the same
fbt-opt::find_bio_block:entry probe firing.
Thoughts?
On Wed, Oct 16, 2024 at 04:54:07PM +0100, Alan Maguire wrote:
> gcc adds suffixes when it carries out optimizations, but often
> these leave parameters to functions intact. Many of these functions
> (like finish_task_switch()) are important for tracing (and they
> are present in available_filter_functions so are traceable) so it is
> valuable to support probing them. For kprobes, the probe name removes
> the "." suffix, so store it as additional tp event data for kprobe
> attach (attach time is the only time we need the full function name).
>
> fprobe does not support such functions, so the kprobe impl is always
> used in such cases.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> libdtrace/dt_prov_fbt.c | 76 ++++++++++++++++++++++++++++++++---------
> 1 file changed, 59 insertions(+), 17 deletions(-)
>
> diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
> index 21f63ddf..06ea78ca 100644
> --- a/libdtrace/dt_prov_fbt.c
> +++ b/libdtrace/dt_prov_fbt.c
> @@ -21,6 +21,7 @@
> #include <assert.h>
> #include <errno.h>
> #include <fcntl.h>
> +#include <port.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> @@ -63,8 +64,8 @@ static const dtrace_pattr_t pattr = {
> */
> static int populate(dtrace_hdl_t *dtp)
> {
> - dt_provider_t *prv;
> - dt_provimpl_t *impl;
> + dt_provider_t *default_prv, *kprobe_prv = NULL;
> + dt_provimpl_t *default_impl;
> FILE *f;
> char *buf = NULL;
> char *p;
> @@ -73,17 +74,25 @@ static int populate(dtrace_hdl_t *dtp)
> dtrace_syminfo_t sip;
> dtrace_probedesc_t pd;
>
> - impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
> + default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
>
> - prv = dt_provider_create(dtp, prvname, impl, &pattr, NULL);
> - if (prv == NULL)
> + default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
> + if (default_prv == NULL)
> return -1; /* errno already set */
> + if (default_impl == &dt_fbt_kprobe)
> + kprobe_prv = default_prv;
>
> f = fopen(PROBE_LIST, "r");
> if (f == NULL)
> return 0;
>
> while (getline(&buf, &n, f) >= 0) {
> + dt_provimpl_t *impl = default_impl;
> + dt_provider_t *prv = default_prv;
> + char fun[DTRACE_FUNCNAMELEN] = {};
> + void *entry_data = NULL;
> + void *return_data = NULL;
> +
> /*
> * Here buf is either "funcname\n" or "funcname [modname]\n".
> * The last line may not have a linefeed.
> @@ -106,10 +115,6 @@ static int populate(dtrace_hdl_t *dtp)
> p++;
> }
>
> - /* Weed out synthetic symbol names (that are invalid). */
> - if (strchr(buf, '.') != NULL)
> - continue;
> -
> #define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
> /* Weed out __ftrace_invalid_address___* entries. */
> if (strstarts(buf, "__ftrace_invalid_address__") ||
> @@ -129,6 +134,31 @@ static int populate(dtrace_hdl_t *dtp)
> } else
> mod = p;
>
> + /* '.'-suffixed functions (e.g. foo.isra.0) must always be
> + * kprobe-backed as fprobe does not support them. They
> + * are represented with their unsuffixed names however, so
> + * the full name including suffix is stored as data associated
> + * with the tp event.
> + */
> + strlcpy(fun, buf, sizeof(fun));
> + if (strchr(buf, '.') != NULL) {
> + char *suffix;
> +
> + impl = &dt_fbt_kprobe;
> + if (!kprobe_prv) {
> + kprobe_prv = dt_provider_create(dtp, prvname,
> + impl, &pattr,
> + NULL);
> + if (kprobe_prv == NULL)
> + return -1;
> + }
> + prv = kprobe_prv;
> + suffix = strchr(fun, '.');
> + *suffix = '\0';
> + entry_data = strdup(buf);
> + return_data = strdup(buf);
> + }
> +
> /*
> * Due to the lack of module names in
> * TRACEFS/available_filter_functions, there are some duplicate
> @@ -138,14 +168,16 @@ static int populate(dtrace_hdl_t *dtp)
> pd.id = DTRACE_IDNONE;
> pd.prv = prvname;
> pd.mod = mod;
> - pd.fun = buf;
> + pd.fun = fun;
> pd.prb = "entry";
> if (dt_probe_lookup(dtp, &pd) != NULL)
> continue;
>
> - if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "entry"))
> + if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
> + "entry", entry_data))
> n++;
> - if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "return"))
> + if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
> + "return", return_data))
> n++;
> }
>
> @@ -363,10 +395,11 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
> static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> {
> if (!dt_tp_probe_has_info(prp)) {
> - char *fn;
> - FILE *f;
> - size_t len;
> - int fd, rc = -1;
> + char *fn;
> + const char *fun;
> + FILE *f;
> + size_t len;
> + int fd, rc = -1;
>
> /*
> * Register the kprobe with the tracing subsystem. This will
> @@ -376,9 +409,18 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
> if (fd == -1)
> return -ENOENT;
>
> + /* If actual function name contained a '.', it is stored in
> + * tp data; use non-suffix name for event as event names
> + * cannot contain a '.'. User-visible probe name does
> + * not contain a '.'.
> + */
> + fun = dt_tp_get_event_data(prp);
> + if (!fun)
> + fun = prp->desc->fun;
> +
> rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
> prp->desc->prb[0] == 'e' ? 'p' : 'r',
> - FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
> + FBT_GROUP_DATA, prp->desc->fun, fun);
> close(fd);
> if (rc == -1)
> return -ENOENT;
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes
2024-11-28 2:22 ` Kris Van Hees
@ 2024-11-28 9:58 ` Alan Maguire
0 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-11-28 9:58 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 28/11/2024 02:22, Kris Van Hees wrote:
> I am still workimg through these patches (and writing some additional
> support patches). But one annoying detail has come up... kprobes do
> not provide argument type data whereas we do have that for fprobes.
> So, if there is a function (like find_bio_block) that can be probed
> with fprobe and there is also a find_bio_block.isra.0 optimized form
> that can be probed with probe, they *cannot* be represented by the
> same DTrace probe because one has known argument data and the other
> does not. And it does not seem safe to assume that the optimized
> form will have the same signature as the regular function.
>
> Even using the dependent probe mechanism won't help here because that
> is based on being able to convert the arguments of the OS level probe
> intp arguments for the dependent DTrace probe. But in this case the
> OS level probe (kprobe on find_bio_block.isra.0) has no defined
> arguments (i.e. they are unknown - lack of prototype data) yet the
> dependent DTrace probe expects to be able to populate arguments.
>
> I think that the only way to really get around this (at least until
> prototype information is made available for the optimized functions,
> if ever) is to have two distinct probes, and requiring the user to
> specify them explicitly, e.g.
>
> fbt::find_bio_block:entry, fbt::find_bio_block.*:entry
>
> Another option would be to introduce a fbt-opt provider so you can
> write:
>
> fbt::find_bio_block:entry, fbt-opt::find_bio_block:entry
>
> or even (although it might conflict with USDT probes):
>
> fbt*::find_bio_block:entry
>
> and the fbt-opt provider can place kprobes on any find_bio_block.*
> symbols and they all would get reported as the same
> fbt-opt::find_bio_block:entry probe firing.
>
> Thoughts?
>
I'm sort of wondering if we go even further and introduce a separate
kprobe provider. As well as not providing argument types, kprobes also
have a feature fentry does not - we can probe at function+offset. So
conceptually they are a separate thing already really, and though we
fall back to kprobes to provide fbt support when fprobe isn't available,
it might be worth having an explicitly kprobe-based provider since we
can offer different functionality with it. If we had a separate provider
(which still hid the "."-suffixes as these are unstable) we could
support the sched tracing case stably (using
kprobe::finish_task_switch:return) while not getting in the way of
fprobe-based fbt, _and_ covering tracing for those cases that
fprobe-based fbt cannot reach.
With this approach nothing needs to change from the fentry-based fprobe
side; no program-dependent logic handling cases where an unsupported by
fprobe fbt probe is used etc.
>
> On Wed, Oct 16, 2024 at 04:54:07PM +0100, Alan Maguire wrote:
>> gcc adds suffixes when it carries out optimizations, but often
>> these leave parameters to functions intact. Many of these functions
>> (like finish_task_switch()) are important for tracing (and they
>> are present in available_filter_functions so are traceable) so it is
>> valuable to support probing them. For kprobes, the probe name removes
>> the "." suffix, so store it as additional tp event data for kprobe
>> attach (attach time is the only time we need the full function name).
>>
>> fprobe does not support such functions, so the kprobe impl is always
>> used in such cases.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> libdtrace/dt_prov_fbt.c | 76 ++++++++++++++++++++++++++++++++---------
>> 1 file changed, 59 insertions(+), 17 deletions(-)
>>
>> diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
>> index 21f63ddf..06ea78ca 100644
>> --- a/libdtrace/dt_prov_fbt.c
>> +++ b/libdtrace/dt_prov_fbt.c
>> @@ -21,6 +21,7 @@
>> #include <assert.h>
>> #include <errno.h>
>> #include <fcntl.h>
>> +#include <port.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> @@ -63,8 +64,8 @@ static const dtrace_pattr_t pattr = {
>> */
>> static int populate(dtrace_hdl_t *dtp)
>> {
>> - dt_provider_t *prv;
>> - dt_provimpl_t *impl;
>> + dt_provider_t *default_prv, *kprobe_prv = NULL;
>> + dt_provimpl_t *default_impl;
>> FILE *f;
>> char *buf = NULL;
>> char *p;
>> @@ -73,17 +74,25 @@ static int populate(dtrace_hdl_t *dtp)
>> dtrace_syminfo_t sip;
>> dtrace_probedesc_t pd;
>>
>> - impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
>> + default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
>>
>> - prv = dt_provider_create(dtp, prvname, impl, &pattr, NULL);
>> - if (prv == NULL)
>> + default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
>> + if (default_prv == NULL)
>> return -1; /* errno already set */
>> + if (default_impl == &dt_fbt_kprobe)
>> + kprobe_prv = default_prv;
>>
>> f = fopen(PROBE_LIST, "r");
>> if (f == NULL)
>> return 0;
>>
>> while (getline(&buf, &n, f) >= 0) {
>> + dt_provimpl_t *impl = default_impl;
>> + dt_provider_t *prv = default_prv;
>> + char fun[DTRACE_FUNCNAMELEN] = {};
>> + void *entry_data = NULL;
>> + void *return_data = NULL;
>> +
>> /*
>> * Here buf is either "funcname\n" or "funcname [modname]\n".
>> * The last line may not have a linefeed.
>> @@ -106,10 +115,6 @@ static int populate(dtrace_hdl_t *dtp)
>> p++;
>> }
>>
>> - /* Weed out synthetic symbol names (that are invalid). */
>> - if (strchr(buf, '.') != NULL)
>> - continue;
>> -
>> #define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
>> /* Weed out __ftrace_invalid_address___* entries. */
>> if (strstarts(buf, "__ftrace_invalid_address__") ||
>> @@ -129,6 +134,31 @@ static int populate(dtrace_hdl_t *dtp)
>> } else
>> mod = p;
>>
>> + /* '.'-suffixed functions (e.g. foo.isra.0) must always be
>> + * kprobe-backed as fprobe does not support them. They
>> + * are represented with their unsuffixed names however, so
>> + * the full name including suffix is stored as data associated
>> + * with the tp event.
>> + */
>> + strlcpy(fun, buf, sizeof(fun));
>> + if (strchr(buf, '.') != NULL) {
>> + char *suffix;
>> +
>> + impl = &dt_fbt_kprobe;
>> + if (!kprobe_prv) {
>> + kprobe_prv = dt_provider_create(dtp, prvname,
>> + impl, &pattr,
>> + NULL);
>> + if (kprobe_prv == NULL)
>> + return -1;
>> + }
>> + prv = kprobe_prv;
>> + suffix = strchr(fun, '.');
>> + *suffix = '\0';
>> + entry_data = strdup(buf);
>> + return_data = strdup(buf);
>> + }
>> +
>> /*
>> * Due to the lack of module names in
>> * TRACEFS/available_filter_functions, there are some duplicate
>> @@ -138,14 +168,16 @@ static int populate(dtrace_hdl_t *dtp)
>> pd.id = DTRACE_IDNONE;
>> pd.prv = prvname;
>> pd.mod = mod;
>> - pd.fun = buf;
>> + pd.fun = fun;
>> pd.prb = "entry";
>> if (dt_probe_lookup(dtp, &pd) != NULL)
>> continue;
>>
>> - if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "entry"))
>> + if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
>> + "entry", entry_data))
>> n++;
>> - if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "return"))
>> + if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
>> + "return", return_data))
>> n++;
>> }
>>
>> @@ -363,10 +395,11 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>> static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>> {
>> if (!dt_tp_probe_has_info(prp)) {
>> - char *fn;
>> - FILE *f;
>> - size_t len;
>> - int fd, rc = -1;
>> + char *fn;
>> + const char *fun;
>> + FILE *f;
>> + size_t len;
>> + int fd, rc = -1;
>>
>> /*
>> * Register the kprobe with the tracing subsystem. This will
>> @@ -376,9 +409,18 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>> if (fd == -1)
>> return -ENOENT;
>>
>> + /* If actual function name contained a '.', it is stored in
>> + * tp data; use non-suffix name for event as event names
>> + * cannot contain a '.'. User-visible probe name does
>> + * not contain a '.'.
>> + */
>> + fun = dt_tp_get_event_data(prp);
>> + if (!fun)
>> + fun = prp->desc->fun;
>> +
>> rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
>> prp->desc->prb[0] == 'e' ? 'p' : 'r',
>> - FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
>> + FBT_GROUP_DATA, prp->desc->fun, fun);
>> close(fd);
>> if (rc == -1)
>> return -ENOENT;
>> --
>> 2.43.5
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes
2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
4 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
If we have a mix of kprobe and fprobe implementations, it causes problems
for DTrace since we cannot guarantee execution in program order.
When fprobes are active, the only place kprobes are used as fbt
implementation is for "."-suffixed functions that cannot be supported
by fprobe. However, by using kprobes for this case, we run the risk of
mixing fprobe and kprobe. So use probe_info callbacks (invoked when
setting context during compilation) to count instances of fbt probe use
for each provider/implementation. If we have a mix of fprobes and kprobes,
revert to using the kprobe implementation for the existing fprobes.
This involves setting provider impl to kprobes, and resetting event
id for any existing fprobes (since it will be set to the BTF id of the
function).
There is no risk in having multiple probes for the same function since
a kprobe for a "."-suffixed function will not be added if the
unsuffixed probe already exists (in practice, such a case is not seen
but is not impossible).
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_prov_fbt.c | 66 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 64 insertions(+), 2 deletions(-)
diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 06ea78ca..259cf674 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -76,7 +76,7 @@ static int populate(dtrace_hdl_t *dtp)
default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
- default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
+ default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, 0);
if (default_prv == NULL)
return -1; /* errno already set */
if (default_impl == &dt_fbt_kprobe)
@@ -148,7 +148,7 @@ static int populate(dtrace_hdl_t *dtp)
if (!kprobe_prv) {
kprobe_prv = dt_provider_create(dtp, prvname,
impl, &pattr,
- NULL);
+ 0);
if (kprobe_prv == NULL)
return -1;
}
@@ -187,6 +187,56 @@ static int populate(dtrace_hdl_t *dtp)
return n;
}
+static int
+fbt_unset_event_id(dtrace_hdl_t *dtp, dt_probe_t *prp, dt_provider_t *prv)
+{
+ if (prp->prov == prv)
+ dt_tp_set_event_id(prp, 0);
+ return 0;
+}
+
+/* Ensure we do not have a mix of kprobe and fprobe implementations for
+ * _actually used_ probes; fall back to using kprobe implementation for
+ * fprobe provider if so.
+ */
+void fbt_check_impl(struct dtrace_hdl *dtp, dt_provider_t *fbt_prv)
+{
+ dt_htab_next_t *it = NULL;
+ dt_provider_t *pvp;
+ dtrace_probedesc_t fbt_desc = { .prv = "fbt",
+ .mod = "",
+ .fun = "",
+ .prb = ""
+ };
+
+
+ /* count number of used probes for provider + impl */
+ fbt_prv->prv_data++;
+
+ /* check other provider + impl (if any) */
+ while ((pvp = dt_htab_next(dtp->dt_provs, &it)) != NULL) {
+ if (pvp == fbt_prv || strcmp(pvp->desc.dtvd_name, "fbt") != 0)
+ continue;
+ if (pvp->prv_data == 0)
+ return;
+ /* already done? */
+ if (pvp->impl == fbt_prv->impl)
+ return;
+ /* multiple fprobe, kprobe probes; revert to kprobes */
+ if (fbt_prv->impl != &dt_fbt_kprobe)
+ pvp = fbt_prv;
+ break;
+ }
+ if (!pvp)
+ return;
+ dt_dprintf("reverting to kprobe impl for %s due to fprobe/kprobe mix\n",
+ pvp->desc.dtvd_name);
+ pvp->impl = &dt_fbt_kprobe;
+ dt_probe_iter(dtp, &fbt_desc, (dt_probe_f *)fbt_unset_event_id, NULL,
+ pvp);
+
+}
+
/*******************************\
* FPROBE-based implementation *
\*******************************/
@@ -296,6 +346,8 @@ static int fprobe_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
goto done;
}
+ fbt_check_impl(dtp, prp->prov);
+
argc = dt_btf_func_argc(dtp, dmp->dm_btf, btf_id);
if (argc == 0)
goto done;
@@ -453,6 +505,15 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
return dt_tp_probe_attach(dtp, prp, bpf_fd);
}
+static int kprobe_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
+ int *argcp, dt_argdesc_t **argvp)
+{
+ fbt_check_impl(dtp, prp->prov);
+ *argcp = 0;
+ *argvp = NULL;
+ return 0;
+}
+
/*
* Try to clean up system resources that may have been allocated for this
* probe.
@@ -502,6 +563,7 @@ dt_provimpl_t dt_fbt_kprobe = {
.load_prog = &dt_bpf_prog_load,
.trampoline = &kprobe_trampoline,
.attach = &kprobe_attach,
+ .probe_info = &kprobe_probe_info,
.detach = &kprobe_detach,
.probe_destroy = &dt_tp_probe_destroy,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16
2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
` (2 preceding siblings ...)
2024-10-16 15:54 ` [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
4 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
The solution for sched:::on-cpu firing (probing
__perf_event_task_sched_in) only works on 5.16 and later as
the relevant function is not in available_filter_functions
in earlier kernels.
Instead use the fbt::finish_task_switch:return probe as this
will cover .isra.0-suffixed optimizations if they are applied
to finish_task_switch().
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_prov_sched.c | 23 ++---------------------
1 file changed, 2 insertions(+), 21 deletions(-)
diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c
index 4fbd24ab..796f39af 100644
--- a/libdtrace/dt_prov_sched.c
+++ b/libdtrace/dt_prov_sched.c
@@ -28,7 +28,7 @@ static probe_dep_t probes[] = {
{ "off-cpu",
DTRACE_PROBESPEC_NAME, "rawtp:sched::sched_switch" },
{ "on-cpu",
- DTRACE_PROBESPEC_NAME, "fbt::__perf_event_task_sched_in:entry" },
+ DTRACE_PROBESPEC_NAME, "fbt:vmlinux:finish_task_switch:return" },
{ "surrender",
DTRACE_PROBESPEC_NAME, "fbt::do_sched_yield:entry" },
{ "tick",
@@ -157,26 +157,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
*/
static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
{
- struct perf_event_attr attr = {};
- int swfd;
-
- if (strcmp(prp->desc->prb, "on-cpu") != 0)
- return dt_sdt_enable(dtp, prp);
-
- memset(&attr, 0, sizeof(attr));
- attr.size = sizeof(attr);
- attr.type = PERF_TYPE_SOFTWARE;
- attr.config = PERF_COUNT_SW_CONTEXT_SWITCHES;
- attr.freq = 1;
- attr.sample_freq = 1000;
- attr.context_switch = 1;
-
- swfd = dt_perf_event_open(&attr, -1, 0, -1, 0);
- if (swfd < 0)
- dt_dprintf("perf event open failed for context_switch: %d\n", errno);
- else
- prp->prv_data = (void *)(long)swfd;
- dt_sdt_enable(dtp, prp);
+ return dt_sdt_enable(dtp, prp);
}
static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
--
2.43.5
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
` (3 preceding siblings ...)
2024-10-16 15:54 ` [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
@ 2024-10-18 15:41 ` Kris Van Hees
2024-10-18 16:15 ` Alan Maguire
4 siblings, 1 reply; 12+ messages in thread
From: Kris Van Hees @ 2024-10-18 15:41 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
Before I review further, I have a question... Do we need to consider the
<func>.<suffix> symbols as separate probes from <func> (at a user level),
or can we group them together? I am hoping that grouping them together
would be the preference if only because the suffix versions result from
compiler optimizations and it is therefore likely that a user would want to
be able to probe <func> and expect it to work even if the compiler decided
to do something under the covers that results in a suffix-variant to also
be created.
Kris
On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
> This series is focused on solving a few issues with fprobe-based
> attachment which prevent us being able to attach to functions
> like finish_task_switch.isra.0. Such functions are present in
> available_filter_functions, and represent real function boundaries
> (since they correspond to the mcount function boundary sites)
> but because they either lack BTF representations, or because
> those BTF representations are named without the .isra suffix, attach
> via fentry/fexit is currently impossible. Falling back to the
> kprobe implementation is the best solution here.
>
> However, for stability, it is best to represent the probes for
> these functions without the ".isra" suffix, so we need to store
> the full function name (with suffix) in the tracepoint data when
> the probe is populated. Patch 1 supports this.
>
> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
> functions. An additional fbt provider with kprobe implementation is
> created to support this (so as not to disturb existing fprobes for other
> functions). At kprobe attach we use the full function name stored
> as tp event data to carry out attach.
>
> Next we need to ensure we do not end up with a mix of kprobes and
> fprobes. Ideally we would do this in a more fine-grained manner, but
> for now just ensure we do not have an fprobe/kprobe mix program-wide.
> When fprobes are active, we will only use kprobes for "."-suffixed
> functions that are used, so in practice such mixes will be relatively
> rare.
>
> As Kris pointed out [1] at compilation time, trampolines have not yet been
> set up, so we can replace the provider underlying fbt at that time.
> The probe_info() callbacks are used to check for a mix of kprobe and
> fprobe implementations; we check for multiple fbt providers which
> have a count of used probes > 0; if this occurs, switch the fbt provider
> using fprobe to use the kprobe implementation and reset any event
> ids associated with fprobes from the BTF id used in fprobes to 0.
>
> Finally we can then use fbt::finish_task_switch:return as the
> dependent probe for sched:::on-cpu, as we now can probe it even
> if it becomes finish_task_switch.isra.0.
>
> So to recap:
>
> Patch 1 supports storing/freeing event data with tp events.
> Patch 2 allows tracing of "."-suffixed functions like
> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
> Patch 4 then uses the fact we can now trace "."-suffixed functions
> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
> as the kprobe dependent event for sched:::on-cpu . This function is
> often optimized to become finish_task_switch.isra.0.
>
> Tested on upstream, 5.15 and 5.4 kernels.
>
> Changes since v2:
>
> - probe function name exposed drops the suffix (Kris, patches 1, 2)
> - restrict kprobe use to "."-suffixed functions; this makes their use
> less likely in the fprobe environment. Do this instead of creating
> a "fake" fprobe probe with kprobe backing (Kris, patch 2)
> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
> wildcard needed now that probe function name is unsuffixed.
>
> Changes since v1:
>
> - simplified approach by just swapping out probe impl when BTF lookup fails
> (Kris, patch 2)
>
> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
>
> Alan Maguire (4):
> dt_provider_tp: add optional event data, freed on tp free
> fbt: support "."-suffixed functions for kprobes
> fbt: avoid mix of kprobe, fprobe implementations for used probes
> sched: fix on-cpu firing for kernels < 5.16
>
> libdtrace/dt_prov_fbt.c | 138 ++++++++++++++++++++++++++++++++-----
> libdtrace/dt_prov_sched.c | 23 +------
> libdtrace/dt_provider_tp.c | 27 ++++++++
> libdtrace/dt_provider_tp.h | 8 +++
> 4 files changed, 158 insertions(+), 38 deletions(-)
>
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
@ 2024-10-18 16:15 ` Alan Maguire
2024-10-18 18:38 ` Kris Van Hees
0 siblings, 1 reply; 12+ messages in thread
From: Alan Maguire @ 2024-10-18 16:15 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 18/10/2024 16:41, Kris Van Hees wrote:
> Before I review further, I have a question... Do we need to consider the
> <func>.<suffix> symbols as separate probes from <func> (at a user level),
> or can we group them together? I am hoping that grouping them together
> would be the preference if only because the suffix versions result from
> compiler optimizations and it is therefore likely that a user would want to
> be able to probe <func> and expect it to work even if the compiler decided
> to do something under the covers that results in a suffix-variant to also
> be created.
>
Right, that's one of the changes in this patch series versus v2. We
expose the probe as <func>, even if it the underlying function has
become <func>.<suffix>. This provides more stability to the user and we
are guaranteed it still represents a function boundary because of its
presence in available_filter_functions (this implies it has an
associated mcount-tagged fentry site).
Alan
> Kris
>
> On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
>> This series is focused on solving a few issues with fprobe-based
>> attachment which prevent us being able to attach to functions
>> like finish_task_switch.isra.0. Such functions are present in
>> available_filter_functions, and represent real function boundaries
>> (since they correspond to the mcount function boundary sites)
>> but because they either lack BTF representations, or because
>> those BTF representations are named without the .isra suffix, attach
>> via fentry/fexit is currently impossible. Falling back to the
>> kprobe implementation is the best solution here.
>>
>> However, for stability, it is best to represent the probes for
>> these functions without the ".isra" suffix, so we need to store
>> the full function name (with suffix) in the tracepoint data when
>> the probe is populated. Patch 1 supports this.
>>
>> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
>> functions. An additional fbt provider with kprobe implementation is
>> created to support this (so as not to disturb existing fprobes for other
>> functions). At kprobe attach we use the full function name stored
>> as tp event data to carry out attach.
>>
>> Next we need to ensure we do not end up with a mix of kprobes and
>> fprobes. Ideally we would do this in a more fine-grained manner, but
>> for now just ensure we do not have an fprobe/kprobe mix program-wide.
>> When fprobes are active, we will only use kprobes for "."-suffixed
>> functions that are used, so in practice such mixes will be relatively
>> rare.
>>
>> As Kris pointed out [1] at compilation time, trampolines have not yet been
>> set up, so we can replace the provider underlying fbt at that time.
>> The probe_info() callbacks are used to check for a mix of kprobe and
>> fprobe implementations; we check for multiple fbt providers which
>> have a count of used probes > 0; if this occurs, switch the fbt provider
>> using fprobe to use the kprobe implementation and reset any event
>> ids associated with fprobes from the BTF id used in fprobes to 0.
>>
>> Finally we can then use fbt::finish_task_switch:return as the
>> dependent probe for sched:::on-cpu, as we now can probe it even
>> if it becomes finish_task_switch.isra.0.
>>
>> So to recap:
>>
>> Patch 1 supports storing/freeing event data with tp events.
>> Patch 2 allows tracing of "."-suffixed functions like
>> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
>> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
>> Patch 4 then uses the fact we can now trace "."-suffixed functions
>> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
>> as the kprobe dependent event for sched:::on-cpu . This function is
>> often optimized to become finish_task_switch.isra.0.
>>
>> Tested on upstream, 5.15 and 5.4 kernels.
>>
>> Changes since v2:
>>
>> - probe function name exposed drops the suffix (Kris, patches 1, 2)
>> - restrict kprobe use to "."-suffixed functions; this makes their use
>> less likely in the fprobe environment. Do this instead of creating
>> a "fake" fprobe probe with kprobe backing (Kris, patch 2)
>> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
>> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
>> wildcard needed now that probe function name is unsuffixed.
>>
>> Changes since v1:
>>
>> - simplified approach by just swapping out probe impl when BTF lookup fails
>> (Kris, patch 2)
>>
>> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
>>
>> Alan Maguire (4):
>> dt_provider_tp: add optional event data, freed on tp free
>> fbt: support "."-suffixed functions for kprobes
>> fbt: avoid mix of kprobe, fprobe implementations for used probes
>> sched: fix on-cpu firing for kernels < 5.16
>>
>> libdtrace/dt_prov_fbt.c | 138 ++++++++++++++++++++++++++++++++-----
>> libdtrace/dt_prov_sched.c | 23 +------
>> libdtrace/dt_provider_tp.c | 27 ++++++++
>> libdtrace/dt_provider_tp.h | 8 +++
>> 4 files changed, 158 insertions(+), 38 deletions(-)
>>
>> --
>> 2.43.5
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
2024-10-18 16:15 ` Alan Maguire
@ 2024-10-18 18:38 ` Kris Van Hees
0 siblings, 0 replies; 12+ messages in thread
From: Kris Van Hees @ 2024-10-18 18:38 UTC (permalink / raw)
To: Alan Maguire; +Cc: Kris Van Hees, dtrace, dtrace-devel
On Fri, Oct 18, 2024 at 05:15:59PM +0100, Alan Maguire wrote:
> On 18/10/2024 16:41, Kris Van Hees wrote:
> > Before I review further, I have a question... Do we need to consider the
> > <func>.<suffix> symbols as separate probes from <func> (at a user level),
> > or can we group them together? I am hoping that grouping them together
> > would be the preference if only because the suffix versions result from
> > compiler optimizations and it is therefore likely that a user would want to
> > be able to probe <func> and expect it to work even if the compiler decided
> > to do something under the covers that results in a suffix-variant to also
> > be created.
> >
>
> Right, that's one of the changes in this patch series versus v2. We
> expose the probe as <func>, even if it the underlying function has
> become <func>.<suffix>. This provides more stability to the user and we
> are guaranteed it still represents a function boundary because of its
> presence in available_filter_functions (this implies it has an
> associated mcount-tagged fentry site).
Good - then I am on the right track. Continuing review - focusing mostly on
whether we currently properly handle multiple FBT probes with the same fully
qualified probe name.
> > On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
> >> This series is focused on solving a few issues with fprobe-based
> >> attachment which prevent us being able to attach to functions
> >> like finish_task_switch.isra.0. Such functions are present in
> >> available_filter_functions, and represent real function boundaries
> >> (since they correspond to the mcount function boundary sites)
> >> but because they either lack BTF representations, or because
> >> those BTF representations are named without the .isra suffix, attach
> >> via fentry/fexit is currently impossible. Falling back to the
> >> kprobe implementation is the best solution here.
> >>
> >> However, for stability, it is best to represent the probes for
> >> these functions without the ".isra" suffix, so we need to store
> >> the full function name (with suffix) in the tracepoint data when
> >> the probe is populated. Patch 1 supports this.
> >>
> >> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
> >> functions. An additional fbt provider with kprobe implementation is
> >> created to support this (so as not to disturb existing fprobes for other
> >> functions). At kprobe attach we use the full function name stored
> >> as tp event data to carry out attach.
> >>
> >> Next we need to ensure we do not end up with a mix of kprobes and
> >> fprobes. Ideally we would do this in a more fine-grained manner, but
> >> for now just ensure we do not have an fprobe/kprobe mix program-wide.
> >> When fprobes are active, we will only use kprobes for "."-suffixed
> >> functions that are used, so in practice such mixes will be relatively
> >> rare.
> >>
> >> As Kris pointed out [1] at compilation time, trampolines have not yet been
> >> set up, so we can replace the provider underlying fbt at that time.
> >> The probe_info() callbacks are used to check for a mix of kprobe and
> >> fprobe implementations; we check for multiple fbt providers which
> >> have a count of used probes > 0; if this occurs, switch the fbt provider
> >> using fprobe to use the kprobe implementation and reset any event
> >> ids associated with fprobes from the BTF id used in fprobes to 0.
> >>
> >> Finally we can then use fbt::finish_task_switch:return as the
> >> dependent probe for sched:::on-cpu, as we now can probe it even
> >> if it becomes finish_task_switch.isra.0.
> >>
> >> So to recap:
> >>
> >> Patch 1 supports storing/freeing event data with tp events.
> >> Patch 2 allows tracing of "."-suffixed functions like
> >> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
> >> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
> >> Patch 4 then uses the fact we can now trace "."-suffixed functions
> >> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
> >> as the kprobe dependent event for sched:::on-cpu . This function is
> >> often optimized to become finish_task_switch.isra.0.
> >>
> >> Tested on upstream, 5.15 and 5.4 kernels.
> >>
> >> Changes since v2:
> >>
> >> - probe function name exposed drops the suffix (Kris, patches 1, 2)
> >> - restrict kprobe use to "."-suffixed functions; this makes their use
> >> less likely in the fprobe environment. Do this instead of creating
> >> a "fake" fprobe probe with kprobe backing (Kris, patch 2)
> >> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
> >> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
> >> wildcard needed now that probe function name is unsuffixed.
> >>
> >> Changes since v1:
> >>
> >> - simplified approach by just swapping out probe impl when BTF lookup fails
> >> (Kris, patch 2)
> >>
> >> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
> >>
> >> Alan Maguire (4):
> >> dt_provider_tp: add optional event data, freed on tp free
> >> fbt: support "."-suffixed functions for kprobes
> >> fbt: avoid mix of kprobe, fprobe implementations for used probes
> >> sched: fix on-cpu firing for kernels < 5.16
> >>
> >> libdtrace/dt_prov_fbt.c | 138 ++++++++++++++++++++++++++++++++-----
> >> libdtrace/dt_prov_sched.c | 23 +------
> >> libdtrace/dt_provider_tp.c | 27 ++++++++
> >> libdtrace/dt_provider_tp.h | 8 +++
> >> 4 files changed, 158 insertions(+), 38 deletions(-)
> >>
> >> --
> >> 2.43.5
> >>
^ permalink raw reply [flat|nested] 12+ messages in thread