public inbox for dtrace@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
@ 2024-10-16 15:54 Alan Maguire
  2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
  To: dtrace; +Cc: dtrace-devel, Alan Maguire

This series is focused on solving a few issues with fprobe-based
attachment which prevent us being able to attach to functions
like finish_task_switch.isra.0.  Such functions are present in
available_filter_functions, and represent real function boundaries
(since they correspond to the mcount function boundary sites)
but because they either lack BTF representations, or because
those BTF representations are named without the .isra suffix, attach
via fentry/fexit is currently impossible.  Falling back to the
kprobe implementation is the best solution here.

However, for stability, it is best to represent the probes for
these functions without the ".isra" suffix, so we need to store
the full function name (with suffix) in the tracepoint data when
the probe is populated.  Patch 1 supports this.

Patch 2 ensures that we use kprobe implementation for any "."-suffixed
functions.  An additional fbt provider with kprobe implementation is
created to support this (so as not to  disturb existing fprobes for other
functions).  At kprobe attach we use the full function name stored
as tp event data to carry out attach.

Next we need to ensure we do not end up with a mix of kprobes and
fprobes.  Ideally we would do this in a more fine-grained manner, but
for now just ensure we do not have an fprobe/kprobe mix program-wide.
When fprobes are active, we will only use kprobes for "."-suffixed
functions that are used, so in practice such mixes will be relatively
rare.

As Kris pointed out [1] at compilation time, trampolines have not yet been
set up, so we can replace the provider underlying fbt at that time.
The probe_info() callbacks are used to check for a mix of kprobe and
fprobe implementations; we check for multiple fbt providers which
have a count of used probes > 0; if this occurs, switch the fbt provider
using fprobe to use the kprobe implementation and reset any event
ids associated with fprobes from the BTF id used in fprobes to 0.

Finally we can then use fbt::finish_task_switch:return as the
dependent probe for sched:::on-cpu, as we now can probe it even
if it becomes finish_task_switch.isra.0.

So to recap:

Patch 1 supports storing/freeing event data with tp events.
Patch 2 allows tracing of "."-suffixed functions like
finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
Patch 3 ensures we do not end up with a kprobe/fprobe mix.
Patch 4 then uses the fact we can now trace "."-suffixed functions
(with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
as the kprobe dependent event for sched:::on-cpu  . This function is
often optimized to become finish_task_switch.isra.0.

Tested on upstream, 5.15 and 5.4 kernels.

Changes since v2:

- probe function name exposed drops the suffix (Kris, patches 1, 2)
- restrict kprobe use to "."-suffixed functions; this makes their use
  less likely in the fprobe environment.  Do this instead of creating
  a "fake" fprobe probe with kprobe backing (Kris, patch 2)
- modify fallback logic to handle kprobe/fprobe mix (patch 3)
- modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
  wildcard needed now that probe function name is unsuffixed.

Changes since v1:

- simplified approach by just swapping out probe impl when BTF lookup fails
  (Kris, patch 2)

[1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/

Alan Maguire (4):
  dt_provider_tp: add optional event data, freed on tp free
  fbt: support "."-suffixed functions for kprobes
  fbt: avoid mix of kprobe, fprobe implementations for used probes
  sched: fix on-cpu firing for kernels < 5.16

 libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
 libdtrace/dt_prov_sched.c  |  23 +------
 libdtrace/dt_provider_tp.c |  27 ++++++++
 libdtrace/dt_provider_tp.h |   8 +++
 4 files changed, 158 insertions(+), 38 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free
  2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
  2024-11-14  5:26   ` [DTrace-devel] " Kris Van Hees
  2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
  To: dtrace; +Cc: dtrace-devel, Alan Maguire

It will be used to store suffixed function name for kprobe attach to
"foo.isra.0"-style functions.  Such functions use the unsuffixed form
as probe name, so we need to store the full function name to support
attach.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 libdtrace/dt_provider_tp.c | 27 +++++++++++++++++++++++++++
 libdtrace/dt_provider_tp.h |  8 ++++++++
 2 files changed, 35 insertions(+)

diff --git a/libdtrace/dt_provider_tp.c b/libdtrace/dt_provider_tp.c
index 50df2328..fc3c216a 100644
--- a/libdtrace/dt_provider_tp.c
+++ b/libdtrace/dt_provider_tp.c
@@ -33,6 +33,7 @@
 struct tp_probe {
 	uint32_t	event_id;	/* tracepoint event id or BTF id */
 	int		event_fd;	/* tracepoint perf event fd */
+	void		*event_data;	/* optional data */
 };
 
 /*
@@ -313,6 +314,8 @@ dt_tp_detach(dtrace_hdl_t *dtp, tp_probe_t *tpp)
 void
 dt_tp_destroy(dtrace_hdl_t *dtp, tp_probe_t *tpp)
 {
+	if (tpp && tpp->event_data)
+		free(tpp->event_data);
 	dt_free(dtp, tpp);
 }
 
@@ -334,6 +337,30 @@ dt_tp_probe_insert(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
 	return dt_probe_insert(dtp, prov, prv, mod, fun, prb, tpp);
 }
 
+dt_probe_t *
+dt_tp_probe_insert_data(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
+			const char *mod, const char *fun, const char *prb,
+			void *data)
+{
+	tp_probe_t	*tpp;
+
+	dt_probe_t *prp = dt_tp_probe_insert(dtp, prov, prv, mod, fun, prb);
+
+	if (prp) {
+		tpp = prp->prv_data;
+		tpp->event_data = data;
+	}
+	return prp;
+}
+
+void *
+dt_tp_get_event_data(const dt_probe_t *prp)
+{
+	tp_probe_t	*tpp = prp->prv_data;
+
+	return tpp->event_data;
+}
+
 uint32_t
 dt_tp_get_event_id(const dt_probe_t *prp)
 {
diff --git a/libdtrace/dt_provider_tp.h b/libdtrace/dt_provider_tp.h
index e7d52876..66a9bd80 100644
--- a/libdtrace/dt_provider_tp.h
+++ b/libdtrace/dt_provider_tp.h
@@ -41,6 +41,14 @@ extern struct dt_probe *dt_tp_probe_insert(dtrace_hdl_t *dtp,
 					   dt_provider_t *prov,
 					   const char *prv, const char *mod,
 					   const char *fun, const char *prb);
+extern struct dt_probe *dt_tp_probe_insert_data(dtrace_hdl_t *dtp,
+						dt_provider_t *prov,
+						const char *prv,
+						const char *mod,
+						const char *fun,
+						const char *prb,
+						void *data);
+extern void *dt_tp_get_event_data(const struct dt_probe *prp);
 extern uint32_t dt_tp_get_event_id(const struct dt_probe *prp);
 extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id);
 extern int dt_tp_probe_info(dtrace_hdl_t *dtp, FILE *f, int skip,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes
  2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
  2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
  2024-11-28  2:22   ` Kris Van Hees
  2024-10-16 15:54 ` [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes Alan Maguire
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
  To: dtrace; +Cc: dtrace-devel, Alan Maguire

gcc adds suffixes when it carries out optimizations, but often
these leave parameters to functions intact.  Many of these functions
(like finish_task_switch()) are important for tracing (and they
are present in available_filter_functions so are traceable) so it is
valuable to support probing them.  For kprobes, the probe name removes
the "." suffix, so store it as additional tp event data for kprobe
attach (attach time is the only time we need the full function name).

fprobe does not support such functions, so the kprobe impl is always
used in such cases.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 libdtrace/dt_prov_fbt.c | 76 ++++++++++++++++++++++++++++++++---------
 1 file changed, 59 insertions(+), 17 deletions(-)

diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 21f63ddf..06ea78ca 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -21,6 +21,7 @@
 #include <assert.h>
 #include <errno.h>
 #include <fcntl.h>
+#include <port.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -63,8 +64,8 @@ static const dtrace_pattr_t	pattr = {
  */
 static int populate(dtrace_hdl_t *dtp)
 {
-	dt_provider_t		*prv;
-	dt_provimpl_t		*impl;
+	dt_provider_t		*default_prv, *kprobe_prv = NULL;
+	dt_provimpl_t		*default_impl;
 	FILE			*f;
 	char			*buf = NULL;
 	char			*p;
@@ -73,17 +74,25 @@ static int populate(dtrace_hdl_t *dtp)
 	dtrace_syminfo_t	sip;
 	dtrace_probedesc_t	pd;
 
-	impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
+	default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
 
-	prv = dt_provider_create(dtp, prvname, impl, &pattr, NULL);
-	if (prv == NULL)
+	default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
+	if (default_prv == NULL)
 		return -1;			/* errno already set */
+	if (default_impl == &dt_fbt_kprobe)
+		kprobe_prv = default_prv;
 
 	f = fopen(PROBE_LIST, "r");
 	if (f == NULL)
 		return 0;
 
 	while (getline(&buf, &n, f) >= 0) {
+		dt_provimpl_t	*impl = default_impl;
+		dt_provider_t	*prv = default_prv;
+		char		fun[DTRACE_FUNCNAMELEN] = {};
+		void		*entry_data = NULL;
+		void		*return_data = NULL;
+
 		/*
 		 * Here buf is either "funcname\n" or "funcname [modname]\n".
 		 * The last line may not have a linefeed.
@@ -106,10 +115,6 @@ static int populate(dtrace_hdl_t *dtp)
 				p++;
 		}
 
-		/* Weed out synthetic symbol names (that are invalid). */
-		if (strchr(buf, '.') != NULL)
-			continue;
-
 #define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
 		/* Weed out __ftrace_invalid_address___* entries. */
 		if (strstarts(buf, "__ftrace_invalid_address__") ||
@@ -129,6 +134,31 @@ static int populate(dtrace_hdl_t *dtp)
 		} else
 			mod = p;
 
+		/* '.'-suffixed functions (e.g. foo.isra.0) must always be
+		 * kprobe-backed as fprobe does not support them.  They
+		 * are represented with their unsuffixed names however, so
+		 * the full name including suffix is stored as data associated
+		 * with the tp event.
+		 */
+		strlcpy(fun, buf, sizeof(fun));
+		if (strchr(buf, '.') != NULL) {
+			char *suffix;
+
+			impl = &dt_fbt_kprobe;
+			if (!kprobe_prv) {
+				kprobe_prv = dt_provider_create(dtp, prvname,
+								impl, &pattr,
+								NULL);
+				if (kprobe_prv == NULL)
+					return -1;
+			}
+			prv = kprobe_prv;
+			suffix = strchr(fun, '.');
+			*suffix = '\0';
+			entry_data = strdup(buf);
+			return_data = strdup(buf);
+		}
+
 		/*
 		 * Due to the lack of module names in
 		 * TRACEFS/available_filter_functions, there are some duplicate
@@ -138,14 +168,16 @@ static int populate(dtrace_hdl_t *dtp)
 		pd.id = DTRACE_IDNONE;
 		pd.prv = prvname;
 		pd.mod = mod;
-		pd.fun = buf;
+		pd.fun = fun;
 		pd.prb = "entry";
 		if (dt_probe_lookup(dtp, &pd) != NULL)
 			continue;
 
-		if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "entry"))
+		if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
+					    "entry", entry_data))
 			n++;
-		if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "return"))
+		if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
+					    "return", return_data))
 			n++;
 	}
 
@@ -363,10 +395,11 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
 static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
 {
 	if (!dt_tp_probe_has_info(prp)) {
-		char	*fn;
-		FILE	*f;
-		size_t	len;
-		int	fd, rc = -1;
+		char		*fn;
+		const char	*fun;
+		FILE		*f;
+		size_t		len;
+		int		fd, rc = -1;
 
 		/*
 		 * Register the kprobe with the tracing subsystem.  This will
@@ -376,9 +409,18 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
 		if (fd == -1)
 			return -ENOENT;
 
+		/* If actual function name contained a '.', it is stored in
+		 * tp data; use non-suffix name for event as event names
+		 * cannot contain a '.'.  User-visible probe name does
+		 * not contain a '.'.
+		 */
+		fun = dt_tp_get_event_data(prp);
+		if (!fun)
+			fun = prp->desc->fun;
+
 		rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
 			     prp->desc->prb[0] == 'e' ? 'p' : 'r',
-			     FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
+			     FBT_GROUP_DATA, prp->desc->fun, fun);
 		close(fd);
 		if (rc == -1)
 			return -ENOENT;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes
  2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
  2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
  2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
  2024-10-16 15:54 ` [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
  2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
  4 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
  To: dtrace; +Cc: dtrace-devel, Alan Maguire

If we have a mix of kprobe and fprobe implementations, it causes problems
for DTrace since we cannot guarantee execution in program order.

When fprobes are active, the only place kprobes are used as fbt
implementation is for "."-suffixed functions that cannot be supported
by fprobe.  However, by using kprobes for this case, we run the risk of
mixing fprobe and kprobe.  So use probe_info callbacks (invoked when
setting context during compilation) to count instances of fbt probe use
for each provider/implementation.  If we have a mix of fprobes and kprobes,
revert to using the kprobe implementation for the existing fprobes.
This involves setting provider impl to kprobes, and resetting event
id for any existing fprobes (since it will be set to the BTF id of the
function).

There is no risk in having multiple probes for the same function since
a kprobe for a "."-suffixed function will not be added if the
unsuffixed probe already exists (in practice, such a case is not seen
but is not impossible).

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 libdtrace/dt_prov_fbt.c | 66 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
index 06ea78ca..259cf674 100644
--- a/libdtrace/dt_prov_fbt.c
+++ b/libdtrace/dt_prov_fbt.c
@@ -76,7 +76,7 @@ static int populate(dtrace_hdl_t *dtp)
 
 	default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
 
-	default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
+	default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, 0);
 	if (default_prv == NULL)
 		return -1;			/* errno already set */
 	if (default_impl == &dt_fbt_kprobe)
@@ -148,7 +148,7 @@ static int populate(dtrace_hdl_t *dtp)
 			if (!kprobe_prv) {
 				kprobe_prv = dt_provider_create(dtp, prvname,
 								impl, &pattr,
-								NULL);
+								0);
 				if (kprobe_prv == NULL)
 					return -1;
 			}
@@ -187,6 +187,56 @@ static int populate(dtrace_hdl_t *dtp)
 	return n;
 }
 
+static int
+fbt_unset_event_id(dtrace_hdl_t *dtp, dt_probe_t *prp, dt_provider_t *prv)
+{
+	if (prp->prov == prv)
+		dt_tp_set_event_id(prp, 0);
+	return 0;
+}
+
+/* Ensure we do not have a mix of kprobe and fprobe implementations for
+ * _actually used_ probes; fall back to using kprobe implementation for
+ * fprobe provider if so.
+ */
+void fbt_check_impl(struct dtrace_hdl *dtp, dt_provider_t *fbt_prv)
+{
+	dt_htab_next_t          *it = NULL;
+	dt_provider_t		*pvp;
+	dtrace_probedesc_t	fbt_desc = {	.prv = "fbt",
+						.mod = "",
+						.fun = "",
+						.prb = ""
+					   };
+
+
+	/* count number of used probes for provider + impl */
+	fbt_prv->prv_data++;
+
+	/* check other provider + impl (if any) */
+	while ((pvp = dt_htab_next(dtp->dt_provs, &it)) != NULL) {
+		if (pvp == fbt_prv || strcmp(pvp->desc.dtvd_name, "fbt") != 0)
+			continue;
+		if (pvp->prv_data == 0)
+			return;
+		/* already done? */
+		if (pvp->impl == fbt_prv->impl)
+			return;
+		/* multiple fprobe, kprobe probes; revert to kprobes */
+		if (fbt_prv->impl != &dt_fbt_kprobe)
+			pvp = fbt_prv;
+		break;
+	}
+	if (!pvp)
+		return;
+	dt_dprintf("reverting to kprobe impl for %s due to fprobe/kprobe mix\n",
+		   pvp->desc.dtvd_name);
+	pvp->impl = &dt_fbt_kprobe;
+	dt_probe_iter(dtp, &fbt_desc, (dt_probe_f *)fbt_unset_event_id, NULL,
+		      pvp);
+
+}
+
 /*******************************\
  * FPROBE-based implementation *
 \*******************************/
@@ -296,6 +346,8 @@ static int fprobe_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
 		goto done;
 	}
 
+	fbt_check_impl(dtp, prp->prov);
+
 	argc = dt_btf_func_argc(dtp, dmp->dm_btf, btf_id);
 	if (argc == 0)
 		goto done;
@@ -453,6 +505,15 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
 	return dt_tp_probe_attach(dtp, prp, bpf_fd);
 }
 
+static int kprobe_probe_info(dtrace_hdl_t *dtp, const dt_probe_t *prp,
+			     int *argcp, dt_argdesc_t **argvp)
+{
+	fbt_check_impl(dtp, prp->prov);
+	*argcp = 0;
+	*argvp = NULL;
+	return 0;
+}
+
 /*
  * Try to clean up system resources that may have been allocated for this
  * probe.
@@ -502,6 +563,7 @@ dt_provimpl_t	dt_fbt_kprobe = {
 	.load_prog	= &dt_bpf_prog_load,
 	.trampoline	= &kprobe_trampoline,
 	.attach		= &kprobe_attach,
+	.probe_info	= &kprobe_probe_info,
 	.detach		= &kprobe_detach,
 	.probe_destroy	= &dt_tp_probe_destroy,
 };
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16
  2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
                   ` (2 preceding siblings ...)
  2024-10-16 15:54 ` [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes Alan Maguire
@ 2024-10-16 15:54 ` Alan Maguire
  2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
  4 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-10-16 15:54 UTC (permalink / raw)
  To: dtrace; +Cc: dtrace-devel, Alan Maguire

The solution for sched:::on-cpu firing (probing
__perf_event_task_sched_in) only works on 5.16 and later as
the relevant function is not in available_filter_functions
in earlier kernels.

Instead use the fbt::finish_task_switch:return probe as this
will cover .isra.0-suffixed optimizations if they are applied
to finish_task_switch().

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 libdtrace/dt_prov_sched.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c
index 4fbd24ab..796f39af 100644
--- a/libdtrace/dt_prov_sched.c
+++ b/libdtrace/dt_prov_sched.c
@@ -28,7 +28,7 @@ static probe_dep_t	probes[] = {
 	{ "off-cpu",
 	  DTRACE_PROBESPEC_NAME,	"rawtp:sched::sched_switch" },
 	{ "on-cpu",
-	  DTRACE_PROBESPEC_NAME,	"fbt::__perf_event_task_sched_in:entry" },
+	  DTRACE_PROBESPEC_NAME,	"fbt:vmlinux:finish_task_switch:return" },
 	{ "surrender",
 	  DTRACE_PROBESPEC_NAME,	"fbt::do_sched_yield:entry" },
 	{ "tick",
@@ -157,26 +157,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
  */
 static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
 {
-	struct perf_event_attr attr = {};
-	int swfd;
-
-	if (strcmp(prp->desc->prb, "on-cpu") != 0)
-		return dt_sdt_enable(dtp, prp);
-
-	memset(&attr, 0, sizeof(attr));
-	attr.size = sizeof(attr);
-	attr.type = PERF_TYPE_SOFTWARE;
-	attr.config = PERF_COUNT_SW_CONTEXT_SWITCHES;
-	attr.freq = 1;
-	attr.sample_freq = 1000;
-	attr.context_switch = 1;
-
-	swfd = dt_perf_event_open(&attr, -1, 0, -1, 0);
-	if (swfd < 0)
-		dt_dprintf("perf event open failed for context_switch: %d\n", errno);
-	else
-		prp->prv_data = (void *)(long)swfd;
-	dt_sdt_enable(dtp, prp);
+	return dt_sdt_enable(dtp, prp);
 }
 
 static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
  2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
                   ` (3 preceding siblings ...)
  2024-10-16 15:54 ` [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
@ 2024-10-18 15:41 ` Kris Van Hees
  2024-10-18 16:15   ` Alan Maguire
  4 siblings, 1 reply; 12+ messages in thread
From: Kris Van Hees @ 2024-10-18 15:41 UTC (permalink / raw)
  To: Alan Maguire; +Cc: dtrace, dtrace-devel

Before I review further, I have a question...  Do we need to consider the
<func>.<suffix> symbols as separate probes from <func> (at a user level),
or can we group them together?  I am hoping that grouping them together
would be the preference if only because the suffix versions result from
compiler optimizations and it is therefore likely that a user would want to
be able to probe <func> and expect it to work even if the compiler decided
to do something under the covers that results in a suffix-variant to also
be created.

Kris

On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
> This series is focused on solving a few issues with fprobe-based
> attachment which prevent us being able to attach to functions
> like finish_task_switch.isra.0.  Such functions are present in
> available_filter_functions, and represent real function boundaries
> (since they correspond to the mcount function boundary sites)
> but because they either lack BTF representations, or because
> those BTF representations are named without the .isra suffix, attach
> via fentry/fexit is currently impossible.  Falling back to the
> kprobe implementation is the best solution here.
> 
> However, for stability, it is best to represent the probes for
> these functions without the ".isra" suffix, so we need to store
> the full function name (with suffix) in the tracepoint data when
> the probe is populated.  Patch 1 supports this.
> 
> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
> functions.  An additional fbt provider with kprobe implementation is
> created to support this (so as not to  disturb existing fprobes for other
> functions).  At kprobe attach we use the full function name stored
> as tp event data to carry out attach.
> 
> Next we need to ensure we do not end up with a mix of kprobes and
> fprobes.  Ideally we would do this in a more fine-grained manner, but
> for now just ensure we do not have an fprobe/kprobe mix program-wide.
> When fprobes are active, we will only use kprobes for "."-suffixed
> functions that are used, so in practice such mixes will be relatively
> rare.
> 
> As Kris pointed out [1] at compilation time, trampolines have not yet been
> set up, so we can replace the provider underlying fbt at that time.
> The probe_info() callbacks are used to check for a mix of kprobe and
> fprobe implementations; we check for multiple fbt providers which
> have a count of used probes > 0; if this occurs, switch the fbt provider
> using fprobe to use the kprobe implementation and reset any event
> ids associated with fprobes from the BTF id used in fprobes to 0.
> 
> Finally we can then use fbt::finish_task_switch:return as the
> dependent probe for sched:::on-cpu, as we now can probe it even
> if it becomes finish_task_switch.isra.0.
> 
> So to recap:
> 
> Patch 1 supports storing/freeing event data with tp events.
> Patch 2 allows tracing of "."-suffixed functions like
> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
> Patch 4 then uses the fact we can now trace "."-suffixed functions
> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
> as the kprobe dependent event for sched:::on-cpu  . This function is
> often optimized to become finish_task_switch.isra.0.
> 
> Tested on upstream, 5.15 and 5.4 kernels.
> 
> Changes since v2:
> 
> - probe function name exposed drops the suffix (Kris, patches 1, 2)
> - restrict kprobe use to "."-suffixed functions; this makes their use
>   less likely in the fprobe environment.  Do this instead of creating
>   a "fake" fprobe probe with kprobe backing (Kris, patch 2)
> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
>   wildcard needed now that probe function name is unsuffixed.
> 
> Changes since v1:
> 
> - simplified approach by just swapping out probe impl when BTF lookup fails
>   (Kris, patch 2)
> 
> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
> 
> Alan Maguire (4):
>   dt_provider_tp: add optional event data, freed on tp free
>   fbt: support "."-suffixed functions for kprobes
>   fbt: avoid mix of kprobe, fprobe implementations for used probes
>   sched: fix on-cpu firing for kernels < 5.16
> 
>  libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
>  libdtrace/dt_prov_sched.c  |  23 +------
>  libdtrace/dt_provider_tp.c |  27 ++++++++
>  libdtrace/dt_provider_tp.h |   8 +++
>  4 files changed, 158 insertions(+), 38 deletions(-)
> 
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
  2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
@ 2024-10-18 16:15   ` Alan Maguire
  2024-10-18 18:38     ` Kris Van Hees
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Maguire @ 2024-10-18 16:15 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: dtrace, dtrace-devel

On 18/10/2024 16:41, Kris Van Hees wrote:
> Before I review further, I have a question...  Do we need to consider the
> <func>.<suffix> symbols as separate probes from <func> (at a user level),
> or can we group them together?  I am hoping that grouping them together
> would be the preference if only because the suffix versions result from
> compiler optimizations and it is therefore likely that a user would want to
> be able to probe <func> and expect it to work even if the compiler decided
> to do something under the covers that results in a suffix-variant to also
> be created.
>

Right, that's one of the changes in this patch series versus v2. We
expose the probe as <func>, even if it the underlying function has
become <func>.<suffix>. This provides more stability to the user and we
are guaranteed it still represents a function boundary because of its
presence in available_filter_functions (this implies it has an
associated mcount-tagged fentry site).

Alan

> Kris
> 
> On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
>> This series is focused on solving a few issues with fprobe-based
>> attachment which prevent us being able to attach to functions
>> like finish_task_switch.isra.0.  Such functions are present in
>> available_filter_functions, and represent real function boundaries
>> (since they correspond to the mcount function boundary sites)
>> but because they either lack BTF representations, or because
>> those BTF representations are named without the .isra suffix, attach
>> via fentry/fexit is currently impossible.  Falling back to the
>> kprobe implementation is the best solution here.
>>
>> However, for stability, it is best to represent the probes for
>> these functions without the ".isra" suffix, so we need to store
>> the full function name (with suffix) in the tracepoint data when
>> the probe is populated.  Patch 1 supports this.
>>
>> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
>> functions.  An additional fbt provider with kprobe implementation is
>> created to support this (so as not to  disturb existing fprobes for other
>> functions).  At kprobe attach we use the full function name stored
>> as tp event data to carry out attach.
>>
>> Next we need to ensure we do not end up with a mix of kprobes and
>> fprobes.  Ideally we would do this in a more fine-grained manner, but
>> for now just ensure we do not have an fprobe/kprobe mix program-wide.
>> When fprobes are active, we will only use kprobes for "."-suffixed
>> functions that are used, so in practice such mixes will be relatively
>> rare.
>>
>> As Kris pointed out [1] at compilation time, trampolines have not yet been
>> set up, so we can replace the provider underlying fbt at that time.
>> The probe_info() callbacks are used to check for a mix of kprobe and
>> fprobe implementations; we check for multiple fbt providers which
>> have a count of used probes > 0; if this occurs, switch the fbt provider
>> using fprobe to use the kprobe implementation and reset any event
>> ids associated with fprobes from the BTF id used in fprobes to 0.
>>
>> Finally we can then use fbt::finish_task_switch:return as the
>> dependent probe for sched:::on-cpu, as we now can probe it even
>> if it becomes finish_task_switch.isra.0.
>>
>> So to recap:
>>
>> Patch 1 supports storing/freeing event data with tp events.
>> Patch 2 allows tracing of "."-suffixed functions like
>> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
>> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
>> Patch 4 then uses the fact we can now trace "."-suffixed functions
>> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
>> as the kprobe dependent event for sched:::on-cpu  . This function is
>> often optimized to become finish_task_switch.isra.0.
>>
>> Tested on upstream, 5.15 and 5.4 kernels.
>>
>> Changes since v2:
>>
>> - probe function name exposed drops the suffix (Kris, patches 1, 2)
>> - restrict kprobe use to "."-suffixed functions; this makes their use
>>   less likely in the fprobe environment.  Do this instead of creating
>>   a "fake" fprobe probe with kprobe backing (Kris, patch 2)
>> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
>> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
>>   wildcard needed now that probe function name is unsuffixed.
>>
>> Changes since v1:
>>
>> - simplified approach by just swapping out probe impl when BTF lookup fails
>>   (Kris, patch 2)
>>
>> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
>>
>> Alan Maguire (4):
>>   dt_provider_tp: add optional event data, freed on tp free
>>   fbt: support "."-suffixed functions for kprobes
>>   fbt: avoid mix of kprobe, fprobe implementations for used probes
>>   sched: fix on-cpu firing for kernels < 5.16
>>
>>  libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
>>  libdtrace/dt_prov_sched.c  |  23 +------
>>  libdtrace/dt_provider_tp.c |  27 ++++++++
>>  libdtrace/dt_provider_tp.h |   8 +++
>>  4 files changed, 158 insertions(+), 38 deletions(-)
>>
>> -- 
>> 2.43.5
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix
  2024-10-18 16:15   ` Alan Maguire
@ 2024-10-18 18:38     ` Kris Van Hees
  0 siblings, 0 replies; 12+ messages in thread
From: Kris Van Hees @ 2024-10-18 18:38 UTC (permalink / raw)
  To: Alan Maguire; +Cc: Kris Van Hees, dtrace, dtrace-devel

On Fri, Oct 18, 2024 at 05:15:59PM +0100, Alan Maguire wrote:
> On 18/10/2024 16:41, Kris Van Hees wrote:
> > Before I review further, I have a question...  Do we need to consider the
> > <func>.<suffix> symbols as separate probes from <func> (at a user level),
> > or can we group them together?  I am hoping that grouping them together
> > would be the preference if only because the suffix versions result from
> > compiler optimizations and it is therefore likely that a user would want to
> > be able to probe <func> and expect it to work even if the compiler decided
> > to do something under the covers that results in a suffix-variant to also
> > be created.
> >
> 
> Right, that's one of the changes in this patch series versus v2. We
> expose the probe as <func>, even if it the underlying function has
> become <func>.<suffix>. This provides more stability to the user and we
> are guaranteed it still represents a function boundary because of its
> presence in available_filter_functions (this implies it has an
> associated mcount-tagged fentry site).

Good - then I am on the right track.  Continuing review - focusing mostly on
whether we currently properly handle multiple FBT probes with the same fully
qualified probe name.

> > On Wed, Oct 16, 2024 at 04:54:05PM +0100, Alan Maguire wrote:
> >> This series is focused on solving a few issues with fprobe-based
> >> attachment which prevent us being able to attach to functions
> >> like finish_task_switch.isra.0.  Such functions are present in
> >> available_filter_functions, and represent real function boundaries
> >> (since they correspond to the mcount function boundary sites)
> >> but because they either lack BTF representations, or because
> >> those BTF representations are named without the .isra suffix, attach
> >> via fentry/fexit is currently impossible.  Falling back to the
> >> kprobe implementation is the best solution here.
> >>
> >> However, for stability, it is best to represent the probes for
> >> these functions without the ".isra" suffix, so we need to store
> >> the full function name (with suffix) in the tracepoint data when
> >> the probe is populated.  Patch 1 supports this.
> >>
> >> Patch 2 ensures that we use kprobe implementation for any "."-suffixed
> >> functions.  An additional fbt provider with kprobe implementation is
> >> created to support this (so as not to  disturb existing fprobes for other
> >> functions).  At kprobe attach we use the full function name stored
> >> as tp event data to carry out attach.
> >>
> >> Next we need to ensure we do not end up with a mix of kprobes and
> >> fprobes.  Ideally we would do this in a more fine-grained manner, but
> >> for now just ensure we do not have an fprobe/kprobe mix program-wide.
> >> When fprobes are active, we will only use kprobes for "."-suffixed
> >> functions that are used, so in practice such mixes will be relatively
> >> rare.
> >>
> >> As Kris pointed out [1] at compilation time, trampolines have not yet been
> >> set up, so we can replace the provider underlying fbt at that time.
> >> The probe_info() callbacks are used to check for a mix of kprobe and
> >> fprobe implementations; we check for multiple fbt providers which
> >> have a count of used probes > 0; if this occurs, switch the fbt provider
> >> using fprobe to use the kprobe implementation and reset any event
> >> ids associated with fprobes from the BTF id used in fprobes to 0.
> >>
> >> Finally we can then use fbt::finish_task_switch:return as the
> >> dependent probe for sched:::on-cpu, as we now can probe it even
> >> if it becomes finish_task_switch.isra.0.
> >>
> >> So to recap:
> >>
> >> Patch 1 supports storing/freeing event data with tp events.
> >> Patch 2 allows tracing of "."-suffixed functions like
> >> finish_task_switch.isra.0 via a kprobe-backed fbt implementation.
> >> Patch 3 ensures we do not end up with a kprobe/fprobe mix.
> >> Patch 4 then uses the fact we can now trace "."-suffixed functions
> >> (with kprobe fallback) by using fbt:vmlinux:finish_task_switch:return
> >> as the kprobe dependent event for sched:::on-cpu  . This function is
> >> often optimized to become finish_task_switch.isra.0.
> >>
> >> Tested on upstream, 5.15 and 5.4 kernels.
> >>
> >> Changes since v2:
> >>
> >> - probe function name exposed drops the suffix (Kris, patches 1, 2)
> >> - restrict kprobe use to "."-suffixed functions; this makes their use
> >>   less likely in the fprobe environment.  Do this instead of creating
> >>   a "fake" fprobe probe with kprobe backing (Kris, patch 2)
> >> - modify fallback logic to handle kprobe/fprobe mix (patch 3)
> >> - modify sched:::on-cpu to use fbt::finish_task_switch:return ; no
> >>   wildcard needed now that probe function name is unsuffixed.
> >>
> >> Changes since v1:
> >>
> >> - simplified approach by just swapping out probe impl when BTF lookup fails
> >>   (Kris, patch 2)
> >>
> >> [1] https://lore.kernel.org/dtrace/20241009140236.883884-1-alan.maguire@oracle.com/
> >>
> >> Alan Maguire (4):
> >>   dt_provider_tp: add optional event data, freed on tp free
> >>   fbt: support "."-suffixed functions for kprobes
> >>   fbt: avoid mix of kprobe, fprobe implementations for used probes
> >>   sched: fix on-cpu firing for kernels < 5.16
> >>
> >>  libdtrace/dt_prov_fbt.c    | 138 ++++++++++++++++++++++++++++++++-----
> >>  libdtrace/dt_prov_sched.c  |  23 +------
> >>  libdtrace/dt_provider_tp.c |  27 ++++++++
> >>  libdtrace/dt_provider_tp.h |   8 +++
> >>  4 files changed, 158 insertions(+), 38 deletions(-)
> >>
> >> -- 
> >> 2.43.5
> >>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [DTrace-devel] [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free
  2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
@ 2024-11-14  5:26   ` Kris Van Hees
  2024-11-14 16:45     ` Alan Maguire
  0 siblings, 1 reply; 12+ messages in thread
From: Kris Van Hees @ 2024-11-14  5:26 UTC (permalink / raw)
  To: Alan Maguire; +Cc: dtrace, dtrace-devel

I don't think this is the right approach.  At least, with other code (uprobe
provider) we went the route of having the provider declare its own custom
struct to store provider-specific probe info along with a pointer to the
tp_probe_t (tracepoint probe info).  I think we should do the same here.

I'm preparing a modified patch for this as part of a small series to fix a
few design issues with function naming in dt_provider_tp, to show what I mean.

I'll continue reviewing the rest of this series based on that upcomming change.

E.g. here we can use:
struct fbt_probe {
	char		*tp_name;
	tp_probe_t	*tp;
} fbt_probe_t;

and an fbt_probe_t instance will be passed to dt_probe_insert( and thus will
be available as prp->prv_data.  Whenevef this provider needs to do anything
with an FBT probe's underlying tracepoint, it can be done by calling the
appropriate dt_tp_* function and passing in ((fbt_probe_t *prp->prv_data)->tp.

Anyway, series coming (hopefully) tomorrow.

On Wed, Oct 16, 2024 at 04:54:06PM +0100, Alan Maguire via DTrace-devel wrote:
> It will be used to store suffixed function name for kprobe attach to
> "foo.isra.0"-style functions.  Such functions use the unsuffixed form
> as probe name, so we need to store the full function name to support
> attach.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  libdtrace/dt_provider_tp.c | 27 +++++++++++++++++++++++++++
>  libdtrace/dt_provider_tp.h |  8 ++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/libdtrace/dt_provider_tp.c b/libdtrace/dt_provider_tp.c
> index 50df2328..fc3c216a 100644
> --- a/libdtrace/dt_provider_tp.c
> +++ b/libdtrace/dt_provider_tp.c
> @@ -33,6 +33,7 @@
>  struct tp_probe {
>  	uint32_t	event_id;	/* tracepoint event id or BTF id */
>  	int		event_fd;	/* tracepoint perf event fd */
> +	void		*event_data;	/* optional data */
>  };
>  
>  /*
> @@ -313,6 +314,8 @@ dt_tp_detach(dtrace_hdl_t *dtp, tp_probe_t *tpp)
>  void
>  dt_tp_destroy(dtrace_hdl_t *dtp, tp_probe_t *tpp)
>  {
> +	if (tpp && tpp->event_data)
> +		free(tpp->event_data);
>  	dt_free(dtp, tpp);
>  }
>  
> @@ -334,6 +337,30 @@ dt_tp_probe_insert(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
>  	return dt_probe_insert(dtp, prov, prv, mod, fun, prb, tpp);
>  }
>  
> +dt_probe_t *
> +dt_tp_probe_insert_data(dtrace_hdl_t *dtp, dt_provider_t *prov, const char *prv,
> +			const char *mod, const char *fun, const char *prb,
> +			void *data)
> +{
> +	tp_probe_t	*tpp;
> +
> +	dt_probe_t *prp = dt_tp_probe_insert(dtp, prov, prv, mod, fun, prb);
> +
> +	if (prp) {
> +		tpp = prp->prv_data;
> +		tpp->event_data = data;
> +	}
> +	return prp;
> +}
> +
> +void *
> +dt_tp_get_event_data(const dt_probe_t *prp)
> +{
> +	tp_probe_t	*tpp = prp->prv_data;
> +
> +	return tpp->event_data;
> +}
> +
>  uint32_t
>  dt_tp_get_event_id(const dt_probe_t *prp)
>  {
> diff --git a/libdtrace/dt_provider_tp.h b/libdtrace/dt_provider_tp.h
> index e7d52876..66a9bd80 100644
> --- a/libdtrace/dt_provider_tp.h
> +++ b/libdtrace/dt_provider_tp.h
> @@ -41,6 +41,14 @@ extern struct dt_probe *dt_tp_probe_insert(dtrace_hdl_t *dtp,
>  					   dt_provider_t *prov,
>  					   const char *prv, const char *mod,
>  					   const char *fun, const char *prb);
> +extern struct dt_probe *dt_tp_probe_insert_data(dtrace_hdl_t *dtp,
> +						dt_provider_t *prov,
> +						const char *prv,
> +						const char *mod,
> +						const char *fun,
> +						const char *prb,
> +						void *data);
> +extern void *dt_tp_get_event_data(const struct dt_probe *prp);
>  extern uint32_t dt_tp_get_event_id(const struct dt_probe *prp);
>  extern void dt_tp_set_event_id(const struct dt_probe *prp, uint32_t id);
>  extern int dt_tp_probe_info(dtrace_hdl_t *dtp, FILE *f, int skip,
> -- 
> 2.43.5
> 
> 
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [DTrace-devel] [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free
  2024-11-14  5:26   ` [DTrace-devel] " Kris Van Hees
@ 2024-11-14 16:45     ` Alan Maguire
  0 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-11-14 16:45 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: dtrace, dtrace-devel

On 14/11/2024 05:26, Kris Van Hees wrote:
> I don't think this is the right approach.  At least, with other code (uprobe
> provider) we went the route of having the provider declare its own custom
> struct to store provider-specific probe info along with a pointer to the
> tp_probe_t (tracepoint probe info).  I think we should do the same here.
> 
> I'm preparing a modified patch for this as part of a small series to fix a
> few design issues with function naming in dt_provider_tp, to show what I mean.
> 
> I'll continue reviewing the rest of this series based on that upcomming change.
> 
> E.g. here we can use:
> struct fbt_probe {
> 	char		*tp_name;
> 	tp_probe_t	*tp;
> } fbt_probe_t;
> 
> and an fbt_probe_t instance will be passed to dt_probe_insert( and thus will
> be available as prp->prv_data.  Whenevef this provider needs to do anything
> with an FBT probe's underlying tracepoint, it can be done by calling the
> appropriate dt_tp_* function and passing in ((fbt_probe_t *prp->prv_data)->tp.
> 
> Anyway, series coming (hopefully) tomorrow.
>

Sounds good, thanks!

Alan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes
  2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
@ 2024-11-28  2:22   ` Kris Van Hees
  2024-11-28  9:58     ` Alan Maguire
  0 siblings, 1 reply; 12+ messages in thread
From: Kris Van Hees @ 2024-11-28  2:22 UTC (permalink / raw)
  To: Alan Maguire; +Cc: dtrace, dtrace-devel

I am still workimg through these patches (and writing some additional
support patches).  But one annoying detail has come up...  kprobes do
not provide argument type data whereas we do have that for fprobes.
So, if there is a function (like find_bio_block) that can be probed
with fprobe and there is also a find_bio_block.isra.0 optimized form
that can be probed with probe, they *cannot* be represented by the
same DTrace probe because one has known argument data and the other
does not.  And it does not seem safe to assume that the optimized
form will have the same signature as the regular function.

Even using the dependent probe mechanism won't help here because that
is based on being able to convert the arguments of the OS level probe
intp arguments for the dependent DTrace probe.  But in this case the
OS level probe (kprobe on find_bio_block.isra.0) has no defined
arguments (i.e. they are unknown - lack of prototype data) yet the
dependent DTrace probe expects to be able to populate arguments.

I think that the only way to really get around this (at least until
prototype information is made available for the optimized functions,
if ever) is to have two distinct probes, and requiring the user to
specify them explicitly, e.g.

	fbt::find_bio_block:entry, fbt::find_bio_block.*:entry

Another option would be to introduce a fbt-opt provider so you can
write:

	fbt::find_bio_block:entry, fbt-opt::find_bio_block:entry

or even (although it might conflict with USDT probes):

	fbt*::find_bio_block:entry

and the fbt-opt provider can place kprobes on any find_bio_block.*
symbols and they all would get reported as the same
fbt-opt::find_bio_block:entry probe firing.

Thoughts?


On Wed, Oct 16, 2024 at 04:54:07PM +0100, Alan Maguire wrote:
> gcc adds suffixes when it carries out optimizations, but often
> these leave parameters to functions intact.  Many of these functions
> (like finish_task_switch()) are important for tracing (and they
> are present in available_filter_functions so are traceable) so it is
> valuable to support probing them.  For kprobes, the probe name removes
> the "." suffix, so store it as additional tp event data for kprobe
> attach (attach time is the only time we need the full function name).
> 
> fprobe does not support such functions, so the kprobe impl is always
> used in such cases.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  libdtrace/dt_prov_fbt.c | 76 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 59 insertions(+), 17 deletions(-)
> 
> diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
> index 21f63ddf..06ea78ca 100644
> --- a/libdtrace/dt_prov_fbt.c
> +++ b/libdtrace/dt_prov_fbt.c
> @@ -21,6 +21,7 @@
>  #include <assert.h>
>  #include <errno.h>
>  #include <fcntl.h>
> +#include <port.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -63,8 +64,8 @@ static const dtrace_pattr_t	pattr = {
>   */
>  static int populate(dtrace_hdl_t *dtp)
>  {
> -	dt_provider_t		*prv;
> -	dt_provimpl_t		*impl;
> +	dt_provider_t		*default_prv, *kprobe_prv = NULL;
> +	dt_provimpl_t		*default_impl;
>  	FILE			*f;
>  	char			*buf = NULL;
>  	char			*p;
> @@ -73,17 +74,25 @@ static int populate(dtrace_hdl_t *dtp)
>  	dtrace_syminfo_t	sip;
>  	dtrace_probedesc_t	pd;
>  
> -	impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
> +	default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
>  
> -	prv = dt_provider_create(dtp, prvname, impl, &pattr, NULL);
> -	if (prv == NULL)
> +	default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
> +	if (default_prv == NULL)
>  		return -1;			/* errno already set */
> +	if (default_impl == &dt_fbt_kprobe)
> +		kprobe_prv = default_prv;
>  
>  	f = fopen(PROBE_LIST, "r");
>  	if (f == NULL)
>  		return 0;
>  
>  	while (getline(&buf, &n, f) >= 0) {
> +		dt_provimpl_t	*impl = default_impl;
> +		dt_provider_t	*prv = default_prv;
> +		char		fun[DTRACE_FUNCNAMELEN] = {};
> +		void		*entry_data = NULL;
> +		void		*return_data = NULL;
> +
>  		/*
>  		 * Here buf is either "funcname\n" or "funcname [modname]\n".
>  		 * The last line may not have a linefeed.
> @@ -106,10 +115,6 @@ static int populate(dtrace_hdl_t *dtp)
>  				p++;
>  		}
>  
> -		/* Weed out synthetic symbol names (that are invalid). */
> -		if (strchr(buf, '.') != NULL)
> -			continue;
> -
>  #define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
>  		/* Weed out __ftrace_invalid_address___* entries. */
>  		if (strstarts(buf, "__ftrace_invalid_address__") ||
> @@ -129,6 +134,31 @@ static int populate(dtrace_hdl_t *dtp)
>  		} else
>  			mod = p;
>  
> +		/* '.'-suffixed functions (e.g. foo.isra.0) must always be
> +		 * kprobe-backed as fprobe does not support them.  They
> +		 * are represented with their unsuffixed names however, so
> +		 * the full name including suffix is stored as data associated
> +		 * with the tp event.
> +		 */
> +		strlcpy(fun, buf, sizeof(fun));
> +		if (strchr(buf, '.') != NULL) {
> +			char *suffix;
> +
> +			impl = &dt_fbt_kprobe;
> +			if (!kprobe_prv) {
> +				kprobe_prv = dt_provider_create(dtp, prvname,
> +								impl, &pattr,
> +								NULL);
> +				if (kprobe_prv == NULL)
> +					return -1;
> +			}
> +			prv = kprobe_prv;
> +			suffix = strchr(fun, '.');
> +			*suffix = '\0';
> +			entry_data = strdup(buf);
> +			return_data = strdup(buf);
> +		}
> +
>  		/*
>  		 * Due to the lack of module names in
>  		 * TRACEFS/available_filter_functions, there are some duplicate
> @@ -138,14 +168,16 @@ static int populate(dtrace_hdl_t *dtp)
>  		pd.id = DTRACE_IDNONE;
>  		pd.prv = prvname;
>  		pd.mod = mod;
> -		pd.fun = buf;
> +		pd.fun = fun;
>  		pd.prb = "entry";
>  		if (dt_probe_lookup(dtp, &pd) != NULL)
>  			continue;
>  
> -		if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "entry"))
> +		if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
> +					    "entry", entry_data))
>  			n++;
> -		if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "return"))
> +		if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
> +					    "return", return_data))
>  			n++;
>  	}
>  
> @@ -363,10 +395,11 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>  {
>  	if (!dt_tp_probe_has_info(prp)) {
> -		char	*fn;
> -		FILE	*f;
> -		size_t	len;
> -		int	fd, rc = -1;
> +		char		*fn;
> +		const char	*fun;
> +		FILE		*f;
> +		size_t		len;
> +		int		fd, rc = -1;
>  
>  		/*
>  		 * Register the kprobe with the tracing subsystem.  This will
> @@ -376,9 +409,18 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>  		if (fd == -1)
>  			return -ENOENT;
>  
> +		/* If actual function name contained a '.', it is stored in
> +		 * tp data; use non-suffix name for event as event names
> +		 * cannot contain a '.'.  User-visible probe name does
> +		 * not contain a '.'.
> +		 */
> +		fun = dt_tp_get_event_data(prp);
> +		if (!fun)
> +			fun = prp->desc->fun;
> +
>  		rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
>  			     prp->desc->prb[0] == 'e' ? 'p' : 'r',
> -			     FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
> +			     FBT_GROUP_DATA, prp->desc->fun, fun);
>  		close(fd);
>  		if (rc == -1)
>  			return -ENOENT;
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes
  2024-11-28  2:22   ` Kris Van Hees
@ 2024-11-28  9:58     ` Alan Maguire
  0 siblings, 0 replies; 12+ messages in thread
From: Alan Maguire @ 2024-11-28  9:58 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: dtrace, dtrace-devel

On 28/11/2024 02:22, Kris Van Hees wrote:
> I am still workimg through these patches (and writing some additional
> support patches).  But one annoying detail has come up...  kprobes do
> not provide argument type data whereas we do have that for fprobes.
> So, if there is a function (like find_bio_block) that can be probed
> with fprobe and there is also a find_bio_block.isra.0 optimized form
> that can be probed with probe, they *cannot* be represented by the
> same DTrace probe because one has known argument data and the other
> does not.  And it does not seem safe to assume that the optimized
> form will have the same signature as the regular function.
> 
> Even using the dependent probe mechanism won't help here because that
> is based on being able to convert the arguments of the OS level probe
> intp arguments for the dependent DTrace probe.  But in this case the
> OS level probe (kprobe on find_bio_block.isra.0) has no defined
> arguments (i.e. they are unknown - lack of prototype data) yet the
> dependent DTrace probe expects to be able to populate arguments.
> 
> I think that the only way to really get around this (at least until
> prototype information is made available for the optimized functions,
> if ever) is to have two distinct probes, and requiring the user to
> specify them explicitly, e.g.
> 
> 	fbt::find_bio_block:entry, fbt::find_bio_block.*:entry
> 
> Another option would be to introduce a fbt-opt provider so you can
> write:
> 
> 	fbt::find_bio_block:entry, fbt-opt::find_bio_block:entry
> 
> or even (although it might conflict with USDT probes):
> 
> 	fbt*::find_bio_block:entry
> 
> and the fbt-opt provider can place kprobes on any find_bio_block.*
> symbols and they all would get reported as the same
> fbt-opt::find_bio_block:entry probe firing.
> 
> Thoughts?
>

I'm sort of wondering if we go even further and introduce a separate
kprobe provider. As well as not providing argument types, kprobes also
have a feature fentry does not - we can probe at function+offset. So
conceptually they are a separate thing already really, and though we
fall back to kprobes to provide fbt support when fprobe isn't available,
it might be worth having an explicitly kprobe-based provider since we
can offer different functionality with it. If we had a separate provider
(which still hid the "."-suffixes as these are unstable) we could
support the sched tracing case stably (using
kprobe::finish_task_switch:return) while not getting in the way of
fprobe-based fbt, _and_ covering tracing for those cases that
fprobe-based fbt cannot reach.

With this approach nothing needs to change from the fentry-based fprobe
side; no program-dependent logic handling cases where an unsupported by
fprobe fbt probe is used etc.

> 
> On Wed, Oct 16, 2024 at 04:54:07PM +0100, Alan Maguire wrote:
>> gcc adds suffixes when it carries out optimizations, but often
>> these leave parameters to functions intact.  Many of these functions
>> (like finish_task_switch()) are important for tracing (and they
>> are present in available_filter_functions so are traceable) so it is
>> valuable to support probing them.  For kprobes, the probe name removes
>> the "." suffix, so store it as additional tp event data for kprobe
>> attach (attach time is the only time we need the full function name).
>>
>> fprobe does not support such functions, so the kprobe impl is always
>> used in such cases.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  libdtrace/dt_prov_fbt.c | 76 ++++++++++++++++++++++++++++++++---------
>>  1 file changed, 59 insertions(+), 17 deletions(-)
>>
>> diff --git a/libdtrace/dt_prov_fbt.c b/libdtrace/dt_prov_fbt.c
>> index 21f63ddf..06ea78ca 100644
>> --- a/libdtrace/dt_prov_fbt.c
>> +++ b/libdtrace/dt_prov_fbt.c
>> @@ -21,6 +21,7 @@
>>  #include <assert.h>
>>  #include <errno.h>
>>  #include <fcntl.h>
>> +#include <port.h>
>>  #include <stdio.h>
>>  #include <stdlib.h>
>>  #include <string.h>
>> @@ -63,8 +64,8 @@ static const dtrace_pattr_t	pattr = {
>>   */
>>  static int populate(dtrace_hdl_t *dtp)
>>  {
>> -	dt_provider_t		*prv;
>> -	dt_provimpl_t		*impl;
>> +	dt_provider_t		*default_prv, *kprobe_prv = NULL;
>> +	dt_provimpl_t		*default_impl;
>>  	FILE			*f;
>>  	char			*buf = NULL;
>>  	char			*p;
>> @@ -73,17 +74,25 @@ static int populate(dtrace_hdl_t *dtp)
>>  	dtrace_syminfo_t	sip;
>>  	dtrace_probedesc_t	pd;
>>  
>> -	impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
>> +	default_impl = BPF_HAS(dtp, BPF_FEAT_FENTRY) ? &dt_fbt_fprobe : &dt_fbt_kprobe;
>>  
>> -	prv = dt_provider_create(dtp, prvname, impl, &pattr, NULL);
>> -	if (prv == NULL)
>> +	default_prv = dt_provider_create(dtp, prvname, default_impl, &pattr, NULL);
>> +	if (default_prv == NULL)
>>  		return -1;			/* errno already set */
>> +	if (default_impl == &dt_fbt_kprobe)
>> +		kprobe_prv = default_prv;
>>  
>>  	f = fopen(PROBE_LIST, "r");
>>  	if (f == NULL)
>>  		return 0;
>>  
>>  	while (getline(&buf, &n, f) >= 0) {
>> +		dt_provimpl_t	*impl = default_impl;
>> +		dt_provider_t	*prv = default_prv;
>> +		char		fun[DTRACE_FUNCNAMELEN] = {};
>> +		void		*entry_data = NULL;
>> +		void		*return_data = NULL;
>> +
>>  		/*
>>  		 * Here buf is either "funcname\n" or "funcname [modname]\n".
>>  		 * The last line may not have a linefeed.
>> @@ -106,10 +115,6 @@ static int populate(dtrace_hdl_t *dtp)
>>  				p++;
>>  		}
>>  
>> -		/* Weed out synthetic symbol names (that are invalid). */
>> -		if (strchr(buf, '.') != NULL)
>> -			continue;
>> -
>>  #define strstarts(var, x) (strncmp(var, x, strlen (x)) == 0)
>>  		/* Weed out __ftrace_invalid_address___* entries. */
>>  		if (strstarts(buf, "__ftrace_invalid_address__") ||
>> @@ -129,6 +134,31 @@ static int populate(dtrace_hdl_t *dtp)
>>  		} else
>>  			mod = p;
>>  
>> +		/* '.'-suffixed functions (e.g. foo.isra.0) must always be
>> +		 * kprobe-backed as fprobe does not support them.  They
>> +		 * are represented with their unsuffixed names however, so
>> +		 * the full name including suffix is stored as data associated
>> +		 * with the tp event.
>> +		 */
>> +		strlcpy(fun, buf, sizeof(fun));
>> +		if (strchr(buf, '.') != NULL) {
>> +			char *suffix;
>> +
>> +			impl = &dt_fbt_kprobe;
>> +			if (!kprobe_prv) {
>> +				kprobe_prv = dt_provider_create(dtp, prvname,
>> +								impl, &pattr,
>> +								NULL);
>> +				if (kprobe_prv == NULL)
>> +					return -1;
>> +			}
>> +			prv = kprobe_prv;
>> +			suffix = strchr(fun, '.');
>> +			*suffix = '\0';
>> +			entry_data = strdup(buf);
>> +			return_data = strdup(buf);
>> +		}
>> +
>>  		/*
>>  		 * Due to the lack of module names in
>>  		 * TRACEFS/available_filter_functions, there are some duplicate
>> @@ -138,14 +168,16 @@ static int populate(dtrace_hdl_t *dtp)
>>  		pd.id = DTRACE_IDNONE;
>>  		pd.prv = prvname;
>>  		pd.mod = mod;
>> -		pd.fun = buf;
>> +		pd.fun = fun;
>>  		pd.prb = "entry";
>>  		if (dt_probe_lookup(dtp, &pd) != NULL)
>>  			continue;
>>  
>> -		if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "entry"))
>> +		if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
>> +					    "entry", entry_data))
>>  			n++;
>> -		if (dt_tp_probe_insert(dtp, prv, prvname, mod, buf, "return"))
>> +		if (dt_tp_probe_insert_data(dtp, prv, prvname, mod, fun,
>> +					    "return", return_data))
>>  			n++;
>>  	}
>>  
>> @@ -363,10 +395,11 @@ static int kprobe_trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>>  static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>>  {
>>  	if (!dt_tp_probe_has_info(prp)) {
>> -		char	*fn;
>> -		FILE	*f;
>> -		size_t	len;
>> -		int	fd, rc = -1;
>> +		char		*fn;
>> +		const char	*fun;
>> +		FILE		*f;
>> +		size_t		len;
>> +		int		fd, rc = -1;
>>  
>>  		/*
>>  		 * Register the kprobe with the tracing subsystem.  This will
>> @@ -376,9 +409,18 @@ static int kprobe_attach(dtrace_hdl_t *dtp, const dt_probe_t *prp, int bpf_fd)
>>  		if (fd == -1)
>>  			return -ENOENT;
>>  
>> +		/* If actual function name contained a '.', it is stored in
>> +		 * tp data; use non-suffix name for event as event names
>> +		 * cannot contain a '.'.  User-visible probe name does
>> +		 * not contain a '.'.
>> +		 */
>> +		fun = dt_tp_get_event_data(prp);
>> +		if (!fun)
>> +			fun = prp->desc->fun;
>> +
>>  		rc = dprintf(fd, "%c:" FBT_GROUP_FMT "/%s %s\n",
>>  			     prp->desc->prb[0] == 'e' ? 'p' : 'r',
>> -			     FBT_GROUP_DATA, prp->desc->fun, prp->desc->fun);
>> +			     FBT_GROUP_DATA, prp->desc->fun, fun);
>>  		close(fd);
>>  		if (rc == -1)
>>  			return -ENOENT;
>> -- 
>> 2.43.5
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-11-28  9:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 15:54 [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 1/4] dt_provider_tp: add optional event data, freed on tp free Alan Maguire
2024-11-14  5:26   ` [DTrace-devel] " Kris Van Hees
2024-11-14 16:45     ` Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 2/4] fbt: support "."-suffixed functions for kprobes Alan Maguire
2024-11-28  2:22   ` Kris Van Hees
2024-11-28  9:58     ` Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 3/4] fbt: avoid mix of kprobe, fprobe implementations for used probes Alan Maguire
2024-10-16 15:54 ` [PATCH v3 dtrace 4/4] sched: fix on-cpu firing for kernels < 5.16 Alan Maguire
2024-10-18 15:41 ` [PATCH v3 dtrace 0/4] kprobe support for .isra.0, sched fix Kris Van Hees
2024-10-18 16:15   ` Alan Maguire
2024-10-18 18:38     ` Kris Van Hees

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox