Linux DTrace development list
 help / color / mirror / Atom feed
* [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
@ 2024-10-09  5:45 eugene.loh
  2024-10-09  5:45 ` [PATCH v2 1/6] Deferred attach of underlying USDT probes eugene.loh
  2024-10-10  4:55 ` [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps Kris Van Hees
  0 siblings, 2 replies; 9+ messages in thread
From: eugene.loh @ 2024-10-09  5:45 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

As USDT processes come and go, the overlying probes for an underlying
probe will change.  Hence, we will move to a scheme in which an
underlying probe will walk all possible clauses it could call,
deciding at run time (using a bit mask for the overlying USDT probe)
which of the clauses to call.

In this patch, we create and update the BPF "usdt_prids" map.  This
is a hash map, where:

  *)  the key (size: dtp->dt_usdt_pridsmap_ksz) comprises
        - the PID of the firing process
        - the PRID of the underlying probe

  *)  the value (size: dtp->dt_usdt_pridsmap_vsz) comprises
        - the PRID over the overlying USDT probe
        - a bit mask indicating which clauses should be called

As USDT processes start up, we also add new overlying USDT probes,
whose name elements must be retrieved by prid for the built-in
variables probeprov, probemod, probefunc, and probename.  While
this is currently done with the BPF "probes" and "strtab" maps,
the monolithic strtab map cannot safely be updated by the consumer
while the kernel might be reading the map.  Therefore, we introduce
a new variable, dtp->dt_ninitprobes, that records the number of
initial probes, those created when a dtrace starts up.  For a
prid < dtp->ninitprobes, the existing scheme is used to retrieve
probe name elements.  For newer prids, necessarily USDT probes,
we use new a new BPF "usdt_names" map.

The underlying-probes "update" function is called sporadically.
The focus of this patch is the function.  When it is called can
be tuned in future patches.

The number of new USDT probes that can be accommodated is set by
the new "nusdtprobes" option.

The size of the bit mask -- that is, the greatest number of clauses
an underlying probe might call.  This is relatively easy to extend;
nevertheless, that work is left for a future patch.

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
 bpf/get_bvar.c                   |  72 ++++++----
 include/dtrace/options_defines.h |   4 +-
 libdtrace/dt_bpf.c               |  39 ++++++
 libdtrace/dt_bpf.h               |   1 +
 libdtrace/dt_cc.c                |   3 +
 libdtrace/dt_dlibs.c             |   3 +
 libdtrace/dt_impl.h              |   7 +-
 libdtrace/dt_open.c              |   9 ++
 libdtrace/dt_options.c           |   1 +
 libdtrace/dt_prov_uprobe.c       | 222 +++++++++++++++++++++++++++++++
 libdtrace/dt_work.c              |   3 +
 11 files changed, 337 insertions(+), 27 deletions(-)

diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c
index 37f29a591..bc184c50f 100644
--- a/bpf/get_bvar.c
+++ b/bpf/get_bvar.c
@@ -21,6 +21,7 @@
 
 extern struct bpf_map_def cpuinfo;
 extern struct bpf_map_def probes;
+extern struct bpf_map_def usdt_names;
 extern struct bpf_map_def state;
 
 extern uint64_t PC;
@@ -29,6 +30,7 @@ extern uint64_t STKSIZ;
 extern uint64_t BOOTTM;
 extern uint64_t STACK_OFF;
 extern uint64_t STACK_SKIP;
+extern uint64_t NINITPROBES;
 
 #define error(dctx, fault, illval) \
 	({ \
@@ -122,32 +124,52 @@ noinline uint64_t dt_get_bvar(const dt_dctx_t *dctx, uint32_t id, uint32_t idx)
 	case DIF_VAR_PROBEMOD:
 	case DIF_VAR_PROBEFUNC:
 	case DIF_VAR_PROBENAME: {
-		uint32_t	key;
-		dt_bpf_probe_t	*pinfo;
-		uint64_t	off;
-
-		key = mst->prid;
-		pinfo = bpf_map_lookup_elem(&probes, &key);
-		if (pinfo == NULL)
-			return (uint64_t)dctx->strtab;
-
-		switch (id) {
-		case DIF_VAR_PROBEPROV:
-			off = pinfo->prv;
-			break;
-		case DIF_VAR_PROBEMOD:
-			off = pinfo->mod;
-			break;
-		case DIF_VAR_PROBEFUNC:
-			off = pinfo->fun;
-			break;
-		case DIF_VAR_PROBENAME:
-			off = pinfo->prb;
+		uint32_t	key = mst->prid;
+
+		if (key < ((uint64_t)&NINITPROBES)) {
+			dt_bpf_probe_t	*pinfo;
+			uint64_t	off;
+
+			pinfo = bpf_map_lookup_elem(&probes, &key);
+			if (pinfo == NULL)
+				return (uint64_t)dctx->strtab;
+
+			switch (id) {
+			case DIF_VAR_PROBEPROV:
+				off = pinfo->prv;
+				break;
+			case DIF_VAR_PROBEMOD:
+				off = pinfo->mod;
+				break;
+			case DIF_VAR_PROBEFUNC:
+				off = pinfo->fun;
+				break;
+			case DIF_VAR_PROBENAME:
+				off = pinfo->prb;
+			}
+			if (off > (uint64_t)&STBSZ)
+				return (uint64_t)dctx->strtab;
+
+			return (uint64_t)(dctx->strtab + off);
+		} else {
+			char *off;
+
+			off = bpf_map_lookup_elem(&usdt_names, &key);
+			if (off == NULL)
+				return (uint64_t)dctx->strtab;
+
+			switch (id) {
+			case DIF_VAR_PROBENAME:
+				off += DTRACE_FUNCNAMELEN;
+			case DIF_VAR_PROBEFUNC:
+				off += DTRACE_MODNAMELEN;
+			case DIF_VAR_PROBEMOD:
+				off += DTRACE_PROVNAMELEN;
+			case DIF_VAR_PROBEPROV:
+			}
+
+			return (uint64_t)off;
 		}
-		if (off > (uint64_t)&STBSZ)
-			return (uint64_t)dctx->strtab;
-
-		return (uint64_t)(dctx->strtab + off);
 	}
 	case DIF_VAR_PID: {
 		uint64_t	val = bpf_get_current_pid_tgid();
diff --git a/include/dtrace/options_defines.h b/include/dtrace/options_defines.h
index 80246be8c..7a49b89f2 100644
--- a/include/dtrace/options_defines.h
+++ b/include/dtrace/options_defines.h
@@ -63,7 +63,9 @@
 #define	DTRACEOPT_SCRATCHSIZE	33	/* max scratch size permitted */
 #define	DTRACEOPT_LOCKMEM	34	/* max locked memory */
 #define	DTRACEOPT_PRINTSIZE	35	/* max # bytes printed by print() action */
-#define	DTRACEOPT_MAX		36	/* number of options */
+#define	DTRACEOPT_NUSDTPROBES	36	/* max number of (added) USDT probes */
+
+#define	DTRACEOPT_MAX		37	/* number of options */
 
 #define	DTRACEOPT_UNSET		(dtrace_optval_t)-2	/* unset option */
 
diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
index ad11d178e..7e4a4461a 100644
--- a/libdtrace/dt_bpf.c
+++ b/libdtrace/dt_bpf.c
@@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
 	return 0;
 }
 
+/*
+ * Create the 'usdt_names' and 'usdt_prids' BPF maps.
+ *
+ * 'usdt_names':  for get_bvar() to look up probe name elements for any
+ *                prid that was created after the initial probes.
+ *
+ * 'usdt_prids':  a global hash map indexed by (pid, underlying probe ID).
+ *                The value is a probe ID for the overlying USDT probe and
+ *                a bit mask indicating which clauses to execute for this pid.
+ *
+ *                For a given (pid, PRID) key, there can be at most one
+ *                overlying USDT probe.
+ */
+static int
+gmap_create_usdt(dtrace_hdl_t *dtp)
+{
+	dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
+	    sizeof(dtrace_id_t),
+	    DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,
+	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
+	if (dtp->dt_usdt_namesmap_fd == -1)
+		return -1;
+
+	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
+	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
+	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
+	if (dtp->dt_usdt_pridsmap_fd == -1)
+		return -1;
+
+	dtp->dt_ninitprobes = dtp->dt_probe_id;
+
+	/* Populate the newly created map.  FIXME:  this is probably not the right place for this. */
+	if (dt_uprobe.update(dtp, NULL) < 0)
+		return -1;
+
+	return 0;
+}
+
 /*
  * Create the 'gvars' BPF map.
  *
@@ -1045,6 +1083,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
 	CREATE_MAP(scratchmem)
 	CREATE_MAP(strtab)
 	CREATE_MAP(probes)
+	CREATE_MAP(usdt)
 	CREATE_MAP(gvars)
 	CREATE_MAP(lvars)
 	CREATE_MAP(dvars)
diff --git a/libdtrace/dt_bpf.h b/libdtrace/dt_bpf.h
index 5716d2320..64f6c00a2 100644
--- a/libdtrace/dt_bpf.h
+++ b/libdtrace/dt_bpf.h
@@ -54,6 +54,7 @@ extern "C" {
 #define DT_CONST_ZERO_OFF		20
 #define DT_CONST_STACK_OFF		21
 #define DT_CONST_STACK_SKIP		22
+#define DT_CONST_NINITPROBES		23
 
 #define DT_BPF_LOG_SIZE_DEFAULT	(UINT32_MAX >> 8)
 #define DT_BPF_LOG_SIZE_SMALL	4096
diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
index 4202771a9..421084381 100644
--- a/libdtrace/dt_cc.c
+++ b/libdtrace/dt_cc.c
@@ -1064,6 +1064,9 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
 				nrp->dofr_data = sizeof(uint64_t)
 				    * dtp->dt_options[DTRACEOPT_MAXFRAMES];
 				continue;
+			case DT_CONST_NINITPROBES:
+				nrp->dofr_data = dtp->dt_ninitprobes;
+				continue;
 			case DT_CONST_BOOTTM:
 				if (boottime == 0 && get_boottime())
 					return -1;
diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
index ba4d4abef..f7b31f7be 100644
--- a/libdtrace/dt_dlibs.c
+++ b/libdtrace/dt_dlibs.c
@@ -71,6 +71,8 @@ static const dt_ident_t		dt_bpf_symbols[] = {
 	DT_BPF_SYMBOL(state, DT_IDENT_PTR),
 	DT_BPF_SYMBOL(strtab, DT_IDENT_PTR),
 	DT_BPF_SYMBOL(tuples, DT_IDENT_PTR),
+	DT_BPF_SYMBOL(usdt_names, DT_IDENT_PTR),
+	DT_BPF_SYMBOL(usdt_prids, DT_IDENT_PTR),
 
 	/* BPF internal identifiers */
 	DT_BPF_SYMBOL_ID(PRID, DT_IDENT_SCALAR, DT_CONST_PRID),
@@ -95,6 +97,7 @@ static const dt_ident_t		dt_bpf_symbols[] = {
 	DT_BPF_SYMBOL_ID(ZERO_OFF, DT_IDENT_SCALAR, DT_CONST_ZERO_OFF),
 	DT_BPF_SYMBOL_ID(STACK_OFF, DT_IDENT_SCALAR, DT_CONST_STACK_OFF),
 	DT_BPF_SYMBOL_ID(STACK_SKIP, DT_IDENT_SCALAR, DT_CONST_STACK_SKIP),
+	DT_BPF_SYMBOL_ID(NINITPROBES, DT_IDENT_SCALAR, DT_CONST_NINITPROBES),
 
 	/* End-of-list marker */
 	{ NULL, }
diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
index 340dc1960..198405db2 100644
--- a/libdtrace/dt_impl.h
+++ b/libdtrace/dt_impl.h
@@ -319,7 +319,8 @@ struct dtrace_hdl {
 	 */
 	struct dt_probe **dt_probes; /* array of probes */
 	size_t dt_probes_sz;	/* size of array of probes */
-	uint32_t dt_probe_id;	/* next available probe id */
+	dtrace_id_t dt_probe_id; /* next available probe id */
+	dtrace_id_t dt_ninitprobes; /* number of initial probes */
 
 	struct dt_probe *dt_error; /* ERROR probe */
 
@@ -389,6 +390,10 @@ struct dtrace_hdl {
 	int dt_aggmap_fd;	/* file descriptor for the 'aggs' BPF map */
 	int dt_genmap_fd;	/* file descriptor for the 'agggen' BPF map */
 	int dt_cpumap_fd;	/* file descriptor for the 'cpuinfo' BPF map */
+	int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
+	int dt_usdt_pridsmap_ksz; /* 'usdt_prids' BPF map key size */
+	int dt_usdt_pridsmap_vsz; /* 'usdt_prids' BPF map value size */
+	int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */
 	dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
 	void *dt_errarg;	/* error handler argument */
 	dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
index 848141ddc..113ffff2b 100644
--- a/libdtrace/dt_open.c
+++ b/libdtrace/dt_open.c
@@ -817,6 +817,15 @@ dt_vopen(int version, int flags, int *errp,
 	dtp->dt_options[DTRACEOPT_SWITCHRATE] = 0;
 	dtp->dt_options[DTRACEOPT_AGGRATE] = 0;
 
+	/*
+	 * Set the default maximum number of (added) USDT probes.
+	 */
+	dtp->dt_options[DTRACEOPT_NUSDTPROBES] = 256;
+
+	/*
+	 * Pre-processor.
+	 */
+
 	dtp->dt_cpp_argv[0] = (char *)strbasename(dtp->dt_cpp_path);
 
 	snprintf(isadef, sizeof(isadef), "-D__SUNW_D_%u",
diff --git a/libdtrace/dt_options.c b/libdtrace/dt_options.c
index ec53358b3..377b396b0 100644
--- a/libdtrace/dt_options.c
+++ b/libdtrace/dt_options.c
@@ -1137,6 +1137,7 @@ static const dt_option_t _dtrace_rtoptions[] = {
 	{ "jstackstrsize", dt_opt_size, DTRACEOPT_JSTACKSTRSIZE },
 	{ "lockmem", dt_opt_lockmem, DTRACEOPT_LOCKMEM },
 	{ "maxframes", dt_opt_runtime, DTRACEOPT_MAXFRAMES },
+	{ "nusdtprobes", dt_opt_runtime, DTRACEOPT_NUSDTPROBES },
 	{ "nspec", dt_opt_runtime, DTRACEOPT_NSPEC },
 	{ "pcapsize", dt_opt_pcapsize, DTRACEOPT_PCAPSIZE },
 	{ "scratchsize", dt_opt_scratchsize, DTRACEOPT_SCRATCHSIZE },
diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
index bb172ace2..b02b84263 100644
--- a/libdtrace/dt_prov_uprobe.c
+++ b/libdtrace/dt_prov_uprobe.c
@@ -37,8 +37,10 @@
 #include "dt_list.h"
 #include "dt_provider_tp.h"
 #include "dt_probe.h"
+#include "dt_program.h"
 #include "dt_pid.h"
 #include "dt_string.h"
+#include "port.h"
 
 /* Provider name for the underlying probes. */
 static const char	prvname[] = "uprobe";
@@ -63,6 +65,15 @@ typedef struct list_probe {
 	dt_probe_t	*probe;
 } list_probe_t;
 
+typedef struct usdt_prids_map_key {
+	pid_t		pid;
+	dtrace_id_t	uprid;
+} usdt_prids_map_key_t;
+typedef struct usdt_prids_map_val {
+	dtrace_id_t	prid;
+	long long	mask;
+} usdt_prids_map_val_t;
+
 static const dtrace_pattr_t	pattr = {
 { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
 { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
@@ -77,6 +88,9 @@ dt_provimpl_t	dt_usdt;
 
 static int populate(dtrace_hdl_t *dtp)
 {
+	dtp->dt_usdt_pridsmap_ksz = sizeof(usdt_prids_map_key_t);
+	dtp->dt_usdt_pridsmap_vsz = sizeof(usdt_prids_map_val_t);
+
 	if (dt_provider_create(dtp, dt_uprobe.name, &dt_uprobe, &pattr,
 			       NULL) == NULL ||
 	    dt_provider_create(dtp, dt_uprobe_is_enabled.name,
@@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
 	free_probe_list(dtp, datap);
 }
 
+/*
+ * Clean up the BPF "usdt_prids" map.
+ */
+static int
+purge_BPFmap(dtrace_hdl_t *dtp)
+{
+	int			fdprids = dtp->dt_usdt_pridsmap_fd;
+	int			fdnames = dtp->dt_usdt_namesmap_fd;
+	usdt_prids_map_key_t	key, nxt;
+	usdt_prids_map_val_t	val;
+
+	/* Initialize key to a pid/uprid that cannot be found. */
+	key.pid = 0;
+	key.uprid = 0;
+
+	/* Loop over all entries. */
+	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
+		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
+
+		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
+			return dt_set_errno(dtp, EDT_BPF);
+
+		/* Check if the process is still running. */
+		if (!Pexists(key.pid)) {
+			dt_bpf_map_delete(fdprids, &key);          // FIXME:  bpf_map_next_key() iteration restarts each time we delete an elem!!!
+			dt_bpf_map_delete(fdnames, &val.prid);
+			continue;
+		}
+
+		/*
+		 * FIXME.  There might be another case, where the process
+		 * is still running, but some of its USDT probes are gone?
+		 * So maybe we have to check for the existence of one of
+		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
+		 *     char *prv = ...pdp->prv minus the numerial part;
+		 *
+		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
+		 *     /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
+		 *     /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
+		 */
+	}
+
+	return 0;
+}
+
+/*
+ * Update the uprobe provider.
+ */
+static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
+{
+	dt_probe_t	*prp;
+	dt_probe_t	*prp_next;
+	int		i, prid = dtp->dt_probe_id;
+	dt_pcb_t	pcb;
+	int		fd = dtp->dt_usdt_namesmap_fd;
+	char		probnam[DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN];
+
+	/* Clear stale pids. */
+	purge_BPFmap(dtp);
+
+	/* Update dt_probes[] and dt_enablings. */
+	/*
+	 * pcb is only used inside of dt_pid_error() to get:
+	 *     pcb->pcb_region
+	 *     pcb->pcb_filetag
+	 *     pcb->pcb_fileptr
+	 * While pcb cannot be NULL, these other things apparently can be.
+	 */
+	memset(&pcb, 0, sizeof(dt_pcb_t));
+	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
+		dtrace_stmtdesc_t *stp;
+
+		stp = dtp->dt_stmts[i];
+		assert(stp != NULL);
+		dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
+	}
+
+	/*
+	 * Add probe name elements for any new USDT probes.
+	 */
+	while (prid < dtp->dt_probe_id) {
+		const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
+
+		dt_probe_enable(dtp, dtp->dt_probes[prid]);
+
+		if (dtp->dt_probes[prid]->prov->impl == &dt_usdt) {
+			snprintf(probnam, DTRACE_PROVNAMELEN, "%s", pdp->prv);
+			snprintf(&probnam[DTRACE_PROVNAMELEN],
+					  DTRACE_MODNAMELEN, "%s", pdp->mod);
+			snprintf(&probnam[DTRACE_PROVNAMELEN +
+					  DTRACE_MODNAMELEN],
+					  DTRACE_FUNCNAMELEN, "%s", pdp->fun);
+			snprintf(&probnam[DTRACE_PROVNAMELEN +
+					  DTRACE_MODNAMELEN +
+					  DTRACE_FUNCNAMELEN],
+					  DTRACE_NAMELEN, "%s", pdp->prb);
+			if (dt_bpf_map_update(fd, &prid, probnam) == -1)
+				assert(0);   // FIXME do something here
+		}
+
+		prid++;
+	}
+
+	/* Review enablings. */
+	for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
+		pid_t		pid;
+		list_probe_t	*pup;
+
+		prp_next = dt_list_next(prp);
+
+		/* Make sure it is an overlying USDT probe. */
+		if (prp->prov->impl != &dt_usdt)
+			continue;
+
+		/* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
+		/*
+		 * Nick writes:
+		 * This is a general problem with running compiler-adjacent things outside
+		 * compile time. I think we should adjust dt_pid_error() so that it works
+		 * with NULL pcb and dpr at once, probably by using the code path for
+		 * pcb != NULL and augmenting it so that it passes in NULL for the region and
+		 * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
+		 * already handle this case.)
+		 */
+		pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
+
+		if (!Pexists(pid)) {
+			/* Remove from enablings. */
+			dt_list_delete(&dtp->dt_enablings, prp);
+
+			/* Make it evident from the probe that it is not in enablings. */
+			((dt_list_t *)prp)->dl_prev = NULL;
+			((dt_list_t *)prp)->dl_next = NULL;
+
+			/* Free up its list of underlying probes. */
+			while ((pup = dt_list_next(prp->prv_data)) != NULL) {
+				dt_list_delete(prp->prv_data, pup);
+				dt_free(dtp, pup);
+			}
+			dt_free(dtp, prp->prv_data);
+			prp->prv_data = NULL;
+
+			continue;
+		}
+
+		for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
+			dt_probe_t		*uprp = pup->probe;
+			long long		mask = 0, bit = 1;
+			usdt_prids_map_key_t	key;
+			usdt_prids_map_val_t	val, oval;
+			dt_uprobe_t		*upp = uprp->prv_data;
+
+			/*
+			 * For is-enabled probes, the bit mask does not matter.
+			 * It is possible that we have this underlying probe due to
+			 * an overlying pid-offset probe and that we will not know
+			 * until later, when some new pid is created, that we also
+			 * have an overlying USDT is-enabled probe, but missing this
+			 * optimization opportunity is okay.
+			 */
+			if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
+				int n;
+
+				for (n = 0; n < dtp->dt_stmt_nextid; n++) {
+					dtrace_stmtdesc_t *stp;
+
+					stp = dtp->dt_stmts[n];
+					assert(stp != NULL);
+
+					if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
+					    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
+					    dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
+					    dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
+						mask |= bit;
+
+					bit <<= 1;
+				}
+			}
+
+			key.pid = pid;
+			key.uprid = uprp->desc->id;
+
+			val.prid = prp->desc->id;
+			val.mask = mask;
+
+			/* linux/bpf.h says error is -1, but it could be -2 */
+			if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
+				/*
+				 * Set the map entry.
+				 */
+				// FIXME Check return value, but how should errors be handled?
+				dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
+			} else if (val.prid != oval.prid || val.mask != oval.mask) {
+				/*
+				 * This can happen when two overlying probes map to the same underlying probe for the same pid.
+				 * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
+				 */
+			} else {
+				/*
+				 * Nothing to do, it already is in the map.
+				 */
+			}
+		}
+	}
+
+	return 0;
+}
 
 /*
  * Look up or create an underlying (real) probe, corresponding directly to a
@@ -782,6 +1003,7 @@ dt_provimpl_t	dt_uprobe = {
 	.probe_info	= &probe_info,
 	.detach		= &detach,
 	.probe_destroy	= &probe_destroy_underlying,
+	.update		= &update_uprobe,
 };
 
 /*
diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
index 11335345a..2e7884cb9 100644
--- a/libdtrace/dt_work.c
+++ b/libdtrace/dt_work.c
@@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
 	dtrace_workstatus_t	rval;
 	int			gen;
 
+	if (dt_uprobe.update(dtp, NULL) < 0)
+		return DTRACE_WORKSTATUS_ERROR;
+
 	switch (dtrace_status(dtp)) {
 	case DTRACE_STATUS_EXITED:
 	case DTRACE_STATUS_STOPPED:
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 1/6] Deferred attach of underlying USDT probes
  2024-10-09  5:45 [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps eugene.loh
@ 2024-10-09  5:45 ` eugene.loh
  2024-10-16 15:58   ` Eugene Loh
  2024-10-10  4:55 ` [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps Kris Van Hees
  1 sibling, 1 reply; 9+ messages in thread
From: eugene.loh @ 2024-10-09  5:45 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

We would like dtrace to trace USDT processes that start
even after the dtrace session has been launched.  In that
case, the underlying probe cannot be attached when dtrace
starts up;  rather, the BPF program must be attached once
the USDT process has been detected.

Therefore:

*)  Make dt_bpf_load_prog() callable outside of dt_bpf.c.

*)  Have update_uprobe() call dt_construct(), dt_link(),
    dt_bpf_load_prog(), and attach() for any new underlying
    probes.

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
 libdtrace/dt_bpf.c                    |   2 +-
 libdtrace/dt_bpf.h                    |   3 +
 libdtrace/dt_prov_uprobe.c            |  65 +++++++++-
 test/unittest/usdt/tst.nusdtprobes.r  |   5 +
 test/unittest/usdt/tst.nusdtprobes.sh | 172 ++++++++++++++++++++++++++
 5 files changed, 244 insertions(+), 3 deletions(-)
 create mode 100644 test/unittest/usdt/tst.nusdtprobes.r
 create mode 100755 test/unittest/usdt/tst.nusdtprobes.sh

diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
index 7e4a4461a..7cdd73552 100644
--- a/libdtrace/dt_bpf.c
+++ b/libdtrace/dt_bpf.c
@@ -1132,7 +1132,7 @@ dt_bpf_reloc_prog(dtrace_hdl_t *dtp, const dtrace_difo_t *dp)
  *
  * Note that DTrace generates BPF programs that are licensed under the GPL.
  */
-static int
+int
 dt_bpf_load_prog(dtrace_hdl_t *dtp, const dt_probe_t *prp,
 		 const dtrace_difo_t *dp, uint_t cflags)
 {
diff --git a/libdtrace/dt_bpf.h b/libdtrace/dt_bpf.h
index 64f6c00a2..6757d4964 100644
--- a/libdtrace/dt_bpf.h
+++ b/libdtrace/dt_bpf.h
@@ -14,6 +14,7 @@
 #include <dtrace/difo.h>
 #include <dt_btf.h>
 #include <dt_impl.h>
+#include <dt_probe.h>
 
 struct dtrace_hdl;
 
@@ -89,6 +90,8 @@ extern int dt_bpf_prog_load(struct dtrace_hdl *, const struct dt_probe *prp,
 			    size_t sz);
 extern int dt_bpf_raw_tracepoint_open(const void *tp, int fd);
 extern int dt_bpf_make_progs(struct dtrace_hdl *, uint_t);
+extern int dt_bpf_load_prog(dtrace_hdl_t *dtp, const dt_probe_t *prp,
+			    const dtrace_difo_t *dp, uint_t cflags);
 extern int dt_bpf_load_progs(struct dtrace_hdl *, uint_t);
 extern void dt_bpf_init(struct dtrace_hdl *dtp);
 
diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
index 582fca6d1..f2ebf6cab 100644
--- a/libdtrace/dt_prov_uprobe.c
+++ b/libdtrace/dt_prov_uprobe.c
@@ -254,9 +254,13 @@ static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
 	dt_probe_t	*prp_next;
 	int		i, prid = dtp->dt_probe_id;
 	dt_pcb_t	pcb;
+	dtrace_optval_t	dest_ok = DTRACEOPT_UNSET;
 	int		fd = dtp->dt_usdt_namesmap_fd;
 	char		probnam[DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN];
 
+	/* Determine whether destructive actions are okay. */
+	dtrace_getopt(dtp, "destructive", &dest_ok);
+
 	/* Clear stale pids. */
 	purge_BPFmap(dtp);
 
@@ -311,6 +315,61 @@ static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
 
 		prp_next = dt_list_next(prp);
 
+		/* Handle underlying probe. */
+		if (prp->prov->impl == &dt_uprobe) {
+			dtrace_difo_t   *dp;
+			int		cflags, fd, rc = -1;
+
+			/*
+			 * Strictly speaking, we want the value passed in to
+			 * dtrace_go().  In practice, its flags pertain to
+			 * compilation and disassembly, which at this stage
+			 * no longer interest us.
+			 */
+			cflags = 0;
+
+			/* Check if the probe is already set up. */
+			if (prp->difo)
+				continue;
+
+			/* Make program. */
+			dp = dt_construct(dtp, prp, cflags, NULL);
+			if (dp == NULL)
+				continue;        // FIXME in dt_bpf_make_progs() this is a fatal error; should we do the same here?
+			prp->difo = dp;
+
+			/* Load program. */
+			if (dt_link(dtp, prp, dp, NULL) == -1)
+				continue;        // FIXME in dt_bpf_load_progs() this is a fatal error; should we do the same here?
+			if (dp->dtdo_flags & DIFOFLG_DESTRUCTIVE &&
+			    dest_ok == DTRACEOPT_UNSET)
+				return dt_set_errno(dtp, EDT_DESTRUCTIVE);
+
+			fd = dt_bpf_load_prog(dtp, prp, dp, cflags);
+			if (fd == -1)
+				continue;        // FIXME in dt_bpf_load_progs() this is a fatal error; should we do the same here?
+
+			if (prp->prov->impl->attach)
+				rc = prp->prov->impl->attach(dtp, prp, fd);
+
+			if (rc == -ENOTSUPP) {
+				char    *s;
+
+				close(fd);
+				if (asprintf(&s, "Failed to enable %s:%s:%s:%s",
+					      prp->desc->prv, prp->desc->mod,
+					      prp->desc->fun, prp->desc->prb) == -1)
+					return dt_set_errno(dtp, EDT_ENABLING_ERR);
+				dt_handle_rawerr(dtp, s);
+				free(s);
+			} else if (rc < 0) {
+				close(fd);
+				return dt_set_errno(dtp, EDT_ENABLING_ERR);
+			}
+
+			continue;
+		}
+
 		/* Make sure it is an overlying USDT probe. */
 		if (prp->prov->impl != &dt_usdt)
 			continue;
@@ -398,9 +457,11 @@ static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
 				dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
 			} else if (val.prid != oval.prid || val.mask != oval.mask) {
 				/*
-				 * This can happen when two overlying probes map to the same underlying probe for the same pid.
-				 * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
+				 * Two different USDT probes should never map to the same
+				 * underlying probe for the same pid.  Nor should the bit
+				 * mask ever change.
 				 */
+				assert(0);
 			} else {
 				/*
 				 * Nothing to do, it already is in the map.
diff --git a/test/unittest/usdt/tst.nusdtprobes.r b/test/unittest/usdt/tst.nusdtprobes.r
new file mode 100644
index 000000000..d894af92e
--- /dev/null
+++ b/test/unittest/usdt/tst.nusdtprobes.r
@@ -0,0 +1,5 @@
+try ""
+try "-xnusdtprobes=40"
+try "-xnusdtprobes=39"
+Files check.txt.sorted and dtrace.out.sorted differ
+success
diff --git a/test/unittest/usdt/tst.nusdtprobes.sh b/test/unittest/usdt/tst.nusdtprobes.sh
new file mode 100755
index 000000000..50f18a6ca
--- /dev/null
+++ b/test/unittest/usdt/tst.nusdtprobes.sh
@@ -0,0 +1,172 @@
+#!/bin/bash
+#
+# Oracle Linux DTrace.
+# Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved.
+# Licensed under the Universal Permissive License v 1.0 as shown at
+# http://oss.oracle.com/licenses/upl.
+#
+# This test verifies the nusdtprobes option.
+# @@timeout: 100
+
+dtrace=$1
+
+# Set up test directory.
+
+DIRNAME=$tmpdir/nusdtprobes.$$.$RANDOM
+mkdir -p $DIRNAME
+cd $DIRNAME
+
+# Make the trigger.
+
+cat << EOF > prov.d
+provider testprov {
+	probe foo0();
+	probe foo1();
+	probe foo2();
+	probe foo3();
+	probe foo4();
+	probe foo5();
+	probe foo6();
+	probe foo7();
+	probe foo8();
+	probe foo9();
+};
+EOF
+
+cat << EOF > main.c
+#include <unistd.h>
+#include "prov.h"
+
+int
+main(int argc, char **argv)
+{
+	while (1) {
+		usleep(1000);
+
+		TESTPROV_FOO0();
+		TESTPROV_FOO1();
+		TESTPROV_FOO2();
+		TESTPROV_FOO3();
+		TESTPROV_FOO4();
+		TESTPROV_FOO5();
+		TESTPROV_FOO6();
+		TESTPROV_FOO7();
+		TESTPROV_FOO8();
+		TESTPROV_FOO9();
+	}
+
+	return 0;
+}
+EOF
+
+# Build the trigger.
+
+$dtrace -h -s prov.d
+if [ $? -ne 0 ]; then
+	echo "failed to generate header file" >&2
+	exit 1
+fi
+cc $test_cppflags -c main.c
+if [ $? -ne 0 ]; then
+	echo "failed to compile test" >&2
+	exit 1
+fi
+$dtrace -G -64 -s prov.d main.o
+if [ $? -ne 0 ]; then
+	echo "failed to create DOF" >&2
+	exit 1
+fi
+cc $test_cppflags -o main main.o prov.o
+if [ $? -ne 0 ]; then
+	echo "failed to link final executable" >&2
+	exit 1
+fi
+
+# Test nusdtprobes settings.
+#
+# We will start teams of processes, each with 4 members, each in turn
+# with 10 USDT probes.  So, regardless of how many teams are run in
+# succession, at any one time DTrace needs room for at least 40 USDT
+# probes.  The default and -xnusdtprobes=40 settings should work, but
+# -xnusdtprobes=39 should not.
+nteams=2
+nmmbrs=4
+
+for nusdt in "" "-xnusdtprobes=40" "-xnusdtprobes=39"; do
+
+	echo try '"'$nusdt'"'
+
+	# Start DTrace.
+
+	rm -f dtrace.out
+	$dtrace $dt_flags $nusdt -Zq -o dtrace.out -n '
+	testprov*:::
+	{
+		@[probeprov, probemod, probefunc, probename] = count();
+	}' &
+	dtpid=$!
+	sleep 2
+	if [[ ! -d /proc/$dtpid ]]; then
+		echo ERROR dtrace died
+		cat dtrace.out
+		exit 1
+	fi
+
+	# Start teams of processes, only one team at a time.
+
+	rm -f check.txt
+	for (( iteam = 0; iteam < $nteams; iteam++ )); do
+		# Start the team, writing out expected output.
+		sleep 2
+		for (( immbr = 0; immbr < $nmmbrs; immbr++ )); do
+			./main &
+			pids[$immbr]=$!
+			for j in `seq 0 9`; do
+				echo testprov${pids[$immbr]} main main foo$j >> check.txt
+			done
+		done
+
+		# Kill the team.
+		sleep 3
+		for (( immbr = 0; immbr < $nmmbrs; immbr++ )); do
+			kill ${pids[$immbr]}
+		done
+	done
+
+	# Kill DTrace.
+
+	kill $dtpid
+	wait
+
+	# Strip the count() value out since we do not know its exact value.
+
+	awk 'NF == 5 { print $1, $2, $3, $4 }' dtrace.out | sort > dtrace.out.sorted
+
+	# Check.
+
+	sort check.txt > check.txt.sorted
+	if [ x$nusdt == x"-xnusdtprobes=39" ]; then
+		# Results should not agree with check.txt.
+		if diff -q check.txt.sorted dtrace.out.sorted; then
+			echo ERROR unexpected agreement
+			cat dtrace.out
+			exit 0
+		fi
+	else
+		# Results should agree with check.txt.
+		if ! diff -q check.txt.sorted dtrace.out.sorted; then
+			echo ERROR output disagrees
+			echo === expected ===
+			cat check.txt.sorted
+			echo === got ===
+			cat dtrace.out.sorted
+			echo === diff ===
+			diff check.txt.sorted dtrace.out.sorted
+			exit 1
+		fi
+	fi
+done
+
+echo success
+
+exit 0
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
  2024-10-09  5:45 [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps eugene.loh
  2024-10-09  5:45 ` [PATCH v2 1/6] Deferred attach of underlying USDT probes eugene.loh
@ 2024-10-10  4:55 ` Kris Van Hees
  2024-10-10 20:58   ` Kris Van Hees
  2024-10-11 22:56   ` Eugene Loh
  1 sibling, 2 replies; 9+ messages in thread
From: Kris Van Hees @ 2024-10-10  4:55 UTC (permalink / raw)
  To: eugene.loh; +Cc: dtrace, dtrace-devel

On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh@oracle.com wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
> 
> As USDT processes come and go, the overlying probes for an underlying
> probe will change.  Hence, we will move to a scheme in which an
> underlying probe will walk all possible clauses it could call,
> deciding at run time (using a bit mask for the overlying USDT probe)
> which of the clauses to call.

I think this could be explained a bit better.  I'd say that it is the "set of
overlying probes for an underlying probe" that will change.  And I would add
that the underlying probe program will walk all possible clauses that any of
its overlying probes might call for each overlying probe, only executing the
clauses that apply to that overlying probe (using a bitmask).

> In this patch, we create and update the BPF "usdt_prids" map.  This
> is a hash map, where:
> 
>   *)  the key (size: dtp->dt_usdt_pridsmap_ksz) comprises
>         - the PID of the firing process
>         - the PRID of the underlying probe
> 
>   *)  the value (size: dtp->dt_usdt_pridsmap_vsz) comprises
>         - the PRID over the overlying USDT probe
>         - a bit mask indicating which clauses should be called

Why are these sizes stored in members of dtrace_hdl_t rather than being
defined as constants?  Since they are constant values, there should be no
reason to store them.  (Commenting on that below as well, for context.)
> 
> As USDT processes start up, we also add new overlying USDT probes,
> whose name elements must be retrieved by prid for the built-in
> variables probeprov, probemod, probefunc, and probename.  While
> this is currently done with the BPF "probes" and "strtab" maps,
> the monolithic strtab map cannot safely be updated by the consumer
> while the kernel might be reading the map.  Therefore, we introduce
> a new variable, dtp->dt_ninitprobes, that records the number of
> initial probes, those created when a dtrace starts up.  For a
> prid < dtp->ninitprobes, the existing scheme is used to retrieve
> probe name elements.  For newer prids, necessarily USDT probes,
> we use new a new BPF "usdt_names" map.

I think this could do with a bit of rewording for clarity.  I am also a bit
wondering about the dt_ninitprobes name.  I think that dt_nprobes would be
sufficient, as the number of probes that DTrace creates.  Any probes added
later are numbered higher than dt_nprobes.  Not an ideal name, but I think
it is sufficient (and pre-tracing is the only time we really care about
counting probes in general - the later ones are counter towards nusdtprobes).

> The underlying-probes "update" function is called sporadically.
> The focus of this patch is the function.  When it is called can
> be tuned in future patches.

This seems to be an old paragraph because the focus of this patch seems to
be the creation (and mgmt) of the two new maps.  Ideally, this would be
multiple patches actually (creating the maps, populating the maps, using the
maps, ...)  More on that function below...

> The number of new USDT probes that can be accommodated is set by
> the new "nusdtprobes" option.
> 
> The size of the bit mask -- that is, the greatest number of clauses
> an underlying probe might call.  This is relatively easy to extend;
> nevertheless, that work is left for a future patch.

The first sentence here seems to have part missing?  I assume you were
going to mention what the size is for now?

> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> ---
>  bpf/get_bvar.c                   |  72 ++++++----
>  include/dtrace/options_defines.h |   4 +-
>  libdtrace/dt_bpf.c               |  39 ++++++
>  libdtrace/dt_bpf.h               |   1 +
>  libdtrace/dt_cc.c                |   3 +
>  libdtrace/dt_dlibs.c             |   3 +
>  libdtrace/dt_impl.h              |   7 +-
>  libdtrace/dt_open.c              |   9 ++
>  libdtrace/dt_options.c           |   1 +
>  libdtrace/dt_prov_uprobe.c       | 222 +++++++++++++++++++++++++++++++
>  libdtrace/dt_work.c              |   3 +
>  11 files changed, 337 insertions(+), 27 deletions(-)
> 
> diff --git a/bpf/get_bvar.c b/bpf/get_bvar.c
> index 37f29a591..bc184c50f 100644
> --- a/bpf/get_bvar.c
> +++ b/bpf/get_bvar.c
> @@ -21,6 +21,7 @@
>  
>  extern struct bpf_map_def cpuinfo;
>  extern struct bpf_map_def probes;
> +extern struct bpf_map_def usdt_names;
>  extern struct bpf_map_def state;

Pedantic but maybe add at the end for alphabetic order (since that is what
was there and there is not really any other ordering used here).

>  extern uint64_t PC;
> @@ -29,6 +30,7 @@ extern uint64_t STKSIZ;
>  extern uint64_t BOOTTM;
>  extern uint64_t STACK_OFF;
>  extern uint64_t STACK_SKIP;
> +extern uint64_t NINITPROBES;

As suggested above, I would name this NPROBES.

>  #define error(dctx, fault, illval) \
>  	({ \
> @@ -122,32 +124,52 @@ noinline uint64_t dt_get_bvar(const dt_dctx_t *dctx, uint32_t id, uint32_t idx)
>  	case DIF_VAR_PROBEMOD:
>  	case DIF_VAR_PROBEFUNC:
>  	case DIF_VAR_PROBENAME: {
> -		uint32_t	key;
> -		dt_bpf_probe_t	*pinfo;
> -		uint64_t	off;
> -
> -		key = mst->prid;
> -		pinfo = bpf_map_lookup_elem(&probes, &key);
> -		if (pinfo == NULL)
> -			return (uint64_t)dctx->strtab;
> -
> -		switch (id) {
> -		case DIF_VAR_PROBEPROV:
> -			off = pinfo->prv;
> -			break;
> -		case DIF_VAR_PROBEMOD:
> -			off = pinfo->mod;
> -			break;
> -		case DIF_VAR_PROBEFUNC:
> -			off = pinfo->fun;
> -			break;
> -		case DIF_VAR_PROBENAME:
> -			off = pinfo->prb;
> +		uint32_t	key = mst->prid;
> +
> +		if (key < ((uint64_t)&NINITPROBES)) {

		if (key < ((uint64_t)&NPROBES)) {

> +			dt_bpf_probe_t	*pinfo;
> +			uint64_t	off;
> +
> +			pinfo = bpf_map_lookup_elem(&probes, &key);
> +			if (pinfo == NULL)
> +				return (uint64_t)dctx->strtab;
> +
> +			switch (id) {
> +			case DIF_VAR_PROBEPROV:
> +				off = pinfo->prv;
> +				break;
> +			case DIF_VAR_PROBEMOD:
> +				off = pinfo->mod;
> +				break;
> +			case DIF_VAR_PROBEFUNC:
> +				off = pinfo->fun;
> +				break;
> +			case DIF_VAR_PROBENAME:
> +				off = pinfo->prb;
> +			}
> +			if (off > (uint64_t)&STBSZ)
> +				return (uint64_t)dctx->strtab;
> +
> +			return (uint64_t)(dctx->strtab + off);
> +		} else {
> +			char *off;

Hm?  I would not expect a variable named off to be a char *.  Especially since
the preceding leg of the conditional also has an off variable and that one is
uint64_t.  Why not name it s, str, or even ptr or something?

> +
> +			off = bpf_map_lookup_elem(&usdt_names, &key);
> +			if (off == NULL)
> +				return (uint64_t)dctx->strtab;
> +
> +			switch (id) {
> +			case DIF_VAR_PROBENAME:
> +				off += DTRACE_FUNCNAMELEN;
> +			case DIF_VAR_PROBEFUNC:
> +				off += DTRACE_MODNAMELEN;
> +			case DIF_VAR_PROBEMOD:
> +				off += DTRACE_PROVNAMELEN;
> +			case DIF_VAR_PROBEPROV:
> +			}
> +
> +			return (uint64_t)off;
>  		}
> -		if (off > (uint64_t)&STBSZ)
> -			return (uint64_t)dctx->strtab;
> -
> -		return (uint64_t)(dctx->strtab + off);
>  	}
>  	case DIF_VAR_PID: {
>  		uint64_t	val = bpf_get_current_pid_tgid();
> diff --git a/include/dtrace/options_defines.h b/include/dtrace/options_defines.h
> index 80246be8c..7a49b89f2 100644
> --- a/include/dtrace/options_defines.h
> +++ b/include/dtrace/options_defines.h
> @@ -63,7 +63,9 @@
>  #define	DTRACEOPT_SCRATCHSIZE	33	/* max scratch size permitted */
>  #define	DTRACEOPT_LOCKMEM	34	/* max locked memory */
>  #define	DTRACEOPT_PRINTSIZE	35	/* max # bytes printed by print() action */
> -#define	DTRACEOPT_MAX		36	/* number of options */
> +#define	DTRACEOPT_NUSDTPROBES	36	/* max number of (added) USDT probes */
> +
> +#define	DTRACEOPT_MAX		37	/* number of options */
>  
>  #define	DTRACEOPT_UNSET		(dtrace_optval_t)-2	/* unset option */
>  
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index ad11d178e..7e4a4461a 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>  	return 0;
>  }
>  
> +/*
> + * Create the 'usdt_names' and 'usdt_prids' BPF maps.
> + *
> + * 'usdt_names':  for get_bvar() to look up probe name elements for any
> + *                prid that was created after the initial probes.

The convention is to mention what type of map this is, what the key is, and
what the value is.

> + *
> + * 'usdt_prids':  a global hash map indexed by (pid, underlying probe ID).
> + *                The value is a probe ID for the overlying USDT probe and
> + *                a bit mask indicating which clauses to execute for this pid.
> + *
> + *                For a given (pid, PRID) key, there can be at most one
> + *                overlying USDT probe.
> + */
> +static int
> +gmap_create_usdt(dtrace_hdl_t *dtp)
> +{
> +	dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
> +	    sizeof(dtrace_id_t),
> +	    DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,

Any reason why you are not using DTRACE_FULLNAMELEN?

> +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);

Probably want to put this in a variable because you will use it again below.

> +	if (dtp->dt_usdt_namesmap_fd == -1)
> +		return -1;
> +
> +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
> +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,

These are constants so there is no need to store them in dtrace_hdl_t and use
them here.  Just define them in dt_bpf_maps.h and use those defs here.

> +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);

empty line perhaps?0~

> +	if (dtp->dt_usdt_pridsmap_fd == -1)
> +		return -1;
> +
> +	dtp->dt_ninitprobes = dtp->dt_probe_id;

	dtp->dt_nprobes = dtp->dt_probe_id;

> +
> +	/* Populate the newly created map.  FIXME:  this is probably not the right place for this. */
> +	if (dt_uprobe.update(dtp, NULL) < 0)
> +		return -1;

Yes and no.  You are using the probe name data for USDT probes that already
exist at this time from the strtab so there is no need to update the usdt_names
map at this point.  The usdt_prids map does need updating, but I don't think it
makes sense to do that here.  During program load would seem a much better
time.  Also, we certainly wouldn't want to use a direct call to a function in
a specific provider.

> +
> +	return 0;
> +}
> +
>  /*
>   * Create the 'gvars' BPF map.
>   *
> @@ -1045,6 +1083,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
>  	CREATE_MAP(scratchmem)
>  	CREATE_MAP(strtab)
>  	CREATE_MAP(probes)
> +	CREATE_MAP(usdt)
>  	CREATE_MAP(gvars)
>  	CREATE_MAP(lvars)
>  	CREATE_MAP(dvars)
> diff --git a/libdtrace/dt_bpf.h b/libdtrace/dt_bpf.h
> index 5716d2320..64f6c00a2 100644
> --- a/libdtrace/dt_bpf.h
> +++ b/libdtrace/dt_bpf.h
> @@ -54,6 +54,7 @@ extern "C" {
>  #define DT_CONST_ZERO_OFF		20
>  #define DT_CONST_STACK_OFF		21
>  #define DT_CONST_STACK_SKIP		22
> +#define DT_CONST_NINITPROBES		23

DT_CONST_NPROBES

>  
>  #define DT_BPF_LOG_SIZE_DEFAULT	(UINT32_MAX >> 8)
>  #define DT_BPF_LOG_SIZE_SMALL	4096
> diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
> index 4202771a9..421084381 100644
> --- a/libdtrace/dt_cc.c
> +++ b/libdtrace/dt_cc.c
> @@ -1064,6 +1064,9 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
>  				nrp->dofr_data = sizeof(uint64_t)
>  				    * dtp->dt_options[DTRACEOPT_MAXFRAMES];
>  				continue;
> +			case DT_CONST_NINITPROBES:
> +				nrp->dofr_data = dtp->dt_ninitprobes;

			case DT_CONST_NPROBES:
				nrp->dofr_data = dtp->dt_nprobes;

> +				continue;
>  			case DT_CONST_BOOTTM:
>  				if (boottime == 0 && get_boottime())
>  					return -1;
> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> index ba4d4abef..f7b31f7be 100644
> --- a/libdtrace/dt_dlibs.c
> +++ b/libdtrace/dt_dlibs.c
> @@ -71,6 +71,8 @@ static const dt_ident_t		dt_bpf_symbols[] = {
>  	DT_BPF_SYMBOL(state, DT_IDENT_PTR),
>  	DT_BPF_SYMBOL(strtab, DT_IDENT_PTR),
>  	DT_BPF_SYMBOL(tuples, DT_IDENT_PTR),
> +	DT_BPF_SYMBOL(usdt_names, DT_IDENT_PTR),
> +	DT_BPF_SYMBOL(usdt_prids, DT_IDENT_PTR),
>  
>  	/* BPF internal identifiers */
>  	DT_BPF_SYMBOL_ID(PRID, DT_IDENT_SCALAR, DT_CONST_PRID),
> @@ -95,6 +97,7 @@ static const dt_ident_t		dt_bpf_symbols[] = {
>  	DT_BPF_SYMBOL_ID(ZERO_OFF, DT_IDENT_SCALAR, DT_CONST_ZERO_OFF),
>  	DT_BPF_SYMBOL_ID(STACK_OFF, DT_IDENT_SCALAR, DT_CONST_STACK_OFF),
>  	DT_BPF_SYMBOL_ID(STACK_SKIP, DT_IDENT_SCALAR, DT_CONST_STACK_SKIP),
> +	DT_BPF_SYMBOL_ID(NINITPROBES, DT_IDENT_SCALAR, DT_CONST_NINITPROBES),

	DT_BPF_SYMBOL_ID(NPROBES, DT_IDENT_SCALAR, DT_CONST_NPROBES),

>  
>  	/* End-of-list marker */
>  	{ NULL, }
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 340dc1960..198405db2 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -319,7 +319,8 @@ struct dtrace_hdl {
>  	 */
>  	struct dt_probe **dt_probes; /* array of probes */
>  	size_t dt_probes_sz;	/* size of array of probes */
> -	uint32_t dt_probe_id;	/* next available probe id */
> +	dtrace_id_t dt_probe_id; /* next available probe id */
> +	dtrace_id_t dt_ninitprobes; /* number of initial probes */

	dtrace_id_t dt_nprobes; /* number of probes (pre-'go') */

>  
>  	struct dt_probe *dt_error; /* ERROR probe */
>  
> @@ -389,6 +390,10 @@ struct dtrace_hdl {
>  	int dt_aggmap_fd;	/* file descriptor for the 'aggs' BPF map */
>  	int dt_genmap_fd;	/* file descriptor for the 'agggen' BPF map */
>  	int dt_cpumap_fd;	/* file descriptor for the 'cpuinfo' BPF map */
> +	int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> +	int dt_usdt_pridsmap_ksz; /* 'usdt_prids' BPF map key size */
> +	int dt_usdt_pridsmap_vsz; /* 'usdt_prids' BPF map value size */

No need for these two size variables.  These values are constants.

> +	int dt_usdt_namesmap_fd; /* file descriptor for the 'usdt_names' BPF map */
>  	dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
>  	void *dt_errarg;	/* error handler argument */
>  	dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
> diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
> index 848141ddc..113ffff2b 100644
> --- a/libdtrace/dt_open.c
> +++ b/libdtrace/dt_open.c
> @@ -817,6 +817,15 @@ dt_vopen(int version, int flags, int *errp,
>  	dtp->dt_options[DTRACEOPT_SWITCHRATE] = 0;
>  	dtp->dt_options[DTRACEOPT_AGGRATE] = 0;
>  
> +	/*
> +	 * Set the default maximum number of (added) USDT probes.
> +	 */
> +	dtp->dt_options[DTRACEOPT_NUSDTPROBES] = 256;
> +
> +	/*
> +	 * Pre-processor.
> +	 */
> +

Drop empty line.

>  	dtp->dt_cpp_argv[0] = (char *)strbasename(dtp->dt_cpp_path);
>  
>  	snprintf(isadef, sizeof(isadef), "-D__SUNW_D_%u",
> diff --git a/libdtrace/dt_options.c b/libdtrace/dt_options.c
> index ec53358b3..377b396b0 100644
> --- a/libdtrace/dt_options.c
> +++ b/libdtrace/dt_options.c
> @@ -1137,6 +1137,7 @@ static const dt_option_t _dtrace_rtoptions[] = {
>  	{ "jstackstrsize", dt_opt_size, DTRACEOPT_JSTACKSTRSIZE },
>  	{ "lockmem", dt_opt_lockmem, DTRACEOPT_LOCKMEM },
>  	{ "maxframes", dt_opt_runtime, DTRACEOPT_MAXFRAMES },
> +	{ "nusdtprobes", dt_opt_runtime, DTRACEOPT_NUSDTPROBES },
>  	{ "nspec", dt_opt_runtime, DTRACEOPT_NSPEC },
>  	{ "pcapsize", dt_opt_pcapsize, DTRACEOPT_PCAPSIZE },
>  	{ "scratchsize", dt_opt_scratchsize, DTRACEOPT_SCRATCHSIZE },
> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> index bb172ace2..b02b84263 100644
> --- a/libdtrace/dt_prov_uprobe.c
> +++ b/libdtrace/dt_prov_uprobe.c
> @@ -37,8 +37,10 @@
>  #include "dt_list.h"
>  #include "dt_provider_tp.h"
>  #include "dt_probe.h"
> +#include "dt_program.h"
>  #include "dt_pid.h"
>  #include "dt_string.h"
> +#include "port.h"
>  
>  /* Provider name for the underlying probes. */
>  static const char	prvname[] = "uprobe";
> @@ -63,6 +65,15 @@ typedef struct list_probe {
>  	dt_probe_t	*probe;
>  } list_probe_t;
>  
> +typedef struct usdt_prids_map_key {
> +	pid_t		pid;
> +	dtrace_id_t	uprid;
> +} usdt_prids_map_key_t;
> +typedef struct usdt_prids_map_val {
> +	dtrace_id_t	prid;
> +	long long	mask;
> +} usdt_prids_map_val_t;

Move these to dt_bpf_maps.h?

> +
>  static const dtrace_pattr_t	pattr = {
>  { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
>  { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> @@ -77,6 +88,9 @@ dt_provimpl_t	dt_usdt;
>  
>  static int populate(dtrace_hdl_t *dtp)
>  {
> +	dtp->dt_usdt_pridsmap_ksz = sizeof(usdt_prids_map_key_t);
> +	dtp->dt_usdt_pridsmap_vsz = sizeof(usdt_prids_map_val_t);

Unnecessary.

> +
>  	if (dt_provider_create(dtp, dt_uprobe.name, &dt_uprobe, &pattr,
>  			       NULL) == NULL ||
>  	    dt_provider_create(dtp, dt_uprobe_is_enabled.name,
> @@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>  	free_probe_list(dtp, datap);
>  }
>  
> +/*
> + * Clean up the BPF "usdt_prids" map.
> + */
> +static int
> +purge_BPFmap(dtrace_hdl_t *dtp)

The function name is too generic compared to the comment that describes what
it does.  Since this performs cleanup for a particular BPF map, it function
name should reflect that.

Then again, it turns out that the function actually purges data from the two
USDT BPF maps, so while the name probably should still be a bit more
descriptive, the comment above it certainly needs updating.

> +{
> +	int			fdprids = dtp->dt_usdt_pridsmap_fd;
> +	int			fdnames = dtp->dt_usdt_namesmap_fd;
> +	usdt_prids_map_key_t	key, nxt;
> +	usdt_prids_map_val_t	val;
> +
> +	/* Initialize key to a pid/uprid that cannot be found. */
> +	key.pid = 0;
> +	key.uprid = 0;
> +
> +	/* Loop over all entries. */
> +	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
> +		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
> +
> +		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
> +			return dt_set_errno(dtp, EDT_BPF);
> +
> +		/* Check if the process is still running. */
> +		if (!Pexists(key.pid)) {
> +			dt_bpf_map_delete(fdprids, &key);          // FIXME:  bpf_map_next_key() iteration restarts each time we delete an elem!!!
> +			dt_bpf_map_delete(fdnames, &val.prid);

I expect you may want to create a list of entries to delete, and then do a
batch delete (if bpf_map_delete_batch() exists) or a delete of each element
in the to-purge list.  That way you do not interfere with the iterator.

> +			continue;
> +		}
> +
> +		/*
> +		 * FIXME.  There might be another case, where the process
> +		 * is still running, but some of its USDT probes are gone?
> +		 * So maybe we have to check for the existence of one of
> +		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
> +		 *     char *prv = ...pdp->prv minus the numerial part;
> +		 *
> +		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
> +		 *     /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> +		 *     /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> +		 */

Yes, there is the case where a dlclose() can cause some of the task's USDT
probes to be dropped.

> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Update the uprobe provider.
> + */

I'll look at this function more tomorrow.  It is doing a lot and it is a bit
more complex, so I want to take my time with it.  One (minor) concern is that
the probnam is filled in with garbage potentially residing between probe name
components that do not fill the entire allocated space.

More tomorrow...

> +static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
> +{
> +	dt_probe_t	*prp;
> +	dt_probe_t	*prp_next;
> +	int		i, prid = dtp->dt_probe_id;
> +	dt_pcb_t	pcb;
> +	int		fd = dtp->dt_usdt_namesmap_fd;
> +	char		probnam[DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN];
> +
> +	/* Clear stale pids. */
> +	purge_BPFmap(dtp);
> +
> +	/* Update dt_probes[] and dt_enablings. */
> +	/*
> +	 * pcb is only used inside of dt_pid_error() to get:
> +	 *     pcb->pcb_region
> +	 *     pcb->pcb_filetag
> +	 *     pcb->pcb_fileptr
> +	 * While pcb cannot be NULL, these other things apparently can be.
> +	 */
> +	memset(&pcb, 0, sizeof(dt_pcb_t));
> +	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> +		dtrace_stmtdesc_t *stp;
> +
> +		stp = dtp->dt_stmts[i];
> +		assert(stp != NULL);
> +		dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
> +	}
> +
> +	/*
> +	 * Add probe name elements for any new USDT probes.
> +	 */
> +	while (prid < dtp->dt_probe_id) {
> +		const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
> +
> +		dt_probe_enable(dtp, dtp->dt_probes[prid]);
> +
> +		if (dtp->dt_probes[prid]->prov->impl == &dt_usdt) {
> +			snprintf(probnam, DTRACE_PROVNAMELEN, "%s", pdp->prv);
> +			snprintf(&probnam[DTRACE_PROVNAMELEN],
> +					  DTRACE_MODNAMELEN, "%s", pdp->mod);
> +			snprintf(&probnam[DTRACE_PROVNAMELEN +
> +					  DTRACE_MODNAMELEN],
> +					  DTRACE_FUNCNAMELEN, "%s", pdp->fun);
> +			snprintf(&probnam[DTRACE_PROVNAMELEN +
> +					  DTRACE_MODNAMELEN +
> +					  DTRACE_FUNCNAMELEN],
> +					  DTRACE_NAMELEN, "%s", pdp->prb);
> +			if (dt_bpf_map_update(fd, &prid, probnam) == -1)
> +				assert(0);   // FIXME do something here
> +		}
> +
> +		prid++;
> +	}
> +
> +	/* Review enablings. */
> +	for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
> +		pid_t		pid;
> +		list_probe_t	*pup;
> +
> +		prp_next = dt_list_next(prp);
> +
> +		/* Make sure it is an overlying USDT probe. */
> +		if (prp->prov->impl != &dt_usdt)
> +			continue;
> +
> +		/* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
> +		/*
> +		 * Nick writes:
> +		 * This is a general problem with running compiler-adjacent things outside
> +		 * compile time. I think we should adjust dt_pid_error() so that it works
> +		 * with NULL pcb and dpr at once, probably by using the code path for
> +		 * pcb != NULL and augmenting it so that it passes in NULL for the region and
> +		 * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
> +		 * already handle this case.)
> +		 */
> +		pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
> +
> +		if (!Pexists(pid)) {
> +			/* Remove from enablings. */
> +			dt_list_delete(&dtp->dt_enablings, prp);
> +
> +			/* Make it evident from the probe that it is not in enablings. */
> +			((dt_list_t *)prp)->dl_prev = NULL;
> +			((dt_list_t *)prp)->dl_next = NULL;
> +
> +			/* Free up its list of underlying probes. */
> +			while ((pup = dt_list_next(prp->prv_data)) != NULL) {
> +				dt_list_delete(prp->prv_data, pup);
> +				dt_free(dtp, pup);
> +			}
> +			dt_free(dtp, prp->prv_data);
> +			prp->prv_data = NULL;
> +
> +			continue;
> +		}
> +
> +		for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
> +			dt_probe_t		*uprp = pup->probe;
> +			long long		mask = 0, bit = 1;
> +			usdt_prids_map_key_t	key;
> +			usdt_prids_map_val_t	val, oval;
> +			dt_uprobe_t		*upp = uprp->prv_data;
> +
> +			/*
> +			 * For is-enabled probes, the bit mask does not matter.
> +			 * It is possible that we have this underlying probe due to
> +			 * an overlying pid-offset probe and that we will not know
> +			 * until later, when some new pid is created, that we also
> +			 * have an overlying USDT is-enabled probe, but missing this
> +			 * optimization opportunity is okay.
> +			 */
> +			if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
> +				int n;
> +
> +				for (n = 0; n < dtp->dt_stmt_nextid; n++) {
> +					dtrace_stmtdesc_t *stp;
> +
> +					stp = dtp->dt_stmts[n];
> +					assert(stp != NULL);
> +
> +					if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
> +					    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
> +					    dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
> +					    dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
> +						mask |= bit;
> +
> +					bit <<= 1;
> +				}
> +			}
> +
> +			key.pid = pid;
> +			key.uprid = uprp->desc->id;
> +
> +			val.prid = prp->desc->id;
> +			val.mask = mask;
> +
> +			/* linux/bpf.h says error is -1, but it could be -2 */
> +			if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
> +				/*
> +				 * Set the map entry.
> +				 */
> +				// FIXME Check return value, but how should errors be handled?
> +				dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
> +			} else if (val.prid != oval.prid || val.mask != oval.mask) {
> +				/*
> +				 * This can happen when two overlying probes map to the same underlying probe for the same pid.
> +				 * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
> +				 */
> +			} else {
> +				/*
> +				 * Nothing to do, it already is in the map.
> +				 */
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
>  
>  /*
>   * Look up or create an underlying (real) probe, corresponding directly to a
> @@ -782,6 +1003,7 @@ dt_provimpl_t	dt_uprobe = {
>  	.probe_info	= &probe_info,
>  	.detach		= &detach,
>  	.probe_destroy	= &probe_destroy_underlying,
> +	.update		= &update_uprobe,

I would think that this functionality resides at the USDT level rather than at
the underlying uprobe level.

>  };
>  
>  /*
> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> index 11335345a..2e7884cb9 100644
> --- a/libdtrace/dt_work.c
> +++ b/libdtrace/dt_work.c
> @@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>  	dtrace_workstatus_t	rval;
>  	int			gen;
>  
> +	if (dt_uprobe.update(dtp, NULL) < 0)
> +		return DTRACE_WORKSTATUS_ERROR;

We probably should to do this with a callback function instead of calling a
function directly in a specific provider.  Still thinking about a better name
for the callback though.  Perhaps dt_usdt->rematch() or something like that?
Or dt_usdt->matchall()?  One of those was used in the original DTrace code
base as far as I recall.

> +
>  	switch (dtrace_status(dtp)) {
>  	case DTRACE_STATUS_EXITED:
>  	case DTRACE_STATUS_STOPPED:
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
  2024-10-10  4:55 ` [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps Kris Van Hees
@ 2024-10-10 20:58   ` Kris Van Hees
  2024-10-11 22:56   ` Eugene Loh
  1 sibling, 0 replies; 9+ messages in thread
From: Kris Van Hees @ 2024-10-10 20:58 UTC (permalink / raw)
  To: Kris Van Hees; +Cc: eugene.loh, dtrace, dtrace-devel

First of, this is (together with the entire series) a remarkable piece of work.
This is complex and very complete already.  Great job.  (I forgot to mention
that in my previous review email.)

About update_uprobe()...

I'm struggling a little with the model of having the function in the provider
(uprobe_update() in dt_prov_uprobe.c) being the one that does the work of
creating new USDT probes as needed (using dt_pid_create_usdt_probes()), and
then going through the newly created probes and then processing each of those
new USDT probes to add them to the running system.

The model that DTrace has used was more using the providers when necessary
once it is known that a particular probe needs work done.  That makes it more
suitable for also supporting other probes becoming available at a later time
(as could potentially be supported to handle kernel module loading etc).

The overall structure of the DTrace design is also more geared towards letting
the core probe handing code take care of matching probes against probe descs,
and then calling into the providers as needed to take action.

I would envision a slightly different flow...

From dtrace_work() (in dt_work.c), a call is made to a function to look for
new probes that match any of the probe descriptions for the statements that
are in use.  This was traditionally done at the kernel level whenever new
providers were registered and probes added.

We can no longer do it that way of course, but we can (and should) do it when
new USDT probes are registered through dtprobed.  That is where (soon) we will
hopefully be able to use inotify or another mechanism to alert dtrace when new
probes are available.  For now, making a call to the code that handles new
probes discovery from dtrace_work() is a good alternative indeed.

So, I would implement a dt_probe_discover() function that looks for new probes
by making a call to a discover() callback in each non-PID-based provider.  For
now, that would only be the dt_usdt provider implementation, of course.  It
makes sense to have dt_probe_discover() loop through the statements as you do
rather than loop through probes and try to match them to statements since we
know that the statements are used (as opposed to probes possibly not being of
interest to us).

You can still keep the logic that stores the dt_probe_id value af the start
of the processing and then uses that with the new value (if it changed) to
limit further work to just those probes.

So, once discover() has been called in all non-PID providers (and perhaps we
can even add a flag to providers to explicitly mark them as supporting new
probes - DT_PROVIDER_DISCOVER flag - that would reduce the overhead even more),
you can loop through all new probes, calling an add_probe() callback for each
probe to be enabled, etc...

What do you think?

On Thu, Oct 10, 2024 at 12:55:07AM -0400, Kris Van Hees wrote:
> On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh@oracle.com wrote:

... skipped ...

> > +/*
> > + * Update the uprobe provider.
> > + */
> > +static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
> > +{
> > +	dt_probe_t	*prp;
> > +	dt_probe_t	*prp_next;
> > +	int		i, prid = dtp->dt_probe_id;
> > +	dt_pcb_t	pcb;
> > +	int		fd = dtp->dt_usdt_namesmap_fd;
> > +	char		probnam[DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN];
> > +
> > +	/* Clear stale pids. */
> > +	purge_BPFmap(dtp);
> > +
> > +	/* Update dt_probes[] and dt_enablings. */
> > +	/*
> > +	 * pcb is only used inside of dt_pid_error() to get:
> > +	 *     pcb->pcb_region
> > +	 *     pcb->pcb_filetag
> > +	 *     pcb->pcb_fileptr
> > +	 * While pcb cannot be NULL, these other things apparently can be.
> > +	 */
> > +	memset(&pcb, 0, sizeof(dt_pcb_t));
> > +	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> > +		dtrace_stmtdesc_t *stp;
> > +
> > +		stp = dtp->dt_stmts[i];
> > +		assert(stp != NULL);
> > +		dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
> > +	}
> > +
> > +	/*
> > +	 * Add probe name elements for any new USDT probes.
> > +	 */
> > +	while (prid < dtp->dt_probe_id) {
> > +		const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
> > +
> > +		dt_probe_enable(dtp, dtp->dt_probes[prid]);
> > +
> > +		if (dtp->dt_probes[prid]->prov->impl == &dt_usdt) {
> > +			snprintf(probnam, DTRACE_PROVNAMELEN, "%s", pdp->prv);
> > +			snprintf(&probnam[DTRACE_PROVNAMELEN],
> > +					  DTRACE_MODNAMELEN, "%s", pdp->mod);
> > +			snprintf(&probnam[DTRACE_PROVNAMELEN +
> > +					  DTRACE_MODNAMELEN],
> > +					  DTRACE_FUNCNAMELEN, "%s", pdp->fun);
> > +			snprintf(&probnam[DTRACE_PROVNAMELEN +
> > +					  DTRACE_MODNAMELEN +
> > +					  DTRACE_FUNCNAMELEN],
> > +					  DTRACE_NAMELEN, "%s", pdp->prb);
> > +			if (dt_bpf_map_update(fd, &prid, probnam) == -1)
> > +				assert(0);   // FIXME do something here
> > +		}
> > +
> > +		prid++;
> > +	}
> > +
> > +	/* Review enablings. */
> > +	for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
> > +		pid_t		pid;
> > +		list_probe_t	*pup;
> > +
> > +		prp_next = dt_list_next(prp);
> > +
> > +		/* Make sure it is an overlying USDT probe. */
> > +		if (prp->prov->impl != &dt_usdt)
> > +			continue;
> > +
> > +		/* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
> > +		/*
> > +		 * Nick writes:
> > +		 * This is a general problem with running compiler-adjacent things outside
> > +		 * compile time. I think we should adjust dt_pid_error() so that it works
> > +		 * with NULL pcb and dpr at once, probably by using the code path for
> > +		 * pcb != NULL and augmenting it so that it passes in NULL for the region and
> > +		 * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
> > +		 * already handle this case.)
> > +		 */
> > +		pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
> > +
> > +		if (!Pexists(pid)) {
> > +			/* Remove from enablings. */
> > +			dt_list_delete(&dtp->dt_enablings, prp);
> > +
> > +			/* Make it evident from the probe that it is not in enablings. */
> > +			((dt_list_t *)prp)->dl_prev = NULL;
> > +			((dt_list_t *)prp)->dl_next = NULL;
> > +
> > +			/* Free up its list of underlying probes. */
> > +			while ((pup = dt_list_next(prp->prv_data)) != NULL) {
> > +				dt_list_delete(prp->prv_data, pup);
> > +				dt_free(dtp, pup);
> > +			}
> > +			dt_free(dtp, prp->prv_data);
> > +			prp->prv_data = NULL;
> > +
> > +			continue;
> > +		}
> > +
> > +		for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
> > +			dt_probe_t		*uprp = pup->probe;
> > +			long long		mask = 0, bit = 1;
> > +			usdt_prids_map_key_t	key;
> > +			usdt_prids_map_val_t	val, oval;
> > +			dt_uprobe_t		*upp = uprp->prv_data;
> > +
> > +			/*
> > +			 * For is-enabled probes, the bit mask does not matter.
> > +			 * It is possible that we have this underlying probe due to
> > +			 * an overlying pid-offset probe and that we will not know
> > +			 * until later, when some new pid is created, that we also
> > +			 * have an overlying USDT is-enabled probe, but missing this
> > +			 * optimization opportunity is okay.
> > +			 */
> > +			if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
> > +				int n;
> > +
> > +				for (n = 0; n < dtp->dt_stmt_nextid; n++) {
> > +					dtrace_stmtdesc_t *stp;
> > +
> > +					stp = dtp->dt_stmts[n];
> > +					assert(stp != NULL);
> > +
> > +					if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
> > +					    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
> > +					    dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
> > +					    dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
> > +						mask |= bit;
> > +
> > +					bit <<= 1;
> > +				}
> > +			}
> > +
> > +			key.pid = pid;
> > +			key.uprid = uprp->desc->id;
> > +
> > +			val.prid = prp->desc->id;
> > +			val.mask = mask;
> > +
> > +			/* linux/bpf.h says error is -1, but it could be -2 */
> > +			if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
> > +				/*
> > +				 * Set the map entry.
> > +				 */
> > +				// FIXME Check return value, but how should errors be handled?
> > +				dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
> > +			} else if (val.prid != oval.prid || val.mask != oval.mask) {
> > +				/*
> > +				 * This can happen when two overlying probes map to the same underlying probe for the same pid.
> > +				 * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
> > +				 */
> > +			} else {
> > +				/*
> > +				 * Nothing to do, it already is in the map.
> > +				 */
> > +			}
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
  2024-10-10  4:55 ` [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps Kris Van Hees
  2024-10-10 20:58   ` Kris Van Hees
@ 2024-10-11 22:56   ` Eugene Loh
  2024-10-11 23:53     ` [DTrace-devel] " Kris Van Hees
  1 sibling, 1 reply; 9+ messages in thread
From: Eugene Loh @ 2024-10-11 22:56 UTC (permalink / raw)
  To: dtrace, dtrace-devel

I know there is another review email closely related to this one, but 
here I'll try basically to respond to what's in this email.

Also, you make a number of suggestions that I could incorporate that, 
therefore, warrant no further discussion here.  If I don't comment on 
some particular point, that means I think I managed to incorporate that 
particular feedback.

On 10/10/24 00:55, Kris Van Hees wrote:
> On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh@oracle.com wrote:
>> From: Eugene Loh <eugene.loh@oracle.com>
>>
>> As USDT processes come and go, the overlying probes for an underlying
>> probe will change.  Hence, we will move to a scheme in which an
>> underlying probe will walk all possible clauses it could call,
>> deciding at run time (using a bit mask for the overlying USDT probe)
>> which of the clauses to call.
> I think this could be explained a bit better.  I'd say that it is the "set of
> overlying probes for an underlying probe" that will change.  And I would add
> that the underlying probe program will walk all possible clauses that any of
> its overlying probes might call for each overlying probe, only executing the
> clauses that apply to that overlying probe (using a bitmask).

I'm not convinced I understand what you're getting at here, but the next 
version of the patch should reflect my best attempt to incorporate your 
suggestions.

>> As USDT processes start up, we also add new overlying USDT probes,
>> whose name elements must be retrieved by prid for the built-in
>> variables probeprov, probemod, probefunc, and probename.  While
>> this is currently done with the BPF "probes" and "strtab" maps,
>> the monolithic strtab map cannot safely be updated by the consumer
>> while the kernel might be reading the map.  Therefore, we introduce
>> a new variable, dtp->dt_ninitprobes, that records the number of
>> initial probes, those created when a dtrace starts up.  For a
>> prid < dtp->ninitprobes, the existing scheme is used to retrieve
>> probe name elements.  For newer prids, necessarily USDT probes,
>> we use new a new BPF "usdt_names" map.
> I think this could do with a bit of rewording for clarity.

I don't know what specifically you had in mind, but the next version of 
the patch should reflect my best understanding of what might have 
concerned you.

> I am also a bit
> wondering about the dt_ninitprobes name.  I think that dt_nprobes would be
> sufficient, as the number of probes that DTrace creates.  Any probes added
> later are numbered higher than dt_nprobes.  Not an ideal name, but I think
> it is sufficient (and pre-tracing is the only time we really care about
> counting probes in general - the later ones are counter towards nusdtprobes).

Okay.  Change made.

>> The underlying-probes "update" function is called sporadically.
>> The focus of this patch is the function.  When it is called can
>> be tuned in future patches.
> This seems to be an old paragraph because the focus of this patch seems to
> be the creation (and mgmt) of the two new maps.  Ideally, this would be
> multiple patches actually (creating the maps, populating the maps, using the
> maps, ...)

Ideally, yes.  The original patch series had finer granularity, but some 
disruption (I think the "enabled PID" change) produced lots of changes, 
which were easier to handle by lumping some patches together.  So, 
"ideally" some refactoring could be done for finer-grained patches for 
easier review, but at this point is the current factoring of the patches 
"good enough"?

>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> @@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>> +static int
>> +gmap_create_usdt(dtrace_hdl_t *dtp)
>> +{
>> +	dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
>> +	    sizeof(dtrace_id_t),
>> +	    DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,
> Any reason why you are not using DTRACE_FULLNAMELEN?

Honestly, because I didn't know about it.

That said, FULLNAMELEN has the funny characteristic that it has a "+4".  
I wouldn't be surprised if that's a mistake.  Anyhow, it means that 
theoretically using FULLNAMELEN means using a little extra storage.

In practice, the component name lengths are all multiples of 8 bytes, 
the key is only 4 bytes, and the BPF map I think rounds to 8-byte 
alignment.  So, in practice, using FULLNAMELEN does not cost any extra 
space.

So, I converted to FULLNAMELEN.  Here, and one other place.

>> +	if (dtp->dt_usdt_namesmap_fd == -1)
>> +		return -1;
>> +
>> +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
>> +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
>> +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
> empty line perhaps?0~

Sorry, I don't understand this comment.

>> +	if (dtp->dt_usdt_pridsmap_fd == -1)
>> +		return -1;
>> +
>> +	dtp->dt_ninitprobes = dtp->dt_probe_id;
>> +
>> +	/* Populate the newly created map.  FIXME:  this is probably not the right place for this. */
>> +	if (dt_uprobe.update(dtp, NULL) < 0)
>> +		return -1;
> Yes and no.  You are using the probe name data for USDT probes that already
> exist at this time from the strtab so there is no need to update the usdt_names
> map at this point.  The usdt_prids map does need updating, but I don't think it
> makes sense to do that here.  During program load would seem a much better
> time.

I don't know what specifically you have in mind, but perhaps the real 
discussion is over on the other email.

> Also, we certainly wouldn't want to use a direct call to a function in
> a specific provider.

Yes, that smells stinky, but that can be changed once we have a better 
sense of where we want to go.  One solution is to have the consumer loop 
over all providers, calling the provider-specific update() function for 
any provider that supplies one.  Again, however, maybe this topic makes 
more sense over on your later email.

>> +
>> +	return 0;
>> +}
>> +
>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> @@ -63,6 +65,15 @@ typedef struct list_probe {
>> +typedef struct usdt_prids_map_key {
>> +	pid_t		pid;
>> +	dtrace_id_t	uprid;
>> +} usdt_prids_map_key_t;
>> +typedef struct usdt_prids_map_val {
>> +	dtrace_id_t	prid;
>> +	long long	mask;
>> +} usdt_prids_map_val_t;
> Move these to dt_bpf_maps.h?

Okay.  I did this, but this uses types like pid_t and dtrace_id_t. I had 
problems compiling the bpf/*.c files that included dt_bpf_maps.h...  
resolving those typedefs and/or dealing with the header files I tried to 
include to define the typedefs.

Not a big deal;  not a blocker.  I moved these things to dt_bpf_maps.h 
and left pid_t and dtrace_id_t as ints for now.  A FIXME indicates that 
this should ideally be cleaned up.

>> @@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>> +/*
>> + * Clean up the BPF "usdt_prids" map.
>> + */
>> +static int
>> +purge_BPFmap(dtrace_hdl_t *dtp)
>> +{
>> +	int			fdprids = dtp->dt_usdt_pridsmap_fd;
>> +	int			fdnames = dtp->dt_usdt_namesmap_fd;
>> +	usdt_prids_map_key_t	key, nxt;
>> +	usdt_prids_map_val_t	val;
>> +
>> +	/* Initialize key to a pid/uprid that cannot be found. */
>> +	key.pid = 0;
>> +	key.uprid = 0;
>> +
>> +	/* Loop over all entries. */
>> +	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
>> +		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
>> +
>> +		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
>> +			return dt_set_errno(dtp, EDT_BPF);
>> +
>> +		/* Check if the process is still running. */
>> +		if (!Pexists(key.pid)) {
>> +			dt_bpf_map_delete(fdprids, &key);          // FIXME:  bpf_map_next_key() iteration restarts each time we delete an elem!!!
>> +			dt_bpf_map_delete(fdnames, &val.prid);
>> +			continue;
>> +		}
>> +
>> +		/*
>> +		 * FIXME.  There might be another case, where the process
>> +		 * is still running, but some of its USDT probes are gone?
>> +		 * So maybe we have to check for the existence of one of
>> +		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
>> +		 *     char *prv = ...pdp->prv minus the numerial part;
>> +		 *
>> +		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
>> +		 *     /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> +		 *     /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> +		 */
> Yes, there is the case where a dlclose() can cause some of the task's USDT
> probes to be dropped.

Shall this be addressed now, or can it wait until after this patch 
series?  (I vote for waiting.)

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>> @@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>>   	dtrace_workstatus_t	rval;
>>   	int			gen;
>>   
>> +	if (dt_uprobe.update(dtp, NULL) < 0)
>> +		return DTRACE_WORKSTATUS_ERROR;
> We probably should to do this with a callback function

Again, maybe we discuss more on the other email, but I don't get what 
you're suggesting here.

> instead of calling a
> function directly in a specific provider.

Again, it could loop over all providers, simply calling an update 
function for any provider that supplies one.

> Still thinking about a better name
> for the callback though.  Perhaps dt_usdt->rematch() or something like that?
> Or dt_usdt->matchall()?  One of those was used in the original DTrace code
> base as far as I recall.

I guess it depends on how we think of what's going on here and how that 
will generalize.  My thought was that, unless we have some event-driven, 
dtprobed, inotify() saying when the function should be called, we simply 
want to be called "periodically" and that other providers might someday 
want to do the same...  for purposes rather unlike what USDT wants this 
for.  So, that why I chose a name that was very generic, assuming only 
that the function would be called "periodically."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
  2024-10-11 22:56   ` Eugene Loh
@ 2024-10-11 23:53     ` Kris Van Hees
  2024-10-12  4:04       ` Eugene Loh
  0 siblings, 1 reply; 9+ messages in thread
From: Kris Van Hees @ 2024-10-11 23:53 UTC (permalink / raw)
  To: Eugene Loh; +Cc: dtrace, dtrace-devel

On Fri, Oct 11, 2024 at 06:56:16PM -0400, Eugene Loh via DTrace-devel wrote:
> I know there is another review email closely related to this one, but here
> I'll try basically to respond to what's in this email.
> 
> Also, you make a number of suggestions that I could incorporate that,
> therefore, warrant no further discussion here.  If I don't comment on some
> particular point, that means I think I managed to incorporate that
> particular feedback.
> 
> On 10/10/24 00:55, Kris Van Hees wrote:
> > On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh@oracle.com wrote:
> > > From: Eugene Loh <eugene.loh@oracle.com>
> > > 
> > > As USDT processes come and go, the overlying probes for an underlying
> > > probe will change.  Hence, we will move to a scheme in which an
> > > underlying probe will walk all possible clauses it could call,
> > > deciding at run time (using a bit mask for the overlying USDT probe)
> > > which of the clauses to call.
> > I think this could be explained a bit better.  I'd say that it is the "set of
> > overlying probes for an underlying probe" that will change.  And I would add
> > that the underlying probe program will walk all possible clauses that any of
> > its overlying probes might call for each overlying probe, only executing the
> > clauses that apply to that overlying probe (using a bitmask).
> 
> I'm not convinced I understand what you're getting at here, but the next
> version of the patch should reflect my best attempt to incorporate your
> suggestions.
> 
> > > As USDT processes start up, we also add new overlying USDT probes,
> > > whose name elements must be retrieved by prid for the built-in
> > > variables probeprov, probemod, probefunc, and probename.  While
> > > this is currently done with the BPF "probes" and "strtab" maps,
> > > the monolithic strtab map cannot safely be updated by the consumer
> > > while the kernel might be reading the map.  Therefore, we introduce
> > > a new variable, dtp->dt_ninitprobes, that records the number of
> > > initial probes, those created when a dtrace starts up.  For a
> > > prid < dtp->ninitprobes, the existing scheme is used to retrieve
> > > probe name elements.  For newer prids, necessarily USDT probes,
> > > we use new a new BPF "usdt_names" map.
> > I think this could do with a bit of rewording for clarity.
> 
> I don't know what specifically you had in mind, but the next version of the
> patch should reflect my best understanding of what might have concerned you.
> 
> > I am also a bit
> > wondering about the dt_ninitprobes name.  I think that dt_nprobes would be
> > sufficient, as the number of probes that DTrace creates.  Any probes added
> > later are numbered higher than dt_nprobes.  Not an ideal name, but I think
> > it is sufficient (and pre-tracing is the only time we really care about
> > counting probes in general - the later ones are counter towards nusdtprobes).
> 
> Okay.  Change made.
> 
> > > The underlying-probes "update" function is called sporadically.
> > > The focus of this patch is the function.  When it is called can
> > > be tuned in future patches.
> > This seems to be an old paragraph because the focus of this patch seems to
> > be the creation (and mgmt) of the two new maps.  Ideally, this would be
> > multiple patches actually (creating the maps, populating the maps, using the
> > maps, ...)
> 
> Ideally, yes.  The original patch series had finer granularity, but some
> disruption (I think the "enabled PID" change) produced lots of changes,
> which were easier to handle by lumping some patches together.  So, "ideally"
> some refactoring could be done for finer-grained patches for easier review,
> but at this point is the current factoring of the patches "good enough"?
> 
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > @@ -940,6 +940,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
> > > +static int
> > > +gmap_create_usdt(dtrace_hdl_t *dtp)
> > > +{
> > > +	dtp->dt_usdt_namesmap_fd = create_gmap(dtp, "usdt_names", BPF_MAP_TYPE_HASH,
> > > +	    sizeof(dtrace_id_t),
> > > +	    DTRACE_PROVNAMELEN + DTRACE_MODNAMELEN + DTRACE_FUNCNAMELEN + DTRACE_NAMELEN,
> > Any reason why you are not using DTRACE_FULLNAMELEN?
> 
> Honestly, because I didn't know about it.
> 
> That said, FULLNAMELEN has the funny characteristic that it has a "+4".  I
> wouldn't be surprised if that's a mistake.  Anyhow, it means that
> theoretically using FULLNAMELEN means using a little extra storage.

The +4 is to account for the ':' separator between name elements.  Which is
as far as I can see actually incorrect because the name length constants
include the terminating 0-byte, so when they are combined into a fully
qualified probe name, those 0-bytes become the ':' separators and thus there
is still no need for extra space.

I'll try a build with the +4 removed and see what happens.  I expect valgrind
won't report anything different.

> In practice, the component name lengths are all multiples of 8 bytes, the
> key is only 4 bytes, and the BPF map I think rounds to 8-byte alignment. 
> So, in practice, using FULLNAMELEN does not cost any extra space.
> 
> So, I converted to FULLNAMELEN.  Here, and one other place.
> 
> > > +	if (dtp->dt_usdt_namesmap_fd == -1)
> > > +		return -1;
> > > +
> > > +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
> > > +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
> > > +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
> > empty line perhaps?0~
> 
> Sorry, I don't understand this comment.

Similar to the previous assignment followed by a conditional, I think that it
would make sense to insert an empty line between the assignment and the
conditional.

> > > +	if (dtp->dt_usdt_pridsmap_fd == -1)
> > > +		return -1;
> > > +
> > > +	dtp->dt_ninitprobes = dtp->dt_probe_id;
> > > +
> > > +	/* Populate the newly created map.  FIXME:  this is probably not the right place for this. */
> > > +	if (dt_uprobe.update(dtp, NULL) < 0)
> > > +		return -1;
> > Yes and no.  You are using the probe name data for USDT probes that already
> > exist at this time from the strtab so there is no need to update the usdt_names
> > map at this point.  The usdt_prids map does need updating, but I don't think it
> > makes sense to do that here.  During program load would seem a much better
> > time.
> 
> I don't know what specifically you have in mind, but perhaps the real
> discussion is over on the other email.

Correct - see other email.

> > Also, we certainly wouldn't want to use a direct call to a function in
> > a specific provider.
> 
> Yes, that smells stinky, but that can be changed once we have a better sense
> of where we want to go.  One solution is to have the consumer loop over all
> providers, calling the provider-specific update() function for any provider
> that supplies one.  Again, however, maybe this topic makes more sense over
> on your later email.

Yes - see other email.

> > > +
> > > +	return 0;
> > > +}
> > > +
> > > diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> > > @@ -63,6 +65,15 @@ typedef struct list_probe {
> > > +typedef struct usdt_prids_map_key {
> > > +	pid_t		pid;
> > > +	dtrace_id_t	uprid;
> > > +} usdt_prids_map_key_t;
> > > +typedef struct usdt_prids_map_val {
> > > +	dtrace_id_t	prid;
> > > +	long long	mask;
> > > +} usdt_prids_map_val_t;
> > Move these to dt_bpf_maps.h?
> 
> Okay.  I did this, but this uses types like pid_t and dtrace_id_t. I had
> problems compiling the bpf/*.c files that included dt_bpf_maps.h... 
> resolving those typedefs and/or dealing with the header files I tried to
> include to define the typedefs.
> 
> Not a big deal;  not a blocker.  I moved these things to dt_bpf_maps.h and
> left pid_t and dtrace_id_t as ints for now.  A FIXME indicates that this
> should ideally be cleaned up.

Yes, we should probably play a little with the include files, to separate out
some parts that the BPF code need without needing to pull in things that are
irrelevant (and might break compilation of BPF code).

> > > @@ -123,6 +137,213 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> > > +/*
> > > + * Clean up the BPF "usdt_prids" map.
> > > + */
> > > +static int
> > > +purge_BPFmap(dtrace_hdl_t *dtp)
> > > +{
> > > +	int			fdprids = dtp->dt_usdt_pridsmap_fd;
> > > +	int			fdnames = dtp->dt_usdt_namesmap_fd;
> > > +	usdt_prids_map_key_t	key, nxt;
> > > +	usdt_prids_map_val_t	val;
> > > +
> > > +	/* Initialize key to a pid/uprid that cannot be found. */
> > > +	key.pid = 0;
> > > +	key.uprid = 0;
> > > +
> > > +	/* Loop over all entries. */
> > > +	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
> > > +		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
> > > +
> > > +		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
> > > +			return dt_set_errno(dtp, EDT_BPF);
> > > +
> > > +		/* Check if the process is still running. */
> > > +		if (!Pexists(key.pid)) {
> > > +			dt_bpf_map_delete(fdprids, &key);          // FIXME:  bpf_map_next_key() iteration restarts each time we delete an elem!!!
> > > +			dt_bpf_map_delete(fdnames, &val.prid);
> > > +			continue;
> > > +		}
> > > +
> > > +		/*
> > > +		 * FIXME.  There might be another case, where the process
> > > +		 * is still running, but some of its USDT probes are gone?
> > > +		 * So maybe we have to check for the existence of one of
> > > +		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
> > > +		 *     char *prv = ...pdp->prv minus the numerial part;
> > > +		 *
> > > +		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
> > > +		 *     /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> > > +		 *     /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> > > +		 */
> > Yes, there is the case where a dlclose() can cause some of the task's USDT
> > probes to be dropped.
> 
> Shall this be addressed now, or can it wait until after this patch series? 
> (I vote for waiting.)

We can wait.  That is a less likely case and is probably easier dealt with once
we have a working implementation.

> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> > > @@ -362,6 +362,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> > >   	dtrace_workstatus_t	rval;
> > >   	int			gen;
> > > +	if (dt_uprobe.update(dtp, NULL) < 0)
> > > +		return DTRACE_WORKSTATUS_ERROR;
> > We probably should to do this with a callback function
> 
> Again, maybe we discuss more on the other email, but I don't get what you're
> suggesting here.

Yes - see other email.

> > instead of calling a
> > function directly in a specific provider.
> 
> Again, it could loop over all providers, simply calling an update function
> for any provider that supplies one.
> 
> > Still thinking about a better name
> > for the callback though.  Perhaps dt_usdt->rematch() or something like that?
> > Or dt_usdt->matchall()?  One of those was used in the original DTrace code
> > base as far as I recall.
> 
> I guess it depends on how we think of what's going on here and how that will
> generalize.  My thought was that, unless we have some event-driven,
> dtprobed, inotify() saying when the function should be called, we simply
> want to be called "periodically" and that other providers might someday want
> to do the same...  for purposes rather unlike what USDT wants this for.  So,
> that why I chose a name that was very generic, assuming only that the
> function would be called "periodically."

See other email for better suggestions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
  2024-10-11 23:53     ` [DTrace-devel] " Kris Van Hees
@ 2024-10-12  4:04       ` Eugene Loh
  2024-10-12  4:11         ` Kris Van Hees
  0 siblings, 1 reply; 9+ messages in thread
From: Eugene Loh @ 2024-10-12  4:04 UTC (permalink / raw)
  To: dtrace, dtrace-devel

On 10/11/24 19:53, Kris Van Hees wrote:

> On Fri, Oct 11, 2024 at 06:56:16PM -0400, Eugene Loh via DTrace-devel wrote:
>> On 10/10/24 00:55, Kris Van Hees wrote:
>>> On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh@oracle.com wrote:
>>>> From: Eugene Loh <eugene.loh@oracle.com>
>>>> +	if (dtp->dt_usdt_namesmap_fd == -1)
>>>> +		return -1;
>>>> +
>>>> +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
>>>> +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
>>>> +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
>>> empty line perhaps?0~
>> Sorry, I don't understand this comment.
> Similar to the previous assignment followed by a conditional, I think that it
> would make sense to insert an empty line between the assignment and the
> conditional.

Curious.  It seemed to me to be very common for us to assign something 
and then immediately (no blank line) check its validity. Just a quick 
sanity check (not using this file, which at this point I've touched 
extensively), I tried libdtrace/dt_bpf.c.  Indeed, lots of this 
practice, and you seemed to be the last person to touch those spots.  I 
like the close spacing, but don't care a whole lot; I just wanted to 
double check since the patch seems to follow common practice in this 
particular regard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [DTrace-devel] [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps
  2024-10-12  4:04       ` Eugene Loh
@ 2024-10-12  4:11         ` Kris Van Hees
  0 siblings, 0 replies; 9+ messages in thread
From: Kris Van Hees @ 2024-10-12  4:11 UTC (permalink / raw)
  To: Eugene Loh; +Cc: dtrace, dtrace-devel

On Sat, Oct 12, 2024 at 12:04:59AM -0400, Eugene Loh wrote:
> On 10/11/24 19:53, Kris Van Hees wrote:
> 
> > On Fri, Oct 11, 2024 at 06:56:16PM -0400, Eugene Loh via DTrace-devel wrote:
> > > On 10/10/24 00:55, Kris Van Hees wrote:
> > > > On Wed, Oct 09, 2024 at 01:45:34AM -0400, eugene.loh@oracle.com wrote:
> > > > > From: Eugene Loh <eugene.loh@oracle.com>
> > > > > +	if (dtp->dt_usdt_namesmap_fd == -1)
> > > > > +		return -1;
> > > > > +
> > > > > +	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
> > > > > +	    dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz,
> > > > > +	    dtp->dt_options[DTRACEOPT_NUSDTPROBES]);
> > > > empty line perhaps?0~
> > > Sorry, I don't understand this comment.
> > Similar to the previous assignment followed by a conditional, I think that it
> > would make sense to insert an empty line between the assignment and the
> > conditional.
> 
> Curious.  It seemed to me to be very common for us to assign something and
> then immediately (no blank line) check its validity. Just a quick sanity
> check (not using this file, which at this point I've touched extensively), I
> tried libdtrace/dt_bpf.c.  Indeed, lots of this practice, and you seemed to
> be the last person to touch those spots.  I like the close spacing, but
> don't care a whole lot; I just wanted to double check since the patch seems
> to follow common practice in this particular regard.

Ah, oops, ignore me.  That was a comment I originally wrote, then meant to
delete (note the odd characters following it in my original review email), but
clearly botched that.  And then when you asked about it, I interpreted my
comment rather than actually looking back at what code it was really refering
to.

My mistake - please ignore.  Sorry for the confusion.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/6] Deferred attach of underlying USDT probes
  2024-10-09  5:45 ` [PATCH v2 1/6] Deferred attach of underlying USDT probes eugene.loh
@ 2024-10-16 15:58   ` Eugene Loh
  0 siblings, 0 replies; 9+ messages in thread
From: Eugene Loh @ 2024-10-16 15:58 UTC (permalink / raw)
  To: dtrace, dtrace-devel

I am withdrawing this patch;  the rest of the 6-patch series is still 
good since it deals with tests.  With Kris's suggestion of 
dt_provider_discover(), the "deferred attach" will be absorbed into 
earlier patches.  The revised patches will be:

v2->v3 02/19 Add a hook for a provider-specific "update" function
     and will be renamed "Add a dt_provider_discover() function"

v5->v6 07/19 Create the BPF usdt_names and usdt_prids maps

v3->v4 09/19 Use usdt_prids map to call clauses conditionally for USDT 
probes

v2->v3 14/19 Ignore clauses in USDT trampoline if we know they are 
impossible

On 10/9/24 01:45, eugene.loh@oracle.com wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
>
> We would like dtrace to trace USDT processes that start
> even after the dtrace session has been launched.  In that
> case, the underlying probe cannot be attached when dtrace
> starts up;  rather, the BPF program must be attached once
> the USDT process has been detected.
>
> Therefore:
>
> *)  Make dt_bpf_load_prog() callable outside of dt_bpf.c.
>
> *)  Have update_uprobe() call dt_construct(), dt_link(),
>      dt_bpf_load_prog(), and attach() for any new underlying
>      probes.
>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-10-16 15:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09  5:45 [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps eugene.loh
2024-10-09  5:45 ` [PATCH v2 1/6] Deferred attach of underlying USDT probes eugene.loh
2024-10-16 15:58   ` Eugene Loh
2024-10-10  4:55 ` [PATCH v5 07/19] Create the BPF usdt_names and usdt_prids maps Kris Van Hees
2024-10-10 20:58   ` Kris Van Hees
2024-10-11 22:56   ` Eugene Loh
2024-10-11 23:53     ` [DTrace-devel] " Kris Van Hees
2024-10-12  4:04       ` Eugene Loh
2024-10-12  4:11         ` Kris Van Hees

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox