* [PATCH v2 02/19] Add a hook for a provider-specific "update" function
@ 2024-09-28 2:15 eugene.loh
2024-09-28 2:15 ` [PATCH v4 07/19] Create the BPF usdt_prids map eugene.loh
0 siblings, 1 reply; 7+ messages in thread
From: eugene.loh @ 2024-09-28 2:15 UTC (permalink / raw)
To: dtrace, dtrace-devel
From: Eugene Loh <eugene.loh@oracle.com>
For up-coming USDT-probe support, we need to update a BPF map
-- at least when the dtrace session starts but possibly also later
to support systemwide USDT tracing for processes that may start up
later.
One way to do this is with a USDT-specific update function.
For now, let's add a hook for providers to have provider-specific
update functions. User space can either call
for (i = 0; i < ARRAY_SIZE(dt_providers); i++) {
if (dt_providers[i]->update)
dt_providers[i]->update(...);
}
any time it likes. Or it can call dt_usdt.update(...).
This is for WIP. A different approach can be adopted later instead.
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
libdtrace/dt_provider.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/libdtrace/dt_provider.h b/libdtrace/dt_provider.h
index 17b1844cb..92df0cd28 100644
--- a/libdtrace/dt_provider.h
+++ b/libdtrace/dt_provider.h
@@ -68,6 +68,8 @@ typedef struct dt_provimpl {
void *datap);
void (*destroy)(dtrace_hdl_t *dtp, /* free provider data */
void *datap);
+ int (*update)(dtrace_hdl_t *dtp, /* update provider-specific info */
+ void *datap);
} dt_provimpl_t;
/* list dt_dtrace first */
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 07/19] Create the BPF usdt_prids map
2024-09-28 2:15 [PATCH v2 02/19] Add a hook for a provider-specific "update" function eugene.loh
@ 2024-09-28 2:15 ` eugene.loh
2024-10-03 5:04 ` [DTrace-devel] " Kris Van Hees
0 siblings, 1 reply; 7+ messages in thread
From: eugene.loh @ 2024-09-28 2:15 UTC (permalink / raw)
To: dtrace, dtrace-devel
From: Eugene Loh <eugene.loh@oracle.com>
As USDT processes come and go, the overlying probes for an underlying
probe will change. Hence, we will move to a scheme in which an
underlying probe will walk all possible clauses it could call,
deciding at run time (using a bit mask for the overlying USDT probe)
which of the clauses to call.
In this patch, we create and update the BPF "usdt_prids" map. This
is a hash map, where:
*) the key (size: dtp->dt_usdt_pridsmap_ksz) comprises
- the PID of the firing process
- the PRID of the underlying probe
*) the value (size: dtp->dt_usdt_pridsmap_vsz) comprises
- the PRID over the overlying USDT probe
- a bit mask indicating which clauses should be called
As processes start up, we also add new overlying USDT probes,
requiring updates of the BPF "probes" and "strtab" maps and of
the list of enablings.
The underlying-probes "update" function is called sporadically.
The focus of this patch is the function. When it is called can
be tuned in future patches.
This patch guesses certain sizes very crudely. These sizes should
be handled more robustly in future patches:
*) The allocated size of the string table. For example,
new provider names have to be added as new processes
start.
*) The number of entries in the BPF "usdt_prids" map.
There is a little relief here in that, as processes
terminate, they can be removed from the map.
*) The size of the bit mask -- that is, the greatest
number of clauses an underlying probe might call.
This is relatively easy to extend; nevertheless,
that work is left for a future patch.
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
libdtrace/dt_bpf.c | 43 +++++-
libdtrace/dt_cc.c | 2 +-
libdtrace/dt_dlibs.c | 1 +
libdtrace/dt_impl.h | 6 +
libdtrace/dt_prov_uprobe.c | 278 +++++++++++++++++++++++++++++++++++++
libdtrace/dt_work.c | 3 +
6 files changed, 331 insertions(+), 2 deletions(-)
diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
index ad11d178e..4acf6e494 100644
--- a/libdtrace/dt_bpf.c
+++ b/libdtrace/dt_bpf.c
@@ -841,7 +841,8 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
int fd, rc, err;
dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
- dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
+ dtp->dt_strmax = dtp->dt_strlen + 1000; // FIXME pad some arbitrary amount to account for new USDT probes as new processes start
+ dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strmax, 8);
dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
dtp->dt_zerosize = 0;
@@ -899,6 +900,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
if (rc == -1)
return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n",
strerror(err));
+ dtp->dt_strtabmap_fd = fd;
return 0;
}
@@ -936,6 +938,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
dtp, "cannot update BPF map 'probes': %s\n",
strerror(errno));
}
+ dtp->dt_probesmap_fd = fd;
+
+ return 0;
+}
+
+/*
+ * Create the 'usdt_prids' BPF map.
+ *
+ * USDT-PRID information map. This is a global hash map for use
+ * with USDT probes. It is indexed by (pid, underlying probe ID).
+ * The value is a probe ID for the overlying USDT probe and a bit
+ * mask indicating which clauses to execute for this pid.
+ *
+ * WIP. Just make up some sizes for now.
+ *
+ * How big is a probe ID? Sometimes, it's a dtrace_id_t.
+ * And uts/common/sys/dtrace_types.h gives us "typedef uint32_t dtrace_id_t".
+ * Meanwhile, libdtrace/dt_impl.h has "uint32_t dt_probe_id".
+ * So either uint32_t or dtrace_id_t is fine.
+ *
+ * How big should our bit mask be? Start with 8*sizeof(long long) bits.
+ *
+ * How many entries should we allow? Start with 1000.
+ *
+ * Note that for a given (pid, PRID) key, there can be at most one
+ * overlying USDT probe.
+ */
+static int
+gmap_create_usdt_prids(dtrace_hdl_t *dtp)
+{
+ dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
+ dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz, 1000);
+ if (dtp->dt_usdt_pridsmap_fd == -1)
+ return -1;
+
+ /* Populate the newly created map. FIXME: this is probably not the right place for this. */
+ if (dt_uprobe.update(dtp, NULL) < 0)
+ return -1;
return 0;
}
@@ -1045,6 +1085,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
CREATE_MAP(scratchmem)
CREATE_MAP(strtab)
CREATE_MAP(probes)
+ CREATE_MAP(usdt_prids)
CREATE_MAP(gvars)
CREATE_MAP(lvars)
CREATE_MAP(dvars)
diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
index 4202771a9..12104fc21 100644
--- a/libdtrace/dt_cc.c
+++ b/libdtrace/dt_cc.c
@@ -1045,7 +1045,7 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
nrp->dofr_data = 0; /* FIXME */
continue;
case DT_CONST_STBSZ:
- nrp->dofr_data = dtp->dt_strlen;
+ nrp->dofr_data = dtp->dt_strmax;
continue;
case DT_CONST_STRSZ:
nrp->dofr_data =
diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
index ba4d4abef..924cf11e4 100644
--- a/libdtrace/dt_dlibs.c
+++ b/libdtrace/dt_dlibs.c
@@ -66,6 +66,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
DT_BPF_SYMBOL(mem, DT_IDENT_PTR),
DT_BPF_SYMBOL(probes, DT_IDENT_PTR),
+ DT_BPF_SYMBOL(usdt_prids, DT_IDENT_PTR),
DT_BPF_SYMBOL(scratchmem, DT_IDENT_PTR),
DT_BPF_SYMBOL(specs, DT_IDENT_PTR),
DT_BPF_SYMBOL(state, DT_IDENT_PTR),
diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
index 340dc1960..1d1248766 100644
--- a/libdtrace/dt_impl.h
+++ b/libdtrace/dt_impl.h
@@ -282,6 +282,7 @@ struct dtrace_hdl {
dt_strtab_t *dt_ccstab; /* global string table (during compilation) */
dt_rodata_t *dt_rodata; /* global read-only data */
uint_t dt_strlen; /* global string table (runtime) size */
+ uint_t dt_strmax; /* global string table (runtime) size, allocated */
uint_t dt_rooffset; /* read-only data offset */
uint_t dt_rosize; /* read-only data size */
uint_t dt_zerooffset; /* zero region, offset */
@@ -389,6 +390,11 @@ struct dtrace_hdl {
int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
+ int dt_strtabmap_fd; /* file descriptor for the 'strtab' BPF map */
+ int dt_probesmap_fd; /* file descriptor for the 'probes' BPF map */
+ int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
+ int dt_usdt_pridsmap_ksz; /* 'usdt_prids' BPF map key size */
+ int dt_usdt_pridsmap_vsz; /* 'usdt_prids' BPF map value size */
dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
void *dt_errarg; /* error handler argument */
dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
index bb172ace2..d99915fa2 100644
--- a/libdtrace/dt_prov_uprobe.c
+++ b/libdtrace/dt_prov_uprobe.c
@@ -37,8 +37,10 @@
#include "dt_list.h"
#include "dt_provider_tp.h"
#include "dt_probe.h"
+#include "dt_program.h"
#include "dt_pid.h"
#include "dt_string.h"
+#include "port.h"
/* Provider name for the underlying probes. */
static const char prvname[] = "uprobe";
@@ -63,6 +65,15 @@ typedef struct list_probe {
dt_probe_t *probe;
} list_probe_t;
+typedef struct usdt_prids_map_key {
+ pid_t pid;
+ dtrace_id_t uprid;
+} usdt_prids_map_key_t;
+typedef struct usdt_prids_map_val {
+ dtrace_id_t prid;
+ long long mask;
+} usdt_prids_map_val_t;
+
static const dtrace_pattr_t pattr = {
{ DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
{ DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
@@ -77,6 +88,9 @@ dt_provimpl_t dt_usdt;
static int populate(dtrace_hdl_t *dtp)
{
+ dtp->dt_usdt_pridsmap_ksz = sizeof(usdt_prids_map_key_t);
+ dtp->dt_usdt_pridsmap_vsz = sizeof(usdt_prids_map_val_t);
+
if (dt_provider_create(dtp, dt_uprobe.name, &dt_uprobe, &pattr,
NULL) == NULL ||
dt_provider_create(dtp, dt_uprobe_is_enabled.name,
@@ -123,6 +137,269 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
free_probe_list(dtp, datap);
}
+/*
+ * Clean up the BPF "usdt_prids" map.
+ */
+static int
+purge_BPFmap(dtrace_hdl_t *dtp)
+{
+ int fd = dtp->dt_usdt_pridsmap_fd;
+ usdt_prids_map_key_t key, nxt;
+ usdt_prids_map_val_t val;
+
+ /* Initialize key to a pid/uprid that cannot be found. */
+ key.pid = 0;
+ key.uprid = 0;
+
+ /* Loop over all entries. */
+ while (dt_bpf_map_next_key(fd, &key, &nxt) == 0) {
+ memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
+
+ if (dt_bpf_map_lookup(fd, &key, &val) == -1)
+ return dt_set_errno(dtp, EDT_BPF);
+
+ /* Check if the process is still running. */
+ if (!Pexists(key.pid)) {
+ dt_bpf_map_delete(fd, &key); // FIXME: bpf_map_next_key() iteration restarts each time we delete an elem!!!
+ continue;
+ }
+
+ /*
+ * FIXME. There might be another case, where the process
+ * is still running, but some of its USDT probes are gone?
+ * So maybe we have to check for the existence of one of
+ * dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
+ * char *prv = ...pdp->prv minus the numerial part;
+ *
+ * /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
+ * /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
+ * /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
+ */
+ }
+
+ return 0;
+}
+
+/*
+ * Grow the string table map. It includes the string table (which might
+ * grow due to new USDT probes). It also includes the read-only data and
+ * the block of zeros, but these remain fixed.
+ *
+ * FIXME Is there a danger of the BPF map update colliding with map reads?
+ */
+static int
+grow_strtab(dtrace_hdl_t *dtp)
+{
+ size_t sz = dtp->dt_zerooffset + dtp->dt_zerosize;
+ char *strtab;
+ uint8_t *buf, *end;
+ size_t strsize = dtp->dt_options[DTRACEOPT_STRSIZE];
+ uint32_t key = 0;
+ int rc, err;
+
+ strtab = dt_zalloc(dtp, sz);
+#if 0
+ // FIXME do something
+ if (strtab == NULL)
+ return dt_set_errno(dtp, EDT_NOMEM);
+#endif
+
+ /* Copy the string table data. */
+ dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr, strtab);
+
+ /* Loop over the string table and truncate strings that are too long. */
+ buf = (uint8_t *)strtab;
+ end = buf + dtp->dt_strlen;
+ while (buf < end) {
+ uint_t len = strlen((char *)buf);
+
+ if (len > strsize)
+ buf[strsize] = '\0';
+
+ buf += len + 1;
+ }
+
+ /* Copy the read-only data. */
+ dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy, strtab + dtp->dt_rooffset);
+
+ rc = dt_bpf_map_update(dtp->dt_strtabmap_fd, &key, strtab);
+ err = errno;
+ dt_free(dtp, strtab);
+
+ // FIXME do something
+ if (rc == -1) {
+ strerror(err);
+ return -1;
+ // return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n", strerror(err));
+ }
+
+ return 0;
+}
+
+/*
+ * Update the uprobe provider.
+ */
+static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
+{
+ dt_probe_t *prp;
+ dt_probe_t *prp_next;
+ int i, prid = dtp->dt_probe_id;
+ uint_t old_strlen = dtp->dt_strlen;
+ dt_pcb_t pcb;
+
+ /* Clear stale pids. */
+ purge_BPFmap(dtp);
+
+ /* Update dt_probes[] and dt_enablings. */
+ /*
+ * pcb is only used inside of dt_pid_error() to get:
+ * pcb->pcb_region
+ * pcb->pcb_filetag
+ * pcb->pcb_fileptr
+ * While pcb cannot be NULL, these other things apparently can be.
+ */
+ memset(&pcb, 0, sizeof(dt_pcb_t));
+ for (i = 0; i < dtp->dt_stmt_nextid; i++) {
+ dtrace_stmtdesc_t *stp;
+
+ stp = dtp->dt_stmts[i];
+ assert(stp != NULL);
+ dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
+ }
+
+ while (prid < dtp->dt_probe_id) {
+ dt_bpf_probe_t pinfo;
+ const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
+ int fd = dtp->dt_probesmap_fd;
+
+ dt_probe_enable(dtp, dtp->dt_probes[prid]);
+
+ pinfo.prv = dt_strtab_index(dtp->dt_ccstab, pdp->prv);
+ pinfo.mod = dt_strtab_index(dtp->dt_ccstab, pdp->mod);
+ pinfo.fun = dt_strtab_index(dtp->dt_ccstab, pdp->fun);
+ pinfo.prb = dt_strtab_index(dtp->dt_ccstab, pdp->prb);
+
+ if (dt_bpf_map_update(fd, &pdp->id, &pinfo) == -1)
+ assert(0); // FIXME do something here
+
+ prid++;
+ }
+
+ /* Grow the strtab if it has gotten bigger. */
+ dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
+
+ if (dtp->dt_strlen > dtp->dt_strmax)
+ assert(0); // FIXME handle this more gracefully
+ if (dtp->dt_strlen > old_strlen)
+ grow_strtab(dtp);
+
+ /* Review enablings. */
+ for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
+ pid_t pid;
+ list_probe_t *pup;
+
+ prp_next = dt_list_next(prp);
+
+ /* Make sure it is an overlying USDT probe. */
+ if (prp->prov->impl != &dt_usdt)
+ continue;
+
+ /* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
+ /*
+ * Nick writes:
+ * This is a general problem with running compiler-adjacent things outside
+ * compile time. I think we should adjust dt_pid_error() so that it works
+ * with NULL pcb and dpr at once, probably by using the code path for
+ * pcb != NULL and augmenting it so that it passes in NULL for the region and
+ * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
+ * already handle this case.)
+ */
+ pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
+
+ if (!Pexists(pid)) {
+ /* Remove from enablings. */
+ dt_list_delete(&dtp->dt_enablings, prp);
+
+ /* Make it evident from the probe that it is not in enablings. */
+ ((dt_list_t *)prp)->dl_prev = NULL;
+ ((dt_list_t *)prp)->dl_next = NULL;
+
+ /* Free up its list of underlying probes. */
+ while ((pup = dt_list_next(prp->prv_data)) != NULL) {
+ dt_list_delete(prp->prv_data, pup);
+ dt_free(dtp, pup);
+ }
+ dt_free(dtp, prp->prv_data);
+ prp->prv_data = NULL;
+
+ /* Free up BPF "probes" map entry. */
+ dt_bpf_map_delete(dtp->dt_probesmap_fd, &prp->desc->id);
+
+ continue;
+ }
+
+ for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
+ dt_probe_t *uprp = pup->probe;
+ long long mask = 0, bit = 1;
+ usdt_prids_map_key_t key;
+ usdt_prids_map_val_t val, oval;
+ dt_uprobe_t *upp = uprp->prv_data;
+
+ /*
+ * For is-enabled probes, the bit mask does not matter.
+ * It is possible that we have this underlying probe due to
+ * an overlying pid-offset probe and that we will not know
+ * until later, when some new pid is created, that we also
+ * have an overlying USDT is-enabled probe, but missing this
+ * optimization opportunity is okay.
+ */
+ if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
+ int n;
+
+ for (n = 0; n < dtp->dt_stmt_nextid; n++) {
+ dtrace_stmtdesc_t *stp;
+
+ stp = dtp->dt_stmts[n];
+ assert(stp != NULL);
+
+ if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
+ dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
+ dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
+ dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
+ mask |= bit;
+
+ bit <<= 1;
+ }
+ }
+
+ key.pid = pid;
+ key.uprid = uprp->desc->id;
+
+ val.prid = prp->desc->id;
+ val.mask = mask;
+
+ /* linux/bpf.h says error is -1, but it could be -2 */
+ if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
+ /*
+ * Set the map entry.
+ */
+ // FIXME Check return value, but how should errors be handled?
+ dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
+ } else if (val.prid != oval.prid || val.mask != oval.mask) {
+ /*
+ * This can happen when two overlying probes map to the same underlying probe for the same pid.
+ * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
+ */
+ } else {
+ /*
+ * Nothing to do, it already is in the map.
+ */
+ }
+ }
+ }
+
+ return 0;
+}
/*
* Look up or create an underlying (real) probe, corresponding directly to a
@@ -782,6 +1059,7 @@ dt_provimpl_t dt_uprobe = {
.probe_info = &probe_info,
.detach = &detach,
.probe_destroy = &probe_destroy_underlying,
+ .update = &update_uprobe,
};
/*
diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
index 40913a0f1..0efad3633 100644
--- a/libdtrace/dt_work.c
+++ b/libdtrace/dt_work.c
@@ -368,6 +368,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
dtrace_workstatus_t rval;
int gen;
+ if (dt_uprobe.update(dtp, NULL) < 0)
+ return DTRACE_WORKSTATUS_ERROR;
+
switch (dtrace_status(dtp)) {
case DTRACE_STATUS_EXITED:
case DTRACE_STATUS_STOPPED:
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [DTrace-devel] [PATCH v4 07/19] Create the BPF usdt_prids map
2024-09-28 2:15 ` [PATCH v4 07/19] Create the BPF usdt_prids map eugene.loh
@ 2024-10-03 5:04 ` Kris Van Hees
2024-10-06 1:11 ` Eugene Loh
0 siblings, 1 reply; 7+ messages in thread
From: Kris Van Hees @ 2024-10-03 5:04 UTC (permalink / raw)
To: eugene.loh; +Cc: dtrace, dtrace-devel
The main problem with the current implementation is the fact that the strtab
BPF map is being updated while it is being used. That is not a safe operation,
and can have unpredictable behaviour depending on the actual implementation at
the kernel level. We cannot cuont on it being safe.
I propose introducing a usdtsize option (or whatever we name it) that sets the
number of USDT probes that we can support. It should be used for the size of
the usdt_prids map.
In order to be able to provide the data for probeprov, probemod, probefunc, and
probename, we need to have a way to store probe name elements dynamically. That
can be done using a usdt_probes hashmap that stores a string (128 bytes max) and
is indexed by an int. The bvara code will need to know to consult this new
BPF map to get the data. I can see two options:
1) increase the probes BPF map by usdtsize extra elements, and set the highest
order bit to indicate that the value for a probe name element should be
retrieved from the usdt_probes hashmap.
2) add probe name element key values to the usdt_prids map
Approach 1 uses the existing the mechanism for storing this data, but adds some
(very minor) performance hit to all probeprov, probemod, probefunc, probename
bvar lookups because we need to check for the highest order bit.
Approach 2 means we have an alternative place where we store the probe name
element data, but it remains localized with the other USDT probe data *and*
the performance impact is limited to USDT probes only (if the prid is not
found in the probes map, we need to look in the usdt_prids map).
On Fri, Sep 27, 2024 at 10:15:25PM -0400, eugene.loh--- via DTrace-devel wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
>
> As USDT processes come and go, the overlying probes for an underlying
> probe will change. Hence, we will move to a scheme in which an
> underlying probe will walk all possible clauses it could call,
> deciding at run time (using a bit mask for the overlying USDT probe)
> which of the clauses to call.
>
> In this patch, we create and update the BPF "usdt_prids" map. This
> is a hash map, where:
>
> *) the key (size: dtp->dt_usdt_pridsmap_ksz) comprises
> - the PID of the firing process
> - the PRID of the underlying probe
>
> *) the value (size: dtp->dt_usdt_pridsmap_vsz) comprises
> - the PRID over the overlying USDT probe
> - a bit mask indicating which clauses should be called
>
> As processes start up, we also add new overlying USDT probes,
> requiring updates of the BPF "probes" and "strtab" maps and of
> the list of enablings.
>
> The underlying-probes "update" function is called sporadically.
> The focus of this patch is the function. When it is called can
> be tuned in future patches.
>
> This patch guesses certain sizes very crudely. These sizes should
> be handled more robustly in future patches:
>
> *) The allocated size of the string table. For example,
> new provider names have to be added as new processes
> start.
>
> *) The number of entries in the BPF "usdt_prids" map.
> There is a little relief here in that, as processes
> terminate, they can be removed from the map.
>
> *) The size of the bit mask -- that is, the greatest
> number of clauses an underlying probe might call.
> This is relatively easy to extend; nevertheless,
> that work is left for a future patch.
>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> ---
> libdtrace/dt_bpf.c | 43 +++++-
> libdtrace/dt_cc.c | 2 +-
> libdtrace/dt_dlibs.c | 1 +
> libdtrace/dt_impl.h | 6 +
> libdtrace/dt_prov_uprobe.c | 278 +++++++++++++++++++++++++++++++++++++
> libdtrace/dt_work.c | 3 +
> 6 files changed, 331 insertions(+), 2 deletions(-)
>
> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> index ad11d178e..4acf6e494 100644
> --- a/libdtrace/dt_bpf.c
> +++ b/libdtrace/dt_bpf.c
> @@ -841,7 +841,8 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> int fd, rc, err;
>
> dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> - dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> + dtp->dt_strmax = dtp->dt_strlen + 1000; // FIXME pad some arbitrary amount to account for new USDT probes as new processes start
> + dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strmax, 8);
> dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
> dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
> dtp->dt_zerosize = 0;
> @@ -899,6 +900,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> if (rc == -1)
> return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n",
> strerror(err));
> + dtp->dt_strtabmap_fd = fd;
>
> return 0;
> }
> @@ -936,6 +938,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
> dtp, "cannot update BPF map 'probes': %s\n",
> strerror(errno));
> }
> + dtp->dt_probesmap_fd = fd;
> +
> + return 0;
> +}
> +
> +/*
> + * Create the 'usdt_prids' BPF map.
> + *
> + * USDT-PRID information map. This is a global hash map for use
> + * with USDT probes. It is indexed by (pid, underlying probe ID).
> + * The value is a probe ID for the overlying USDT probe and a bit
> + * mask indicating which clauses to execute for this pid.
> + *
> + * WIP. Just make up some sizes for now.
> + *
> + * How big is a probe ID? Sometimes, it's a dtrace_id_t.
> + * And uts/common/sys/dtrace_types.h gives us "typedef uint32_t dtrace_id_t".
> + * Meanwhile, libdtrace/dt_impl.h has "uint32_t dt_probe_id".
> + * So either uint32_t or dtrace_id_t is fine.
> + *
> + * How big should our bit mask be? Start with 8*sizeof(long long) bits.
> + *
> + * How many entries should we allow? Start with 1000.
> + *
> + * Note that for a given (pid, PRID) key, there can be at most one
> + * overlying USDT probe.
> + */
> +static int
> +gmap_create_usdt_prids(dtrace_hdl_t *dtp)
> +{
> + dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
> + dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz, 1000);
> + if (dtp->dt_usdt_pridsmap_fd == -1)
> + return -1;
> +
> + /* Populate the newly created map. FIXME: this is probably not the right place for this. */
> + if (dt_uprobe.update(dtp, NULL) < 0)
> + return -1;
>
> return 0;
> }
> @@ -1045,6 +1085,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
> CREATE_MAP(scratchmem)
> CREATE_MAP(strtab)
> CREATE_MAP(probes)
> + CREATE_MAP(usdt_prids)
> CREATE_MAP(gvars)
> CREATE_MAP(lvars)
> CREATE_MAP(dvars)
> diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
> index 4202771a9..12104fc21 100644
> --- a/libdtrace/dt_cc.c
> +++ b/libdtrace/dt_cc.c
> @@ -1045,7 +1045,7 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
> nrp->dofr_data = 0; /* FIXME */
> continue;
> case DT_CONST_STBSZ:
> - nrp->dofr_data = dtp->dt_strlen;
> + nrp->dofr_data = dtp->dt_strmax;
> continue;
> case DT_CONST_STRSZ:
> nrp->dofr_data =
> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> index ba4d4abef..924cf11e4 100644
> --- a/libdtrace/dt_dlibs.c
> +++ b/libdtrace/dt_dlibs.c
> @@ -66,6 +66,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
> DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
> DT_BPF_SYMBOL(mem, DT_IDENT_PTR),
> DT_BPF_SYMBOL(probes, DT_IDENT_PTR),
> + DT_BPF_SYMBOL(usdt_prids, DT_IDENT_PTR),
> DT_BPF_SYMBOL(scratchmem, DT_IDENT_PTR),
> DT_BPF_SYMBOL(specs, DT_IDENT_PTR),
> DT_BPF_SYMBOL(state, DT_IDENT_PTR),
> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> index 340dc1960..1d1248766 100644
> --- a/libdtrace/dt_impl.h
> +++ b/libdtrace/dt_impl.h
> @@ -282,6 +282,7 @@ struct dtrace_hdl {
> dt_strtab_t *dt_ccstab; /* global string table (during compilation) */
> dt_rodata_t *dt_rodata; /* global read-only data */
> uint_t dt_strlen; /* global string table (runtime) size */
> + uint_t dt_strmax; /* global string table (runtime) size, allocated */
> uint_t dt_rooffset; /* read-only data offset */
> uint_t dt_rosize; /* read-only data size */
> uint_t dt_zerooffset; /* zero region, offset */
> @@ -389,6 +390,11 @@ struct dtrace_hdl {
> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> + int dt_strtabmap_fd; /* file descriptor for the 'strtab' BPF map */
> + int dt_probesmap_fd; /* file descriptor for the 'probes' BPF map */
> + int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> + int dt_usdt_pridsmap_ksz; /* 'usdt_prids' BPF map key size */
> + int dt_usdt_pridsmap_vsz; /* 'usdt_prids' BPF map value size */
> dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
> void *dt_errarg; /* error handler argument */
> dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> index bb172ace2..d99915fa2 100644
> --- a/libdtrace/dt_prov_uprobe.c
> +++ b/libdtrace/dt_prov_uprobe.c
> @@ -37,8 +37,10 @@
> #include "dt_list.h"
> #include "dt_provider_tp.h"
> #include "dt_probe.h"
> +#include "dt_program.h"
> #include "dt_pid.h"
> #include "dt_string.h"
> +#include "port.h"
>
> /* Provider name for the underlying probes. */
> static const char prvname[] = "uprobe";
> @@ -63,6 +65,15 @@ typedef struct list_probe {
> dt_probe_t *probe;
> } list_probe_t;
>
> +typedef struct usdt_prids_map_key {
> + pid_t pid;
> + dtrace_id_t uprid;
> +} usdt_prids_map_key_t;
> +typedef struct usdt_prids_map_val {
> + dtrace_id_t prid;
> + long long mask;
> +} usdt_prids_map_val_t;
> +
> static const dtrace_pattr_t pattr = {
> { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> @@ -77,6 +88,9 @@ dt_provimpl_t dt_usdt;
>
> static int populate(dtrace_hdl_t *dtp)
> {
> + dtp->dt_usdt_pridsmap_ksz = sizeof(usdt_prids_map_key_t);
> + dtp->dt_usdt_pridsmap_vsz = sizeof(usdt_prids_map_val_t);
> +
> if (dt_provider_create(dtp, dt_uprobe.name, &dt_uprobe, &pattr,
> NULL) == NULL ||
> dt_provider_create(dtp, dt_uprobe_is_enabled.name,
> @@ -123,6 +137,269 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> free_probe_list(dtp, datap);
> }
>
> +/*
> + * Clean up the BPF "usdt_prids" map.
> + */
> +static int
> +purge_BPFmap(dtrace_hdl_t *dtp)
> +{
> + int fd = dtp->dt_usdt_pridsmap_fd;
> + usdt_prids_map_key_t key, nxt;
> + usdt_prids_map_val_t val;
> +
> + /* Initialize key to a pid/uprid that cannot be found. */
> + key.pid = 0;
> + key.uprid = 0;
> +
> + /* Loop over all entries. */
> + while (dt_bpf_map_next_key(fd, &key, &nxt) == 0) {
> + memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
> +
> + if (dt_bpf_map_lookup(fd, &key, &val) == -1)
> + return dt_set_errno(dtp, EDT_BPF);
> +
> + /* Check if the process is still running. */
> + if (!Pexists(key.pid)) {
> + dt_bpf_map_delete(fd, &key); // FIXME: bpf_map_next_key() iteration restarts each time we delete an elem!!!
> + continue;
> + }
> +
> + /*
> + * FIXME. There might be another case, where the process
> + * is still running, but some of its USDT probes are gone?
> + * So maybe we have to check for the existence of one of
> + * dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
> + * char *prv = ...pdp->prv minus the numerial part;
> + *
> + * /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
> + * /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> + * /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> + */
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Grow the string table map. It includes the string table (which might
> + * grow due to new USDT probes). It also includes the read-only data and
> + * the block of zeros, but these remain fixed.
> + *
> + * FIXME Is there a danger of the BPF map update colliding with map reads?
> + */
> +static int
> +grow_strtab(dtrace_hdl_t *dtp)
> +{
> + size_t sz = dtp->dt_zerooffset + dtp->dt_zerosize;
> + char *strtab;
> + uint8_t *buf, *end;
> + size_t strsize = dtp->dt_options[DTRACEOPT_STRSIZE];
> + uint32_t key = 0;
> + int rc, err;
> +
> + strtab = dt_zalloc(dtp, sz);
> +#if 0
> + // FIXME do something
> + if (strtab == NULL)
> + return dt_set_errno(dtp, EDT_NOMEM);
> +#endif
> +
> + /* Copy the string table data. */
> + dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr, strtab);
> +
> + /* Loop over the string table and truncate strings that are too long. */
> + buf = (uint8_t *)strtab;
> + end = buf + dtp->dt_strlen;
> + while (buf < end) {
> + uint_t len = strlen((char *)buf);
> +
> + if (len > strsize)
> + buf[strsize] = '\0';
> +
> + buf += len + 1;
> + }
> +
> + /* Copy the read-only data. */
> + dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy, strtab + dtp->dt_rooffset);
> +
> + rc = dt_bpf_map_update(dtp->dt_strtabmap_fd, &key, strtab);
> + err = errno;
> + dt_free(dtp, strtab);
> +
> + // FIXME do something
> + if (rc == -1) {
> + strerror(err);
> + return -1;
> + // return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n", strerror(err));
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Update the uprobe provider.
> + */
> +static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
> +{
> + dt_probe_t *prp;
> + dt_probe_t *prp_next;
> + int i, prid = dtp->dt_probe_id;
> + uint_t old_strlen = dtp->dt_strlen;
> + dt_pcb_t pcb;
> +
> + /* Clear stale pids. */
> + purge_BPFmap(dtp);
> +
> + /* Update dt_probes[] and dt_enablings. */
> + /*
> + * pcb is only used inside of dt_pid_error() to get:
> + * pcb->pcb_region
> + * pcb->pcb_filetag
> + * pcb->pcb_fileptr
> + * While pcb cannot be NULL, these other things apparently can be.
> + */
> + memset(&pcb, 0, sizeof(dt_pcb_t));
> + for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> + dtrace_stmtdesc_t *stp;
> +
> + stp = dtp->dt_stmts[i];
> + assert(stp != NULL);
> + dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
> + }
> +
> + while (prid < dtp->dt_probe_id) {
> + dt_bpf_probe_t pinfo;
> + const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
> + int fd = dtp->dt_probesmap_fd;
> +
> + dt_probe_enable(dtp, dtp->dt_probes[prid]);
> +
> + pinfo.prv = dt_strtab_index(dtp->dt_ccstab, pdp->prv);
> + pinfo.mod = dt_strtab_index(dtp->dt_ccstab, pdp->mod);
> + pinfo.fun = dt_strtab_index(dtp->dt_ccstab, pdp->fun);
> + pinfo.prb = dt_strtab_index(dtp->dt_ccstab, pdp->prb);
> +
> + if (dt_bpf_map_update(fd, &pdp->id, &pinfo) == -1)
> + assert(0); // FIXME do something here
> +
> + prid++;
> + }
> +
> + /* Grow the strtab if it has gotten bigger. */
> + dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> +
> + if (dtp->dt_strlen > dtp->dt_strmax)
> + assert(0); // FIXME handle this more gracefully
> + if (dtp->dt_strlen > old_strlen)
> + grow_strtab(dtp);
> +
> + /* Review enablings. */
> + for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
> + pid_t pid;
> + list_probe_t *pup;
> +
> + prp_next = dt_list_next(prp);
> +
> + /* Make sure it is an overlying USDT probe. */
> + if (prp->prov->impl != &dt_usdt)
> + continue;
> +
> + /* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
> + /*
> + * Nick writes:
> + * This is a general problem with running compiler-adjacent things outside
> + * compile time. I think we should adjust dt_pid_error() so that it works
> + * with NULL pcb and dpr at once, probably by using the code path for
> + * pcb != NULL and augmenting it so that it passes in NULL for the region and
> + * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
> + * already handle this case.)
> + */
> + pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
> +
> + if (!Pexists(pid)) {
> + /* Remove from enablings. */
> + dt_list_delete(&dtp->dt_enablings, prp);
> +
> + /* Make it evident from the probe that it is not in enablings. */
> + ((dt_list_t *)prp)->dl_prev = NULL;
> + ((dt_list_t *)prp)->dl_next = NULL;
> +
> + /* Free up its list of underlying probes. */
> + while ((pup = dt_list_next(prp->prv_data)) != NULL) {
> + dt_list_delete(prp->prv_data, pup);
> + dt_free(dtp, pup);
> + }
> + dt_free(dtp, prp->prv_data);
> + prp->prv_data = NULL;
> +
> + /* Free up BPF "probes" map entry. */
> + dt_bpf_map_delete(dtp->dt_probesmap_fd, &prp->desc->id);
> +
> + continue;
> + }
> +
> + for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
> + dt_probe_t *uprp = pup->probe;
> + long long mask = 0, bit = 1;
> + usdt_prids_map_key_t key;
> + usdt_prids_map_val_t val, oval;
> + dt_uprobe_t *upp = uprp->prv_data;
> +
> + /*
> + * For is-enabled probes, the bit mask does not matter.
> + * It is possible that we have this underlying probe due to
> + * an overlying pid-offset probe and that we will not know
> + * until later, when some new pid is created, that we also
> + * have an overlying USDT is-enabled probe, but missing this
> + * optimization opportunity is okay.
> + */
> + if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
> + int n;
> +
> + for (n = 0; n < dtp->dt_stmt_nextid; n++) {
> + dtrace_stmtdesc_t *stp;
> +
> + stp = dtp->dt_stmts[n];
> + assert(stp != NULL);
> +
> + if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
> + dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
> + dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
> + dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
> + mask |= bit;
> +
> + bit <<= 1;
> + }
> + }
> +
> + key.pid = pid;
> + key.uprid = uprp->desc->id;
> +
> + val.prid = prp->desc->id;
> + val.mask = mask;
> +
> + /* linux/bpf.h says error is -1, but it could be -2 */
> + if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
> + /*
> + * Set the map entry.
> + */
> + // FIXME Check return value, but how should errors be handled?
> + dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
> + } else if (val.prid != oval.prid || val.mask != oval.mask) {
> + /*
> + * This can happen when two overlying probes map to the same underlying probe for the same pid.
> + * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
> + */
> + } else {
> + /*
> + * Nothing to do, it already is in the map.
> + */
> + }
> + }
> + }
> +
> + return 0;
> +}
>
> /*
> * Look up or create an underlying (real) probe, corresponding directly to a
> @@ -782,6 +1059,7 @@ dt_provimpl_t dt_uprobe = {
> .probe_info = &probe_info,
> .detach = &detach,
> .probe_destroy = &probe_destroy_underlying,
> + .update = &update_uprobe,
> };
>
> /*
> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> index 40913a0f1..0efad3633 100644
> --- a/libdtrace/dt_work.c
> +++ b/libdtrace/dt_work.c
> @@ -368,6 +368,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> dtrace_workstatus_t rval;
> int gen;
>
> + if (dt_uprobe.update(dtp, NULL) < 0)
> + return DTRACE_WORKSTATUS_ERROR;
> +
> switch (dtrace_status(dtp)) {
> case DTRACE_STATUS_EXITED:
> case DTRACE_STATUS_STOPPED:
> --
> 2.43.5
>
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DTrace-devel] [PATCH v4 07/19] Create the BPF usdt_prids map
2024-10-03 5:04 ` [DTrace-devel] " Kris Van Hees
@ 2024-10-06 1:11 ` Eugene Loh
2024-10-07 15:52 ` Kris Van Hees
0 siblings, 1 reply; 7+ messages in thread
From: Eugene Loh @ 2024-10-06 1:11 UTC (permalink / raw)
To: dtrace, dtrace-devel
On 10/3/24 01:04, Kris Van Hees wrote:
> The main problem with the current implementation is the fact that the strtab
> BPF map is being updated while it is being used. That is not a safe operation,
> and can have unpredictable behaviour depending on the actual implementation at
> the kernel level. We cannot cuont on it being safe.
>
> I propose introducing a usdtsize option (or whatever we name it) that sets the
> number of USDT probes that we can support. It should be used for the size of
> the usdt_prids map.
>
> In order to be able to provide the data for probeprov, probemod, probefunc, and
> probename, we need to have a way to store probe name elements dynamically. That
> can be done using a usdt_probes hashmap that stores a string (128 bytes max) and
> is indexed by an int. The bvara code will need to know to consult this new
> BPF map to get the data. I can see two options:
>
> 1) increase the probes BPF map by usdtsize extra elements, and set the highest
> order bit to indicate that the value for a probe name element should be
> retrieved from the usdt_probes hashmap.
>
> 2) add probe name element key values to the usdt_prids map
We usually want to access this by prid, and usdt_prids has prid in its
value rather than in its key.
> Approach 1 uses the existing the mechanism for storing this data, but adds some
> (very minor) performance hit to all probeprov, probemod, probefunc, probename
> bvar lookups because we need to check for the highest order bit.
>
> Approach 2 means we have an alternative place where we store the probe name
> element data, but it remains localized with the other USDT probe data *and*
> the performance impact is limited to USDT probes only (if the prid is not
> found in the probes map, we need to look in the usdt_prids map).
How about this. There are some probes that are already known when we
create the BPF maps. We simply remember how many probe IDs exist at
this time. Thereafter, if we have a probe whose ID is greater, its name
elements go into a new hash map. In get_bvar(), the first thing we do
is look whether the probe ID we're looking up is old (less than the
number of initial probes) or new. That tells us whether to look in the
standard, static hash map or in the new, dynamic hash map. Sound good?
I have this version working. If you agree with the approach, I'll clean
up the patch and post it for review.
Put another way, when we create the BPF maps, we memorize the current
value of dtp->dt_probe_id. I call this dtp->dt_ninitprobes, the number
of initial probes. Thereafter, we use the traditional BPF map when prid
<= dtp->dt_ninitprobes and the new dynamic hash map otherwise.
> On Fri, Sep 27, 2024 at 10:15:25PM -0400, eugene.loh--- via DTrace-devel wrote:
>> From: Eugene Loh <eugene.loh@oracle.com>
>>
>> As USDT processes come and go, the overlying probes for an underlying
>> probe will change. Hence, we will move to a scheme in which an
>> underlying probe will walk all possible clauses it could call,
>> deciding at run time (using a bit mask for the overlying USDT probe)
>> which of the clauses to call.
>>
>> In this patch, we create and update the BPF "usdt_prids" map. This
>> is a hash map, where:
>>
>> *) the key (size: dtp->dt_usdt_pridsmap_ksz) comprises
>> - the PID of the firing process
>> - the PRID of the underlying probe
>>
>> *) the value (size: dtp->dt_usdt_pridsmap_vsz) comprises
>> - the PRID over the overlying USDT probe
>> - a bit mask indicating which clauses should be called
>>
>> As processes start up, we also add new overlying USDT probes,
>> requiring updates of the BPF "probes" and "strtab" maps and of
>> the list of enablings.
>>
>> The underlying-probes "update" function is called sporadically.
>> The focus of this patch is the function. When it is called can
>> be tuned in future patches.
>>
>> This patch guesses certain sizes very crudely. These sizes should
>> be handled more robustly in future patches:
>>
>> *) The allocated size of the string table. For example,
>> new provider names have to be added as new processes
>> start.
>>
>> *) The number of entries in the BPF "usdt_prids" map.
>> There is a little relief here in that, as processes
>> terminate, they can be removed from the map.
>>
>> *) The size of the bit mask -- that is, the greatest
>> number of clauses an underlying probe might call.
>> This is relatively easy to extend; nevertheless,
>> that work is left for a future patch.
>>
>> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
>> ---
>> libdtrace/dt_bpf.c | 43 +++++-
>> libdtrace/dt_cc.c | 2 +-
>> libdtrace/dt_dlibs.c | 1 +
>> libdtrace/dt_impl.h | 6 +
>> libdtrace/dt_prov_uprobe.c | 278 +++++++++++++++++++++++++++++++++++++
>> libdtrace/dt_work.c | 3 +
>> 6 files changed, 331 insertions(+), 2 deletions(-)
>>
>> diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
>> index ad11d178e..4acf6e494 100644
>> --- a/libdtrace/dt_bpf.c
>> +++ b/libdtrace/dt_bpf.c
>> @@ -841,7 +841,8 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
>> int fd, rc, err;
>>
>> dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
>> - dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
>> + dtp->dt_strmax = dtp->dt_strlen + 1000; // FIXME pad some arbitrary amount to account for new USDT probes as new processes start
>> + dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strmax, 8);
>> dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
>> dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
>> dtp->dt_zerosize = 0;
>> @@ -899,6 +900,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
>> if (rc == -1)
>> return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n",
>> strerror(err));
>> + dtp->dt_strtabmap_fd = fd;
>>
>> return 0;
>> }
>> @@ -936,6 +938,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
>> dtp, "cannot update BPF map 'probes': %s\n",
>> strerror(errno));
>> }
>> + dtp->dt_probesmap_fd = fd;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Create the 'usdt_prids' BPF map.
>> + *
>> + * USDT-PRID information map. This is a global hash map for use
>> + * with USDT probes. It is indexed by (pid, underlying probe ID).
>> + * The value is a probe ID for the overlying USDT probe and a bit
>> + * mask indicating which clauses to execute for this pid.
>> + *
>> + * WIP. Just make up some sizes for now.
>> + *
>> + * How big is a probe ID? Sometimes, it's a dtrace_id_t.
>> + * And uts/common/sys/dtrace_types.h gives us "typedef uint32_t dtrace_id_t".
>> + * Meanwhile, libdtrace/dt_impl.h has "uint32_t dt_probe_id".
>> + * So either uint32_t or dtrace_id_t is fine.
>> + *
>> + * How big should our bit mask be? Start with 8*sizeof(long long) bits.
>> + *
>> + * How many entries should we allow? Start with 1000.
>> + *
>> + * Note that for a given (pid, PRID) key, there can be at most one
>> + * overlying USDT probe.
>> + */
>> +static int
>> +gmap_create_usdt_prids(dtrace_hdl_t *dtp)
>> +{
>> + dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
>> + dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz, 1000);
>> + if (dtp->dt_usdt_pridsmap_fd == -1)
>> + return -1;
>> +
>> + /* Populate the newly created map. FIXME: this is probably not the right place for this. */
>> + if (dt_uprobe.update(dtp, NULL) < 0)
>> + return -1;
>>
>> return 0;
>> }
>> @@ -1045,6 +1085,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
>> CREATE_MAP(scratchmem)
>> CREATE_MAP(strtab)
>> CREATE_MAP(probes)
>> + CREATE_MAP(usdt_prids)
>> CREATE_MAP(gvars)
>> CREATE_MAP(lvars)
>> CREATE_MAP(dvars)
>> diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
>> index 4202771a9..12104fc21 100644
>> --- a/libdtrace/dt_cc.c
>> +++ b/libdtrace/dt_cc.c
>> @@ -1045,7 +1045,7 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
>> nrp->dofr_data = 0; /* FIXME */
>> continue;
>> case DT_CONST_STBSZ:
>> - nrp->dofr_data = dtp->dt_strlen;
>> + nrp->dofr_data = dtp->dt_strmax;
>> continue;
>> case DT_CONST_STRSZ:
>> nrp->dofr_data =
>> diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
>> index ba4d4abef..924cf11e4 100644
>> --- a/libdtrace/dt_dlibs.c
>> +++ b/libdtrace/dt_dlibs.c
>> @@ -66,6 +66,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
>> DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(mem, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(probes, DT_IDENT_PTR),
>> + DT_BPF_SYMBOL(usdt_prids, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(scratchmem, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(specs, DT_IDENT_PTR),
>> DT_BPF_SYMBOL(state, DT_IDENT_PTR),
>> diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
>> index 340dc1960..1d1248766 100644
>> --- a/libdtrace/dt_impl.h
>> +++ b/libdtrace/dt_impl.h
>> @@ -282,6 +282,7 @@ struct dtrace_hdl {
>> dt_strtab_t *dt_ccstab; /* global string table (during compilation) */
>> dt_rodata_t *dt_rodata; /* global read-only data */
>> uint_t dt_strlen; /* global string table (runtime) size */
>> + uint_t dt_strmax; /* global string table (runtime) size, allocated */
>> uint_t dt_rooffset; /* read-only data offset */
>> uint_t dt_rosize; /* read-only data size */
>> uint_t dt_zerooffset; /* zero region, offset */
>> @@ -389,6 +390,11 @@ struct dtrace_hdl {
>> int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
>> int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
>> int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
>> + int dt_strtabmap_fd; /* file descriptor for the 'strtab' BPF map */
>> + int dt_probesmap_fd; /* file descriptor for the 'probes' BPF map */
>> + int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
>> + int dt_usdt_pridsmap_ksz; /* 'usdt_prids' BPF map key size */
>> + int dt_usdt_pridsmap_vsz; /* 'usdt_prids' BPF map value size */
>> dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
>> void *dt_errarg; /* error handler argument */
>> dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> index bb172ace2..d99915fa2 100644
>> --- a/libdtrace/dt_prov_uprobe.c
>> +++ b/libdtrace/dt_prov_uprobe.c
>> @@ -37,8 +37,10 @@
>> #include "dt_list.h"
>> #include "dt_provider_tp.h"
>> #include "dt_probe.h"
>> +#include "dt_program.h"
>> #include "dt_pid.h"
>> #include "dt_string.h"
>> +#include "port.h"
>>
>> /* Provider name for the underlying probes. */
>> static const char prvname[] = "uprobe";
>> @@ -63,6 +65,15 @@ typedef struct list_probe {
>> dt_probe_t *probe;
>> } list_probe_t;
>>
>> +typedef struct usdt_prids_map_key {
>> + pid_t pid;
>> + dtrace_id_t uprid;
>> +} usdt_prids_map_key_t;
>> +typedef struct usdt_prids_map_val {
>> + dtrace_id_t prid;
>> + long long mask;
>> +} usdt_prids_map_val_t;
>> +
>> static const dtrace_pattr_t pattr = {
>> { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
>> { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
>> @@ -77,6 +88,9 @@ dt_provimpl_t dt_usdt;
>>
>> static int populate(dtrace_hdl_t *dtp)
>> {
>> + dtp->dt_usdt_pridsmap_ksz = sizeof(usdt_prids_map_key_t);
>> + dtp->dt_usdt_pridsmap_vsz = sizeof(usdt_prids_map_val_t);
>> +
>> if (dt_provider_create(dtp, dt_uprobe.name, &dt_uprobe, &pattr,
>> NULL) == NULL ||
>> dt_provider_create(dtp, dt_uprobe_is_enabled.name,
>> @@ -123,6 +137,269 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
>> free_probe_list(dtp, datap);
>> }
>>
>> +/*
>> + * Clean up the BPF "usdt_prids" map.
>> + */
>> +static int
>> +purge_BPFmap(dtrace_hdl_t *dtp)
>> +{
>> + int fd = dtp->dt_usdt_pridsmap_fd;
>> + usdt_prids_map_key_t key, nxt;
>> + usdt_prids_map_val_t val;
>> +
>> + /* Initialize key to a pid/uprid that cannot be found. */
>> + key.pid = 0;
>> + key.uprid = 0;
>> +
>> + /* Loop over all entries. */
>> + while (dt_bpf_map_next_key(fd, &key, &nxt) == 0) {
>> + memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
>> +
>> + if (dt_bpf_map_lookup(fd, &key, &val) == -1)
>> + return dt_set_errno(dtp, EDT_BPF);
>> +
>> + /* Check if the process is still running. */
>> + if (!Pexists(key.pid)) {
>> + dt_bpf_map_delete(fd, &key); // FIXME: bpf_map_next_key() iteration restarts each time we delete an elem!!!
>> + continue;
>> + }
>> +
>> + /*
>> + * FIXME. There might be another case, where the process
>> + * is still running, but some of its USDT probes are gone?
>> + * So maybe we have to check for the existence of one of
>> + * dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
>> + * char *prv = ...pdp->prv minus the numerial part;
>> + *
>> + * /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
>> + * /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> + * /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
>> + */
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Grow the string table map. It includes the string table (which might
>> + * grow due to new USDT probes). It also includes the read-only data and
>> + * the block of zeros, but these remain fixed.
>> + *
>> + * FIXME Is there a danger of the BPF map update colliding with map reads?
>> + */
>> +static int
>> +grow_strtab(dtrace_hdl_t *dtp)
>> +{
>> + size_t sz = dtp->dt_zerooffset + dtp->dt_zerosize;
>> + char *strtab;
>> + uint8_t *buf, *end;
>> + size_t strsize = dtp->dt_options[DTRACEOPT_STRSIZE];
>> + uint32_t key = 0;
>> + int rc, err;
>> +
>> + strtab = dt_zalloc(dtp, sz);
>> +#if 0
>> + // FIXME do something
>> + if (strtab == NULL)
>> + return dt_set_errno(dtp, EDT_NOMEM);
>> +#endif
>> +
>> + /* Copy the string table data. */
>> + dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr, strtab);
>> +
>> + /* Loop over the string table and truncate strings that are too long. */
>> + buf = (uint8_t *)strtab;
>> + end = buf + dtp->dt_strlen;
>> + while (buf < end) {
>> + uint_t len = strlen((char *)buf);
>> +
>> + if (len > strsize)
>> + buf[strsize] = '\0';
>> +
>> + buf += len + 1;
>> + }
>> +
>> + /* Copy the read-only data. */
>> + dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy, strtab + dtp->dt_rooffset);
>> +
>> + rc = dt_bpf_map_update(dtp->dt_strtabmap_fd, &key, strtab);
>> + err = errno;
>> + dt_free(dtp, strtab);
>> +
>> + // FIXME do something
>> + if (rc == -1) {
>> + strerror(err);
>> + return -1;
>> + // return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n", strerror(err));
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Update the uprobe provider.
>> + */
>> +static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
>> +{
>> + dt_probe_t *prp;
>> + dt_probe_t *prp_next;
>> + int i, prid = dtp->dt_probe_id;
>> + uint_t old_strlen = dtp->dt_strlen;
>> + dt_pcb_t pcb;
>> +
>> + /* Clear stale pids. */
>> + purge_BPFmap(dtp);
>> +
>> + /* Update dt_probes[] and dt_enablings. */
>> + /*
>> + * pcb is only used inside of dt_pid_error() to get:
>> + * pcb->pcb_region
>> + * pcb->pcb_filetag
>> + * pcb->pcb_fileptr
>> + * While pcb cannot be NULL, these other things apparently can be.
>> + */
>> + memset(&pcb, 0, sizeof(dt_pcb_t));
>> + for (i = 0; i < dtp->dt_stmt_nextid; i++) {
>> + dtrace_stmtdesc_t *stp;
>> +
>> + stp = dtp->dt_stmts[i];
>> + assert(stp != NULL);
>> + dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
>> + }
>> +
>> + while (prid < dtp->dt_probe_id) {
>> + dt_bpf_probe_t pinfo;
>> + const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
>> + int fd = dtp->dt_probesmap_fd;
>> +
>> + dt_probe_enable(dtp, dtp->dt_probes[prid]);
>> +
>> + pinfo.prv = dt_strtab_index(dtp->dt_ccstab, pdp->prv);
>> + pinfo.mod = dt_strtab_index(dtp->dt_ccstab, pdp->mod);
>> + pinfo.fun = dt_strtab_index(dtp->dt_ccstab, pdp->fun);
>> + pinfo.prb = dt_strtab_index(dtp->dt_ccstab, pdp->prb);
>> +
>> + if (dt_bpf_map_update(fd, &pdp->id, &pinfo) == -1)
>> + assert(0); // FIXME do something here
>> +
>> + prid++;
>> + }
>> +
>> + /* Grow the strtab if it has gotten bigger. */
>> + dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
>> +
>> + if (dtp->dt_strlen > dtp->dt_strmax)
>> + assert(0); // FIXME handle this more gracefully
>> + if (dtp->dt_strlen > old_strlen)
>> + grow_strtab(dtp);
>> +
>> + /* Review enablings. */
>> + for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
>> + pid_t pid;
>> + list_probe_t *pup;
>> +
>> + prp_next = dt_list_next(prp);
>> +
>> + /* Make sure it is an overlying USDT probe. */
>> + if (prp->prov->impl != &dt_usdt)
>> + continue;
>> +
>> + /* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
>> + /*
>> + * Nick writes:
>> + * This is a general problem with running compiler-adjacent things outside
>> + * compile time. I think we should adjust dt_pid_error() so that it works
>> + * with NULL pcb and dpr at once, probably by using the code path for
>> + * pcb != NULL and augmenting it so that it passes in NULL for the region and
>> + * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
>> + * already handle this case.)
>> + */
>> + pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
>> +
>> + if (!Pexists(pid)) {
>> + /* Remove from enablings. */
>> + dt_list_delete(&dtp->dt_enablings, prp);
>> +
>> + /* Make it evident from the probe that it is not in enablings. */
>> + ((dt_list_t *)prp)->dl_prev = NULL;
>> + ((dt_list_t *)prp)->dl_next = NULL;
>> +
>> + /* Free up its list of underlying probes. */
>> + while ((pup = dt_list_next(prp->prv_data)) != NULL) {
>> + dt_list_delete(prp->prv_data, pup);
>> + dt_free(dtp, pup);
>> + }
>> + dt_free(dtp, prp->prv_data);
>> + prp->prv_data = NULL;
>> +
>> + /* Free up BPF "probes" map entry. */
>> + dt_bpf_map_delete(dtp->dt_probesmap_fd, &prp->desc->id);
>> +
>> + continue;
>> + }
>> +
>> + for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
>> + dt_probe_t *uprp = pup->probe;
>> + long long mask = 0, bit = 1;
>> + usdt_prids_map_key_t key;
>> + usdt_prids_map_val_t val, oval;
>> + dt_uprobe_t *upp = uprp->prv_data;
>> +
>> + /*
>> + * For is-enabled probes, the bit mask does not matter.
>> + * It is possible that we have this underlying probe due to
>> + * an overlying pid-offset probe and that we will not know
>> + * until later, when some new pid is created, that we also
>> + * have an overlying USDT is-enabled probe, but missing this
>> + * optimization opportunity is okay.
>> + */
>> + if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
>> + int n;
>> +
>> + for (n = 0; n < dtp->dt_stmt_nextid; n++) {
>> + dtrace_stmtdesc_t *stp;
>> +
>> + stp = dtp->dt_stmts[n];
>> + assert(stp != NULL);
>> +
>> + if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
>> + dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
>> + dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
>> + dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
>> + mask |= bit;
>> +
>> + bit <<= 1;
>> + }
>> + }
>> +
>> + key.pid = pid;
>> + key.uprid = uprp->desc->id;
>> +
>> + val.prid = prp->desc->id;
>> + val.mask = mask;
>> +
>> + /* linux/bpf.h says error is -1, but it could be -2 */
>> + if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
>> + /*
>> + * Set the map entry.
>> + */
>> + // FIXME Check return value, but how should errors be handled?
>> + dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
>> + } else if (val.prid != oval.prid || val.mask != oval.mask) {
>> + /*
>> + * This can happen when two overlying probes map to the same underlying probe for the same pid.
>> + * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
>> + */
>> + } else {
>> + /*
>> + * Nothing to do, it already is in the map.
>> + */
>> + }
>> + }
>> + }
>> +
>> + return 0;
>> +}
>>
>> /*
>> * Look up or create an underlying (real) probe, corresponding directly to a
>> @@ -782,6 +1059,7 @@ dt_provimpl_t dt_uprobe = {
>> .probe_info = &probe_info,
>> .detach = &detach,
>> .probe_destroy = &probe_destroy_underlying,
>> + .update = &update_uprobe,
>> };
>>
>> /*
>> diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
>> index 40913a0f1..0efad3633 100644
>> --- a/libdtrace/dt_work.c
>> +++ b/libdtrace/dt_work.c
>> @@ -368,6 +368,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
>> dtrace_workstatus_t rval;
>> int gen;
>>
>> + if (dt_uprobe.update(dtp, NULL) < 0)
>> + return DTRACE_WORKSTATUS_ERROR;
>> +
>> switch (dtrace_status(dtp)) {
>> case DTRACE_STATUS_EXITED:
>> case DTRACE_STATUS_STOPPED:
>> --
>> 2.43.5
>>
>>
>> _______________________________________________
>> DTrace-devel mailing list
>> DTrace-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/dtrace-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DTrace-devel] [PATCH v4 07/19] Create the BPF usdt_prids map
2024-10-06 1:11 ` Eugene Loh
@ 2024-10-07 15:52 ` Kris Van Hees
2024-10-07 16:57 ` Eugene Loh
0 siblings, 1 reply; 7+ messages in thread
From: Kris Van Hees @ 2024-10-07 15:52 UTC (permalink / raw)
To: Eugene Loh; +Cc: dtrace, dtrace-devel
On Sat, Oct 05, 2024 at 09:11:51PM -0400, Eugene Loh via DTrace-devel wrote:
> On 10/3/24 01:04, Kris Van Hees wrote:
>
> > The main problem with the current implementation is the fact that the strtab
> > BPF map is being updated while it is being used. That is not a safe operation,
> > and can have unpredictable behaviour depending on the actual implementation at
> > the kernel level. We cannot cuont on it being safe.
> >
> > I propose introducing a usdtsize option (or whatever we name it) that sets the
> > number of USDT probes that we can support. It should be used for the size of
> > the usdt_prids map.
> >
> > In order to be able to provide the data for probeprov, probemod, probefunc, and
> > probename, we need to have a way to store probe name elements dynamically. That
> > can be done using a usdt_probes hashmap that stores a string (128 bytes max) and
> > is indexed by an int. The bvara code will need to know to consult this new
> > BPF map to get the data. I can see two options:
> >
> > 1) increase the probes BPF map by usdtsize extra elements, and set the highest
> > order bit to indicate that the value for a probe name element should be
> > retrieved from the usdt_probes hashmap.
> >
> > 2) add probe name element key values to the usdt_prids map
>
> We usually want to access this by prid, and usdt_prids has prid in its value
> rather than in its key.
Ah yes.
> > Approach 1 uses the existing the mechanism for storing this data, but adds some
> > (very minor) performance hit to all probeprov, probemod, probefunc, probename
> > bvar lookups because we need to check for the highest order bit.
> >
> > Approach 2 means we have an alternative place where we store the probe name
> > element data, but it remains localized with the other USDT probe data *and*
> > the performance impact is limited to USDT probes only (if the prid is not
> > found in the probes map, we need to look in the usdt_prids map).
>
> How about this. There are some probes that are already known when we create
> the BPF maps. We simply remember how many probe IDs exist at this time.
> Thereafter, if we have a probe whose ID is greater, its name elements go
> into a new hash map. In get_bvar(), the first thing we do is look whether
> the probe ID we're looking up is old (less than the number of initial
> probes) or new. That tells us whether to look in the standard, static hash
> map or in the new, dynamic hash map. Sound good? I have this version
> working. If you agree with the approach, I'll clean up the patch and post
> it for review.
>
> Put another way, when we create the BPF maps, we memorize the current value
> of dtp->dt_probe_id. I call this dtp->dt_ninitprobes, the number of initial
> probes. Thereafter, we use the traditional BPF map when prid <=
> dtp->dt_ninitprobes and the new dynamic hash map otherwise.
Yes, I think that works fine. You probably could even just name it
dtp->dt_nprobes (meaning the number of probes that is created statically),
and introduce a NPROBES symbol that the relocator will fill in with the
dtp->dt_nprobes value. I assume you assign the dtp->dt_nprobes value when
we start probing (e.g. when the probes BPF map is created)?
> > On Fri, Sep 27, 2024 at 10:15:25PM -0400, eugene.loh--- via DTrace-devel wrote:
> > > From: Eugene Loh <eugene.loh@oracle.com>
> > >
> > > As USDT processes come and go, the overlying probes for an underlying
> > > probe will change. Hence, we will move to a scheme in which an
> > > underlying probe will walk all possible clauses it could call,
> > > deciding at run time (using a bit mask for the overlying USDT probe)
> > > which of the clauses to call.
> > >
> > > In this patch, we create and update the BPF "usdt_prids" map. This
> > > is a hash map, where:
> > >
> > > *) the key (size: dtp->dt_usdt_pridsmap_ksz) comprises
> > > - the PID of the firing process
> > > - the PRID of the underlying probe
> > >
> > > *) the value (size: dtp->dt_usdt_pridsmap_vsz) comprises
> > > - the PRID over the overlying USDT probe
> > > - a bit mask indicating which clauses should be called
> > >
> > > As processes start up, we also add new overlying USDT probes,
> > > requiring updates of the BPF "probes" and "strtab" maps and of
> > > the list of enablings.
> > >
> > > The underlying-probes "update" function is called sporadically.
> > > The focus of this patch is the function. When it is called can
> > > be tuned in future patches.
> > >
> > > This patch guesses certain sizes very crudely. These sizes should
> > > be handled more robustly in future patches:
> > >
> > > *) The allocated size of the string table. For example,
> > > new provider names have to be added as new processes
> > > start.
> > >
> > > *) The number of entries in the BPF "usdt_prids" map.
> > > There is a little relief here in that, as processes
> > > terminate, they can be removed from the map.
> > >
> > > *) The size of the bit mask -- that is, the greatest
> > > number of clauses an underlying probe might call.
> > > This is relatively easy to extend; nevertheless,
> > > that work is left for a future patch.
> > >
> > > Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> > > ---
> > > libdtrace/dt_bpf.c | 43 +++++-
> > > libdtrace/dt_cc.c | 2 +-
> > > libdtrace/dt_dlibs.c | 1 +
> > > libdtrace/dt_impl.h | 6 +
> > > libdtrace/dt_prov_uprobe.c | 278 +++++++++++++++++++++++++++++++++++++
> > > libdtrace/dt_work.c | 3 +
> > > 6 files changed, 331 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
> > > index ad11d178e..4acf6e494 100644
> > > --- a/libdtrace/dt_bpf.c
> > > +++ b/libdtrace/dt_bpf.c
> > > @@ -841,7 +841,8 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> > > int fd, rc, err;
> > > dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> > > - dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strlen, 8);
> > > + dtp->dt_strmax = dtp->dt_strlen + 1000; // FIXME pad some arbitrary amount to account for new USDT probes as new processes start
> > > + dtp->dt_rooffset = P2ROUNDUP(dtp->dt_strmax, 8);
> > > dtp->dt_rosize = dt_rodata_size(dtp->dt_rodata);
> > > dtp->dt_zerooffset = P2ROUNDUP(dtp->dt_rooffset + dtp->dt_rosize, 8);
> > > dtp->dt_zerosize = 0;
> > > @@ -899,6 +900,7 @@ gmap_create_strtab(dtrace_hdl_t *dtp)
> > > if (rc == -1)
> > > return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n",
> > > strerror(err));
> > > + dtp->dt_strtabmap_fd = fd;
> > > return 0;
> > > }
> > > @@ -936,6 +938,44 @@ gmap_create_probes(dtrace_hdl_t *dtp)
> > > dtp, "cannot update BPF map 'probes': %s\n",
> > > strerror(errno));
> > > }
> > > + dtp->dt_probesmap_fd = fd;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * Create the 'usdt_prids' BPF map.
> > > + *
> > > + * USDT-PRID information map. This is a global hash map for use
> > > + * with USDT probes. It is indexed by (pid, underlying probe ID).
> > > + * The value is a probe ID for the overlying USDT probe and a bit
> > > + * mask indicating which clauses to execute for this pid.
> > > + *
> > > + * WIP. Just make up some sizes for now.
> > > + *
> > > + * How big is a probe ID? Sometimes, it's a dtrace_id_t.
> > > + * And uts/common/sys/dtrace_types.h gives us "typedef uint32_t dtrace_id_t".
> > > + * Meanwhile, libdtrace/dt_impl.h has "uint32_t dt_probe_id".
> > > + * So either uint32_t or dtrace_id_t is fine.
> > > + *
> > > + * How big should our bit mask be? Start with 8*sizeof(long long) bits.
> > > + *
> > > + * How many entries should we allow? Start with 1000.
> > > + *
> > > + * Note that for a given (pid, PRID) key, there can be at most one
> > > + * overlying USDT probe.
> > > + */
> > > +static int
> > > +gmap_create_usdt_prids(dtrace_hdl_t *dtp)
> > > +{
> > > + dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
> > > + dtp->dt_usdt_pridsmap_ksz, dtp->dt_usdt_pridsmap_vsz, 1000);
> > > + if (dtp->dt_usdt_pridsmap_fd == -1)
> > > + return -1;
> > > +
> > > + /* Populate the newly created map. FIXME: this is probably not the right place for this. */
> > > + if (dt_uprobe.update(dtp, NULL) < 0)
> > > + return -1;
> > > return 0;
> > > }
> > > @@ -1045,6 +1085,7 @@ dt_bpf_gmap_create(dtrace_hdl_t *dtp)
> > > CREATE_MAP(scratchmem)
> > > CREATE_MAP(strtab)
> > > CREATE_MAP(probes)
> > > + CREATE_MAP(usdt_prids)
> > > CREATE_MAP(gvars)
> > > CREATE_MAP(lvars)
> > > CREATE_MAP(dvars)
> > > diff --git a/libdtrace/dt_cc.c b/libdtrace/dt_cc.c
> > > index 4202771a9..12104fc21 100644
> > > --- a/libdtrace/dt_cc.c
> > > +++ b/libdtrace/dt_cc.c
> > > @@ -1045,7 +1045,7 @@ dt_link_construct(dtrace_hdl_t *dtp, const dt_probe_t *prp, dtrace_difo_t *dp,
> > > nrp->dofr_data = 0; /* FIXME */
> > > continue;
> > > case DT_CONST_STBSZ:
> > > - nrp->dofr_data = dtp->dt_strlen;
> > > + nrp->dofr_data = dtp->dt_strmax;
> > > continue;
> > > case DT_CONST_STRSZ:
> > > nrp->dofr_data =
> > > diff --git a/libdtrace/dt_dlibs.c b/libdtrace/dt_dlibs.c
> > > index ba4d4abef..924cf11e4 100644
> > > --- a/libdtrace/dt_dlibs.c
> > > +++ b/libdtrace/dt_dlibs.c
> > > @@ -66,6 +66,7 @@ static const dt_ident_t dt_bpf_symbols[] = {
> > > DT_BPF_SYMBOL(lvars, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(mem, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(probes, DT_IDENT_PTR),
> > > + DT_BPF_SYMBOL(usdt_prids, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(scratchmem, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(specs, DT_IDENT_PTR),
> > > DT_BPF_SYMBOL(state, DT_IDENT_PTR),
> > > diff --git a/libdtrace/dt_impl.h b/libdtrace/dt_impl.h
> > > index 340dc1960..1d1248766 100644
> > > --- a/libdtrace/dt_impl.h
> > > +++ b/libdtrace/dt_impl.h
> > > @@ -282,6 +282,7 @@ struct dtrace_hdl {
> > > dt_strtab_t *dt_ccstab; /* global string table (during compilation) */
> > > dt_rodata_t *dt_rodata; /* global read-only data */
> > > uint_t dt_strlen; /* global string table (runtime) size */
> > > + uint_t dt_strmax; /* global string table (runtime) size, allocated */
> > > uint_t dt_rooffset; /* read-only data offset */
> > > uint_t dt_rosize; /* read-only data size */
> > > uint_t dt_zerooffset; /* zero region, offset */
> > > @@ -389,6 +390,11 @@ struct dtrace_hdl {
> > > int dt_aggmap_fd; /* file descriptor for the 'aggs' BPF map */
> > > int dt_genmap_fd; /* file descriptor for the 'agggen' BPF map */
> > > int dt_cpumap_fd; /* file descriptor for the 'cpuinfo' BPF map */
> > > + int dt_strtabmap_fd; /* file descriptor for the 'strtab' BPF map */
> > > + int dt_probesmap_fd; /* file descriptor for the 'probes' BPF map */
> > > + int dt_usdt_pridsmap_fd; /* file descriptor for the 'usdt_prids' BPF map */
> > > + int dt_usdt_pridsmap_ksz; /* 'usdt_prids' BPF map key size */
> > > + int dt_usdt_pridsmap_vsz; /* 'usdt_prids' BPF map value size */
> > > dtrace_handle_err_f *dt_errhdlr; /* error handler, if any */
> > > void *dt_errarg; /* error handler argument */
> > > dtrace_handle_drop_f *dt_drophdlr; /* drop handler, if any */
> > > diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> > > index bb172ace2..d99915fa2 100644
> > > --- a/libdtrace/dt_prov_uprobe.c
> > > +++ b/libdtrace/dt_prov_uprobe.c
> > > @@ -37,8 +37,10 @@
> > > #include "dt_list.h"
> > > #include "dt_provider_tp.h"
> > > #include "dt_probe.h"
> > > +#include "dt_program.h"
> > > #include "dt_pid.h"
> > > #include "dt_string.h"
> > > +#include "port.h"
> > > /* Provider name for the underlying probes. */
> > > static const char prvname[] = "uprobe";
> > > @@ -63,6 +65,15 @@ typedef struct list_probe {
> > > dt_probe_t *probe;
> > > } list_probe_t;
> > > +typedef struct usdt_prids_map_key {
> > > + pid_t pid;
> > > + dtrace_id_t uprid;
> > > +} usdt_prids_map_key_t;
> > > +typedef struct usdt_prids_map_val {
> > > + dtrace_id_t prid;
> > > + long long mask;
> > > +} usdt_prids_map_val_t;
> > > +
> > > static const dtrace_pattr_t pattr = {
> > > { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
> > > { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> > > @@ -77,6 +88,9 @@ dt_provimpl_t dt_usdt;
> > > static int populate(dtrace_hdl_t *dtp)
> > > {
> > > + dtp->dt_usdt_pridsmap_ksz = sizeof(usdt_prids_map_key_t);
> > > + dtp->dt_usdt_pridsmap_vsz = sizeof(usdt_prids_map_val_t);
> > > +
> > > if (dt_provider_create(dtp, dt_uprobe.name, &dt_uprobe, &pattr,
> > > NULL) == NULL ||
> > > dt_provider_create(dtp, dt_uprobe_is_enabled.name,
> > > @@ -123,6 +137,269 @@ static void probe_destroy(dtrace_hdl_t *dtp, void *datap)
> > > free_probe_list(dtp, datap);
> > > }
> > > +/*
> > > + * Clean up the BPF "usdt_prids" map.
> > > + */
> > > +static int
> > > +purge_BPFmap(dtrace_hdl_t *dtp)
> > > +{
> > > + int fd = dtp->dt_usdt_pridsmap_fd;
> > > + usdt_prids_map_key_t key, nxt;
> > > + usdt_prids_map_val_t val;
> > > +
> > > + /* Initialize key to a pid/uprid that cannot be found. */
> > > + key.pid = 0;
> > > + key.uprid = 0;
> > > +
> > > + /* Loop over all entries. */
> > > + while (dt_bpf_map_next_key(fd, &key, &nxt) == 0) {
> > > + memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
> > > +
> > > + if (dt_bpf_map_lookup(fd, &key, &val) == -1)
> > > + return dt_set_errno(dtp, EDT_BPF);
> > > +
> > > + /* Check if the process is still running. */
> > > + if (!Pexists(key.pid)) {
> > > + dt_bpf_map_delete(fd, &key); // FIXME: bpf_map_next_key() iteration restarts each time we delete an elem!!!
> > > + continue;
> > > + }
> > > +
> > > + /*
> > > + * FIXME. There might be another case, where the process
> > > + * is still running, but some of its USDT probes are gone?
> > > + * So maybe we have to check for the existence of one of
> > > + * dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
> > > + * char *prv = ...pdp->prv minus the numerial part;
> > > + *
> > > + * /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
> > > + * /run/dtrace/stash/dof-pid/$pid/0/parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> > > + * /run/dtrace/stash/dof-pid/$pid/.../parsed/$prv:$pdp->mod:$pdp->fun:$pdp->prb
> > > + */
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * Grow the string table map. It includes the string table (which might
> > > + * grow due to new USDT probes). It also includes the read-only data and
> > > + * the block of zeros, but these remain fixed.
> > > + *
> > > + * FIXME Is there a danger of the BPF map update colliding with map reads?
> > > + */
> > > +static int
> > > +grow_strtab(dtrace_hdl_t *dtp)
> > > +{
> > > + size_t sz = dtp->dt_zerooffset + dtp->dt_zerosize;
> > > + char *strtab;
> > > + uint8_t *buf, *end;
> > > + size_t strsize = dtp->dt_options[DTRACEOPT_STRSIZE];
> > > + uint32_t key = 0;
> > > + int rc, err;
> > > +
> > > + strtab = dt_zalloc(dtp, sz);
> > > +#if 0
> > > + // FIXME do something
> > > + if (strtab == NULL)
> > > + return dt_set_errno(dtp, EDT_NOMEM);
> > > +#endif
> > > +
> > > + /* Copy the string table data. */
> > > + dt_strtab_write(dtp->dt_ccstab, (dt_strtab_write_f *)dt_strtab_copystr, strtab);
> > > +
> > > + /* Loop over the string table and truncate strings that are too long. */
> > > + buf = (uint8_t *)strtab;
> > > + end = buf + dtp->dt_strlen;
> > > + while (buf < end) {
> > > + uint_t len = strlen((char *)buf);
> > > +
> > > + if (len > strsize)
> > > + buf[strsize] = '\0';
> > > +
> > > + buf += len + 1;
> > > + }
> > > +
> > > + /* Copy the read-only data. */
> > > + dt_rodata_write(dtp->dt_rodata, (dt_rodata_copy_f *)dt_rodata_copy, strtab + dtp->dt_rooffset);
> > > +
> > > + rc = dt_bpf_map_update(dtp->dt_strtabmap_fd, &key, strtab);
> > > + err = errno;
> > > + dt_free(dtp, strtab);
> > > +
> > > + // FIXME do something
> > > + if (rc == -1) {
> > > + strerror(err);
> > > + return -1;
> > > + // return dt_bpf_error(dtp, "cannot update BPF map 'strtab': %s\n", strerror(err));
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * Update the uprobe provider.
> > > + */
> > > +static int update_uprobe(dtrace_hdl_t *dtp, void *datap)
> > > +{
> > > + dt_probe_t *prp;
> > > + dt_probe_t *prp_next;
> > > + int i, prid = dtp->dt_probe_id;
> > > + uint_t old_strlen = dtp->dt_strlen;
> > > + dt_pcb_t pcb;
> > > +
> > > + /* Clear stale pids. */
> > > + purge_BPFmap(dtp);
> > > +
> > > + /* Update dt_probes[] and dt_enablings. */
> > > + /*
> > > + * pcb is only used inside of dt_pid_error() to get:
> > > + * pcb->pcb_region
> > > + * pcb->pcb_filetag
> > > + * pcb->pcb_fileptr
> > > + * While pcb cannot be NULL, these other things apparently can be.
> > > + */
> > > + memset(&pcb, 0, sizeof(dt_pcb_t));
> > > + for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> > > + dtrace_stmtdesc_t *stp;
> > > +
> > > + stp = dtp->dt_stmts[i];
> > > + assert(stp != NULL);
> > > + dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
> > > + }
> > > +
> > > + while (prid < dtp->dt_probe_id) {
> > > + dt_bpf_probe_t pinfo;
> > > + const dtrace_probedesc_t *pdp = dtp->dt_probes[prid]->desc;
> > > + int fd = dtp->dt_probesmap_fd;
> > > +
> > > + dt_probe_enable(dtp, dtp->dt_probes[prid]);
> > > +
> > > + pinfo.prv = dt_strtab_index(dtp->dt_ccstab, pdp->prv);
> > > + pinfo.mod = dt_strtab_index(dtp->dt_ccstab, pdp->mod);
> > > + pinfo.fun = dt_strtab_index(dtp->dt_ccstab, pdp->fun);
> > > + pinfo.prb = dt_strtab_index(dtp->dt_ccstab, pdp->prb);
> > > +
> > > + if (dt_bpf_map_update(fd, &pdp->id, &pinfo) == -1)
> > > + assert(0); // FIXME do something here
> > > +
> > > + prid++;
> > > + }
> > > +
> > > + /* Grow the strtab if it has gotten bigger. */
> > > + dtp->dt_strlen = dt_strtab_size(dtp->dt_ccstab);
> > > +
> > > + if (dtp->dt_strlen > dtp->dt_strmax)
> > > + assert(0); // FIXME handle this more gracefully
> > > + if (dtp->dt_strlen > old_strlen)
> > > + grow_strtab(dtp);
> > > +
> > > + /* Review enablings. */
> > > + for (prp = dt_list_next(&dtp->dt_enablings); prp != NULL; prp = prp_next) {
> > > + pid_t pid;
> > > + list_probe_t *pup;
> > > +
> > > + prp_next = dt_list_next(prp);
> > > +
> > > + /* Make sure it is an overlying USDT probe. */
> > > + if (prp->prov->impl != &dt_usdt)
> > > + continue;
> > > +
> > > + /* FIXME passing in NULL pcb and dpr wreaks havoc on error reporting? */
> > > + /*
> > > + * Nick writes:
> > > + * This is a general problem with running compiler-adjacent things outside
> > > + * compile time. I think we should adjust dt_pid_error() so that it works
> > > + * with NULL pcb and dpr at once, probably by using the code path for
> > > + * pcb != NULL and augmenting it so that it passes in NULL for the region and
> > > + * filename args and 0 for the lineno if pcb is NULL. (dt_set_errmsg can
> > > + * already handle this case.)
> > > + */
> > > + pid = dt_pid_get_pid(prp->desc, dtp, NULL, NULL);
> > > +
> > > + if (!Pexists(pid)) {
> > > + /* Remove from enablings. */
> > > + dt_list_delete(&dtp->dt_enablings, prp);
> > > +
> > > + /* Make it evident from the probe that it is not in enablings. */
> > > + ((dt_list_t *)prp)->dl_prev = NULL;
> > > + ((dt_list_t *)prp)->dl_next = NULL;
> > > +
> > > + /* Free up its list of underlying probes. */
> > > + while ((pup = dt_list_next(prp->prv_data)) != NULL) {
> > > + dt_list_delete(prp->prv_data, pup);
> > > + dt_free(dtp, pup);
> > > + }
> > > + dt_free(dtp, prp->prv_data);
> > > + prp->prv_data = NULL;
> > > +
> > > + /* Free up BPF "probes" map entry. */
> > > + dt_bpf_map_delete(dtp->dt_probesmap_fd, &prp->desc->id);
> > > +
> > > + continue;
> > > + }
> > > +
> > > + for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
> > > + dt_probe_t *uprp = pup->probe;
> > > + long long mask = 0, bit = 1;
> > > + usdt_prids_map_key_t key;
> > > + usdt_prids_map_val_t val, oval;
> > > + dt_uprobe_t *upp = uprp->prv_data;
> > > +
> > > + /*
> > > + * For is-enabled probes, the bit mask does not matter.
> > > + * It is possible that we have this underlying probe due to
> > > + * an overlying pid-offset probe and that we will not know
> > > + * until later, when some new pid is created, that we also
> > > + * have an overlying USDT is-enabled probe, but missing this
> > > + * optimization opportunity is okay.
> > > + */
> > > + if (uprp->prov->impl == &dt_uprobe && !(upp->flags & PP_IS_ENABLED)) {
> > > + int n;
> > > +
> > > + for (n = 0; n < dtp->dt_stmt_nextid; n++) {
> > > + dtrace_stmtdesc_t *stp;
> > > +
> > > + stp = dtp->dt_stmts[n];
> > > + assert(stp != NULL);
> > > +
> > > + if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
> > > + dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
> > > + dt_gmatch(prp->desc->fun, stp->dtsd_ecbdesc->dted_probe.fun) &&
> > > + dt_gmatch(prp->desc->prb, stp->dtsd_ecbdesc->dted_probe.prb))
> > > + mask |= bit;
> > > +
> > > + bit <<= 1;
> > > + }
> > > + }
> > > +
> > > + key.pid = pid;
> > > + key.uprid = uprp->desc->id;
> > > +
> > > + val.prid = prp->desc->id;
> > > + val.mask = mask;
> > > +
> > > + /* linux/bpf.h says error is -1, but it could be -2 */
> > > + if (dt_bpf_map_lookup(dtp->dt_usdt_pridsmap_fd, &key, &oval) < 0) {
> > > + /*
> > > + * Set the map entry.
> > > + */
> > > + // FIXME Check return value, but how should errors be handled?
> > > + dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
> > > + } else if (val.prid != oval.prid || val.mask != oval.mask) {
> > > + /*
> > > + * This can happen when two overlying probes map to the same underlying probe for the same pid.
> > > + * E.g., pid:::entry and pid:::0, or pid:::$offset and usdt:::.
> > > + */
> > > + } else {
> > > + /*
> > > + * Nothing to do, it already is in the map.
> > > + */
> > > + }
> > > + }
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > /*
> > > * Look up or create an underlying (real) probe, corresponding directly to a
> > > @@ -782,6 +1059,7 @@ dt_provimpl_t dt_uprobe = {
> > > .probe_info = &probe_info,
> > > .detach = &detach,
> > > .probe_destroy = &probe_destroy_underlying,
> > > + .update = &update_uprobe,
> > > };
> > > /*
> > > diff --git a/libdtrace/dt_work.c b/libdtrace/dt_work.c
> > > index 40913a0f1..0efad3633 100644
> > > --- a/libdtrace/dt_work.c
> > > +++ b/libdtrace/dt_work.c
> > > @@ -368,6 +368,9 @@ dtrace_work(dtrace_hdl_t *dtp, FILE *fp, dtrace_consume_probe_f *pfunc,
> > > dtrace_workstatus_t rval;
> > > int gen;
> > > + if (dt_uprobe.update(dtp, NULL) < 0)
> > > + return DTRACE_WORKSTATUS_ERROR;
> > > +
> > > switch (dtrace_status(dtp)) {
> > > case DTRACE_STATUS_EXITED:
> > > case DTRACE_STATUS_STOPPED:
> > > --
> > > 2.43.5
> > >
> > >
> > > _______________________________________________
> > > DTrace-devel mailing list
> > > DTrace-devel@oss.oracle.com
> > > https://oss.oracle.com/mailman/listinfo/dtrace-devel
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DTrace-devel] [PATCH v4 07/19] Create the BPF usdt_prids map
2024-10-07 15:52 ` Kris Van Hees
@ 2024-10-07 16:57 ` Eugene Loh
2024-10-09 4:58 ` Eugene Loh
0 siblings, 1 reply; 7+ messages in thread
From: Eugene Loh @ 2024-10-07 16:57 UTC (permalink / raw)
To: dtrace, dtrace-devel
On 10/7/24 11:52, Kris Van Hees wrote:
> Yes, I think that works fine.
Cool. I'm adding a new test but will otherwise just refactor the
changes into the patch series and repost this patch (and any other
impacted patches) for new review.
> You probably could even just name it
> dtp->dt_nprobes (meaning the number of probes that is created statically),
> and introduce a NPROBES symbol that the relocator will fill in with the
> dtp->dt_nprobes value. I assume you assign the dtp->dt_nprobes value when
> we start probing (e.g. when the probes BPF map is created)?
Right.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [DTrace-devel] [PATCH v4 07/19] Create the BPF usdt_prids map
2024-10-07 16:57 ` Eugene Loh
@ 2024-10-09 4:58 ` Eugene Loh
0 siblings, 0 replies; 7+ messages in thread
From: Eugene Loh @ 2024-10-09 4:58 UTC (permalink / raw)
To: dtrace, dtrace-devel
Okay, v5 coming up. It's slightly rebranded as "Create the BPF
usdt_names and usdt_prids maps".
On 10/7/24 12:57, Eugene Loh wrote:
> On 10/7/24 11:52, Kris Van Hees wrote:
>
>> Yes, I think that works fine.
>
> Cool. I'm adding a new test but will otherwise just refactor the
> changes into the patch series and repost this patch (and any other
> impacted patches) for new review.
>
>> You probably could even just name it
>> dtp->dt_nprobes (meaning the number of probes that is created
>> statically),
>> and introduce a NPROBES symbol that the relocator will fill in with the
>> dtp->dt_nprobes value. I assume you assign the dtp->dt_nprobes value
>> when
>> we start probing (e.g. when the probes BPF map is created)?
>
> Right.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-09 4:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-28 2:15 [PATCH v2 02/19] Add a hook for a provider-specific "update" function eugene.loh
2024-09-28 2:15 ` [PATCH v4 07/19] Create the BPF usdt_prids map eugene.loh
2024-10-03 5:04 ` [DTrace-devel] " Kris Van Hees
2024-10-06 1:11 ` Eugene Loh
2024-10-07 15:52 ` Kris Van Hees
2024-10-07 16:57 ` Eugene Loh
2024-10-09 4:58 ` Eugene Loh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox