[PATCH v2] Optimize USDT discovery a little

public inbox for dtrace@lists.linux.dev
 help / color / mirror / Atom feed

* [PATCH v2] Optimize USDT discovery a little
@ 2025-06-25  6:03 eugene.loh
  2025-06-25  6:03 ` [PATCH v2 3/4] Sync up the version numbers eugene.loh
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: eugene.loh @ 2025-06-25  6:03 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

We want to reduce the performance cost of USDT discovery if possible.

Specifically, a number of statements -- with probe descriptions such as
"dtrace:::BEGIN" that could never specify USDT probes -- will not get
their clause flags set with DT_CLSFLAG_USDT_EXCLUDE.  So these statements
get considered unnecessarily during periodic probe discovery.

Therefore:

*)  Expand ignore_clause(dtp, n, uprp) to support the case uprp==NULL.
    This case is independent of any knowledge of a specific underlying
    probe.

*)  During probe discovery, check ignore_clause(dtp, i, NULL).  This
    sets the DT_CLSFLAG_USDT_[INCLUDE|EXCLUDE] flag and allows faster
    exclusion of statements that do not need to be reconsidered.

To take advantage of this optimization, users should specify providers.
E.g., instead of "BEGIN" (which could conceivably be a USDT probe),
specify "dtrace:::BEGIN".

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
 libdtrace/dt_prov_uprobe.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
index cf5cfd431..5ba50b678 100644
--- a/libdtrace/dt_prov_uprobe.c
+++ b/libdtrace/dt_prov_uprobe.c
@@ -504,7 +504,8 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
 
 /*
  * Judge whether clause "n" could ever be called as a USDT probe
- * for this underlying probe.
+ * for this underlying probe.  We can pass uprp==NULL to see if
+ * the clause can be excluded for every probe.
  */
 static int
 ignore_clause(dtrace_hdl_t *dtp, int n, const dt_probe_t *uprp)
@@ -512,6 +513,9 @@ ignore_clause(dtrace_hdl_t *dtp, int n, const dt_probe_t *uprp)
 	dtrace_stmtdesc_t	*stp = dtp->dt_stmts[n];
 	dtrace_probedesc_t	*pdp = &stp->dtsd_ecbdesc->dted_probe;
 
+	if (stp == NULL)
+		return 1;
+
 	/*
 	 * Some clauses could never be called for a USDT probe,
 	 * regardless of the underlying probe uprp.  Cache this
@@ -525,7 +529,7 @@ ignore_clause(dtrace_hdl_t *dtp, int n, const dt_probe_t *uprp)
 		 * neither '*' nor a digit, it cannot be a USDT probe.
 		 */
 		if (len > 1) {
-			char	lastchar = pdp->prv[len - 1];
+			char	lastchar = (pdp->prv[0] != '\0' ? pdp->prv[len - 1] : '*');
 
 			if (lastchar != '*' && !isdigit(lastchar)) {
 				dt_stmt_clsflag_set(stp, DT_CLSFLAG_USDT_EXCLUDE);
@@ -555,6 +559,8 @@ ignore_clause(dtrace_hdl_t *dtp, int n, const dt_probe_t *uprp)
 	}
 	if (dt_stmt_clsflag_test(stp, DT_CLSFLAG_USDT_EXCLUDE) == 1)
 		return 1;
+	if (uprp == NULL)
+		return 0;
 
 	/*
 	 * If we cannot ignore this statement, try to use uprp.
@@ -751,13 +757,9 @@ static int discover(dtrace_hdl_t *dtp)
 	 */
 	memset(&pcb, 0, sizeof(dt_pcb_t));
 	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
-		dtrace_stmtdesc_t *stp;
-
-		stp = dtp->dt_stmts[i];
-		if (stp == NULL)
+		if (ignore_clause(dtp, i, NULL))
 			continue;
-		if (dt_stmt_clsflag_test(stp, DT_CLSFLAG_USDT_EXCLUDE) != 1)
-			dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
+		dt_pid_create_usdt_probes(&dtp->dt_stmts[i]->dtsd_ecbdesc->dted_probe, dtp, &pcb);
 	}
 
 	return 0;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/4] Sync up the version numbers
  2025-06-25  6:03 [PATCH v2] Optimize USDT discovery a little eugene.loh
@ 2025-06-25  6:03 ` eugene.loh
  2025-07-22 10:51   ` [DTrace-devel] " Nick Alcock
  2025-06-25  6:03 ` [PATCH v2 4/4] test: Add test for predefined preprocessor definitions eugene.loh
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: eugene.loh @ 2025-06-25  6:03 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

DTrace has many version numbers -- e.g., for the release, packaging,
and API.

In reality, the variations in numbering have become nearly meaningless:

- Packaging numbers -- like the Version in the dtrace*spec RPM spec
  file and the VERSION in GNUmakefile -- have basically been tracking
  the DTrace release since 2.0 anyway.

- Stability attributes for idents are haphazard.  Generally, they
  are 1.0 (or sometimes other 1.x), and all the BPFs are 2.0.  While
  this is generally accurate, it is not exactly robust, and idents
  are sometimes introduced or modified without careful regard to
  the version number.  Further, the stability of user D scripts is
  likely to depend more on kernel variations -- e.g., the contents of
  available_filter_functions -- than on D changes.

- Version number updates are susceptible to mistakes, and so there
  have been version mismatches.

Bring the version numbers into sync for 2.0.3.  Clean up the descriptions
in dt_version.h.

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
 GNUmakefile                                  |  5 +-
 dtrace.spec                                  |  1 +
 libdtrace/dt_open.c                          |  4 ++
 libdtrace/dt_version.h                       | 54 ++++++++++++--------
 test/unittest/dtrace-util/tst.APIVersion.r   |  2 +-
 test/unittest/dtrace-util/tst.APIVersion.r.p |  6 +++
 test/unittest/options/tst.version.r          |  2 +-
 test/unittest/options/tst.version.sh         |  4 +-
 8 files changed, 52 insertions(+), 26 deletions(-)
 create mode 100755 test/unittest/dtrace-util/tst.APIVersion.r.p

diff --git a/GNUmakefile b/GNUmakefile
index d1e18bb1b..00269785b 100644
--- a/GNUmakefile
+++ b/GNUmakefile
@@ -3,7 +3,7 @@
 # Build files in subdirectories are included by this file.
 #
 # Oracle Linux DTrace.
-# Copyright (c) 2011, 2024, Oracle and/or its affiliates. All rights reserved.
+# Copyright (c) 2011, 2025, Oracle and/or its affiliates. All rights reserved.
 # Licensed under the Universal Permissive License v 1.0 as shown at
 # http://oss.oracle.com/licenses/upl.
 
@@ -14,7 +14,8 @@
 SHELL = /bin/bash
 
 PROJECT := dtrace
-VERSION := 2.0.1
+# When updating version, see comments in dt_version.h.
+VERSION := 2.0.3
 
 # Verify supported hardware.
 
diff --git a/dtrace.spec b/dtrace.spec
index 6bb792acb..dd44a5b67 100644
--- a/dtrace.spec
+++ b/dtrace.spec
@@ -72,6 +72,7 @@ Requires:     libdtrace-ctf >= 1.1.0
 BuildRequires: libdtrace-ctf-devel >= 1.1.0
 %endif
 Summary:      DTrace user interface.
+# When updating version, see comments in dt_version.h.
 Version:      2.0.3
 Release:      1%{?dist}
 Source:       dtrace-%{version}.tar.bz2
diff --git a/libdtrace/dt_open.c b/libdtrace/dt_open.c
index 7d47cab34..6cb797df6 100644
--- a/libdtrace/dt_open.c
+++ b/libdtrace/dt_open.c
@@ -60,6 +60,10 @@ const dt_version_t _dtrace_versions[] = {
 	DT_VERS_1_6_3,	/* D API 1.6.3 */
 	DT_VERS_1_6_4,	/* D API 1.6.4 */
 	DT_VERS_2_0,	/* D API 2.0 */
+	DT_VERS_2_0_1,	/* D API 2.0.1 */
+	DT_VERS_2_0_2,	/* D API 2.0.2 */
+	DT_VERS_2_0_3,	/* D API 2.0.3 */
+	/* When updating version, see comments in dt_version.h. */
 	0
 };
 
diff --git a/libdtrace/dt_version.h b/libdtrace/dt_version.h
index 0f47ea70d..5cfcab69a 100644
--- a/libdtrace/dt_version.h
+++ b/libdtrace/dt_version.h
@@ -37,32 +37,28 @@ extern "C" {
  *
  * These #defines are used in identifier tables to fill in the version fields
  * associated with each identifier.  The DT_VERS_* macros declare the encoded
- * integer values of all versions used so far.  DT_VERS_STRING must be an ASCII
- * string that contains the latest version within it along with any suffixes
- * (e.g. Beta).  You must update DT_VERS_STRING when adding a new version,
- * and then add the new version to the _dtrace_versions[] array declared in
- * dt_open.c.
+ * integer values of all versions used so far.
  *
- * Refer to the Solaris Dynamic Tracing Guide Versioning chapter for an
- * explanation of these DTrace features and their values.
+ * The major number should be incremented when a fundamental change has been
+ * made that would affect all consumers, and would reflect sweeping changes
+ * to DTrace or the D language.
+ *
+ * The minor number should be incremented when a change is introduced that
+ * could break scripts that had previously worked;  for example, adding a
+ * new built-in variable could break a script which was already using that
+ * identifier.
+ *
+ * The micro number should be changed when introducing functionality changes
+ * or major bug fixes that do not affect backward compatibility -- this is
+ * merely to make capabilities easily determined from the version number.
+ *
+ * Minor bugs do not require any modification to the version number.
  *
  * NOTE: Although the DTrace versioning scheme supports the labeling and
  *       introduction of incompatible changes (e.g. dropping an interface in a
  *       major release), the libdtrace code does not currently support this.
  *       All versions are assumed to strictly inherit from one another.  If
  *       we ever need to provide divergent interfaces, this will need work.
- *
- * The version number should be increased for every customer visible release
- * of Solaris. The major number should be incremented when a fundamental
- * change has been made that would affect all consumers, and would reflect
- * sweeping changes to DTrace or the D language. The minor number should be
- * incremented when a change is introduced that could break scripts that had
- * previously worked; for example, adding a new built-in variable could break
- * a script which was already using that identifier. The micro number should
- * be changed when introducing functionality changes or major bug fixes that
- * do not affect backward compatibility -- this is merely to make capabilities
- * easily determined from the version number. Minor bugs do not require any
- * modification to the version number.
  */
 #define	DT_VERS_1_0	DT_VERSION_NUMBER(1, 0, 0)
 #define	DT_VERS_1_1	DT_VERSION_NUMBER(1, 1, 0)
@@ -80,8 +76,26 @@ extern "C" {
 #define	DT_VERS_1_6_4	DT_VERSION_NUMBER(1, 6, 4)
 #define	DT_VERS_2_0	DT_VERSION_NUMBER(2, 0, 0)
 #define	DT_VERS_2_0_1	DT_VERSION_NUMBER(2, 0, 1)
+#define	DT_VERS_2_0_2	DT_VERSION_NUMBER(2, 0, 2)
+#define	DT_VERS_2_0_3	DT_VERSION_NUMBER(2, 0, 3)
+
+/*
+ * When the version number is updated, the following must be kept in sync:
+ *
+ *   libdtrace/dt_version.h    DT_VERS_STRING, an ASCII string that contains
+ *                             the latest version within it along with any
+ *                             suffixes (e.g. Beta)
+ *
+ *   libdtrace/dt_open.c       _dtrace_versions[]
+ *
+ *   dtrace.spec               Version
+ *
+ *   libdtrace/Build           libdtrace_VERSION
+ *
+ *   GNUmakefile               VERSION
+ */
 
-#define	DT_VERS_STRING	"Oracle D 2.0"
+#define	DT_VERS_STRING	"Oracle D 2.0.3"
 
 #ifdef  __cplusplus
 }
diff --git a/test/unittest/dtrace-util/tst.APIVersion.r b/test/unittest/dtrace-util/tst.APIVersion.r
index 6bc7b9d72..02bf03150 100644
--- a/test/unittest/dtrace-util/tst.APIVersion.r
+++ b/test/unittest/dtrace-util/tst.APIVersion.r
@@ -1 +1 @@
-dtrace: Oracle D 2.0
+dtrace: Oracle D 2.0.x
diff --git a/test/unittest/dtrace-util/tst.APIVersion.r.p b/test/unittest/dtrace-util/tst.APIVersion.r.p
new file mode 100755
index 000000000..32ec94df4
--- /dev/null
+++ b/test/unittest/dtrace-util/tst.APIVersion.r.p
@@ -0,0 +1,6 @@
+#!/usr/bin/gawk -f
+
+# The test allows the version string to vary in micro number as well as
+# other suffixes (like "Beta").  The .r.p and .r files still need to be
+# updated for each minor number change.
+{ sub("^dtrace: Oracle D 2\\.0\\..*$", "dtrace: Oracle D 2.0.x"); print }
diff --git a/test/unittest/options/tst.version.r b/test/unittest/options/tst.version.r
index 15010b3db..882e208ed 100644
--- a/test/unittest/options/tst.version.r
+++ b/test/unittest/options/tst.version.r
@@ -1,2 +1,2 @@
-version is 2.0
+version is 2.0.x
 
diff --git a/test/unittest/options/tst.version.sh b/test/unittest/options/tst.version.sh
index ffffcdd8b..684120af8 100755
--- a/test/unittest/options/tst.version.sh
+++ b/test/unittest/options/tst.version.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 #
 # Oracle Linux DTrace.
-# Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
+# Copyright (c) 2023, 2025, Oracle and/or its affiliates. All rights reserved.
 # Licensed under the Universal Permissive License v 1.0 as shown at
 # http://oss.oracle.com/licenses/upl.
 #
@@ -9,7 +9,7 @@
 dtrace=$1
 
 myversion=`$dtrace $dt_flags -V | gawk '{ print $NF }'`
-echo version is $myversion
+echo version is $myversion | sed 's:2.0.[0-9]:2.0.x:'
 
 $dtrace $dt_flags -xversion=$myversion -qn 'BEGIN { exit(0) }'
 exit $?
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/4] test: Add test for predefined preprocessor definitions
  2025-06-25  6:03 [PATCH v2] Optimize USDT discovery a little eugene.loh
  2025-06-25  6:03 ` [PATCH v2 3/4] Sync up the version numbers eugene.loh
@ 2025-06-25  6:03 ` eugene.loh
  2025-06-25  6:03 ` [PATCH v2 2/2] Extend the USDT bit mask to multiple words eugene.loh
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: eugene.loh @ 2025-06-25  6:03 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

Orabug: 28763074
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
---
 COMMANDLINE-OPTIONS                          |  10 +-
 test/unittest/preprocessor/tst.predefined.r  |   1 +
 test/unittest/preprocessor/tst.predefined.sh | 119 +++++++++++++++++++
 3 files changed, 125 insertions(+), 5 deletions(-)
 create mode 100644 test/unittest/preprocessor/tst.predefined.r
 create mode 100755 test/unittest/preprocessor/tst.predefined.sh

diff --git a/COMMANDLINE-OPTIONS b/COMMANDLINE-OPTIONS
index 40561af91..73be89b1f 100644
--- a/COMMANDLINE-OPTIONS
+++ b/COMMANDLINE-OPTIONS
@@ -321,12 +321,12 @@ definitions are always specified and valid in all modes:
     * __sparcv9 (on SPARC® systems only when 64–bit programs are compiled)
     * __i386 (on x86 systems only when 32–bit programs are compiled)
     * __amd64 (on x86 systems only when 64–bit programs are compiled)
-    * _`uname -s` (for example, __Linux)
+    * __`uname -s` (for example, __Linux)
     * __SUNW_D=1
-    * _SUNW_D_VERSION=0x_MMmmmuuu (where MM is the Major release value
-      in hexadecimal, mmm is the Minor release value in hexadecimal,
-      and uuu is the Micro release value in hexadecimal; see Chapter
-      41, Versioning for more information about DTrace versioning)
+    * _SUNW_D_VERSION=(MM << 24 | mmm << 12 | uuu), where
+      MM is the Major release value
+      mmm is the Minor release value
+      uuu is the Micro release value
 
 -Z
 Permit probe descriptions that match zero probes. If the -Z option is
diff --git a/test/unittest/preprocessor/tst.predefined.r b/test/unittest/preprocessor/tst.predefined.r
new file mode 100644
index 000000000..2e9ba477f
--- /dev/null
+++ b/test/unittest/preprocessor/tst.predefined.r
@@ -0,0 +1 @@
+success
diff --git a/test/unittest/preprocessor/tst.predefined.sh b/test/unittest/preprocessor/tst.predefined.sh
new file mode 100755
index 000000000..47e35d9c6
--- /dev/null
+++ b/test/unittest/preprocessor/tst.predefined.sh
@@ -0,0 +1,119 @@
+#!/bin/bash
+#
+# Oracle Linux DTrace.
+# Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.
+# Licensed under the Universal Permissive License v 1.0 as shown at
+# http://oss.oracle.com/licenses/upl.
+#
+# Confirm preprocessor pre-definitions.
+
+dtrace=$1
+
+DIRNAME=$tmpdir/predefined.$$.$RANDOM
+mkdir -p $DIRNAME
+cd $DIRNAME
+
+# Arg 1 is macro that we check is defined.
+
+function check_defined() {
+	# Add to script: #ifdef is okay, else is ERROR.
+	echo '#ifdef' $1                         >> D.d
+	echo 'printf("'$1' okay\n");'            >> D.d
+	echo '#else'                             >> D.d
+	echo 'printf("ERROR!  missing '$1'\n");' >> D.d
+	echo '#endif'                            >> D.d
+
+	# Add to check file: expect "okay" message.
+	echo $1 okay                             >> chk.txt
+}
+
+# Arg 1 is macro whose value we check to be arg 2.
+
+function check_value() {
+	# Add to script: print value.
+	echo 'printf("'$1'=%x\n", '$1');'        >> D.d
+
+	# Add to check file: expected value.
+	echo $1=$2                               >> chk.txt
+}
+
+# Arg 1 is macro that we check is not defined.
+
+function check_undef() {
+	# Add to script: #ifdef is ERROR, else is okay.
+	echo '#ifdef' $1                         >> D.d
+	echo 'printf("ERROR!  found '$1'\n");'   >> D.d
+	echo '#else'                             >> D.d
+	echo 'printf("missing '$1' is okay\n");' >> D.d
+	echo '#endif'                            >> D.d
+
+	# Add to check file: expect "okay" message.
+	echo missing $1 is okay                  >> chk.txt
+}
+
+# Construct version string (major, minor, micro).
+
+read MM mmm uuu <<< `$dtrace -vV | awk '/^This is DTrace / { gsub("\\\.", " "); print $(NF-2), $(NF-1), $NF }'`
+vers=`printf "%x" $(($MM << 24 | $mmm << 12 | $uuu))`
+
+# Start setting up the D script.
+
+echo 'BEGIN {' > D.d
+
+# Check for the preprocessor definitions of COMMANDLINE-OPTIONS.
+
+check_defined __linux
+check_defined __unix
+check_defined __SVR4
+if [ `uname -m` == x86_64 ]; then
+check_defined __amd64
+else
+check_undef   __amd64
+fi
+check_defined __`uname -s`
+check_value   __SUNW_D 1
+check_value   __SUNW_D_VERSION $vers
+
+# Confirm other preprocessor definitions.
+
+check_defined __SUNW_D_64
+
+# Confirm that __GNUC__ is not present.
+
+check_undef __GNUC__
+
+# Finish setting up the D script.
+
+echo 'exit(0); }' >> D.d
+echo              >> chk.txt
+
+# Run the D script.
+
+$dtrace $dt_flags -qCs D.d -o out.txt
+if [ $? -ne 0 ]; then
+	echo ERROR: DTrace failed
+	echo "==== D.d"
+	cat        D.d
+	echo "==== out.txt"
+	cat        out.txt
+	exit 1
+fi
+
+# Check.
+
+if ! diff -q chk.txt out.txt; then
+	echo ERROR output disagrees
+	echo === expect ===
+	cat chk.txt
+	echo === actual ===
+	cat out.txt
+	echo === diff ===
+	diff chk.txt out.txt
+	exit 1
+fi
+
+# Indicate success.
+
+echo success
+
+exit 0
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] Extend the USDT bit mask to multiple words
  2025-06-25  6:03 [PATCH v2] Optimize USDT discovery a little eugene.loh
  2025-06-25  6:03 ` [PATCH v2 3/4] Sync up the version numbers eugene.loh
  2025-06-25  6:03 ` [PATCH v2 4/4] test: Add test for predefined preprocessor definitions eugene.loh
@ 2025-06-25  6:03 ` eugene.loh
  2025-07-22 12:34   ` Nick Alcock
  2025-06-25  6:03 ` [PATCH] test: Extend timeout for many-pids test eugene.loh
  2025-07-22 10:45 ` [PATCH v2] Optimize USDT discovery a little Nick Alcock
  4 siblings, 1 reply; 10+ messages in thread
From: eugene.loh @ 2025-06-25  6:03 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

Currently, USDT is limited to 64 probe descriptions since the
underlying probe uses a 64-bit mask to decide which probes to execute.

Change to a multi-word bit mask that can be extended to however many
probe descriptions there are.

Also, change the mask words to be 32-bit rather than 64-bit.  The reason
is that, commonly, there will be fewer than 32 probe descriptions.  In
this case, we shorten the value of the "USDT prids" BPF map from 16 bytes
        uint32_t        prid;
        long long       mask[1];
down to 8 bytes
        uint32_t        prid;
        uint32_t        mask[1];
(The second member is smaller and no longer costs extra padding.)

We also add an
        extern int usdt_prids_map_val_extra_bytes;
to denote how many extra bytes will be needed for the extended mask.
This value is computed by usdt_prids_map_val_extra_bytes_init().
Currently, this function is awkwardly called in gmap_create_usdt(),
just before the value is needed.  Such a call to a provider-specific
function is clumsy, but there are no other calls to the provider
between compilation (where the number of statements is determined)
and this map creation.

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
 libdtrace/dt_bpf.c                            |   6 +-
 libdtrace/dt_bpf_maps.h                       |   5 +-
 libdtrace/dt_prov_uprobe.c                    |  81 ++++++++---
 .../unittest/usdt/tst.manyprobedescriptions.r |   1 +
 .../usdt/tst.manyprobedescriptions.sh         |  64 +++++++++
 .../usdt/tst.manyprobedescriptions2.r         |   1 +
 .../usdt/tst.manyprobedescriptions2.sh        | 127 ++++++++++++++++++
 7 files changed, 261 insertions(+), 24 deletions(-)
 create mode 100644 test/unittest/usdt/tst.manyprobedescriptions.r
 create mode 100755 test/unittest/usdt/tst.manyprobedescriptions.sh
 create mode 100644 test/unittest/usdt/tst.manyprobedescriptions2.r
 create mode 100755 test/unittest/usdt/tst.manyprobedescriptions2.sh

diff --git a/libdtrace/dt_bpf.c b/libdtrace/dt_bpf.c
index ddd849d0b..029d5fcdb 100644
--- a/libdtrace/dt_bpf.c
+++ b/libdtrace/dt_bpf.c
@@ -967,6 +967,7 @@ gmap_create_probes(dtrace_hdl_t *dtp)
 	return 0;
 }
 
+void usdt_prids_map_val_extra_bytes_init(dtrace_hdl_t *dtp);
 /*
  * Create the 'usdt_names' and 'usdt_prids' BPF maps.
  *
@@ -992,8 +993,11 @@ gmap_create_usdt(dtrace_hdl_t *dtp)
 	if (dtp->dt_usdt_namesmap_fd == -1)
 		return -1;
 
+	usdt_prids_map_val_extra_bytes_init(dtp);
+
 	dtp->dt_usdt_pridsmap_fd = create_gmap(dtp, "usdt_prids", BPF_MAP_TYPE_HASH,
-	    sizeof(usdt_prids_map_key_t), sizeof(usdt_prids_map_val_t), nusdtprobes);
+	    sizeof(usdt_prids_map_key_t),
+	    sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes, nusdtprobes);
 	if (dtp->dt_usdt_pridsmap_fd == -1)
 		return -1;
 
diff --git a/libdtrace/dt_bpf_maps.h b/libdtrace/dt_bpf_maps.h
index 884dc3983..bdb20c9f2 100644
--- a/libdtrace/dt_bpf_maps.h
+++ b/libdtrace/dt_bpf_maps.h
@@ -1,6 +1,6 @@
 /*
  * Oracle Linux DTrace.
- * Copyright (c) 2019, 2024, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2019, 2025, Oracle and/or its affiliates. All rights reserved.
  * Licensed under the Universal Permissive License v 1.0 as shown at
  * http://oss.oracle.com/licenses/upl.
  */
@@ -48,8 +48,9 @@ typedef struct usdt_prids_map_key {
 } usdt_prids_map_key_t;
 typedef struct usdt_prids_map_val {
 	uint32_t	prid;		/* should be dtrace_id_t, sys/dtrace_types.h */
-	long long	mask;
+	uint32_t	mask[1];
 } usdt_prids_map_val_t;
+extern int usdt_prids_map_val_extra_bytes;
 
 #ifdef  __cplusplus
 }
diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
index 5ba50b678..65413cba6 100644
--- a/libdtrace/dt_prov_uprobe.c
+++ b/libdtrace/dt_prov_uprobe.c
@@ -303,6 +303,8 @@ typedef struct list_key {
 	usdt_prids_map_key_t	key;
 } list_key_t;
 
+int usdt_prids_map_val_extra_bytes;
+
 static const dtrace_pattr_t	pattr = {
 { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
 { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
@@ -403,7 +405,7 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
 	int			fdprids = dtp->dt_usdt_pridsmap_fd;
 	int			fdnames = dtp->dt_usdt_namesmap_fd;
 	usdt_prids_map_key_t	key, nxt;
-	usdt_prids_map_val_t	val;
+	usdt_prids_map_val_t	*val = alloca(sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes);
 	list_key_t		keys_to_delete, *elem, *elem_next;
 	dt_probe_t		*prp, *prp_next;
 
@@ -418,7 +420,7 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
 	while (dt_bpf_map_next_key(fdprids, &key, &nxt) == 0) {
 		memcpy(&key, &nxt, sizeof(usdt_prids_map_key_t));
 
-		if (dt_bpf_map_lookup(fdprids, &key, &val) == -1)
+		if (dt_bpf_map_lookup(fdprids, &key, val) == -1)
 			return dt_set_errno(dtp, EDT_BPF);
 
 		/* Check if the process is still running. */
@@ -431,7 +433,7 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
 			 * we might delete the same usdt_names entry
 			 * multiple times.  That's okay.
 			 */
-			dt_bpf_map_delete(fdnames, &val.prid);
+			dt_bpf_map_delete(fdnames, &val->prid);
 
 			/*
 			 * Delete the usdt_prids entry.
@@ -452,7 +454,7 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
 		 * FIXME.  There might be another case, where the process
 		 * is still running, but some of its USDT probes are gone?
 		 * So maybe we have to check for the existence of one of
-		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val.prid]->desc;
+		 *     dtrace_probedesc_t *pdp = dtp->dt_probes[val->prid]->desc;
 		 *     char *prv = ...pdp->prv minus the numerial part;
 		 *
 		 *     /run/dtrace/probes/$pid/$pdp->prv/$pdp->mod/$pdp->fun/$pdp->prb
@@ -589,6 +591,33 @@ static void usdt_error(dt_pcb_t *pcb, const char *fmt, ...)
 	longjmp(pcb->pcb_jmpbuf, EDT_COMPILER);
 }
 
+void usdt_prids_map_val_extra_bytes_init(dtrace_hdl_t *dtp) {
+	int i, n = 0, w = sizeof(((usdt_prids_map_val_t *)0)->mask[0]);
+
+	/* Count how many statements cannot be ignored, regardless of uprp. */
+	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
+		dtrace_stmtdesc_t *stp;
+
+		stp = dtp->dt_stmts[i];
+		if (stp == NULL || ignore_clause(dtp, i, NULL))
+			continue;
+
+		n++;
+	}
+
+	/* Determine how many bytes are needed for this many bits. */
+	n = (n + CHAR_BIT - 1) / CHAR_BIT;
+
+	/* Determine how many words are needed for this many bytes. */
+	n = (n + w - 1) / w;
+
+	/* Determine how many extra bytes are needed. */
+	if (n > 1)
+		usdt_prids_map_val_extra_bytes = (n - 1) * w;
+	else
+		usdt_prids_map_val_extra_bytes = 0;
+}
+
 static int add_probe_uprobe(dtrace_hdl_t *dtp, dt_probe_t *prp)
 {
 	dtrace_difo_t   *dp;
@@ -650,6 +679,7 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
 	int				fd = dtp->dt_usdt_namesmap_fd;
 	pid_t				pid;
 	list_probe_t			*pup;
+	usdt_prids_map_val_t		*val;
 
 	/* Add probe name elements to usdt_names map. */
 	p = probnam;
@@ -685,11 +715,11 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
 	}
 
 	/* Add prid and bit mask to usdt_prids map. */
+	val = alloca(sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes);
 	for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
 		dt_probe_t		*uprp = pup->probe;
-		long long		mask = 0, bit = 1;
+		uint32_t		iword = 0, mask = 0, bit = 1;
 		usdt_prids_map_key_t	key;
-		usdt_prids_map_val_t	val;
 		dt_uprobe_t		*upp = uprp->prv_data;
 
 		/*
@@ -707,11 +737,15 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
 				dtrace_stmtdesc_t *stp;
 
 				stp = dtp->dt_stmts[n];
-				if (stp == NULL)
+				if (stp == NULL || ignore_clause(dtp, n, uprp))
 					continue;
 
-				if (ignore_clause(dtp, n, uprp))
-					continue;
+				if (bit == 0) {
+					val->mask[iword] = mask;
+					mask = 0;
+					iword++;
+					bit = 1;
+				}
 
 				if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
 				    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
@@ -726,11 +760,11 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
 		key.pid = pid;
 		key.uprid = uprp->desc->id;
 
-		val.prid = prp->desc->id;
-		val.mask = mask;
+		val->prid = prp->desc->id;
+		val->mask[iword] = mask;
 
 		// FIXME Check return value, but how should errors be handled?
-		dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
+		dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, val);
 	}
 
 	return 0;
@@ -1451,7 +1485,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
 	const list_probe_t	*pop;
 	uint_t			lbl_exit = pcb->pcb_exitlbl;
 	dt_ident_t		*usdt_prids = dt_dlib_get_map(dtp, "usdt_prids");
-	int			n;
+	int			n, ibit, w = CHAR_BIT * sizeof(((usdt_prids_map_val_t *)0)->mask[0]);
 
 	assert(usdt_prids != NULL);
 
@@ -1538,7 +1572,8 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
 	 */
 	assert(sizeof(usdt_prids_map_key_t) <= DT_STK_SLOT_SZ);
 	emit(dlp,  BPF_STORE(BPF_W, BPF_REG_FP, DT_TRAMP_SP_SLOT(0), BPF_REG_0));
-	emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP, DT_TRAMP_SP_SLOT(0) + (int)sizeof(pid_t), uprp->desc->id));
+	emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP,
+		   DT_TRAMP_SP_SLOT(0) + (int)sizeof(pid_t), uprp->desc->id));
 	dt_cg_xsetx(dlp, usdt_prids, DT_LBL_NONE, BPF_REG_1, usdt_prids->di_id);
 	emit(dlp,  BPF_MOV_REG(BPF_REG_2, BPF_REG_FP));
 	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0)));
@@ -1572,8 +1607,8 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
 	emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_1, BPF_REG_0, 0));
 	emit(dlp,  BPF_STORE(BPF_W, BPF_REG_7, DMST_PRID, BPF_REG_1));
 
-	/* Read the bit mask from the table lookup in %r6. */    // FIXME someday, extend this past 64 bits
-	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_6, BPF_REG_0, offsetof(usdt_prids_map_val_t, mask)));
+	/* Store the value key for reuse. */
+	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DT_TRAMP_SP_SLOT(0), BPF_REG_0));
 
 	/*
 	 * Apply arg mappings, if needed.
@@ -1587,21 +1622,24 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
 	/*
 	 * Hold the bit mask in %r6 between clause calls.
 	 */
-	for (n = 0; n < dtp->dt_stmt_nextid; n++) {
+	for (ibit = n = 0; n < dtp->dt_stmt_nextid; n++) {
 		dtrace_stmtdesc_t *stp;
 		dt_ident_t	*idp;
 		uint_t		lbl_next;
 
 		stp = dtp->dt_stmts[n];
-		if (stp == NULL)
-			continue;
-
-		if (ignore_clause(dtp, n, uprp))
+		if (stp == NULL || ignore_clause(dtp, n, uprp))
 			continue;
 
 		idp = stp->dtsd_clause;
 		lbl_next = dt_irlist_label(dlp);
 
+		/* Load the next word of the bit mask into %r6. */
+		if (ibit % w == 0) {
+			emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_TRAMP_SP_SLOT(0)));
+			emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_6, BPF_REG_0, offsetof(usdt_prids_map_val_t, mask[ibit / w])));
+		}
+
 		/* If the lowest %r6 bit is 0, skip over this clause. */
 		emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_6));
 		emit(dlp,  BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 1));
@@ -1629,6 +1667,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
 
 		/* Right-shift %r6. */
 		emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 1));
+		ibit++;
 	}
 
 out:
diff --git a/test/unittest/usdt/tst.manyprobedescriptions.r b/test/unittest/usdt/tst.manyprobedescriptions.r
new file mode 100644
index 000000000..2e9ba477f
--- /dev/null
+++ b/test/unittest/usdt/tst.manyprobedescriptions.r
@@ -0,0 +1 @@
+success
diff --git a/test/unittest/usdt/tst.manyprobedescriptions.sh b/test/unittest/usdt/tst.manyprobedescriptions.sh
new file mode 100755
index 000000000..92a61d5b7
--- /dev/null
+++ b/test/unittest/usdt/tst.manyprobedescriptions.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+#
+# Oracle Linux DTrace.
+# Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.
+# Licensed under the Universal Permissive License v 1.0 as shown at
+# http://oss.oracle.com/licenses/upl.
+
+dtrace=$1
+TRIGGER=$PWD/test/triggers/usdt-tst-args
+
+DIRNAME="$tmpdir/usdt-many_probe_descriptions.$$.$RANDOM"
+mkdir -p $DIRNAME
+cd $DIRNAME
+
+# Construct the D scripts and output files.
+# We stick 80 probe descriptions in each of 3 scripts to test
+# USDT's ability to handle hundreds of probe descriptions.
+for d in 0 1 2; do
+for x in 00 01 02 03 04 05 06 07 08 09 \
+         10 11 12 13 14 15 16 17 18 19 \
+         20 21 22 23 24 25 26 27 28 29 \
+         30 31 32 33 34 35 36 37 38 39 \
+         40 41 42 43 44 45 46 47 48 49 \
+         50 51 52 53 54 55 56 57 58 59 \
+         60 61 62 63 64 65 66 67 68 69 \
+         70 71 72 73 74 75 76 77 78 79 \
+; do
+	echo 'test_prov$target:::place { printf("'$d$x'\n"); }' >> D$d.d
+	echo $d$x >> expect.txt
+done
+done
+echo 'test_prov$target:::place { exit(0); }' >> D$d.d
+echo >> expect.txt
+
+# Run DTrace.
+
+$dtrace $dt_flags -c $TRIGGER -q -s D0.d -s D1.d -s D2.d >& actual.txt
+if [ $? -eq 0 ]; then
+	if diff -q expect.txt actual.txt > /dev/null; then
+		echo success
+		exit 0
+	else
+		echo ERROR: did not get expected results
+		echo === expect.txt
+		cat      expect.txt
+		echo === actual.txt
+		cat      actual.txt
+		echo === diff
+		diff expect.txt actual.txt
+	fi
+else
+	echo ERROR: dtrace error
+	echo ==== output
+	cat actual.txt
+fi
+
+echo ==== script D0.d
+cat D0.d
+echo ==== script D1.d
+cat D1.d
+echo ==== script D2.d
+cat D2.d
+
+exit 1
diff --git a/test/unittest/usdt/tst.manyprobedescriptions2.r b/test/unittest/usdt/tst.manyprobedescriptions2.r
new file mode 100644
index 000000000..2e9ba477f
--- /dev/null
+++ b/test/unittest/usdt/tst.manyprobedescriptions2.r
@@ -0,0 +1 @@
+success
diff --git a/test/unittest/usdt/tst.manyprobedescriptions2.sh b/test/unittest/usdt/tst.manyprobedescriptions2.sh
new file mode 100755
index 000000000..8001cec0b
--- /dev/null
+++ b/test/unittest/usdt/tst.manyprobedescriptions2.sh
@@ -0,0 +1,127 @@
+#!/bin/bash
+#
+# Oracle Linux DTrace.
+# Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.
+# Licensed under the Universal Permissive License v 1.0 as shown at
+# http://oss.oracle.com/licenses/upl.
+
+# This test uses many probes and probe descriptions.  Therefore, the
+# number of BPF programs to load into the kernel -- dt_bpf_load_prog()
+# calling prp->prov->impl->load_prog(), which is dt_bpf_prog_load() --
+# and the duration of each load are both increasing.
+# @@timeout: 400
+
+dtrace=$1
+
+DIRNAME="$tmpdir/usdt-many_probe_descriptions2.$$.$RANDOM"
+mkdir -p $DIRNAME
+cd $DIRNAME
+
+# Set the lists.
+# - The probes will be foo$x$y.
+# - The probe descriptions will be foo$x* and foo*$y, for each $d.
+# So if there are nx items in xlist, ny in ylist, and nd in dlist,
+# - there will be roughly nx*ny probes
+# - there will be roughly (nx+ny)*nd probe descriptions
+
+xlist="a b c d e f g h i j k l m"
+ylist="n o p q r s t u v w x y z"
+dlist="0 1 2 3 4 5 6 7 8"
+
+# Make the trigger:  Preambles.
+
+echo "provider testprov {" > prov.d
+
+echo '#include "prov.h"' > main.c
+echo 'int main(int argc, char **argv) {' >> main.c
+
+# Make the trigger:  Loop over the probes.
+
+for x in $xlist; do
+for y in $ylist; do
+    echo "probe foo$x$y();" >> prov.d
+    echo "TESTPROV_FOO$x$y();" | awk '{ print(toupper($1)) }' >> main.c
+done
+done
+
+# Make the trigger:  Epilogues.
+
+echo "};" >> prov.d
+echo "return 0; }" >> main.c
+
+# Build the trigger.
+
+$dtrace $dt_flags -h -s prov.d
+if [ $? -ne 0 ]; then
+	echo "failed to generate header file" >&2
+	cat prov.d
+	exit 1
+fi
+$CC $test_cppflags -c main.c
+if [ $? -ne 0 ]; then
+	echo "failed to compile test" >&2
+	cat main.c
+	exit 1
+fi
+$dtrace $dt_flags -G -64 -s prov.d main.o
+if [ $? -ne 0 ]; then
+	echo "failed to create DOF" >&2
+	exit 1
+fi
+$CC $test_ldflags -o main main.o prov.o
+if [ $? -ne 0 ]; then
+	echo "failed to link final executable" >&2
+	exit 1
+fi
+
+# Prepare the D script, generating the probe descriptions.
+
+rm -f D.d
+for d in $dlist; do
+	for x in $xlist; do
+		echo 'testprov$target:::foo'$x'* { printf("'$d' '$x'* %s\n", probename) }' >> D.d
+	done
+	for y in $ylist; do
+		echo 'testprov$target:::foo*'$y' { printf("'$d' *'$y' %s\n", probename) }' >> D.d
+	done
+done
+
+# Prepare the expected output.
+
+for x in $xlist; do
+for y in $ylist; do
+for d in $dlist; do
+	echo $d $x'*' foo$x$y >> expect.txt
+	echo $d '*'$y foo$x$y >> expect.txt
+done
+done
+done
+echo >> expect.txt
+
+# Run DTrace.
+
+$dtrace $dt_flags -c ./main -qs D.d >& actual.txt
+if [ $? -ne 0 ]; then
+	echo ERROR: dtrace error
+	echo "==== D script"
+	cat D.d
+	echo "==== output"
+	cat actual.txt
+	exit 1
+fi
+
+# Check results.
+
+if diff -q expect.txt actual.txt; then
+	echo success
+	exit 0
+else
+	echo ERROR: unexpected results
+	echo "==== expect"
+	cat expect.txt
+	echo "==== actual"
+	cat actual.txt
+	echo "==== diff"
+	diff expect.txt actual.txt
+	exit 1
+fi
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH] test: Extend timeout for many-pids test
  2025-06-25  6:03 [PATCH v2] Optimize USDT discovery a little eugene.loh
                   ` (2 preceding siblings ...)
  2025-06-25  6:03 ` [PATCH v2 2/2] Extend the USDT bit mask to multiple words eugene.loh
@ 2025-06-25  6:03 ` eugene.loh
  2025-07-22 12:44   ` Nick Alcock
  2025-07-22 10:45 ` [PATCH v2] Optimize USDT discovery a little Nick Alcock
  4 siblings, 1 reply; 10+ messages in thread
From: eugene.loh @ 2025-06-25  6:03 UTC (permalink / raw)
  To: dtrace, dtrace-devel

From: Eugene Loh <eugene.loh@oracle.com>

Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
---
 test/unittest/pid/tst.manypids.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test/unittest/pid/tst.manypids.sh b/test/unittest/pid/tst.manypids.sh
index 4f77564f9..38757d0e4 100755
--- a/test/unittest/pid/tst.manypids.sh
+++ b/test/unittest/pid/tst.manypids.sh
@@ -5,6 +5,7 @@
 # Licensed under the Universal Permissive License v 1.0 as shown at
 # http://oss.oracle.com/licenses/upl.
 #
+# @@timeout: 80
 
 if [ $# != 1 ]; then
 	echo expected one argument: '<'dtrace-path'>'
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] Optimize USDT discovery a little
  2025-06-25  6:03 [PATCH v2] Optimize USDT discovery a little eugene.loh
                   ` (3 preceding siblings ...)
  2025-06-25  6:03 ` [PATCH] test: Extend timeout for many-pids test eugene.loh
@ 2025-07-22 10:45 ` Nick Alcock
  4 siblings, 0 replies; 10+ messages in thread
From: Nick Alcock @ 2025-07-22 10:45 UTC (permalink / raw)
  To: eugene.loh; +Cc: dtrace, dtrace-devel

On 25 Jun 2025, eugene loh outgrape:

> From: Eugene Loh <eugene.loh@oracle.com>
>
> We want to reduce the performance cost of USDT discovery if possible.
>
> Specifically, a number of statements -- with probe descriptions such as
> "dtrace:::BEGIN" that could never specify USDT probes -- will not get
> their clause flags set with DT_CLSFLAG_USDT_EXCLUDE.  So these statements
> get considered unnecessarily during periodic probe discovery.
>
> Therefore:
>
> *)  Expand ignore_clause(dtp, n, uprp) to support the case uprp==NULL.
>     This case is independent of any knowledge of a specific underlying
>     probe.
>
> *)  During probe discovery, check ignore_clause(dtp, i, NULL).  This
>     sets the DT_CLSFLAG_USDT_[INCLUDE|EXCLUDE] flag and allows faster
>     exclusion of statements that do not need to be reconsidered.

I'm pretty amazed you can even detect this optimization, but the code
change feels like a good idea anyway, if just on code reuse grounds.

> To take advantage of this optimization, users should specify providers.
> E.g., instead of "BEGIN" (which could conceivably be a USDT probe),
> specify "dtrace:::BEGIN".

It's a shame this is necessary though. (In practice, nobody is ever
going to do it for BEGIN, at least -- but it's common enough with other
probes to be worth taking advantage of.)

> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>

Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

modulo the tiny nit below.

> -			char	lastchar = pdp->prv[len - 1];
> +			char	lastchar = (pdp->prv[0] != '\0' ? pdp->prv[len - 1] : '*');

I'd be a bit more belt-and-braces here, given that if this goes wrong
it's a buffer overrun:

> +			char	lastchar = ((pdp->prv[0] != '\0' && len > 0) ? pdp->prv[len - 1] : '*');

> @@ -555,6 +559,8 @@ ignore_clause(dtrace_hdl_t *dtp, int n, const dt_probe_t *uprp)
>  	}
>  	if (dt_stmt_clsflag_test(stp, DT_CLSFLAG_USDT_EXCLUDE) == 1)
>  		return 1;
> +	if (uprp == NULL)
> +		return 0;

This could do with a blank line above the first if in this hunk, I think?

>  
>  	/*
>  	 * If we cannot ignore this statement, try to use uprp.
> @@ -751,13 +757,9 @@ static int discover(dtrace_hdl_t *dtp)
>  	 */
>  	memset(&pcb, 0, sizeof(dt_pcb_t));
>  	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> -		dtrace_stmtdesc_t *stp;
> -
> -		stp = dtp->dt_stmts[i];
> -		if (stp == NULL)
> +		if (ignore_clause(dtp, i, NULL))
>  			continue;
> -		if (dt_stmt_clsflag_test(stp, DT_CLSFLAG_USDT_EXCLUDE) != 1)
> -			dt_pid_create_usdt_probes(&stp->dtsd_ecbdesc->dted_probe, dtp, &pcb);
> +		dt_pid_create_usdt_probes(&dtp->dt_stmts[i]->dtsd_ecbdesc->dted_probe, dtp, &pcb);

(straight moves into the callee, ok.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DTrace-devel] [PATCH v2 3/4] Sync up the version numbers
  2025-06-25  6:03 ` [PATCH v2 3/4] Sync up the version numbers eugene.loh
@ 2025-07-22 10:51   ` Nick Alcock
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Alcock @ 2025-07-22 10:51 UTC (permalink / raw)
  To: eugene.loh; +Cc: dtrace-devel, dtrace

On 25 Jun 2025, eugene loh verbalised:

> From: Eugene Loh <eugene.loh@oracle.com>
>
> DTrace has many version numbers -- e.g., for the release, packaging,
> and API.
>
> In reality, the variations in numbering have become nearly meaningless:

Tell me about it :(

> - Packaging numbers -- like the Version in the dtrace*spec RPM spec
>   file and the VERSION in GNUmakefile -- have basically been tracking
>   the DTrace release since 2.0 anyway.
>
> - Stability attributes for idents are haphazard.  Generally, they
>   are 1.0 (or sometimes other 1.x), and all the BPFs are 2.0.  While
>   this is generally accurate, it is not exactly robust, and idents
>   are sometimes introduced or modified without careful regard to
>   the version number.  Further, the stability of user D scripts is
>   likely to depend more on kernel variations -- e.g., the contents of
>   available_filter_functions -- than on D changes.
>
> - Version number updates are susceptible to mistakes, and so there
>   have been version mismatches.
>
> Bring the version numbers into sync for 2.0.3.  Clean up the descriptions
> in dt_version.h.

So... this means all the ident version numbers change to 2? I doubt it :)
but at least we *have* 2.0.[23] stability-related version numbers now.

Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

modulo the tiny nit below.

> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> ---
>  GNUmakefile                                  |  5 +-
>  dtrace.spec                                  |  1 +
>  libdtrace/dt_open.c                          |  4 ++
>  libdtrace/dt_version.h                       | 54 ++++++++++++--------
>  test/unittest/dtrace-util/tst.APIVersion.r   |  2 +-
>  test/unittest/dtrace-util/tst.APIVersion.r.p |  6 +++
>  test/unittest/options/tst.version.r          |  2 +-
>  test/unittest/options/tst.version.sh         |  4 +-
>  8 files changed, 52 insertions(+), 26 deletions(-)
>  create mode 100755 test/unittest/dtrace-util/tst.APIVersion.r.p
>
> diff --git a/GNUmakefile b/GNUmakefile
> index d1e18bb1b..00269785b 100644
> --- a/GNUmakefile
> +++ b/GNUmakefile
> @@ -3,7 +3,7 @@
>  # Build files in subdirectories are included by this file.
>  #
>  # Oracle Linux DTrace.
> -# Copyright (c) 2011, 2024, Oracle and/or its affiliates. All rights reserved.
> +# Copyright (c) 2011, 2025, Oracle and/or its affiliates. All rights reserved.
>  # Licensed under the Universal Permissive License v 1.0 as shown at
>  # http://oss.oracle.com/licenses/upl.
>  
> @@ -14,7 +14,8 @@
>  SHELL = /bin/bash
>  
>  PROJECT := dtrace
> -VERSION := 2.0.1
> +# When updating version, see comments in dt_version.h.
> +VERSION := 2.0.3

Oh dear :( this one isn't invisible to users...

> diff --git a/libdtrace/dt_version.h b/libdtrace/dt_version.h
> index 0f47ea70d..5cfcab69a 100644
> --- a/libdtrace/dt_version.h
> +++ b/libdtrace/dt_version.h
> @@ -37,32 +37,28 @@ extern "C" {
>   *
>   * These #defines are used in identifier tables to fill in the version fields
>   * associated with each identifier.  The DT_VERS_* macros declare the encoded
> - * integer values of all versions used so far.  DT_VERS_STRING must be an ASCII
> - * string that contains the latest version within it along with any suffixes
> - * (e.g. Beta).  You must update DT_VERS_STRING when adding a new version,
> - * and then add the new version to the _dtrace_versions[] array declared in
> - * dt_open.c.
> + * integer values of all versions used so far.
>   *
> - * Refer to the Solaris Dynamic Tracing Guide Versioning chapter for an
> - * explanation of these DTrace features and their values.
> + * The major number should be incremented when a fundamental change has been
> + * made that would affect all consumers, and would reflect sweeping changes
> + * to DTrace or the D language.
> + *
> + * The minor number should be incremented when a change is introduced that
> + * could break scripts that had previously worked;  for example, adding a
> + * new built-in variable could break a script which was already using that
> + * identifier.
> + *
> + * The micro number should be changed when introducing functionality changes
> + * or major bug fixes that do not affect backward compatibility -- this is
> + * merely to make capabilities easily determined from the version number.
> + *
> + * Minor bugs do not require any modification to the version number.

Nice general rules! Now we just have to follow them :)

> + * When the version number is updated, the following must be kept in sync:
> + *
> + *   libdtrace/dt_version.h    DT_VERS_STRING, an ASCII string that contains
> + *                             the latest version within it along with any
> + *                             suffixes (e.g. Beta)
> + *
> + *   libdtrace/dt_open.c       _dtrace_versions[]
> + *
> + *   dtrace.spec               Version
> + *
> + *   libdtrace/Build           libdtrace_VERSION
> + *
> + *   GNUmakefile               VERSION

I don't think that's quite true -- in particular, libdtrace_VERSION
should only be bumped if it experiences incompatible API or ABI changes,
whether or not DTrace itself has seen e.g. major language changes (which
is why libdtrace_VERSION is not simply $(VERSION) already).

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] Extend the USDT bit mask to multiple words
  2025-06-25  6:03 ` [PATCH v2 2/2] Extend the USDT bit mask to multiple words eugene.loh
@ 2025-07-22 12:34   ` Nick Alcock
  2025-07-23  0:47     ` Eugene Loh
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Alcock @ 2025-07-22 12:34 UTC (permalink / raw)
  To: eugene.loh; +Cc: dtrace, dtrace-devel

On 25 Jun 2025, eugene loh outgrape:

> From: Eugene Loh <eugene.loh@oracle.com>
>
> Currently, USDT is limited to 64 probe descriptions since the
> underlying probe uses a 64-bit mask to decide which probes to execute.
>
> Change to a multi-word bit mask that can be extended to however many
> probe descriptions there are.

Oh yeah!!!!

Not quite happy enough with this one to r-b it, but it's sooo close...
:)

> Also, change the mask words to be 32-bit rather than 64-bit.  The reason
> is that, commonly, there will be fewer than 32 probe descriptions.  In
> this case, we shorten the value of the "USDT prids" BPF map from 16 bytes
>         uint32_t        prid;
>         long long       mask[1];
> down to 8 bytes
>         uint32_t        prid;
>         uint32_t        mask[1];
> (The second member is smaller and no longer costs extra padding.)

Nice!

> We also add an
>         extern int usdt_prids_map_val_extra_bytes;
> to denote how many extra bytes will be needed for the extended mask.

Ugh. Is this really the best we can do? A global variable referenced via
extern? Can't we put it in the dtp or anything? This will fail as soon
as we have more than one dtrace handle... and yes I now that's not
exactly tested and probably doesn't work, but we shouldn't break it
worse when keeping it working as well as it is now is so simple.

> @@ -48,8 +48,9 @@ typedef struct usdt_prids_map_key {
>  } usdt_prids_map_key_t;
>  typedef struct usdt_prids_map_val {
>  	uint32_t	prid;		/* should be dtrace_id_t, sys/dtrace_types.h */
> -	long long	mask;
> +	uint32_t	mask[1];
>  } usdt_prids_map_val_t;
> +extern int usdt_prids_map_val_extra_bytes;

Note that this is not how you're supposed to implement VLAs: [1] is
extremely obsolete (as in, pre-C99) and very deprecated at this point.
[] is the recommended approach, if you can live with its restrictions
(and I think we can here: all we need to do is remove the extra in
extra_bytes and *always* allocate at least one byte, which actually
simplifies the code in usdt_prids_map_val_extra_bytes(). [] does include
the size of necessary padding: mask will start at precisely
sizeof(usdt_prids_map_val_t).)

> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
> index 5ba50b678..65413cba6 100644
> --- a/libdtrace/dt_prov_uprobe.c
> +++ b/libdtrace/dt_prov_uprobe.c
> @@ -303,6 +303,8 @@ typedef struct list_key {
>  	usdt_prids_map_key_t	key;
>  } list_key_t;
>  
> +int usdt_prids_map_val_extra_bytes;

Ugh!

>  static const dtrace_pattr_t	pattr = {
>  { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
>  { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
> @@ -403,7 +405,7 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
>  	int			fdprids = dtp->dt_usdt_pridsmap_fd;
>  	int			fdnames = dtp->dt_usdt_namesmap_fd;
>  	usdt_prids_map_key_t	key, nxt;
> -	usdt_prids_map_val_t	val;
> +	usdt_prids_map_val_t	*val = alloca(sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes);

(this bit will need changing slightly if you use [].)

>  
> +void usdt_prids_map_val_extra_bytes_init(dtrace_hdl_t *dtp) {

Wrongly-placed {.

> +	int i, n = 0, w = sizeof(((usdt_prids_map_val_t *)0)->mask[0]);
> +
> +	/* Count how many statements cannot be ignored, regardless of uprp. */
> +	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
> +		dtrace_stmtdesc_t *stp;
> +
> +		stp = dtp->dt_stmts[i];
> +		if (stp == NULL || ignore_clause(dtp, i, NULL))
> +			continue;

I was about to complain about the cost of all these ignore_clause()s,
but of course this is only invoked once per run...

> +		n++;
> +	}
> +
> +	/* Determine how many bytes are needed for this many bits. */
> +	n = (n + CHAR_BIT - 1) / CHAR_BIT;
> +
> +	/* Determine how many words are needed for this many bytes. */
> +	n = (n + w - 1) / w;

Thank God you commented this -- it took me ages to realise how it works.
It feels like it should be a standard idiom, but I've never encountered
it before...

> +	/* Determine how many extra bytes are needed. */
> +	if (n > 1)
> +		usdt_prids_map_val_extra_bytes = (n - 1) * w;
> +	else
> +		usdt_prids_map_val_extra_bytes = 0;

If you use a [] array, this can simply become

usdt_prids_map_val_bytes = n * w;

> +}
> +
>  static int add_probe_uprobe(dtrace_hdl_t *dtp, dt_probe_t *prp)
>  {
>  	dtrace_difo_t   *dp;
> @@ -650,6 +679,7 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>  	int				fd = dtp->dt_usdt_namesmap_fd;
>  	pid_t				pid;
>  	list_probe_t			*pup;
> +	usdt_prids_map_val_t		*val;
>  
>  	/* Add probe name elements to usdt_names map. */
>  	p = probnam;
> @@ -685,11 +715,11 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>  	}
>  
>  	/* Add prid and bit mask to usdt_prids map. */
> +	val = alloca(sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes);
>  	for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
>  		dt_probe_t		*uprp = pup->probe;
> -		long long		mask = 0, bit = 1;
> +		uint32_t		iword = 0, mask = 0, bit = 1;
>  		usdt_prids_map_key_t	key;
> -		usdt_prids_map_val_t	val;
>  		dt_uprobe_t		*upp = uprp->prv_data;
>  
>  		/*
> @@ -707,11 +737,15 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>  				dtrace_stmtdesc_t *stp;
>  
>  				stp = dtp->dt_stmts[n];
> -				if (stp == NULL)
> +				if (stp == NULL || ignore_clause(dtp, n, uprp))
>  					continue;
>  
> -				if (ignore_clause(dtp, n, uprp))
> -					continue;
> +				if (bit == 0) {
> +					val->mask[iword] = mask;
> +					mask = 0;
> +					iword++;
> +					bit = 1;
> +				}
>  
>  				if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
>  				    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
> @@ -726,11 +760,11 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>  		key.pid = pid;
>  		key.uprid = uprp->desc->id;
>  
> -		val.prid = prp->desc->id;
> -		val.mask = mask;
> +		val->prid = prp->desc->id;
> +		val->mask[iword] = mask;
>
>  		// FIXME Check return value, but how should errors be handled?
> -		dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
> +		dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, val);
>  	}

... am I missing something, or do you never modify "mask" (the local
variable) anywhere in this function? So it ends up always zero...

> @@ -1451,7 +1485,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  	const list_probe_t	*pop;
>  	uint_t			lbl_exit = pcb->pcb_exitlbl;
>  	dt_ident_t		*usdt_prids = dt_dlib_get_map(dtp, "usdt_prids");
> -	int			n;
> +	int			n, ibit, w = CHAR_BIT * sizeof(((usdt_prids_map_val_t *)0)->mask[0]);

We could really do with better variable names than "n" and "w". w seems
to be the bit-width of a single word: word_width? width?

> @@ -1538,7 +1572,8 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  	 */
>  	assert(sizeof(usdt_prids_map_key_t) <= DT_STK_SLOT_SZ);
>  	emit(dlp,  BPF_STORE(BPF_W, BPF_REG_FP, DT_TRAMP_SP_SLOT(0), BPF_REG_0));
> -	emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP, DT_TRAMP_SP_SLOT(0) + (int)sizeof(pid_t), uprp->desc->id));
> +	emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP,
> +		   DT_TRAMP_SP_SLOT(0) + (int)sizeof(pid_t), uprp->desc->id));
>  	dt_cg_xsetx(dlp, usdt_prids, DT_LBL_NONE, BPF_REG_1, usdt_prids->di_id);
>  	emit(dlp,  BPF_MOV_REG(BPF_REG_2, BPF_REG_FP));
>  	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0)));
> @@ -1572,8 +1607,8 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  	emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_1, BPF_REG_0, 0));
>  	emit(dlp,  BPF_STORE(BPF_W, BPF_REG_7, DMST_PRID, BPF_REG_1));
>  
> -	/* Read the bit mask from the table lookup in %r6. */    // FIXME someday, extend this past 64 bits
> -	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_6, BPF_REG_0, offsetof(usdt_prids_map_val_t, mask)));
> +	/* Store the value key for reuse. */
> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DT_TRAMP_SP_SLOT(0), BPF_REG_0));

This all makes sense...

> @@ -1587,21 +1622,24 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  	/*
>  	 * Hold the bit mask in %r6 between clause calls.
>  	 */
> -	for (n = 0; n < dtp->dt_stmt_nextid; n++) {
> +	for (ibit = n = 0; n < dtp->dt_stmt_nextid; n++) {
>  		dtrace_stmtdesc_t *stp;
>  		dt_ident_t	*idp;
>  		uint_t		lbl_next;
>  
>  		stp = dtp->dt_stmts[n];
> -		if (stp == NULL)
> -			continue;
> -
> -		if (ignore_clause(dtp, n, uprp))
> +		if (stp == NULL || ignore_clause(dtp, n, uprp))
>  			continue;
>  
>  		idp = stp->dtsd_clause;
>  		lbl_next = dt_irlist_label(dlp);

... as does this (although we are now calling ignore_clause() at least
twice for every clause, I think three times)...

> +		/* Load the next word of the bit mask into %r6. */
> +		if (ibit % w == 0) {
> +			emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_TRAMP_SP_SLOT(0)));
> +			emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_6, BPF_REG_0, offsetof(usdt_prids_map_val_t, mask[ibit / w])));

ok, so this uses the 'w' thing to get an array offset out of a bit
count, and then you can just use the existing repated-shift code. Should
work.

> +		}
> +
>  		/* If the lowest %r6 bit is 0, skip over this clause. */
>  		emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_6));
>  		emit(dlp,  BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 1));
> @@ -1629,6 +1667,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>  
>  		/* Right-shift %r6. */
>  		emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 1));
> +		ibit++;

... so, this isn't in the loop conditionl because the values are only
filled for non-ignored, non-NULL statements. This assumption is
duplicated in two places (here and in usdt_prids_map_val_extra_bytes())
and if it ever drifts we get uninitialized rubbish or buffer overruns.
Should it be centralized somewhere, somehow? At least it needs a
comment.

> +# We stick 80 probe descriptions in each of 3 scripts to test
> +# USDT's ability to handle hundreds of probe descriptions.

> +++ b/test/unittest/usdt/tst.manyprobedescriptions2.sh

... maybe manyprobedescriptions-bytesize.sh :)

> @@ -0,0 +1,127 @@
> +#!/bin/bash
> +#
> +# Oracle Linux DTrace.
> +# Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.
> +# Licensed under the Universal Permissive License v 1.0 as shown at
> +# http://oss.oracle.com/licenses/upl.
> +
> +# This test uses many probes and probe descriptions.  Therefore, the
> +# number of BPF programs to load into the kernel -- dt_bpf_load_prog()
> +# calling prp->prov->impl->load_prog(), which is dt_bpf_prog_load() --
> +# and the duration of each load are both increasing.
> +# @@timeout: 400
> +
> +dtrace=$1
> +
> +DIRNAME="$tmpdir/usdt-many_probe_descriptions2.$$.$RANDOM"
> +mkdir -p $DIRNAME
> +cd $DIRNAME
> +
> +# Set the lists.
> +# - The probes will be foo$x$y.
> +# - The probe descriptions will be foo$x* and foo*$y, for each $d.
> +# So if there are nx items in xlist, ny in ylist, and nd in dlist,
> +# - there will be roughly nx*ny probes
> +# - there will be roughly (nx+ny)*nd probe descriptions
> +
> +xlist="a b c d e f g h i j k l m"
> +ylist="n o p q r s t u v w x y z"
> +dlist="0 1 2 3 4 5 6 7 8"

... and with 8... maybe we should also test 0, 1, 7 and 9, checking all
the boundary conditions (and refactor things so we don't need to copy
all of this every time).

(You'll only really see things go wrong under valgrind, but that's fine,
valgrinding should be routine before releases anyway.)

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] test: Extend timeout for many-pids test
  2025-06-25  6:03 ` [PATCH] test: Extend timeout for many-pids test eugene.loh
@ 2025-07-22 12:44   ` Nick Alcock
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Alcock @ 2025-07-22 12:44 UTC (permalink / raw)
  To: eugene.loh; +Cc: dtrace, dtrace-devel

On 25 Jun 2025, eugene loh said:

> From: Eugene Loh <eugene.loh@oracle.com>
>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>

Reviewed-by: Nick Alcock <nick.alcock@oracle.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] Extend the USDT bit mask to multiple words
  2025-07-22 12:34   ` Nick Alcock
@ 2025-07-23  0:47     ` Eugene Loh
  0 siblings, 0 replies; 10+ messages in thread
From: Eugene Loh @ 2025-07-23  0:47 UTC (permalink / raw)
  To: Nick Alcock; +Cc: dtrace, dtrace-devel

On 7/22/25 08:34, Nick Alcock wrote:

> On 25 Jun 2025, eugene loh outgrape:
>> From: Eugene Loh <eugene.loh@oracle.com>
>>
>> Currently, USDT is limited to 64 probe descriptions since the
>> underlying probe uses a 64-bit mask to decide which probes to execute.
>>
>> Change to a multi-word bit mask that can be extended to however many
>> probe descriptions there are.
> Oh yeah!!!!
>
> Not quite happy enough with this one to r-b it, but it's sooo close...
> :)
>
>> Also, change the mask words to be 32-bit rather than 64-bit.  The reason
>> is that, commonly, there will be fewer than 32 probe descriptions.  In
>> this case, we shorten the value of the "USDT prids" BPF map from 16 bytes
>>          uint32_t        prid;
>>          long long       mask[1];
>> down to 8 bytes
>>          uint32_t        prid;
>>          uint32_t        mask[1];
>> (The second member is smaller and no longer costs extra padding.)
> Nice!
>
>> We also add an
>>          extern int usdt_prids_map_val_extra_bytes;
>> to denote how many extra bytes will be needed for the extended mask.
> Ugh. Is this really the best we can do? A global variable referenced via
> extern? Can't we put it in the dtp or anything? This will fail as soon
> as we have more than one dtrace handle... and yes I now that's not
> exactly tested and probably doesn't work, but we shouldn't break it
> worse when keeping it working as well as it is now is so simple.

Okay, changed.

>> @@ -48,8 +48,9 @@ typedef struct usdt_prids_map_key {
>>   } usdt_prids_map_key_t;
>>   typedef struct usdt_prids_map_val {
>>   	uint32_t	prid;		/* should be dtrace_id_t, sys/dtrace_types.h */
>> -	long long	mask;
>> +	uint32_t	mask[1];
>>   } usdt_prids_map_val_t;
>> +extern int usdt_prids_map_val_extra_bytes;
> Note that this is not how you're supposed to implement VLAs: [1] is
> extremely obsolete (as in, pre-C99) and very deprecated at this point.
> [] is the recommended approach, if you can live with its restrictions
> (and I think we can here: all we need to do is remove the extra in
> extra_bytes and *always* allocate at least one byte, which actually
> simplifies the code in usdt_prids_map_val_extra_bytes(). [] does include
> the size of necessary padding: mask will start at precisely
> sizeof(usdt_prids_map_val_t).)

Also changed.

>> diff --git a/libdtrace/dt_prov_uprobe.c b/libdtrace/dt_prov_uprobe.c
>> index 5ba50b678..65413cba6 100644
>> --- a/libdtrace/dt_prov_uprobe.c
>> +++ b/libdtrace/dt_prov_uprobe.c
>> @@ -303,6 +303,8 @@ typedef struct list_key {
>>   	usdt_prids_map_key_t	key;
>>   } list_key_t;
>>   
>> +int usdt_prids_map_val_extra_bytes;
> Ugh!

Ditto.

>>   static const dtrace_pattr_t	pattr = {
>>   { DTRACE_STABILITY_EVOLVING, DTRACE_STABILITY_EVOLVING, DTRACE_CLASS_ISA },
>>   { DTRACE_STABILITY_PRIVATE, DTRACE_STABILITY_PRIVATE, DTRACE_CLASS_UNKNOWN },
>> @@ -403,7 +405,7 @@ clean_usdt_probes(dtrace_hdl_t *dtp)
>>   	int			fdprids = dtp->dt_usdt_pridsmap_fd;
>>   	int			fdnames = dtp->dt_usdt_namesmap_fd;
>>   	usdt_prids_map_key_t	key, nxt;
>> -	usdt_prids_map_val_t	val;
>> +	usdt_prids_map_val_t	*val = alloca(sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes);
> (this bit will need changing slightly if you use [].)
>
>>   
>> +void usdt_prids_map_val_extra_bytes_init(dtrace_hdl_t *dtp) {
> Wrongly-placed {.

Got it.

>> +	int i, n = 0, w = sizeof(((usdt_prids_map_val_t *)0)->mask[0]);
>> +
>> +	/* Count how many statements cannot be ignored, regardless of uprp. */
>> +	for (i = 0; i < dtp->dt_stmt_nextid; i++) {
>> +		dtrace_stmtdesc_t *stp;
>> +
>> +		stp = dtp->dt_stmts[i];
>> +		if (stp == NULL || ignore_clause(dtp, i, NULL))
>> +			continue;
> I was about to complain about the cost of all these ignore_clause()s,
> but of course this is only invoked once per run...
>
>> +		n++;
>> +	}
>> +
>> +	/* Determine how many bytes are needed for this many bits. */
>> +	n = (n + CHAR_BIT - 1) / CHAR_BIT;
>> +
>> +	/* Determine how many words are needed for this many bytes. */
>> +	n = (n + w - 1) / w;
> Thank God you commented this -- it took me ages to realise how it works.
> It feels like it should be a standard idiom, but I've never encountered
> it before...
>
>> +	/* Determine how many extra bytes are needed. */
>> +	if (n > 1)
>> +		usdt_prids_map_val_extra_bytes = (n - 1) * w;
>> +	else
>> +		usdt_prids_map_val_extra_bytes = 0;
> If you use a [] array, this can simply become
>
> usdt_prids_map_val_bytes = n * w;

Earlier you wrote, "*always* allocate at least one byte, which actually 
simplifies the code in usdt_prids_map_val_extra_bytes()." So I'm missing 
what you mean in the case n=0.

>> +}
>> +
>>   static int add_probe_uprobe(dtrace_hdl_t *dtp, dt_probe_t *prp)
>>   {
>>   	dtrace_difo_t   *dp;
>> @@ -650,6 +679,7 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>>   	int				fd = dtp->dt_usdt_namesmap_fd;
>>   	pid_t				pid;
>>   	list_probe_t			*pup;
>> +	usdt_prids_map_val_t		*val;
>>   
>>   	/* Add probe name elements to usdt_names map. */
>>   	p = probnam;
>> @@ -685,11 +715,11 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>>   	}
>>   
>>   	/* Add prid and bit mask to usdt_prids map. */
>> +	val = alloca(sizeof(usdt_prids_map_val_t) + usdt_prids_map_val_extra_bytes);
>>   	for (pup = prp->prv_data; pup != NULL; pup = dt_list_next(pup)) {
>>   		dt_probe_t		*uprp = pup->probe;
>> -		long long		mask = 0, bit = 1;
>> +		uint32_t		iword = 0, mask = 0, bit = 1;
>>   		usdt_prids_map_key_t	key;
>> -		usdt_prids_map_val_t	val;
>>   		dt_uprobe_t		*upp = uprp->prv_data;
>>   
>>   		/*
>> @@ -707,11 +737,15 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>>   				dtrace_stmtdesc_t *stp;
>>   
>>   				stp = dtp->dt_stmts[n];
>> -				if (stp == NULL)
>> +				if (stp == NULL || ignore_clause(dtp, n, uprp))
>>   					continue;
>>   
>> -				if (ignore_clause(dtp, n, uprp))
>> -					continue;
>> +				if (bit == 0) {
>> +					val->mask[iword] = mask;
>> +					mask = 0;
>> +					iword++;
>> +					bit = 1;
>> +				}
>>   
>>   				if (dt_gmatch(prp->desc->prv, stp->dtsd_ecbdesc->dted_probe.prv) &&
>>   				    dt_gmatch(prp->desc->mod, stp->dtsd_ecbdesc->dted_probe.mod) &&
>> @@ -726,11 +760,11 @@ static int add_probe_usdt(dtrace_hdl_t *dtp, dt_probe_t *prp)
>>   		key.pid = pid;
>>   		key.uprid = uprp->desc->id;
>>   
>> -		val.prid = prp->desc->id;
>> -		val.mask = mask;
>> +		val->prid = prp->desc->id;
>> +		val->mask[iword] = mask;
>>
>>   		// FIXME Check return value, but how should errors be handled?
>> -		dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, &val);
>> +		dt_bpf_map_update(dtp->dt_usdt_pridsmap_fd, &key, val);
>>   	}
> ... am I missing something, or do you never modify "mask" (the local
> variable) anywhere in this function? So it ends up always zero...

Perhaps you're missing something?  mask |= bit

>> @@ -1451,7 +1485,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>>   	const list_probe_t	*pop;
>>   	uint_t			lbl_exit = pcb->pcb_exitlbl;
>>   	dt_ident_t		*usdt_prids = dt_dlib_get_map(dtp, "usdt_prids");
>> -	int			n;
>> +	int			n, ibit, w = CHAR_BIT * sizeof(((usdt_prids_map_val_t *)0)->mask[0]);
> We could really do with better variable names than "n" and "w". w seems
> to be the bit-width of a single word: word_width? width?

Hopefully, this is good enough:  "w" for "word width" or "width"? Mimic 
other similar code closely.  Seemed like a reasonable choice to me.

>> @@ -1538,7 +1572,8 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>>   	 */
>>   	assert(sizeof(usdt_prids_map_key_t) <= DT_STK_SLOT_SZ);
>>   	emit(dlp,  BPF_STORE(BPF_W, BPF_REG_FP, DT_TRAMP_SP_SLOT(0), BPF_REG_0));
>> -	emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP, DT_TRAMP_SP_SLOT(0) + (int)sizeof(pid_t), uprp->desc->id));
>> +	emit(dlp,  BPF_STORE_IMM(BPF_W, BPF_REG_FP,
>> +		   DT_TRAMP_SP_SLOT(0) + (int)sizeof(pid_t), uprp->desc->id));
>>   	dt_cg_xsetx(dlp, usdt_prids, DT_LBL_NONE, BPF_REG_1, usdt_prids->di_id);
>>   	emit(dlp,  BPF_MOV_REG(BPF_REG_2, BPF_REG_FP));
>>   	emit(dlp,  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, DT_TRAMP_SP_SLOT(0)));
>> @@ -1572,8 +1607,8 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>>   	emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_1, BPF_REG_0, 0));
>>   	emit(dlp,  BPF_STORE(BPF_W, BPF_REG_7, DMST_PRID, BPF_REG_1));
>>   
>> -	/* Read the bit mask from the table lookup in %r6. */    // FIXME someday, extend this past 64 bits
>> -	emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_6, BPF_REG_0, offsetof(usdt_prids_map_val_t, mask)));
>> +	/* Store the value key for reuse. */
>> +	emit(dlp,  BPF_STORE(BPF_DW, BPF_REG_FP, DT_TRAMP_SP_SLOT(0), BPF_REG_0));
> This all makes sense...
>
>> @@ -1587,21 +1622,24 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>>   	/*
>>   	 * Hold the bit mask in %r6 between clause calls.
>>   	 */
>> -	for (n = 0; n < dtp->dt_stmt_nextid; n++) {
>> +	for (ibit = n = 0; n < dtp->dt_stmt_nextid; n++) {
>>   		dtrace_stmtdesc_t *stp;
>>   		dt_ident_t	*idp;
>>   		uint_t		lbl_next;
>>   
>>   		stp = dtp->dt_stmts[n];
>> -		if (stp == NULL)
>> -			continue;
>> -
>> -		if (ignore_clause(dtp, n, uprp))
>> +		if (stp == NULL || ignore_clause(dtp, n, uprp))
>>   			continue;
>>   
>>   		idp = stp->dtsd_clause;
>>   		lbl_next = dt_irlist_label(dlp);
> ... as does this (although we are now calling ignore_clause() at least
> twice for every clause, I think three times)...
>
>> +		/* Load the next word of the bit mask into %r6. */
>> +		if (ibit % w == 0) {
>> +			emit(dlp,  BPF_LOAD(BPF_DW, BPF_REG_0, BPF_REG_FP, DT_TRAMP_SP_SLOT(0)));
>> +			emit(dlp,  BPF_LOAD(BPF_W, BPF_REG_6, BPF_REG_0, offsetof(usdt_prids_map_val_t, mask[ibit / w])));
> ok, so this uses the 'w' thing to get an array offset out of a bit
> count, and then you can just use the existing repated-shift code. Should
> work.
>
>> +		}
>> +
>>   		/* If the lowest %r6 bit is 0, skip over this clause. */
>>   		emit(dlp,  BPF_MOV_REG(BPF_REG_1, BPF_REG_6));
>>   		emit(dlp,  BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 1));
>> @@ -1629,6 +1667,7 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
>>   
>>   		/* Right-shift %r6. */
>>   		emit(dlp,  BPF_ALU64_IMM(BPF_RSH, BPF_REG_6, 1));
>> +		ibit++;
> ... so, this isn't in the loop conditionl because the values are only
> filled for non-ignored, non-NULL statements. This assumption is
> duplicated in two places (here and in usdt_prids_map_val_extra_bytes())
> and if it ever drifts we get uninitialized rubbish or buffer overruns.
> Should it be centralized somewhere, somehow? At least it needs a
> comment.

The loop bodies differ.  Okay, though, I added comments.

>> +# We stick 80 probe descriptions in each of 3 scripts to test
>> +# USDT's ability to handle hundreds of probe descriptions.
>> +++ b/test/unittest/usdt/tst.manyprobedescriptions2.sh
> ... maybe manyprobedescriptions-bytesize.sh :)

What is "byte size"?  Anyhow, the point is that there are many probe 
descriptions.

>> @@ -0,0 +1,127 @@
>> +#!/bin/bash
>> +#
>> +# Oracle Linux DTrace.
>> +# Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved.
>> +# Licensed under the Universal Permissive License v 1.0 as shown at
>> +# http://oss.oracle.com/licenses/upl.
>> +
>> +# This test uses many probes and probe descriptions.  Therefore, the
>> +# number of BPF programs to load into the kernel -- dt_bpf_load_prog()
>> +# calling prp->prov->impl->load_prog(), which is dt_bpf_prog_load() --
>> +# and the duration of each load are both increasing.
>> +# @@timeout: 400
>> +
>> +dtrace=$1
>> +
>> +DIRNAME="$tmpdir/usdt-many_probe_descriptions2.$$.$RANDOM"
>> +mkdir -p $DIRNAME
>> +cd $DIRNAME
>> +
>> +# Set the lists.
>> +# - The probes will be foo$x$y.
>> +# - The probe descriptions will be foo$x* and foo*$y, for each $d.
>> +# So if there are nx items in xlist, ny in ylist, and nd in dlist,
>> +# - there will be roughly nx*ny probes
>> +# - there will be roughly (nx+ny)*nd probe descriptions
>> +
>> +xlist="a b c d e f g h i j k l m"
>> +ylist="n o p q r s t u v w x y z"
>> +dlist="0 1 2 3 4 5 6 7 8"
> ... and with 8... maybe we should also test 0, 1, 7 and 9, checking all
> the boundary conditions (and refactor things so we don't need to copy
> all of this every time).

But this is not a case of filling some byte up to 8 bits.  The test has 
nested loops over these lists, appending more probes or probe 
descriptions, as appropriate.  There are 13x13 probes, one after 
another.  There are (13+13)*9 probe descriptions, one after another.  
There are 13x13x9 probe firings, one after another.  It just so happens 
that none of those numbers is a multiple of 8;  the only point is that 
each of those numbers is "big."

Thanks for the review.  Next version momentarily.

> (You'll only really see things go wrong under valgrind, but that's fine,
> valgrinding should be routine before releases anyway.)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-07-23  0:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25  6:03 [PATCH v2] Optimize USDT discovery a little eugene.loh
2025-06-25  6:03 ` [PATCH v2 3/4] Sync up the version numbers eugene.loh
2025-07-22 10:51   ` [DTrace-devel] " Nick Alcock
2025-06-25  6:03 ` [PATCH v2 4/4] test: Add test for predefined preprocessor definitions eugene.loh
2025-06-25  6:03 ` [PATCH v2 2/2] Extend the USDT bit mask to multiple words eugene.loh
2025-07-22 12:34   ` Nick Alcock
2025-07-23  0:47     ` Eugene Loh
2025-06-25  6:03 ` [PATCH] test: Extend timeout for many-pids test eugene.loh
2025-07-22 12:44   ` Nick Alcock
2025-07-22 10:45 ` [PATCH v2] Optimize USDT discovery a little Nick Alcock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox