From: Kris Van Hees <kris.van.hees@oracle.com>
To: eugene.loh@oracle.com
Cc: dtrace@lists.linux.dev, dtrace-devel@oss.oracle.com
Subject: Re: [PATCH] Improve dt_pid_create_usdt_probes() error handling
Date: Mon, 6 Jan 2025 14:13:43 -0500 [thread overview]
Message-ID: <Z3wrZ7dmS/vaVxF3@oracle.com> (raw)
In-Reply-To: <20241202042741.12328-1-eugene.loh@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
... although I think that the problem is (in part) on the design side of
USDT handling, where we do a bit too much work at this point of probe
discovery, when we do not even know whether we will be using those probes.
But that is subject matter for a more extensive reworking of the code.
This patch seems sufficient to deal with the existing problem of DTrace
being affected by unruly processes.
On Sun, Dec 01, 2024 at 11:27:41PM -0500, eugene.loh@oracle.com wrote:
> From: Eugene Loh <eugene.loh@oracle.com>
>
> In one run of the test suite, roughly 90 consecutive tests failed.
> The exact circumstances are difficult to reconstruct, but basically
> they complained with messages like:
>
> dtrace: failed to compile script ...: failed to grab process 23132
>
> The same pid was always indicated. Then, the errors stopped.
>
> Apparently, the problem was trying "dtrace -s foo.d" where the script
> has a BEGIN (or END) probe. Then we get to
>
> cmd/dtrace.c compile_file
> -> libdtrace/dt_cc.c dt_program_compile
> -> libdtrace/dt_cc.c dt_compile
> -> libdtrace/dt_cc.c dt_compile_one_clause
> -> libdtrace/dt_cc.c dt_setcontext
> -> libdtrace/dt_pid.c dt_pid_create_usdt_probes
>
> We look for processes that might have USDT probes. Since the provider
> description is blank, we check every process in .../run/dtrace. It
> turned out, that pid 23132 could not be locked. This sent an error
> back up the call stack.
>
> We do not know why process 23132 could not be locked, but having dtrace
> fail under these circumstances (using BEGIN or END probe) seems severe.
>
> Change dt_pid_create_usdt_probes() error handling. Even if there is a
> problem with some USDT processes, report success anyhow if there was a
> glob pid specification. If, on the other hand, a pid was specifically
> requested, then a problem with that pid results in an error.
>
> Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
> ---
> libdtrace/dt_pid.c | 35 +++++++++++++++++++++++++++++++++--
> 1 file changed, 33 insertions(+), 2 deletions(-)
>
> diff --git a/libdtrace/dt_pid.c b/libdtrace/dt_pid.c
> index e0a26d5aa..fd94a0706 100644
> --- a/libdtrace/dt_pid.c
> +++ b/libdtrace/dt_pid.c
> @@ -1252,7 +1252,8 @@ dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *
> DTRACE_PROC_SHORTLIVED) < 0) {
> dt_pid_error(dtp, pcb, NULL, D_PROC_GRAB,
> "failed to grab process %d", (int)pid);
> - return -1;
> + err = 1; // FIXME but do we want to set the error if we end up return 0?
> + continue;
> }
> dpr = dt_proc_lookup(dtp, pid);
> assert(dpr != NULL);
> @@ -1272,7 +1273,37 @@ dt_pid_create_usdt_probes(dtrace_probedesc_t *pdp, dtrace_hdl_t *dtp, dt_pcb_t *
> free(globpat);
> globfree(&globbuf);
>
> - return err ? -1 : 0;
> + /* If no errors, report success. */
> + if (err == 0)
> + return 0;
> +
> + /* If provider description was blank, report success. */
> + if (pdp->prv[0] == '\0')
> + return 0;
> +
> + /* Look to see if the provider description had a pid glob. */
> + for (i = strlen(pdp->prv) - 1; i >= 0; i--) {
> + /*
> + * If we hit a '*' before a nondigit, we have a pid glob.
> + * So, even though err==0, we declare success.
> + */
> + if (pdp->prv[i] == '*')
> + return 0;
> +
> + /*
> + * If we hit a nondigit before a '*', we do not have a pid glob.
> + * Since a pid was specified explicitly, err==1 means an error.
> + */
> + if (!isdigit(pdp->prv[i]))
> + return -1;
> + }
> +
> + /*
> + * If the provider description was exclusively digits,
> + * it was not a legitimate USDT provider description.
> + * So it makes perfect sense not to return any probes.
> + */
> + return 0;
> }
>
> int
> --
> 2.43.5
>
prev parent reply other threads:[~2025-01-06 19:13 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-02 4:27 [PATCH] Improve dt_pid_create_usdt_probes() error handling eugene.loh
2025-01-06 19:13 ` Kris Van Hees [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z3wrZ7dmS/vaVxF3@oracle.com \
--to=kris.van.hees@oracle.com \
--cc=dtrace-devel@oss.oracle.com \
--cc=dtrace@lists.linux.dev \
--cc=eugene.loh@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox