* [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
@ 2025-11-10 15:27 Kris Van Hees
2025-11-11 0:10 ` Eugene Loh
0 siblings, 1 reply; 7+ messages in thread
From: Kris Van Hees @ 2025-11-10 15:27 UTC (permalink / raw)
To: dtrace, dtrace-devel
The test used 'date' as its trigger but that caused the test to XFAIL
for the wrong reason: date does not have symbols and therefore dtrace
failed to enable probes for it.
Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
---
test/unittest/ustack/tst.depth.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
index dec16a3a..977d1b42 100755
--- a/test/unittest/ustack/tst.depth.sh
+++ b/test/unittest/ustack/tst.depth.sh
@@ -16,7 +16,7 @@ dtrace=$1
rm -f $file
-$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
+$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
#pragma D option quiet
#pragma D option bufsize=1M
@@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
EOF
status=$?
-if [ "$status" -ne 0 ]; then
+if [ $status -ne 0 ]; then
echo $tst: dtrace failed
rm -f $file
exit $status
@@ -78,8 +78,8 @@ EOF
status=$?
-count=`wc -l $file | cut -f1 -do`
-if [ "$count" -lt 1000 ]; then
+count=`wc -l $file | cut -f1 -d' '`
+if [ $count -lt 1000 ]; then
echo $tst: output was too short
status=1
fi
--
2.43.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
2025-11-10 15:27 [PATCH 3/4] test: fix tst.depth.sh to use proper trigger Kris Van Hees
@ 2025-11-11 0:10 ` Eugene Loh
2025-11-11 3:00 ` Kris Van Hees
0 siblings, 1 reply; 7+ messages in thread
From: Eugene Loh @ 2025-11-11 0:10 UTC (permalink / raw)
To: Kris Van Hees, dtrace, dtrace-devel
I'm against this patch.
The picky stuff is:
- the Copyright year should be updated
- the timeout probe could replace
tick-1sec /n++ == 10/
with simply
tick-10sec
- the "delay" env var be set for delaydie???
- the ustacks for delaydie are perhaps not very interesting
A little more serious is that all this patch does is swap one failure
mode for another. What for? It's XFAIL anyhow, and we do not check
that we're getting a particular, expected failure.
The real problem, meanwhile, is that this patch leads to lots of test
failures, at least for me. Not every VM, but many. Oh, and I do not
mean on this test, because this test remains XFAIL anyhow.
Consider a subset of the VMs I run on. I get these failures on the
test/unittest/ustack/tst.*.d tests. Why on these? Because they run
after tst.depth.sh.
+------------------ arm OL8 UEK7
| +---------------- x86 OL9 UEK7
| | +-------------- arm OL9 UEK7
| | | +------------ x86 OL9 UEK8
| | | | +---------- arm OL9 UEK8
| | | | | +-------- arm OL10 UEK8
| | | | | |
v v v v v v
x x x x x x tst.kthread.d FAIL: timed out.
x x x x tst.uaddr.d FAIL: timed out.
x x x x tst.uaddr-pid0.d FAIL: timed out.
x x tst.ufunc.d FAIL: timed out.
x x x tst.ufunc-pid0.d FAIL: timed out.
x x tst.umod.d FAIL: timed out.
x x tst.ustack25_max95_profile.d FAIL: timed out.
x x tst.ustack25_pid.d FAIL: timed out.
x x tst.ustack25_profile.d FAIL: timed out.
x x tst.ustack95_max25_profile.d FAIL: timed out.
x x tst.ustack95_profile.d FAIL: timed out.
x x tst.ustack_max25_profile.d FAIL: timed out.
x x tst.ustack_profile.d FAIL: timed out.
x tst.usym.d FAIL: timed out.
x tst.usym-pid0.d FAIL: timed out.
I think the problem is that tst.depth.sh -- with this patch, in any case
-- turns on a huge number of probes. That causes problems for this
test, but not in the sense of it FAILing since it is marked XFAIL
anyhow. (All this patch does is change the FAIL from one problem to
another, but we still do not specify what failure we expect to see nor
check that that is what we get.)
So, the problem is not with this test, but that the system is left in
some whacked state due to having attempted so many probes. As a result,
the process "dtrace -o $file -c delaydie" is still around well after the
depth test has finished, the load average remains high, the pid0 process
does not run, and many tests time out. How long (how many seconds or
how many tests) does this effect last? It depends, as one can see above.
This raises all sorts of other questions as well, such as whether tests
in our test suite should be allowed to depend on running on a somewhat
idle system, whether we (dtrace or the test suite) are cleaning up well
enough, etc.
On 11/10/25 10:27, Kris Van Hees wrote:
> The test used 'date' as its trigger but that caused the test to XFAIL
> for the wrong reason: date does not have symbols and therefore dtrace
> failed to enable probes for it.
>
> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> ---
> test/unittest/ustack/tst.depth.sh | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
> index dec16a3a..977d1b42 100755
> --- a/test/unittest/ustack/tst.depth.sh
> +++ b/test/unittest/ustack/tst.depth.sh
> @@ -16,7 +16,7 @@ dtrace=$1
>
> rm -f $file
>
> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
>
> #pragma D option quiet
> #pragma D option bufsize=1M
> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> EOF
>
> status=$?
> -if [ "$status" -ne 0 ]; then
> +if [ $status -ne 0 ]; then
> echo $tst: dtrace failed
> rm -f $file
> exit $status
> @@ -78,8 +78,8 @@ EOF
>
> status=$?
>
> -count=`wc -l $file | cut -f1 -do`
> -if [ "$count" -lt 1000 ]; then
> +count=`wc -l $file | cut -f1 -d' '`
> +if [ $count -lt 1000 ]; then
> echo $tst: output was too short
> status=1
> fi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
2025-11-11 0:10 ` Eugene Loh
@ 2025-11-11 3:00 ` Kris Van Hees
2025-11-11 20:50 ` Eugene Loh
0 siblings, 1 reply; 7+ messages in thread
From: Kris Van Hees @ 2025-11-11 3:00 UTC (permalink / raw)
To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel
On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
> I'm against this patch.
>
> The picky stuff is:
> - the Copyright year should be updated
Easy to fix - done.
> - the timeout probe could replace
> tick-1sec /n++ == 10/
> with simply
> tick-10sec
Arguably, the two are not identical. It depends on why one is used over the
other. Here we could probably go either way, though it might be nice to have
the tick probe firing every second as a way to generate a bit more activity.
Given the frequency of the other probes, not really important though as far as
I can see. Note that I didn't touch the timeout probe primarily because that
was not the purpose of this patch anyway.
> - the "delay" env var be set for delaydie???
Why? We are simply using delaydie to have a trigger executable to execute.
This is a valid use of it (and it is used that same way in other tests also).
> - the ustacks for delaydie are perhaps not very interesting
That is not the point of the test, and for that matter, I highly doubt that
ustacks from date would have been any more interesting (aside from the fact
that you'd never know because the test didn't even work with date).
> A little more serious is that all this patch does is swap one failure mode
> for another. What for? It's XFAIL anyhow, and we do not check that we're
> getting a particular, expected failure.
The difference is that the pre-patch failure was due to a problem with the
actual test. But since it is XFAIL that was reported as an expected failure,
which was entirely bogus. After the patch, it still fails but at least it is
failing for valid reasons. I.e. now we have a failure we should look into.
> The real problem, meanwhile, is that this patch leads to lots of test
> failures, at least for me. Not every VM, but many. Oh, and I do not mean
> on this test, because this test remains XFAIL anyhow.
>
> Consider a subset of the VMs I run on. I get these failures on the
> test/unittest/ustack/tst.*.d tests. Why on these? Because they run after
> tst.depth.sh.
>
> +------------------ arm OL8 UEK7
> | +---------------- x86 OL9 UEK7
> | | +-------------- arm OL9 UEK7
> | | | +------------ x86 OL9 UEK8
> | | | | +---------- arm OL9 UEK8
> | | | | | +-------- arm OL10 UEK8
> | | | | | |
> v v v v v v
>
> x x x x x x tst.kthread.d FAIL: timed out.
> x x x x tst.uaddr.d FAIL: timed out.
> x x x x tst.uaddr-pid0.d FAIL: timed out.
> x x tst.ufunc.d FAIL: timed out.
> x x x tst.ufunc-pid0.d FAIL: timed out.
> x x tst.umod.d FAIL: timed out.
> x x tst.ustack25_max95_profile.d FAIL: timed out.
> x x tst.ustack25_pid.d FAIL: timed out.
> x x tst.ustack25_profile.d FAIL: timed out.
> x x tst.ustack95_max25_profile.d FAIL: timed out.
> x x tst.ustack95_profile.d FAIL: timed out.
> x x tst.ustack_max25_profile.d FAIL: timed out.
> x x tst.ustack_profile.d FAIL: timed out.
> x tst.usym.d FAIL: timed out.
> x tst.usym-pid0.d FAIL: timed out.
>
> I think the problem is that tst.depth.sh -- with this patch, in any case --
> turns on a huge number of probes. That causes problems for this test, but
> not in the sense of it FAILing since it is marked XFAIL anyhow. (All this
> patch does is change the FAIL from one problem to another, but we still do
> not specify what failure we expect to see nor check that that is what we
> get.)
>
> So, the problem is not with this test, but that the system is left in some
> whacked state due to having attempted so many probes. As a result, the
> process "dtrace -o $file -c delaydie" is still around well after the depth
> test has finished, the load average remains high, the pid0 process does not
> run, and many tests time out. How long (how many seconds or how many tests)
> does this effect last? It depends, as one can see above.
>
> This raises all sorts of other questions as well, such as whether tests in
> our test suite should be allowed to depend on running on a somewhat idle
> system, whether we (dtrace or the test suite) are cleaning up well enough,
> etc.
There is of course the aspect that this patch is part of a series, and the
next patch in the series should resolve the impact issues of this test. If
it makes you more happy, I'd happily move this patch to the end of the series.
I don't see how it matters, because this crazy fallout from tst.depth.sh will
be an issue either way until the per-PID uprohes support is applied.
> On 11/10/25 10:27, Kris Van Hees wrote:
> > The test used 'date' as its trigger but that caused the test to XFAIL
> > for the wrong reason: date does not have symbols and therefore dtrace
> > failed to enable probes for it.
> >
> > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > ---
> > test/unittest/ustack/tst.depth.sh | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
> > index dec16a3a..977d1b42 100755
> > --- a/test/unittest/ustack/tst.depth.sh
> > +++ b/test/unittest/ustack/tst.depth.sh
> > @@ -16,7 +16,7 @@ dtrace=$1
> > rm -f $file
> > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
> > #pragma D option quiet
> > #pragma D option bufsize=1M
> > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > EOF
> > status=$?
> > -if [ "$status" -ne 0 ]; then
> > +if [ $status -ne 0 ]; then
> > echo $tst: dtrace failed
> > rm -f $file
> > exit $status
> > @@ -78,8 +78,8 @@ EOF
> > status=$?
> > -count=`wc -l $file | cut -f1 -do`
> > -if [ "$count" -lt 1000 ]; then
> > +count=`wc -l $file | cut -f1 -d' '`
> > +if [ $count -lt 1000 ]; then
> > echo $tst: output was too short
> > status=1
> > fi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
2025-11-11 3:00 ` Kris Van Hees
@ 2025-11-11 20:50 ` Eugene Loh
2025-11-11 22:13 ` Kris Van Hees
0 siblings, 1 reply; 7+ messages in thread
From: Eugene Loh @ 2025-11-11 20:50 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 11/10/25 22:00, Kris Van Hees wrote:
> On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
>> I'm against this patch.
>>
>> The picky stuff is:
>> - the Copyright year should be updated
> Easy to fix - done.
>
>> - the timeout probe could replace
>> tick-1sec /n++ == 10/
>> with simply
>> tick-10sec
> Arguably, the two are not identical. It depends on why one is used over the
> other. Here we could probably go either way, though it might be nice to have
> the tick probe firing every second as a way to generate a bit more activity.
> Given the frequency of the other probes, not really important though as far as
> I can see. Note that I didn't touch the timeout probe primarily because that
> was not the purpose of this patch anyway.
>
>> - the "delay" env var be set for delaydie???
> Why? We are simply using delaydie to have a trigger executable to execute.
> This is a valid use of it (and it is used that same way in other tests also).
>
>> - the ustacks for delaydie are perhaps not very interesting
> That is not the point of the test, and for that matter, I highly doubt that
> ustacks from date would have been any more interesting (aside from the fact
> that you'd never know because the test didn't even work with date).
>
>> A little more serious is that all this patch does is swap one failure mode
>> for another. What for? It's XFAIL anyhow, and we do not check that we're
>> getting a particular, expected failure.
> The difference is that the pre-patch failure was due to a problem with the
> actual test. But since it is XFAIL that was reported as an expected failure,
> which was entirely bogus. After the patch, it still fails but at least it is
> failing for valid reasons. I.e. now we have a failure we should look into.
>
>> The real problem, meanwhile, is that this patch leads to lots of test
>> failures, at least for me. Not every VM, but many. Oh, and I do not mean
>> on this test, because this test remains XFAIL anyhow.
>>
>> Consider a subset of the VMs I run on. I get these failures on the
>> test/unittest/ustack/tst.*.d tests. Why on these? Because they run after
>> tst.depth.sh.
>>
>> +------------------ arm OL8 UEK7
>> | +---------------- x86 OL9 UEK7
>> | | +-------------- arm OL9 UEK7
>> | | | +------------ x86 OL9 UEK8
>> | | | | +---------- arm OL9 UEK8
>> | | | | | +-------- arm OL10 UEK8
>> | | | | | |
>> v v v v v v
>>
>> x x x x x x tst.kthread.d FAIL: timed out.
>> x x x x tst.uaddr.d FAIL: timed out.
>> x x x x tst.uaddr-pid0.d FAIL: timed out.
>> x x tst.ufunc.d FAIL: timed out.
>> x x x tst.ufunc-pid0.d FAIL: timed out.
>> x x tst.umod.d FAIL: timed out.
>> x x tst.ustack25_max95_profile.d FAIL: timed out.
>> x x tst.ustack25_pid.d FAIL: timed out.
>> x x tst.ustack25_profile.d FAIL: timed out.
>> x x tst.ustack95_max25_profile.d FAIL: timed out.
>> x x tst.ustack95_profile.d FAIL: timed out.
>> x x tst.ustack_max25_profile.d FAIL: timed out.
>> x x tst.ustack_profile.d FAIL: timed out.
>> x tst.usym.d FAIL: timed out.
>> x tst.usym-pid0.d FAIL: timed out.
>>
>> I think the problem is that tst.depth.sh -- with this patch, in any case --
>> turns on a huge number of probes. That causes problems for this test, but
>> not in the sense of it FAILing since it is marked XFAIL anyhow. (All this
>> patch does is change the FAIL from one problem to another, but we still do
>> not specify what failure we expect to see nor check that that is what we
>> get.)
>>
>> So, the problem is not with this test, but that the system is left in some
>> whacked state due to having attempted so many probes. As a result, the
>> process "dtrace -o $file -c delaydie" is still around well after the depth
>> test has finished, the load average remains high, the pid0 process does not
>> run, and many tests time out. How long (how many seconds or how many tests)
>> does this effect last? It depends, as one can see above.
>>
>> This raises all sorts of other questions as well, such as whether tests in
>> our test suite should be allowed to depend on running on a somewhat idle
>> system, whether we (dtrace or the test suite) are cleaning up well enough,
>> etc.
> There is of course the aspect that this patch is part of a series, and the
> next patch in the series should resolve the impact issues of this test. If
> it makes you more happy, I'd happily move this patch to the end of the series.
> I don't see how it matters, because this crazy fallout from tst.depth.sh will
> be an issue either way until the per-PID uprohes support is applied.
I don't get it.
First of all, the crazy fallout results with the patch and it doesn't
without the patch. Without the patch the test was basically SKIPing
(admittedly as a mistaken side effect of a test defect).
Second, the above results, with the crazy fallout, included the entire
patch series. That is, the tip of the branch looked like this:
59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is
rendered correctly
61e69762 bpf: fix file descriptor leak
20780963 uprobe: remove unnecessary enable_*() functions
170c4cf9 test: fix tst.depth.sh to use proper trigger
cf7ae35b uprobe: Implement PID-specific uprobes
If there are other bits/patches I should try, let me know.
>> On 11/10/25 10:27, Kris Van Hees wrote:
>>> The test used 'date' as its trigger but that caused the test to XFAIL
>>> for the wrong reason: date does not have symbols and therefore dtrace
>>> failed to enable probes for it.
>>>
>>> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
>>> ---
>>> test/unittest/ustack/tst.depth.sh | 8 ++++----
>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
>>> index dec16a3a..977d1b42 100755
>>> --- a/test/unittest/ustack/tst.depth.sh
>>> +++ b/test/unittest/ustack/tst.depth.sh
>>> @@ -16,7 +16,7 @@ dtrace=$1
>>> rm -f $file
>>> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
>>> #pragma D option quiet
>>> #pragma D option bufsize=1M
>>> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>> EOF
>>> status=$?
>>> -if [ "$status" -ne 0 ]; then
>>> +if [ $status -ne 0 ]; then
>>> echo $tst: dtrace failed
>>> rm -f $file
>>> exit $status
>>> @@ -78,8 +78,8 @@ EOF
>>> status=$?
>>> -count=`wc -l $file | cut -f1 -do`
>>> -if [ "$count" -lt 1000 ]; then
>>> +count=`wc -l $file | cut -f1 -d' '`
>>> +if [ $count -lt 1000 ]; then
>>> echo $tst: output was too short
>>> status=1
>>> fi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
2025-11-11 20:50 ` Eugene Loh
@ 2025-11-11 22:13 ` Kris Van Hees
2025-11-11 22:55 ` Eugene Loh
0 siblings, 1 reply; 7+ messages in thread
From: Kris Van Hees @ 2025-11-11 22:13 UTC (permalink / raw)
To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel
On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote:
> On 11/10/25 22:00, Kris Van Hees wrote:
>
> > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
> > > I'm against this patch.
> > >
> > > The picky stuff is:
> > > - the Copyright year should be updated
> > Easy to fix - done.
> >
> > > - the timeout probe could replace
> > > tick-1sec /n++ == 10/
> > > with simply
> > > tick-10sec
> > Arguably, the two are not identical. It depends on why one is used over the
> > other. Here we could probably go either way, though it might be nice to have
> > the tick probe firing every second as a way to generate a bit more activity.
> > Given the frequency of the other probes, not really important though as far as
> > I can see. Note that I didn't touch the timeout probe primarily because that
> > was not the purpose of this patch anyway.
> >
> > > - the "delay" env var be set for delaydie???
> > Why? We are simply using delaydie to have a trigger executable to execute.
> > This is a valid use of it (and it is used that same way in other tests also).
> >
> > > - the ustacks for delaydie are perhaps not very interesting
> > That is not the point of the test, and for that matter, I highly doubt that
> > ustacks from date would have been any more interesting (aside from the fact
> > that you'd never know because the test didn't even work with date).
> >
> > > A little more serious is that all this patch does is swap one failure mode
> > > for another. What for? It's XFAIL anyhow, and we do not check that we're
> > > getting a particular, expected failure.
> > The difference is that the pre-patch failure was due to a problem with the
> > actual test. But since it is XFAIL that was reported as an expected failure,
> > which was entirely bogus. After the patch, it still fails but at least it is
> > failing for valid reasons. I.e. now we have a failure we should look into.
> >
> > > The real problem, meanwhile, is that this patch leads to lots of test
> > > failures, at least for me. Not every VM, but many. Oh, and I do not mean
> > > on this test, because this test remains XFAIL anyhow.
> > >
> > > Consider a subset of the VMs I run on. I get these failures on the
> > > test/unittest/ustack/tst.*.d tests. Why on these? Because they run after
> > > tst.depth.sh.
> > >
> > > +------------------ arm OL8 UEK7
> > > | +---------------- x86 OL9 UEK7
> > > | | +-------------- arm OL9 UEK7
> > > | | | +------------ x86 OL9 UEK8
> > > | | | | +---------- arm OL9 UEK8
> > > | | | | | +-------- arm OL10 UEK8
> > > | | | | | |
> > > v v v v v v
> > >
> > > x x x x x x tst.kthread.d FAIL: timed out.
> > > x x x x tst.uaddr.d FAIL: timed out.
> > > x x x x tst.uaddr-pid0.d FAIL: timed out.
> > > x x tst.ufunc.d FAIL: timed out.
> > > x x x tst.ufunc-pid0.d FAIL: timed out.
> > > x x tst.umod.d FAIL: timed out.
> > > x x tst.ustack25_max95_profile.d FAIL: timed out.
> > > x x tst.ustack25_pid.d FAIL: timed out.
> > > x x tst.ustack25_profile.d FAIL: timed out.
> > > x x tst.ustack95_max25_profile.d FAIL: timed out.
> > > x x tst.ustack95_profile.d FAIL: timed out.
> > > x x tst.ustack_max25_profile.d FAIL: timed out.
> > > x x tst.ustack_profile.d FAIL: timed out.
> > > x tst.usym.d FAIL: timed out.
> > > x tst.usym-pid0.d FAIL: timed out.
> > >
> > > I think the problem is that tst.depth.sh -- with this patch, in any case --
> > > turns on a huge number of probes. That causes problems for this test, but
> > > not in the sense of it FAILing since it is marked XFAIL anyhow. (All this
> > > patch does is change the FAIL from one problem to another, but we still do
> > > not specify what failure we expect to see nor check that that is what we
> > > get.)
> > >
> > > So, the problem is not with this test, but that the system is left in some
> > > whacked state due to having attempted so many probes. As a result, the
> > > process "dtrace -o $file -c delaydie" is still around well after the depth
> > > test has finished, the load average remains high, the pid0 process does not
> > > run, and many tests time out. How long (how many seconds or how many tests)
> > > does this effect last? It depends, as one can see above.
> > >
> > > This raises all sorts of other questions as well, such as whether tests in
> > > our test suite should be allowed to depend on running on a somewhat idle
> > > system, whether we (dtrace or the test suite) are cleaning up well enough,
> > > etc.
> > There is of course the aspect that this patch is part of a series, and the
> > next patch in the series should resolve the impact issues of this test. If
> > it makes you more happy, I'd happily move this patch to the end of the series.
> > I don't see how it matters, because this crazy fallout from tst.depth.sh will
> > be an issue either way until the per-PID uprohes support is applied.
>
> I don't get it.
>
> First of all, the crazy fallout results with the patch and it doesn't
> without the patch. Without the patch the test was basically SKIPing
> (admittedly as a mistaken side effect of a test defect).
>
> Second, the above results, with the crazy fallout, included the entire patch
> series. That is, the tip of the branch looked like this:
Well, it was not clear from your email that you were testing this after you
applied the entire series. That of course changes the situation.
Clearly, there may be more wrong with this test. Or it might be pointing on a
potential issue where DTrace might leave things hanging in a state that takes
too long to clean up. Either way, though, the change in this patch is needed
to make this test even remotely useful.
If the patch causes the test to trigger other tests to fail, then more work
is needed. I would still recommend that we let this patch go through, as it
*is* a correct fix for a problem.
And then look at what the new issue is that this test brings to light, and fix
what is needed to fix that.
> 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered
> correctly
> 61e69762 bpf: fix file descriptor leak
> 20780963 uprobe: remove unnecessary enable_*() functions
> 170c4cf9 test: fix tst.depth.sh to use proper trigger
> cf7ae35b uprobe: Implement PID-specific uprobes
>
> If there are other bits/patches I should try, let me know.
>
> > > On 11/10/25 10:27, Kris Van Hees wrote:
> > > > The test used 'date' as its trigger but that caused the test to XFAIL
> > > > for the wrong reason: date does not have symbols and therefore dtrace
> > > > failed to enable probes for it.
> > > >
> > > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > > > ---
> > > > test/unittest/ustack/tst.depth.sh | 8 ++++----
> > > > 1 file changed, 4 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
> > > > index dec16a3a..977d1b42 100755
> > > > --- a/test/unittest/ustack/tst.depth.sh
> > > > +++ b/test/unittest/ustack/tst.depth.sh
> > > > @@ -16,7 +16,7 @@ dtrace=$1
> > > > rm -f $file
> > > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
> > > > #pragma D option quiet
> > > > #pragma D option bufsize=1M
> > > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > EOF
> > > > status=$?
> > > > -if [ "$status" -ne 0 ]; then
> > > > +if [ $status -ne 0 ]; then
> > > > echo $tst: dtrace failed
> > > > rm -f $file
> > > > exit $status
> > > > @@ -78,8 +78,8 @@ EOF
> > > > status=$?
> > > > -count=`wc -l $file | cut -f1 -do`
> > > > -if [ "$count" -lt 1000 ]; then
> > > > +count=`wc -l $file | cut -f1 -d' '`
> > > > +if [ $count -lt 1000 ]; then
> > > > echo $tst: output was too short
> > > > status=1
> > > > fi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
2025-11-11 22:13 ` Kris Van Hees
@ 2025-11-11 22:55 ` Eugene Loh
2025-11-11 23:09 ` Kris Van Hees
0 siblings, 1 reply; 7+ messages in thread
From: Eugene Loh @ 2025-11-11 22:55 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 11/11/25 17:13, Kris Van Hees wrote:
> On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote:
>> On 11/10/25 22:00, Kris Van Hees wrote:
>>
>>> On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
>>>> I'm against this patch.
>>>>
>>>> The picky stuff is:
>>>> - the Copyright year should be updated
>>> Easy to fix - done.
>>>
>>>> - the timeout probe could replace
>>>> tick-1sec /n++ == 10/
>>>> with simply
>>>> tick-10sec
>>> Arguably, the two are not identical. It depends on why one is used over the
>>> other. Here we could probably go either way, though it might be nice to have
>>> the tick probe firing every second as a way to generate a bit more activity.
>>> Given the frequency of the other probes, not really important though as far as
>>> I can see. Note that I didn't touch the timeout probe primarily because that
>>> was not the purpose of this patch anyway.
>>>
>>>> - the "delay" env var be set for delaydie???
>>> Why? We are simply using delaydie to have a trigger executable to execute.
>>> This is a valid use of it (and it is used that same way in other tests also).
>>>
>>>> - the ustacks for delaydie are perhaps not very interesting
>>> That is not the point of the test, and for that matter, I highly doubt that
>>> ustacks from date would have been any more interesting (aside from the fact
>>> that you'd never know because the test didn't even work with date).
>>>
>>>> A little more serious is that all this patch does is swap one failure mode
>>>> for another. What for? It's XFAIL anyhow, and we do not check that we're
>>>> getting a particular, expected failure.
>>> The difference is that the pre-patch failure was due to a problem with the
>>> actual test. But since it is XFAIL that was reported as an expected failure,
>>> which was entirely bogus. After the patch, it still fails but at least it is
>>> failing for valid reasons. I.e. now we have a failure we should look into.
>>>
>>>> The real problem, meanwhile, is that this patch leads to lots of test
>>>> failures, at least for me. Not every VM, but many. Oh, and I do not mean
>>>> on this test, because this test remains XFAIL anyhow.
>>>>
>>>> Consider a subset of the VMs I run on. I get these failures on the
>>>> test/unittest/ustack/tst.*.d tests. Why on these? Because they run after
>>>> tst.depth.sh.
>>>>
>>>> +------------------ arm OL8 UEK7
>>>> | +---------------- x86 OL9 UEK7
>>>> | | +-------------- arm OL9 UEK7
>>>> | | | +------------ x86 OL9 UEK8
>>>> | | | | +---------- arm OL9 UEK8
>>>> | | | | | +-------- arm OL10 UEK8
>>>> | | | | | |
>>>> v v v v v v
>>>>
>>>> x x x x x x tst.kthread.d FAIL: timed out.
>>>> x x x x tst.uaddr.d FAIL: timed out.
>>>> x x x x tst.uaddr-pid0.d FAIL: timed out.
>>>> x x tst.ufunc.d FAIL: timed out.
>>>> x x x tst.ufunc-pid0.d FAIL: timed out.
>>>> x x tst.umod.d FAIL: timed out.
>>>> x x tst.ustack25_max95_profile.d FAIL: timed out.
>>>> x x tst.ustack25_pid.d FAIL: timed out.
>>>> x x tst.ustack25_profile.d FAIL: timed out.
>>>> x x tst.ustack95_max25_profile.d FAIL: timed out.
>>>> x x tst.ustack95_profile.d FAIL: timed out.
>>>> x x tst.ustack_max25_profile.d FAIL: timed out.
>>>> x x tst.ustack_profile.d FAIL: timed out.
>>>> x tst.usym.d FAIL: timed out.
>>>> x tst.usym-pid0.d FAIL: timed out.
>>>>
>>>> I think the problem is that tst.depth.sh -- with this patch, in any case --
>>>> turns on a huge number of probes. That causes problems for this test, but
>>>> not in the sense of it FAILing since it is marked XFAIL anyhow. (All this
>>>> patch does is change the FAIL from one problem to another, but we still do
>>>> not specify what failure we expect to see nor check that that is what we
>>>> get.)
>>>>
>>>> So, the problem is not with this test, but that the system is left in some
>>>> whacked state due to having attempted so many probes. As a result, the
>>>> process "dtrace -o $file -c delaydie" is still around well after the depth
>>>> test has finished, the load average remains high, the pid0 process does not
>>>> run, and many tests time out. How long (how many seconds or how many tests)
>>>> does this effect last? It depends, as one can see above.
>>>>
>>>> This raises all sorts of other questions as well, such as whether tests in
>>>> our test suite should be allowed to depend on running on a somewhat idle
>>>> system, whether we (dtrace or the test suite) are cleaning up well enough,
>>>> etc.
>>> There is of course the aspect that this patch is part of a series, and the
>>> next patch in the series should resolve the impact issues of this test. If
>>> it makes you more happy, I'd happily move this patch to the end of the series.
>>> I don't see how it matters, because this crazy fallout from tst.depth.sh will
>>> be an issue either way until the per-PID uprohes support is applied.
>> I don't get it.
>>
>> First of all, the crazy fallout results with the patch and it doesn't
>> without the patch. Without the patch the test was basically SKIPing
>> (admittedly as a mistaken side effect of a test defect).
>>
>> Second, the above results, with the crazy fallout, included the entire patch
>> series. That is, the tip of the branch looked like this:
> Well, it was not clear from your email that you were testing this after you
> applied the entire series. That of course changes the situation.
>
> Clearly, there may be more wrong with this test. Or it might be pointing on a
> potential issue where DTrace might leave things hanging in a state that takes
> too long to clean up. Either way, though, the change in this patch is needed
> to make this test even remotely useful.
>
> If the patch causes the test to trigger other tests to fail, then more work
> is needed. I would still recommend that we let this patch go through, as it
> *is* a correct fix for a problem.
>
> And then look at what the new issue is that this test brings to light, and fix
> what is needed to fix that.
How about having the patch also SKIP the test and we simply file a bug
to track the issue. Otherwise, we're simply invalidating a bunch of
other tests (in the process, increasing the run time of the test
suite). The cascade of FAILs is even longer than I indicated for one of
the VMs. The patch is a fix to a problem, but it's only a partial fix
to what's going on here, a partial fix that sabotages quite a few other
tests. The related problems can be tracked much more efficiently and
directly with a bug ticket.
>> 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered
>> correctly
>> 61e69762 bpf: fix file descriptor leak
>> 20780963 uprobe: remove unnecessary enable_*() functions
>> 170c4cf9 test: fix tst.depth.sh to use proper trigger
>> cf7ae35b uprobe: Implement PID-specific uprobes
>>
>> If there are other bits/patches I should try, let me know.
>>
>>>> On 11/10/25 10:27, Kris Van Hees wrote:
>>>>> The test used 'date' as its trigger but that caused the test to XFAIL
>>>>> for the wrong reason: date does not have symbols and therefore dtrace
>>>>> failed to enable probes for it.
>>>>>
>>>>> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
>>>>> ---
>>>>> test/unittest/ustack/tst.depth.sh | 8 ++++----
>>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
>>>>> index dec16a3a..977d1b42 100755
>>>>> --- a/test/unittest/ustack/tst.depth.sh
>>>>> +++ b/test/unittest/ustack/tst.depth.sh
>>>>> @@ -16,7 +16,7 @@ dtrace=$1
>>>>> rm -f $file
>>>>> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>>>> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
>>>>> #pragma D option quiet
>>>>> #pragma D option bufsize=1M
>>>>> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
>>>>> EOF
>>>>> status=$?
>>>>> -if [ "$status" -ne 0 ]; then
>>>>> +if [ $status -ne 0 ]; then
>>>>> echo $tst: dtrace failed
>>>>> rm -f $file
>>>>> exit $status
>>>>> @@ -78,8 +78,8 @@ EOF
>>>>> status=$?
>>>>> -count=`wc -l $file | cut -f1 -do`
>>>>> -if [ "$count" -lt 1000 ]; then
>>>>> +count=`wc -l $file | cut -f1 -d' '`
>>>>> +if [ $count -lt 1000 ]; then
>>>>> echo $tst: output was too short
>>>>> status=1
>>>>> fi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger
2025-11-11 22:55 ` Eugene Loh
@ 2025-11-11 23:09 ` Kris Van Hees
0 siblings, 0 replies; 7+ messages in thread
From: Kris Van Hees @ 2025-11-11 23:09 UTC (permalink / raw)
To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel
On Tue, Nov 11, 2025 at 05:55:48PM -0500, Eugene Loh wrote:
> On 11/11/25 17:13, Kris Van Hees wrote:
>
> > On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote:
> > > On 11/10/25 22:00, Kris Van Hees wrote:
> > >
> > > > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote:
> > > > > I'm against this patch.
> > > > >
> > > > > The picky stuff is:
> > > > > - the Copyright year should be updated
> > > > Easy to fix - done.
> > > >
> > > > > - the timeout probe could replace
> > > > > tick-1sec /n++ == 10/
> > > > > with simply
> > > > > tick-10sec
> > > > Arguably, the two are not identical. It depends on why one is used over the
> > > > other. Here we could probably go either way, though it might be nice to have
> > > > the tick probe firing every second as a way to generate a bit more activity.
> > > > Given the frequency of the other probes, not really important though as far as
> > > > I can see. Note that I didn't touch the timeout probe primarily because that
> > > > was not the purpose of this patch anyway.
> > > >
> > > > > - the "delay" env var be set for delaydie???
> > > > Why? We are simply using delaydie to have a trigger executable to execute.
> > > > This is a valid use of it (and it is used that same way in other tests also).
> > > >
> > > > > - the ustacks for delaydie are perhaps not very interesting
> > > > That is not the point of the test, and for that matter, I highly doubt that
> > > > ustacks from date would have been any more interesting (aside from the fact
> > > > that you'd never know because the test didn't even work with date).
> > > >
> > > > > A little more serious is that all this patch does is swap one failure mode
> > > > > for another. What for? It's XFAIL anyhow, and we do not check that we're
> > > > > getting a particular, expected failure.
> > > > The difference is that the pre-patch failure was due to a problem with the
> > > > actual test. But since it is XFAIL that was reported as an expected failure,
> > > > which was entirely bogus. After the patch, it still fails but at least it is
> > > > failing for valid reasons. I.e. now we have a failure we should look into.
> > > >
> > > > > The real problem, meanwhile, is that this patch leads to lots of test
> > > > > failures, at least for me. Not every VM, but many. Oh, and I do not mean
> > > > > on this test, because this test remains XFAIL anyhow.
> > > > >
> > > > > Consider a subset of the VMs I run on. I get these failures on the
> > > > > test/unittest/ustack/tst.*.d tests. Why on these? Because they run after
> > > > > tst.depth.sh.
> > > > >
> > > > > +------------------ arm OL8 UEK7
> > > > > | +---------------- x86 OL9 UEK7
> > > > > | | +-------------- arm OL9 UEK7
> > > > > | | | +------------ x86 OL9 UEK8
> > > > > | | | | +---------- arm OL9 UEK8
> > > > > | | | | | +-------- arm OL10 UEK8
> > > > > | | | | | |
> > > > > v v v v v v
> > > > >
> > > > > x x x x x x tst.kthread.d FAIL: timed out.
> > > > > x x x x tst.uaddr.d FAIL: timed out.
> > > > > x x x x tst.uaddr-pid0.d FAIL: timed out.
> > > > > x x tst.ufunc.d FAIL: timed out.
> > > > > x x x tst.ufunc-pid0.d FAIL: timed out.
> > > > > x x tst.umod.d FAIL: timed out.
> > > > > x x tst.ustack25_max95_profile.d FAIL: timed out.
> > > > > x x tst.ustack25_pid.d FAIL: timed out.
> > > > > x x tst.ustack25_profile.d FAIL: timed out.
> > > > > x x tst.ustack95_max25_profile.d FAIL: timed out.
> > > > > x x tst.ustack95_profile.d FAIL: timed out.
> > > > > x x tst.ustack_max25_profile.d FAIL: timed out.
> > > > > x x tst.ustack_profile.d FAIL: timed out.
> > > > > x tst.usym.d FAIL: timed out.
> > > > > x tst.usym-pid0.d FAIL: timed out.
> > > > >
> > > > > I think the problem is that tst.depth.sh -- with this patch, in any case --
> > > > > turns on a huge number of probes. That causes problems for this test, but
> > > > > not in the sense of it FAILing since it is marked XFAIL anyhow. (All this
> > > > > patch does is change the FAIL from one problem to another, but we still do
> > > > > not specify what failure we expect to see nor check that that is what we
> > > > > get.)
> > > > >
> > > > > So, the problem is not with this test, but that the system is left in some
> > > > > whacked state due to having attempted so many probes. As a result, the
> > > > > process "dtrace -o $file -c delaydie" is still around well after the depth
> > > > > test has finished, the load average remains high, the pid0 process does not
> > > > > run, and many tests time out. How long (how many seconds or how many tests)
> > > > > does this effect last? It depends, as one can see above.
> > > > >
> > > > > This raises all sorts of other questions as well, such as whether tests in
> > > > > our test suite should be allowed to depend on running on a somewhat idle
> > > > > system, whether we (dtrace or the test suite) are cleaning up well enough,
> > > > > etc.
> > > > There is of course the aspect that this patch is part of a series, and the
> > > > next patch in the series should resolve the impact issues of this test. If
> > > > it makes you more happy, I'd happily move this patch to the end of the series.
> > > > I don't see how it matters, because this crazy fallout from tst.depth.sh will
> > > > be an issue either way until the per-PID uprohes support is applied.
> > > I don't get it.
> > >
> > > First of all, the crazy fallout results with the patch and it doesn't
> > > without the patch. Without the patch the test was basically SKIPing
> > > (admittedly as a mistaken side effect of a test defect).
> > >
> > > Second, the above results, with the crazy fallout, included the entire patch
> > > series. That is, the tip of the branch looked like this:
> > Well, it was not clear from your email that you were testing this after you
> > applied the entire series. That of course changes the situation.
> >
> > Clearly, there may be more wrong with this test. Or it might be pointing on a
> > potential issue where DTrace might leave things hanging in a state that takes
> > too long to clean up. Either way, though, the change in this patch is needed
> > to make this test even remotely useful.
> >
> > If the patch causes the test to trigger other tests to fail, then more work
> > is needed. I would still recommend that we let this patch go through, as it
> > *is* a correct fix for a problem.
> >
> > And then look at what the new issue is that this test brings to light, and fix
> > what is needed to fix that.
>
> How about having the patch also SKIP the test and we simply file a bug to
> track the issue. Otherwise, we're simply invalidating a bunch of other
> tests (in the process, increasing the run time of the test suite). The
> cascade of FAILs is even longer than I indicated for one of the VMs. The
> patch is a fix to a problem, but it's only a partial fix to what's going on
> here, a partial fix that sabotages quite a few other tests. The related
> problems can be tracked much more efficiently and directly with a bug
> ticket.
Sure, but then I would suggest you submit a tiny patch to do the skipping, so
that we can revert that later without also reverting the needed change from
this patch.
> > > 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered
> > > correctly
> > > 61e69762 bpf: fix file descriptor leak
> > > 20780963 uprobe: remove unnecessary enable_*() functions
> > > 170c4cf9 test: fix tst.depth.sh to use proper trigger
> > > cf7ae35b uprobe: Implement PID-specific uprobes
> > >
> > > If there are other bits/patches I should try, let me know.
> > >
> > > > > On 11/10/25 10:27, Kris Van Hees wrote:
> > > > > > The test used 'date' as its trigger but that caused the test to XFAIL
> > > > > > for the wrong reason: date does not have symbols and therefore dtrace
> > > > > > failed to enable probes for it.
> > > > > >
> > > > > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
> > > > > > ---
> > > > > > test/unittest/ustack/tst.depth.sh | 8 ++++----
> > > > > > 1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh
> > > > > > index dec16a3a..977d1b42 100755
> > > > > > --- a/test/unittest/ustack/tst.depth.sh
> > > > > > +++ b/test/unittest/ustack/tst.depth.sh
> > > > > > @@ -16,7 +16,7 @@ dtrace=$1
> > > > > > rm -f $file
> > > > > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF
> > > > > > #pragma D option quiet
> > > > > > #pragma D option bufsize=1M
> > > > > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF
> > > > > > EOF
> > > > > > status=$?
> > > > > > -if [ "$status" -ne 0 ]; then
> > > > > > +if [ $status -ne 0 ]; then
> > > > > > echo $tst: dtrace failed
> > > > > > rm -f $file
> > > > > > exit $status
> > > > > > @@ -78,8 +78,8 @@ EOF
> > > > > > status=$?
> > > > > > -count=`wc -l $file | cut -f1 -do`
> > > > > > -if [ "$count" -lt 1000 ]; then
> > > > > > +count=`wc -l $file | cut -f1 -d' '`
> > > > > > +if [ $count -lt 1000 ]; then
> > > > > > echo $tst: output was too short
> > > > > > status=1
> > > > > > fi
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-11 23:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10 15:27 [PATCH 3/4] test: fix tst.depth.sh to use proper trigger Kris Van Hees
2025-11-11 0:10 ` Eugene Loh
2025-11-11 3:00 ` Kris Van Hees
2025-11-11 20:50 ` Eugene Loh
2025-11-11 22:13 ` Kris Van Hees
2025-11-11 22:55 ` Eugene Loh
2025-11-11 23:09 ` Kris Van Hees
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox