* [PATCH 3/4] test: fix tst.depth.sh to use proper trigger @ 2025-11-10 15:27 Kris Van Hees 2025-11-11 0:10 ` Eugene Loh 0 siblings, 1 reply; 7+ messages in thread From: Kris Van Hees @ 2025-11-10 15:27 UTC (permalink / raw) To: dtrace, dtrace-devel The test used 'date' as its trigger but that caused the test to XFAIL for the wrong reason: date does not have symbols and therefore dtrace failed to enable probes for it. Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> --- test/unittest/ustack/tst.depth.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh index dec16a3a..977d1b42 100755 --- a/test/unittest/ustack/tst.depth.sh +++ b/test/unittest/ustack/tst.depth.sh @@ -16,7 +16,7 @@ dtrace=$1 rm -f $file -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF #pragma D option quiet #pragma D option bufsize=1M @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF EOF status=$? -if [ "$status" -ne 0 ]; then +if [ $status -ne 0 ]; then echo $tst: dtrace failed rm -f $file exit $status @@ -78,8 +78,8 @@ EOF status=$? -count=`wc -l $file | cut -f1 -do` -if [ "$count" -lt 1000 ]; then +count=`wc -l $file | cut -f1 -d' '` +if [ $count -lt 1000 ]; then echo $tst: output was too short status=1 fi -- 2.43.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger 2025-11-10 15:27 [PATCH 3/4] test: fix tst.depth.sh to use proper trigger Kris Van Hees @ 2025-11-11 0:10 ` Eugene Loh 2025-11-11 3:00 ` Kris Van Hees 0 siblings, 1 reply; 7+ messages in thread From: Eugene Loh @ 2025-11-11 0:10 UTC (permalink / raw) To: Kris Van Hees, dtrace, dtrace-devel I'm against this patch. The picky stuff is: - the Copyright year should be updated - the timeout probe could replace tick-1sec /n++ == 10/ with simply tick-10sec - the "delay" env var be set for delaydie??? - the ustacks for delaydie are perhaps not very interesting A little more serious is that all this patch does is swap one failure mode for another. What for? It's XFAIL anyhow, and we do not check that we're getting a particular, expected failure. The real problem, meanwhile, is that this patch leads to lots of test failures, at least for me. Not every VM, but many. Oh, and I do not mean on this test, because this test remains XFAIL anyhow. Consider a subset of the VMs I run on. I get these failures on the test/unittest/ustack/tst.*.d tests. Why on these? Because they run after tst.depth.sh. +------------------ arm OL8 UEK7 | +---------------- x86 OL9 UEK7 | | +-------------- arm OL9 UEK7 | | | +------------ x86 OL9 UEK8 | | | | +---------- arm OL9 UEK8 | | | | | +-------- arm OL10 UEK8 | | | | | | v v v v v v x x x x x x tst.kthread.d FAIL: timed out. x x x x tst.uaddr.d FAIL: timed out. x x x x tst.uaddr-pid0.d FAIL: timed out. x x tst.ufunc.d FAIL: timed out. x x x tst.ufunc-pid0.d FAIL: timed out. x x tst.umod.d FAIL: timed out. x x tst.ustack25_max95_profile.d FAIL: timed out. x x tst.ustack25_pid.d FAIL: timed out. x x tst.ustack25_profile.d FAIL: timed out. x x tst.ustack95_max25_profile.d FAIL: timed out. x x tst.ustack95_profile.d FAIL: timed out. x x tst.ustack_max25_profile.d FAIL: timed out. x x tst.ustack_profile.d FAIL: timed out. x tst.usym.d FAIL: timed out. x tst.usym-pid0.d FAIL: timed out. I think the problem is that tst.depth.sh -- with this patch, in any case -- turns on a huge number of probes. That causes problems for this test, but not in the sense of it FAILing since it is marked XFAIL anyhow. (All this patch does is change the FAIL from one problem to another, but we still do not specify what failure we expect to see nor check that that is what we get.) So, the problem is not with this test, but that the system is left in some whacked state due to having attempted so many probes. As a result, the process "dtrace -o $file -c delaydie" is still around well after the depth test has finished, the load average remains high, the pid0 process does not run, and many tests time out. How long (how many seconds or how many tests) does this effect last? It depends, as one can see above. This raises all sorts of other questions as well, such as whether tests in our test suite should be allowed to depend on running on a somewhat idle system, whether we (dtrace or the test suite) are cleaning up well enough, etc. On 11/10/25 10:27, Kris Van Hees wrote: > The test used 'date' as its trigger but that caused the test to XFAIL > for the wrong reason: date does not have symbols and therefore dtrace > failed to enable probes for it. > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> > --- > test/unittest/ustack/tst.depth.sh | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh > index dec16a3a..977d1b42 100755 > --- a/test/unittest/ustack/tst.depth.sh > +++ b/test/unittest/ustack/tst.depth.sh > @@ -16,7 +16,7 @@ dtrace=$1 > > rm -f $file > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF > > #pragma D option quiet > #pragma D option bufsize=1M > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > EOF > > status=$? > -if [ "$status" -ne 0 ]; then > +if [ $status -ne 0 ]; then > echo $tst: dtrace failed > rm -f $file > exit $status > @@ -78,8 +78,8 @@ EOF > > status=$? > > -count=`wc -l $file | cut -f1 -do` > -if [ "$count" -lt 1000 ]; then > +count=`wc -l $file | cut -f1 -d' '` > +if [ $count -lt 1000 ]; then > echo $tst: output was too short > status=1 > fi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger 2025-11-11 0:10 ` Eugene Loh @ 2025-11-11 3:00 ` Kris Van Hees 2025-11-11 20:50 ` Eugene Loh 0 siblings, 1 reply; 7+ messages in thread From: Kris Van Hees @ 2025-11-11 3:00 UTC (permalink / raw) To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote: > I'm against this patch. > > The picky stuff is: > - the Copyright year should be updated Easy to fix - done. > - the timeout probe could replace > tick-1sec /n++ == 10/ > with simply > tick-10sec Arguably, the two are not identical. It depends on why one is used over the other. Here we could probably go either way, though it might be nice to have the tick probe firing every second as a way to generate a bit more activity. Given the frequency of the other probes, not really important though as far as I can see. Note that I didn't touch the timeout probe primarily because that was not the purpose of this patch anyway. > - the "delay" env var be set for delaydie??? Why? We are simply using delaydie to have a trigger executable to execute. This is a valid use of it (and it is used that same way in other tests also). > - the ustacks for delaydie are perhaps not very interesting That is not the point of the test, and for that matter, I highly doubt that ustacks from date would have been any more interesting (aside from the fact that you'd never know because the test didn't even work with date). > A little more serious is that all this patch does is swap one failure mode > for another. What for? It's XFAIL anyhow, and we do not check that we're > getting a particular, expected failure. The difference is that the pre-patch failure was due to a problem with the actual test. But since it is XFAIL that was reported as an expected failure, which was entirely bogus. After the patch, it still fails but at least it is failing for valid reasons. I.e. now we have a failure we should look into. > The real problem, meanwhile, is that this patch leads to lots of test > failures, at least for me. Not every VM, but many. Oh, and I do not mean > on this test, because this test remains XFAIL anyhow. > > Consider a subset of the VMs I run on. I get these failures on the > test/unittest/ustack/tst.*.d tests. Why on these? Because they run after > tst.depth.sh. > > +------------------ arm OL8 UEK7 > | +---------------- x86 OL9 UEK7 > | | +-------------- arm OL9 UEK7 > | | | +------------ x86 OL9 UEK8 > | | | | +---------- arm OL9 UEK8 > | | | | | +-------- arm OL10 UEK8 > | | | | | | > v v v v v v > > x x x x x x tst.kthread.d FAIL: timed out. > x x x x tst.uaddr.d FAIL: timed out. > x x x x tst.uaddr-pid0.d FAIL: timed out. > x x tst.ufunc.d FAIL: timed out. > x x x tst.ufunc-pid0.d FAIL: timed out. > x x tst.umod.d FAIL: timed out. > x x tst.ustack25_max95_profile.d FAIL: timed out. > x x tst.ustack25_pid.d FAIL: timed out. > x x tst.ustack25_profile.d FAIL: timed out. > x x tst.ustack95_max25_profile.d FAIL: timed out. > x x tst.ustack95_profile.d FAIL: timed out. > x x tst.ustack_max25_profile.d FAIL: timed out. > x x tst.ustack_profile.d FAIL: timed out. > x tst.usym.d FAIL: timed out. > x tst.usym-pid0.d FAIL: timed out. > > I think the problem is that tst.depth.sh -- with this patch, in any case -- > turns on a huge number of probes. That causes problems for this test, but > not in the sense of it FAILing since it is marked XFAIL anyhow. (All this > patch does is change the FAIL from one problem to another, but we still do > not specify what failure we expect to see nor check that that is what we > get.) > > So, the problem is not with this test, but that the system is left in some > whacked state due to having attempted so many probes. As a result, the > process "dtrace -o $file -c delaydie" is still around well after the depth > test has finished, the load average remains high, the pid0 process does not > run, and many tests time out. How long (how many seconds or how many tests) > does this effect last? It depends, as one can see above. > > This raises all sorts of other questions as well, such as whether tests in > our test suite should be allowed to depend on running on a somewhat idle > system, whether we (dtrace or the test suite) are cleaning up well enough, > etc. There is of course the aspect that this patch is part of a series, and the next patch in the series should resolve the impact issues of this test. If it makes you more happy, I'd happily move this patch to the end of the series. I don't see how it matters, because this crazy fallout from tst.depth.sh will be an issue either way until the per-PID uprohes support is applied. > On 11/10/25 10:27, Kris Van Hees wrote: > > The test used 'date' as its trigger but that caused the test to XFAIL > > for the wrong reason: date does not have symbols and therefore dtrace > > failed to enable probes for it. > > > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> > > --- > > test/unittest/ustack/tst.depth.sh | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh > > index dec16a3a..977d1b42 100755 > > --- a/test/unittest/ustack/tst.depth.sh > > +++ b/test/unittest/ustack/tst.depth.sh > > @@ -16,7 +16,7 @@ dtrace=$1 > > rm -f $file > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF > > #pragma D option quiet > > #pragma D option bufsize=1M > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > > EOF > > status=$? > > -if [ "$status" -ne 0 ]; then > > +if [ $status -ne 0 ]; then > > echo $tst: dtrace failed > > rm -f $file > > exit $status > > @@ -78,8 +78,8 @@ EOF > > status=$? > > -count=`wc -l $file | cut -f1 -do` > > -if [ "$count" -lt 1000 ]; then > > +count=`wc -l $file | cut -f1 -d' '` > > +if [ $count -lt 1000 ]; then > > echo $tst: output was too short > > status=1 > > fi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger 2025-11-11 3:00 ` Kris Van Hees @ 2025-11-11 20:50 ` Eugene Loh 2025-11-11 22:13 ` Kris Van Hees 0 siblings, 1 reply; 7+ messages in thread From: Eugene Loh @ 2025-11-11 20:50 UTC (permalink / raw) To: Kris Van Hees; +Cc: dtrace, dtrace-devel On 11/10/25 22:00, Kris Van Hees wrote: > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote: >> I'm against this patch. >> >> The picky stuff is: >> - the Copyright year should be updated > Easy to fix - done. > >> - the timeout probe could replace >> tick-1sec /n++ == 10/ >> with simply >> tick-10sec > Arguably, the two are not identical. It depends on why one is used over the > other. Here we could probably go either way, though it might be nice to have > the tick probe firing every second as a way to generate a bit more activity. > Given the frequency of the other probes, not really important though as far as > I can see. Note that I didn't touch the timeout probe primarily because that > was not the purpose of this patch anyway. > >> - the "delay" env var be set for delaydie??? > Why? We are simply using delaydie to have a trigger executable to execute. > This is a valid use of it (and it is used that same way in other tests also). > >> - the ustacks for delaydie are perhaps not very interesting > That is not the point of the test, and for that matter, I highly doubt that > ustacks from date would have been any more interesting (aside from the fact > that you'd never know because the test didn't even work with date). > >> A little more serious is that all this patch does is swap one failure mode >> for another. What for? It's XFAIL anyhow, and we do not check that we're >> getting a particular, expected failure. > The difference is that the pre-patch failure was due to a problem with the > actual test. But since it is XFAIL that was reported as an expected failure, > which was entirely bogus. After the patch, it still fails but at least it is > failing for valid reasons. I.e. now we have a failure we should look into. > >> The real problem, meanwhile, is that this patch leads to lots of test >> failures, at least for me. Not every VM, but many. Oh, and I do not mean >> on this test, because this test remains XFAIL anyhow. >> >> Consider a subset of the VMs I run on. I get these failures on the >> test/unittest/ustack/tst.*.d tests. Why on these? Because they run after >> tst.depth.sh. >> >> +------------------ arm OL8 UEK7 >> | +---------------- x86 OL9 UEK7 >> | | +-------------- arm OL9 UEK7 >> | | | +------------ x86 OL9 UEK8 >> | | | | +---------- arm OL9 UEK8 >> | | | | | +-------- arm OL10 UEK8 >> | | | | | | >> v v v v v v >> >> x x x x x x tst.kthread.d FAIL: timed out. >> x x x x tst.uaddr.d FAIL: timed out. >> x x x x tst.uaddr-pid0.d FAIL: timed out. >> x x tst.ufunc.d FAIL: timed out. >> x x x tst.ufunc-pid0.d FAIL: timed out. >> x x tst.umod.d FAIL: timed out. >> x x tst.ustack25_max95_profile.d FAIL: timed out. >> x x tst.ustack25_pid.d FAIL: timed out. >> x x tst.ustack25_profile.d FAIL: timed out. >> x x tst.ustack95_max25_profile.d FAIL: timed out. >> x x tst.ustack95_profile.d FAIL: timed out. >> x x tst.ustack_max25_profile.d FAIL: timed out. >> x x tst.ustack_profile.d FAIL: timed out. >> x tst.usym.d FAIL: timed out. >> x tst.usym-pid0.d FAIL: timed out. >> >> I think the problem is that tst.depth.sh -- with this patch, in any case -- >> turns on a huge number of probes. That causes problems for this test, but >> not in the sense of it FAILing since it is marked XFAIL anyhow. (All this >> patch does is change the FAIL from one problem to another, but we still do >> not specify what failure we expect to see nor check that that is what we >> get.) >> >> So, the problem is not with this test, but that the system is left in some >> whacked state due to having attempted so many probes. As a result, the >> process "dtrace -o $file -c delaydie" is still around well after the depth >> test has finished, the load average remains high, the pid0 process does not >> run, and many tests time out. How long (how many seconds or how many tests) >> does this effect last? It depends, as one can see above. >> >> This raises all sorts of other questions as well, such as whether tests in >> our test suite should be allowed to depend on running on a somewhat idle >> system, whether we (dtrace or the test suite) are cleaning up well enough, >> etc. > There is of course the aspect that this patch is part of a series, and the > next patch in the series should resolve the impact issues of this test. If > it makes you more happy, I'd happily move this patch to the end of the series. > I don't see how it matters, because this crazy fallout from tst.depth.sh will > be an issue either way until the per-PID uprohes support is applied. I don't get it. First of all, the crazy fallout results with the patch and it doesn't without the patch. Without the patch the test was basically SKIPing (admittedly as a mistaken side effect of a test defect). Second, the above results, with the crazy fallout, included the entire patch series. That is, the tip of the branch looked like this: 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered correctly 61e69762 bpf: fix file descriptor leak 20780963 uprobe: remove unnecessary enable_*() functions 170c4cf9 test: fix tst.depth.sh to use proper trigger cf7ae35b uprobe: Implement PID-specific uprobes If there are other bits/patches I should try, let me know. >> On 11/10/25 10:27, Kris Van Hees wrote: >>> The test used 'date' as its trigger but that caused the test to XFAIL >>> for the wrong reason: date does not have symbols and therefore dtrace >>> failed to enable probes for it. >>> >>> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> >>> --- >>> test/unittest/ustack/tst.depth.sh | 8 ++++---- >>> 1 file changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh >>> index dec16a3a..977d1b42 100755 >>> --- a/test/unittest/ustack/tst.depth.sh >>> +++ b/test/unittest/ustack/tst.depth.sh >>> @@ -16,7 +16,7 @@ dtrace=$1 >>> rm -f $file >>> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF >>> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF >>> #pragma D option quiet >>> #pragma D option bufsize=1M >>> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF >>> EOF >>> status=$? >>> -if [ "$status" -ne 0 ]; then >>> +if [ $status -ne 0 ]; then >>> echo $tst: dtrace failed >>> rm -f $file >>> exit $status >>> @@ -78,8 +78,8 @@ EOF >>> status=$? >>> -count=`wc -l $file | cut -f1 -do` >>> -if [ "$count" -lt 1000 ]; then >>> +count=`wc -l $file | cut -f1 -d' '` >>> +if [ $count -lt 1000 ]; then >>> echo $tst: output was too short >>> status=1 >>> fi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger 2025-11-11 20:50 ` Eugene Loh @ 2025-11-11 22:13 ` Kris Van Hees 2025-11-11 22:55 ` Eugene Loh 0 siblings, 1 reply; 7+ messages in thread From: Kris Van Hees @ 2025-11-11 22:13 UTC (permalink / raw) To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote: > On 11/10/25 22:00, Kris Van Hees wrote: > > > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote: > > > I'm against this patch. > > > > > > The picky stuff is: > > > - the Copyright year should be updated > > Easy to fix - done. > > > > > - the timeout probe could replace > > > tick-1sec /n++ == 10/ > > > with simply > > > tick-10sec > > Arguably, the two are not identical. It depends on why one is used over the > > other. Here we could probably go either way, though it might be nice to have > > the tick probe firing every second as a way to generate a bit more activity. > > Given the frequency of the other probes, not really important though as far as > > I can see. Note that I didn't touch the timeout probe primarily because that > > was not the purpose of this patch anyway. > > > > > - the "delay" env var be set for delaydie??? > > Why? We are simply using delaydie to have a trigger executable to execute. > > This is a valid use of it (and it is used that same way in other tests also). > > > > > - the ustacks for delaydie are perhaps not very interesting > > That is not the point of the test, and for that matter, I highly doubt that > > ustacks from date would have been any more interesting (aside from the fact > > that you'd never know because the test didn't even work with date). > > > > > A little more serious is that all this patch does is swap one failure mode > > > for another. What for? It's XFAIL anyhow, and we do not check that we're > > > getting a particular, expected failure. > > The difference is that the pre-patch failure was due to a problem with the > > actual test. But since it is XFAIL that was reported as an expected failure, > > which was entirely bogus. After the patch, it still fails but at least it is > > failing for valid reasons. I.e. now we have a failure we should look into. > > > > > The real problem, meanwhile, is that this patch leads to lots of test > > > failures, at least for me. Not every VM, but many. Oh, and I do not mean > > > on this test, because this test remains XFAIL anyhow. > > > > > > Consider a subset of the VMs I run on. I get these failures on the > > > test/unittest/ustack/tst.*.d tests. Why on these? Because they run after > > > tst.depth.sh. > > > > > > +------------------ arm OL8 UEK7 > > > | +---------------- x86 OL9 UEK7 > > > | | +-------------- arm OL9 UEK7 > > > | | | +------------ x86 OL9 UEK8 > > > | | | | +---------- arm OL9 UEK8 > > > | | | | | +-------- arm OL10 UEK8 > > > | | | | | | > > > v v v v v v > > > > > > x x x x x x tst.kthread.d FAIL: timed out. > > > x x x x tst.uaddr.d FAIL: timed out. > > > x x x x tst.uaddr-pid0.d FAIL: timed out. > > > x x tst.ufunc.d FAIL: timed out. > > > x x x tst.ufunc-pid0.d FAIL: timed out. > > > x x tst.umod.d FAIL: timed out. > > > x x tst.ustack25_max95_profile.d FAIL: timed out. > > > x x tst.ustack25_pid.d FAIL: timed out. > > > x x tst.ustack25_profile.d FAIL: timed out. > > > x x tst.ustack95_max25_profile.d FAIL: timed out. > > > x x tst.ustack95_profile.d FAIL: timed out. > > > x x tst.ustack_max25_profile.d FAIL: timed out. > > > x x tst.ustack_profile.d FAIL: timed out. > > > x tst.usym.d FAIL: timed out. > > > x tst.usym-pid0.d FAIL: timed out. > > > > > > I think the problem is that tst.depth.sh -- with this patch, in any case -- > > > turns on a huge number of probes. That causes problems for this test, but > > > not in the sense of it FAILing since it is marked XFAIL anyhow. (All this > > > patch does is change the FAIL from one problem to another, but we still do > > > not specify what failure we expect to see nor check that that is what we > > > get.) > > > > > > So, the problem is not with this test, but that the system is left in some > > > whacked state due to having attempted so many probes. As a result, the > > > process "dtrace -o $file -c delaydie" is still around well after the depth > > > test has finished, the load average remains high, the pid0 process does not > > > run, and many tests time out. How long (how many seconds or how many tests) > > > does this effect last? It depends, as one can see above. > > > > > > This raises all sorts of other questions as well, such as whether tests in > > > our test suite should be allowed to depend on running on a somewhat idle > > > system, whether we (dtrace or the test suite) are cleaning up well enough, > > > etc. > > There is of course the aspect that this patch is part of a series, and the > > next patch in the series should resolve the impact issues of this test. If > > it makes you more happy, I'd happily move this patch to the end of the series. > > I don't see how it matters, because this crazy fallout from tst.depth.sh will > > be an issue either way until the per-PID uprohes support is applied. > > I don't get it. > > First of all, the crazy fallout results with the patch and it doesn't > without the patch. Without the patch the test was basically SKIPing > (admittedly as a mistaken side effect of a test defect). > > Second, the above results, with the crazy fallout, included the entire patch > series. That is, the tip of the branch looked like this: Well, it was not clear from your email that you were testing this after you applied the entire series. That of course changes the situation. Clearly, there may be more wrong with this test. Or it might be pointing on a potential issue where DTrace might leave things hanging in a state that takes too long to clean up. Either way, though, the change in this patch is needed to make this test even remotely useful. If the patch causes the test to trigger other tests to fail, then more work is needed. I would still recommend that we let this patch go through, as it *is* a correct fix for a problem. And then look at what the new issue is that this test brings to light, and fix what is needed to fix that. > 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered > correctly > 61e69762 bpf: fix file descriptor leak > 20780963 uprobe: remove unnecessary enable_*() functions > 170c4cf9 test: fix tst.depth.sh to use proper trigger > cf7ae35b uprobe: Implement PID-specific uprobes > > If there are other bits/patches I should try, let me know. > > > > On 11/10/25 10:27, Kris Van Hees wrote: > > > > The test used 'date' as its trigger but that caused the test to XFAIL > > > > for the wrong reason: date does not have symbols and therefore dtrace > > > > failed to enable probes for it. > > > > > > > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> > > > > --- > > > > test/unittest/ustack/tst.depth.sh | 8 ++++---- > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh > > > > index dec16a3a..977d1b42 100755 > > > > --- a/test/unittest/ustack/tst.depth.sh > > > > +++ b/test/unittest/ustack/tst.depth.sh > > > > @@ -16,7 +16,7 @@ dtrace=$1 > > > > rm -f $file > > > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > > > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF > > > > #pragma D option quiet > > > > #pragma D option bufsize=1M > > > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > > > > EOF > > > > status=$? > > > > -if [ "$status" -ne 0 ]; then > > > > +if [ $status -ne 0 ]; then > > > > echo $tst: dtrace failed > > > > rm -f $file > > > > exit $status > > > > @@ -78,8 +78,8 @@ EOF > > > > status=$? > > > > -count=`wc -l $file | cut -f1 -do` > > > > -if [ "$count" -lt 1000 ]; then > > > > +count=`wc -l $file | cut -f1 -d' '` > > > > +if [ $count -lt 1000 ]; then > > > > echo $tst: output was too short > > > > status=1 > > > > fi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger 2025-11-11 22:13 ` Kris Van Hees @ 2025-11-11 22:55 ` Eugene Loh 2025-11-11 23:09 ` Kris Van Hees 0 siblings, 1 reply; 7+ messages in thread From: Eugene Loh @ 2025-11-11 22:55 UTC (permalink / raw) To: Kris Van Hees; +Cc: dtrace, dtrace-devel On 11/11/25 17:13, Kris Van Hees wrote: > On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote: >> On 11/10/25 22:00, Kris Van Hees wrote: >> >>> On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote: >>>> I'm against this patch. >>>> >>>> The picky stuff is: >>>> - the Copyright year should be updated >>> Easy to fix - done. >>> >>>> - the timeout probe could replace >>>> tick-1sec /n++ == 10/ >>>> with simply >>>> tick-10sec >>> Arguably, the two are not identical. It depends on why one is used over the >>> other. Here we could probably go either way, though it might be nice to have >>> the tick probe firing every second as a way to generate a bit more activity. >>> Given the frequency of the other probes, not really important though as far as >>> I can see. Note that I didn't touch the timeout probe primarily because that >>> was not the purpose of this patch anyway. >>> >>>> - the "delay" env var be set for delaydie??? >>> Why? We are simply using delaydie to have a trigger executable to execute. >>> This is a valid use of it (and it is used that same way in other tests also). >>> >>>> - the ustacks for delaydie are perhaps not very interesting >>> That is not the point of the test, and for that matter, I highly doubt that >>> ustacks from date would have been any more interesting (aside from the fact >>> that you'd never know because the test didn't even work with date). >>> >>>> A little more serious is that all this patch does is swap one failure mode >>>> for another. What for? It's XFAIL anyhow, and we do not check that we're >>>> getting a particular, expected failure. >>> The difference is that the pre-patch failure was due to a problem with the >>> actual test. But since it is XFAIL that was reported as an expected failure, >>> which was entirely bogus. After the patch, it still fails but at least it is >>> failing for valid reasons. I.e. now we have a failure we should look into. >>> >>>> The real problem, meanwhile, is that this patch leads to lots of test >>>> failures, at least for me. Not every VM, but many. Oh, and I do not mean >>>> on this test, because this test remains XFAIL anyhow. >>>> >>>> Consider a subset of the VMs I run on. I get these failures on the >>>> test/unittest/ustack/tst.*.d tests. Why on these? Because they run after >>>> tst.depth.sh. >>>> >>>> +------------------ arm OL8 UEK7 >>>> | +---------------- x86 OL9 UEK7 >>>> | | +-------------- arm OL9 UEK7 >>>> | | | +------------ x86 OL9 UEK8 >>>> | | | | +---------- arm OL9 UEK8 >>>> | | | | | +-------- arm OL10 UEK8 >>>> | | | | | | >>>> v v v v v v >>>> >>>> x x x x x x tst.kthread.d FAIL: timed out. >>>> x x x x tst.uaddr.d FAIL: timed out. >>>> x x x x tst.uaddr-pid0.d FAIL: timed out. >>>> x x tst.ufunc.d FAIL: timed out. >>>> x x x tst.ufunc-pid0.d FAIL: timed out. >>>> x x tst.umod.d FAIL: timed out. >>>> x x tst.ustack25_max95_profile.d FAIL: timed out. >>>> x x tst.ustack25_pid.d FAIL: timed out. >>>> x x tst.ustack25_profile.d FAIL: timed out. >>>> x x tst.ustack95_max25_profile.d FAIL: timed out. >>>> x x tst.ustack95_profile.d FAIL: timed out. >>>> x x tst.ustack_max25_profile.d FAIL: timed out. >>>> x x tst.ustack_profile.d FAIL: timed out. >>>> x tst.usym.d FAIL: timed out. >>>> x tst.usym-pid0.d FAIL: timed out. >>>> >>>> I think the problem is that tst.depth.sh -- with this patch, in any case -- >>>> turns on a huge number of probes. That causes problems for this test, but >>>> not in the sense of it FAILing since it is marked XFAIL anyhow. (All this >>>> patch does is change the FAIL from one problem to another, but we still do >>>> not specify what failure we expect to see nor check that that is what we >>>> get.) >>>> >>>> So, the problem is not with this test, but that the system is left in some >>>> whacked state due to having attempted so many probes. As a result, the >>>> process "dtrace -o $file -c delaydie" is still around well after the depth >>>> test has finished, the load average remains high, the pid0 process does not >>>> run, and many tests time out. How long (how many seconds or how many tests) >>>> does this effect last? It depends, as one can see above. >>>> >>>> This raises all sorts of other questions as well, such as whether tests in >>>> our test suite should be allowed to depend on running on a somewhat idle >>>> system, whether we (dtrace or the test suite) are cleaning up well enough, >>>> etc. >>> There is of course the aspect that this patch is part of a series, and the >>> next patch in the series should resolve the impact issues of this test. If >>> it makes you more happy, I'd happily move this patch to the end of the series. >>> I don't see how it matters, because this crazy fallout from tst.depth.sh will >>> be an issue either way until the per-PID uprohes support is applied. >> I don't get it. >> >> First of all, the crazy fallout results with the patch and it doesn't >> without the patch. Without the patch the test was basically SKIPing >> (admittedly as a mistaken side effect of a test defect). >> >> Second, the above results, with the crazy fallout, included the entire patch >> series. That is, the tip of the branch looked like this: > Well, it was not clear from your email that you were testing this after you > applied the entire series. That of course changes the situation. > > Clearly, there may be more wrong with this test. Or it might be pointing on a > potential issue where DTrace might leave things hanging in a state that takes > too long to clean up. Either way, though, the change in this patch is needed > to make this test even remotely useful. > > If the patch causes the test to trigger other tests to fail, then more work > is needed. I would still recommend that we let this patch go through, as it > *is* a correct fix for a problem. > > And then look at what the new issue is that this test brings to light, and fix > what is needed to fix that. How about having the patch also SKIP the test and we simply file a bug to track the issue. Otherwise, we're simply invalidating a bunch of other tests (in the process, increasing the run time of the test suite). The cascade of FAILs is even longer than I indicated for one of the VMs. The patch is a fix to a problem, but it's only a partial fix to what's going on here, a partial fix that sabotages quite a few other tests. The related problems can be tracked much more efficiently and directly with a bug ticket. >> 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered >> correctly >> 61e69762 bpf: fix file descriptor leak >> 20780963 uprobe: remove unnecessary enable_*() functions >> 170c4cf9 test: fix tst.depth.sh to use proper trigger >> cf7ae35b uprobe: Implement PID-specific uprobes >> >> If there are other bits/patches I should try, let me know. >> >>>> On 11/10/25 10:27, Kris Van Hees wrote: >>>>> The test used 'date' as its trigger but that caused the test to XFAIL >>>>> for the wrong reason: date does not have symbols and therefore dtrace >>>>> failed to enable probes for it. >>>>> >>>>> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> >>>>> --- >>>>> test/unittest/ustack/tst.depth.sh | 8 ++++---- >>>>> 1 file changed, 4 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh >>>>> index dec16a3a..977d1b42 100755 >>>>> --- a/test/unittest/ustack/tst.depth.sh >>>>> +++ b/test/unittest/ustack/tst.depth.sh >>>>> @@ -16,7 +16,7 @@ dtrace=$1 >>>>> rm -f $file >>>>> -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF >>>>> +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF >>>>> #pragma D option quiet >>>>> #pragma D option bufsize=1M >>>>> @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF >>>>> EOF >>>>> status=$? >>>>> -if [ "$status" -ne 0 ]; then >>>>> +if [ $status -ne 0 ]; then >>>>> echo $tst: dtrace failed >>>>> rm -f $file >>>>> exit $status >>>>> @@ -78,8 +78,8 @@ EOF >>>>> status=$? >>>>> -count=`wc -l $file | cut -f1 -do` >>>>> -if [ "$count" -lt 1000 ]; then >>>>> +count=`wc -l $file | cut -f1 -d' '` >>>>> +if [ $count -lt 1000 ]; then >>>>> echo $tst: output was too short >>>>> status=1 >>>>> fi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/4] test: fix tst.depth.sh to use proper trigger 2025-11-11 22:55 ` Eugene Loh @ 2025-11-11 23:09 ` Kris Van Hees 0 siblings, 0 replies; 7+ messages in thread From: Kris Van Hees @ 2025-11-11 23:09 UTC (permalink / raw) To: Eugene Loh; +Cc: Kris Van Hees, dtrace, dtrace-devel On Tue, Nov 11, 2025 at 05:55:48PM -0500, Eugene Loh wrote: > On 11/11/25 17:13, Kris Van Hees wrote: > > > On Tue, Nov 11, 2025 at 03:50:37PM -0500, Eugene Loh wrote: > > > On 11/10/25 22:00, Kris Van Hees wrote: > > > > > > > On Mon, Nov 10, 2025 at 07:10:52PM -0500, Eugene Loh wrote: > > > > > I'm against this patch. > > > > > > > > > > The picky stuff is: > > > > > - the Copyright year should be updated > > > > Easy to fix - done. > > > > > > > > > - the timeout probe could replace > > > > > tick-1sec /n++ == 10/ > > > > > with simply > > > > > tick-10sec > > > > Arguably, the two are not identical. It depends on why one is used over the > > > > other. Here we could probably go either way, though it might be nice to have > > > > the tick probe firing every second as a way to generate a bit more activity. > > > > Given the frequency of the other probes, not really important though as far as > > > > I can see. Note that I didn't touch the timeout probe primarily because that > > > > was not the purpose of this patch anyway. > > > > > > > > > - the "delay" env var be set for delaydie??? > > > > Why? We are simply using delaydie to have a trigger executable to execute. > > > > This is a valid use of it (and it is used that same way in other tests also). > > > > > > > > > - the ustacks for delaydie are perhaps not very interesting > > > > That is not the point of the test, and for that matter, I highly doubt that > > > > ustacks from date would have been any more interesting (aside from the fact > > > > that you'd never know because the test didn't even work with date). > > > > > > > > > A little more serious is that all this patch does is swap one failure mode > > > > > for another. What for? It's XFAIL anyhow, and we do not check that we're > > > > > getting a particular, expected failure. > > > > The difference is that the pre-patch failure was due to a problem with the > > > > actual test. But since it is XFAIL that was reported as an expected failure, > > > > which was entirely bogus. After the patch, it still fails but at least it is > > > > failing for valid reasons. I.e. now we have a failure we should look into. > > > > > > > > > The real problem, meanwhile, is that this patch leads to lots of test > > > > > failures, at least for me. Not every VM, but many. Oh, and I do not mean > > > > > on this test, because this test remains XFAIL anyhow. > > > > > > > > > > Consider a subset of the VMs I run on. I get these failures on the > > > > > test/unittest/ustack/tst.*.d tests. Why on these? Because they run after > > > > > tst.depth.sh. > > > > > > > > > > +------------------ arm OL8 UEK7 > > > > > | +---------------- x86 OL9 UEK7 > > > > > | | +-------------- arm OL9 UEK7 > > > > > | | | +------------ x86 OL9 UEK8 > > > > > | | | | +---------- arm OL9 UEK8 > > > > > | | | | | +-------- arm OL10 UEK8 > > > > > | | | | | | > > > > > v v v v v v > > > > > > > > > > x x x x x x tst.kthread.d FAIL: timed out. > > > > > x x x x tst.uaddr.d FAIL: timed out. > > > > > x x x x tst.uaddr-pid0.d FAIL: timed out. > > > > > x x tst.ufunc.d FAIL: timed out. > > > > > x x x tst.ufunc-pid0.d FAIL: timed out. > > > > > x x tst.umod.d FAIL: timed out. > > > > > x x tst.ustack25_max95_profile.d FAIL: timed out. > > > > > x x tst.ustack25_pid.d FAIL: timed out. > > > > > x x tst.ustack25_profile.d FAIL: timed out. > > > > > x x tst.ustack95_max25_profile.d FAIL: timed out. > > > > > x x tst.ustack95_profile.d FAIL: timed out. > > > > > x x tst.ustack_max25_profile.d FAIL: timed out. > > > > > x x tst.ustack_profile.d FAIL: timed out. > > > > > x tst.usym.d FAIL: timed out. > > > > > x tst.usym-pid0.d FAIL: timed out. > > > > > > > > > > I think the problem is that tst.depth.sh -- with this patch, in any case -- > > > > > turns on a huge number of probes. That causes problems for this test, but > > > > > not in the sense of it FAILing since it is marked XFAIL anyhow. (All this > > > > > patch does is change the FAIL from one problem to another, but we still do > > > > > not specify what failure we expect to see nor check that that is what we > > > > > get.) > > > > > > > > > > So, the problem is not with this test, but that the system is left in some > > > > > whacked state due to having attempted so many probes. As a result, the > > > > > process "dtrace -o $file -c delaydie" is still around well after the depth > > > > > test has finished, the load average remains high, the pid0 process does not > > > > > run, and many tests time out. How long (how many seconds or how many tests) > > > > > does this effect last? It depends, as one can see above. > > > > > > > > > > This raises all sorts of other questions as well, such as whether tests in > > > > > our test suite should be allowed to depend on running on a somewhat idle > > > > > system, whether we (dtrace or the test suite) are cleaning up well enough, > > > > > etc. > > > > There is of course the aspect that this patch is part of a series, and the > > > > next patch in the series should resolve the impact issues of this test. If > > > > it makes you more happy, I'd happily move this patch to the end of the series. > > > > I don't see how it matters, because this crazy fallout from tst.depth.sh will > > > > be an issue either way until the per-PID uprohes support is applied. > > > I don't get it. > > > > > > First of all, the crazy fallout results with the patch and it doesn't > > > without the patch. Without the patch the test was basically SKIPing > > > (admittedly as a mistaken side effect of a test defect). > > > > > > Second, the above results, with the crazy fallout, included the entire patch > > > series. That is, the tip of the branch looked like this: > > Well, it was not clear from your email that you were testing this after you > > applied the entire series. That of course changes the situation. > > > > Clearly, there may be more wrong with this test. Or it might be pointing on a > > potential issue where DTrace might leave things hanging in a state that takes > > too long to clean up. Either way, though, the change in this patch is needed > > to make this test even remotely useful. > > > > If the patch causes the test to trigger other tests to fail, then more work > > is needed. I would still recommend that we let this patch go through, as it > > *is* a correct fix for a problem. > > > > And then look at what the new issue is that this test brings to light, and fix > > what is needed to fix that. > > How about having the patch also SKIP the test and we simply file a bug to > track the issue. Otherwise, we're simply invalidating a bunch of other > tests (in the process, increasing the run time of the test suite). The > cascade of FAILs is even longer than I indicated for one of the VMs. The > patch is a fix to a problem, but it's only a partial fix to what's going on > here, a partial fix that sabotages quite a few other tests. The related > problems can be tracked much more efficiently and directly with a bug > ticket. Sure, but then I would suggest you submit a tiny patch to do the skipping, so that we can revert that later without also reverting the needed change from this patch. > > > 59a49ce9 (tag: 2.0.4) doc: Add blank line before bold text so it is rendered > > > correctly > > > 61e69762 bpf: fix file descriptor leak > > > 20780963 uprobe: remove unnecessary enable_*() functions > > > 170c4cf9 test: fix tst.depth.sh to use proper trigger > > > cf7ae35b uprobe: Implement PID-specific uprobes > > > > > > If there are other bits/patches I should try, let me know. > > > > > > > > On 11/10/25 10:27, Kris Van Hees wrote: > > > > > > The test used 'date' as its trigger but that caused the test to XFAIL > > > > > > for the wrong reason: date does not have symbols and therefore dtrace > > > > > > failed to enable probes for it. > > > > > > > > > > > > Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> > > > > > > --- > > > > > > test/unittest/ustack/tst.depth.sh | 8 ++++---- > > > > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/test/unittest/ustack/tst.depth.sh b/test/unittest/ustack/tst.depth.sh > > > > > > index dec16a3a..977d1b42 100755 > > > > > > --- a/test/unittest/ustack/tst.depth.sh > > > > > > +++ b/test/unittest/ustack/tst.depth.sh > > > > > > @@ -16,7 +16,7 @@ dtrace=$1 > > > > > > rm -f $file > > > > > > -$dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > > > > > > +$dtrace $dt_flags -o $file -c test/triggers/delaydie -s /dev/stdin <<EOF > > > > > > #pragma D option quiet > > > > > > #pragma D option bufsize=1M > > > > > > @@ -45,7 +45,7 @@ $dtrace $dt_flags -o $file -c date -s /dev/stdin <<EOF > > > > > > EOF > > > > > > status=$? > > > > > > -if [ "$status" -ne 0 ]; then > > > > > > +if [ $status -ne 0 ]; then > > > > > > echo $tst: dtrace failed > > > > > > rm -f $file > > > > > > exit $status > > > > > > @@ -78,8 +78,8 @@ EOF > > > > > > status=$? > > > > > > -count=`wc -l $file | cut -f1 -do` > > > > > > -if [ "$count" -lt 1000 ]; then > > > > > > +count=`wc -l $file | cut -f1 -d' '` > > > > > > +if [ $count -lt 1000 ]; then > > > > > > echo $tst: output was too short > > > > > > status=1 > > > > > > fi ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-11 23:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-10 15:27 [PATCH 3/4] test: fix tst.depth.sh to use proper trigger Kris Van Hees 2025-11-11 0:10 ` Eugene Loh 2025-11-11 3:00 ` Kris Van Hees 2025-11-11 20:50 ` Eugene Loh 2025-11-11 22:13 ` Kris Van Hees 2025-11-11 22:55 ` Eugene Loh 2025-11-11 23:09 ` Kris Van Hees
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox