linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] perf trace: do not lose last events in a race
@ 2024-11-06 23:45 Benjamin Peterson
  2024-11-06 23:45 ` [PATCH 2/2] perf tests: add test for trace output loss Benjamin Peterson
  2024-11-07 21:46 ` [PATCH 1/2] perf trace: do not lose last events in a race Howard Chu
  0 siblings, 2 replies; 14+ messages in thread
From: Benjamin Peterson @ 2024-11-06 23:45 UTC (permalink / raw)
  Cc: Benjamin Peterson, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

If a perf trace event selector specifies a maximum number of events to output
(i.e., "/nr=N/" syntax), the event printing handler, trace__event_handler,
disables the event selector after the maximum number events are
printed. Furthermore, trace__event_handler checked if the event selector was
disabled before doing any work. This avoided exceeding the maximum number of
events to print if more events were in the buffer before the selector was
disabled. However, the event selector can be disabled for reasons other than
exceeding the maximum number of events. In particular, when the traced
subprocess exits, the main loop disables all event selectors. This meant the
last events of a traced subprocess might be lost to the printing handler's
short-circuiting logic.

This nondeterministic problem could be seen by running the following many times:

  $ perf trace -e syscalls:sys_enter_exit_group true

trace__event_handler should simply check for exceeding the maximum number of
events to print rather than the state of the event selector.

Fixes: a9c5e6c1e9bff ("perf trace: Introduce per-event maximum number of events property")
Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
---
 tools/perf/builtin-trace.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index d3f11b90d025..f6179b13b8b4 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3096,13 +3096,8 @@ static int trace__event_handler(struct trace *trace, struct evsel *evsel,
 {
 	struct thread *thread;
 	int callchain_ret = 0;
-	/*
-	 * Check if we called perf_evsel__disable(evsel) due to, for instance,
-	 * this event's max_events having been hit and this is an entry coming
-	 * from the ring buffer that we should discard, since the max events
-	 * have already been considered/printed.
-	 */
-	if (evsel->disabled)
+
+	if (evsel->nr_events_printed >= evsel->max_events)
 		return 0;
 
 	thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-06 23:45 [PATCH 1/2] perf trace: do not lose last events in a race Benjamin Peterson
@ 2024-11-06 23:45 ` Benjamin Peterson
  2024-11-07 22:06   ` Howard Chu
  2024-11-14 17:37   ` Arnaldo Carvalho de Melo
  2024-11-07 21:46 ` [PATCH 1/2] perf trace: do not lose last events in a race Howard Chu
  1 sibling, 2 replies; 14+ messages in thread
From: Benjamin Peterson @ 2024-11-06 23:45 UTC (permalink / raw)
  Cc: Benjamin Peterson, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list, open list:PERFORMANCE EVENTS SUBSYSTEM

Add a test that checks that trace output is not lost to races. This is
accomplished by tracing the exit_group syscall of "true" multiple times and
checking for correct output.

Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
trace: avoid garbage when not printing a trace event's arguments") because
exit_group triggers the previously buggy printing behavior.

Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
---
 tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100755 tools/perf/tests/shell/trace_exit_race.sh

diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
new file mode 100755
index 000000000000..8b70324bc5b4
--- /dev/null
+++ b/tools/perf/tests/shell/trace_exit_race.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+# perf trace exit race
+# SPDX-License-Identifier: GPL-2.0
+
+# Check that the last events of a perf trace'd subprocess are not
+# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
+# the output contains 100 correct lines.
+
+# shellcheck source=lib/probe.sh
+. "$(dirname $0)"/lib/probe.sh
+
+skip_if_no_perf_trace || exit 2
+
+trace_shutdown_race() {
+	for i in $(seq 100); do
+		perf trace -e syscalls:sys_enter_exit_group true 2>>$file
+	done
+	[ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
+}
+
+
+file=$(mktemp /tmp/temporary_file.XXXXX)
+
+# Do not use whatever ~/.perfconfig file, it may change the output
+# via trace.{show_timestamp,show_prefix,etc}
+export PERF_CONFIG=/dev/null
+
+trace_shutdown_race
+err=$?
+rm -f ${file}
+exit $err
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] perf trace: do not lose last events in a race
  2024-11-06 23:45 [PATCH 1/2] perf trace: do not lose last events in a race Benjamin Peterson
  2024-11-06 23:45 ` [PATCH 2/2] perf tests: add test for trace output loss Benjamin Peterson
@ 2024-11-07 21:46 ` Howard Chu
  1 sibling, 0 replies; 14+ messages in thread
From: Howard Chu @ 2024-11-07 21:46 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Liang, Kan,
	open list:PERFORMANCE EVENTS SUBSYSTEM,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Hello Benjamin,

Without your patch (no output):
perf $ ./perf trace -e syscalls:sys_enter_exit_group true
perf $

With your patch:

perf $ ./perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1530009 syscalls:sys_enter_exit_group()

The result looks good.

Tested-by: Howard Chu <howardchu95@gmail.com>

Thanks,
Howard

On Wed, Nov 6, 2024 at 3:45 PM Benjamin Peterson <benjamin@engflow.com> wrote:
>
> If a perf trace event selector specifies a maximum number of events to output
> (i.e., "/nr=N/" syntax), the event printing handler, trace__event_handler,
> disables the event selector after the maximum number events are
> printed. Furthermore, trace__event_handler checked if the event selector was
> disabled before doing any work. This avoided exceeding the maximum number of
> events to print if more events were in the buffer before the selector was
> disabled. However, the event selector can be disabled for reasons other than
> exceeding the maximum number of events. In particular, when the traced
> subprocess exits, the main loop disables all event selectors. This meant the
> last events of a traced subprocess might be lost to the printing handler's
> short-circuiting logic.
>
> This nondeterministic problem could be seen by running the following many times:
>
>   $ perf trace -e syscalls:sys_enter_exit_group true
>
> trace__event_handler should simply check for exceeding the maximum number of
> events to print rather than the state of the event selector.
>
> Fixes: a9c5e6c1e9bff ("perf trace: Introduce per-event maximum number of events property")
> Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> ---
>  tools/perf/builtin-trace.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index d3f11b90d025..f6179b13b8b4 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -3096,13 +3096,8 @@ static int trace__event_handler(struct trace *trace, struct evsel *evsel,
>  {
>         struct thread *thread;
>         int callchain_ret = 0;
> -       /*
> -        * Check if we called perf_evsel__disable(evsel) due to, for instance,
> -        * this event's max_events having been hit and this is an entry coming
> -        * from the ring buffer that we should discard, since the max events
> -        * have already been considered/printed.
> -        */
> -       if (evsel->disabled)
> +
> +       if (evsel->nr_events_printed >= evsel->max_events)
>                 return 0;
>
>         thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
> --
> 2.39.5
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-06 23:45 ` [PATCH 2/2] perf tests: add test for trace output loss Benjamin Peterson
@ 2024-11-07 22:06   ` Howard Chu
  2024-11-07 23:17     ` Benjamin Peterson
  2024-11-14 17:37   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 14+ messages in thread
From: Howard Chu @ 2024-11-07 22:06 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Liang, Kan, open list,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Hello,

On Wed, Nov 6, 2024 at 3:46 PM Benjamin Peterson <benjamin@engflow.com> wrote:
>
> Add a test that checks that trace output is not lost to races. This is
> accomplished by tracing the exit_group syscall of "true" multiple times and
> checking for correct output.
>
> Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> trace: avoid garbage when not printing a trace event's arguments") because
> exit_group triggers the previously buggy printing behavior.
>
> Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> ---
>  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
>
> diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> new file mode 100755
> index 000000000000..8b70324bc5b4
> --- /dev/null
> +++ b/tools/perf/tests/shell/trace_exit_race.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +# perf trace exit race
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# Check that the last events of a perf trace'd subprocess are not
> +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> +# the output contains 100 correct lines.
> +
> +# shellcheck source=lib/probe.sh
> +. "$(dirname $0)"/lib/probe.sh
> +
> +skip_if_no_perf_trace || exit 2
> +
> +trace_shutdown_race() {
> +       for i in $(seq 100); do
> +               perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> +       done
> +       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]

The test failed due to regex mismatched, I think because of this:

     0.000 true/1526046 syscalls:sys_enter_exit_group( )
     0.000 true/1526212 syscalls:sys_enter_exit_group( )
     0.000 true/1526383 syscalls:sys_enter_exit_group(SSSSSSSSSSSSSSSS)

Ironically the junk buffer problem you fixed last time, thanks for
finding another printing problem in perf trace :), I'll figure out the
cause of this.

Another thing is this test takes a long time to finish

perf $ time ./perf test 109
109: perf trace exit race                                            : FAILED!

real    0m38.762s
user    0m15.090s
sys     0m21.794s

Is it really necessary to run it 100 times? To me it seems to be just
a wrong handling logic of draining samples, will there be coincidence?

Thanks,
Howard
> +}
> +
> +
> +file=$(mktemp /tmp/temporary_file.XXXXX)
> +
> +# Do not use whatever ~/.perfconfig file, it may change the output
> +# via trace.{show_timestamp,show_prefix,etc}
> +export PERF_CONFIG=/dev/null
> +
> +trace_shutdown_race
> +err=$?
> +rm -f ${file}
> +exit $err
> --
> 2.39.5
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-07 22:06   ` Howard Chu
@ 2024-11-07 23:17     ` Benjamin Peterson
  2024-11-08 16:20       ` Howard Chu
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Peterson @ 2024-11-07 23:17 UTC (permalink / raw)
  To: Howard Chu
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Liang, Kan, open list,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Nov 7, 2024 at 2:07 PM Howard Chu <howardchu95@gmail.com> wrote:
>
> Hello,
>
> On Wed, Nov 6, 2024 at 3:46 PM Benjamin Peterson <benjamin@engflow.com> wrote:
> >
> > Add a test that checks that trace output is not lost to races. This is
> > accomplished by tracing the exit_group syscall of "true" multiple times and
> > checking for correct output.
> >
> > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > trace: avoid garbage when not printing a trace event's arguments") because
> > exit_group triggers the previously buggy printing behavior.
> >
> > Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> > ---
> >  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> >  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> >
> > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > new file mode 100755
> > index 000000000000..8b70324bc5b4
> > --- /dev/null
> > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > @@ -0,0 +1,31 @@
> > +#!/bin/sh
> > +# perf trace exit race
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +# Check that the last events of a perf trace'd subprocess are not
> > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > +# the output contains 100 correct lines.
> > +
> > +# shellcheck source=lib/probe.sh
> > +. "$(dirname $0)"/lib/probe.sh
> > +
> > +skip_if_no_perf_trace || exit 2
> > +
> > +trace_shutdown_race() {
> > +       for i in $(seq 100); do
> > +               perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > +       done
> > +       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
>
> The test failed due to regex mismatched, I think because of this:
>
>      0.000 true/1526046 syscalls:sys_enter_exit_group( )
>      0.000 true/1526212 syscalls:sys_enter_exit_group( )
>      0.000 true/1526383 syscalls:sys_enter_exit_group(SSSSSSSSSSSSSSSS)
>
> Ironically the junk buffer problem you fixed last time, thanks for
> finding another printing problem in perf trace :), I'll figure out the
> cause of this.

Thanks, the problem is a parallel issue with trace__fprintf_sys_enter.
I will include a patch for the problem in my spin of this series.

>
> Another thing is this test takes a long time to finish
>
> perf $ time ./perf test 109
> 109: perf trace exit race                                            : FAILED!
>
> real    0m38.762s
> user    0m15.090s
> sys     0m21.794s
>
> Is it really necessary to run it 100 times? To me it seems to be just
> a wrong handling logic of draining samples, will there be coincidence?

Yes, it's racy. In my testing, sometimes the correct output would
appear. However, I will reduce the number of iterations to 10.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-07 23:17     ` Benjamin Peterson
@ 2024-11-08 16:20       ` Howard Chu
  2024-11-08 16:59         ` Benjamin Peterson
  0 siblings, 1 reply; 14+ messages in thread
From: Howard Chu @ 2024-11-08 16:20 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Liang, Kan, open list,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Hello Benjamin,

On Thu, Nov 7, 2024 at 3:18 PM Benjamin Peterson <benjamin@engflow.com> wrote:
>
> On Thu, Nov 7, 2024 at 2:07 PM Howard Chu <howardchu95@gmail.com> wrote:
> >
> > Hello,
> >
> > On Wed, Nov 6, 2024 at 3:46 PM Benjamin Peterson <benjamin@engflow.com> wrote:
> > >
> > > Add a test that checks that trace output is not lost to races. This is
> > > accomplished by tracing the exit_group syscall of "true" multiple times and
> > > checking for correct output.
> > >
> > > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > > trace: avoid garbage when not printing a trace event's arguments") because
> > > exit_group triggers the previously buggy printing behavior.
> > >
> > > Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> > > ---
> > >  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> > >  1 file changed, 31 insertions(+)
> > >  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> > >
> > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > new file mode 100755
> > > index 000000000000..8b70324bc5b4
> > > --- /dev/null
> > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > @@ -0,0 +1,31 @@
> > > +#!/bin/sh
> > > +# perf trace exit race
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +
> > > +# Check that the last events of a perf trace'd subprocess are not
> > > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > > +# the output contains 100 correct lines.
> > > +
> > > +# shellcheck source=lib/probe.sh
> > > +. "$(dirname $0)"/lib/probe.sh
> > > +
> > > +skip_if_no_perf_trace || exit 2
> > > +
> > > +trace_shutdown_race() {
> > > +       for i in $(seq 100); do
> > > +               perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > > +       done
> > > +       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> >
> > The test failed due to regex mismatched, I think because of this:
> >
> >      0.000 true/1526046 syscalls:sys_enter_exit_group( )
> >      0.000 true/1526212 syscalls:sys_enter_exit_group( )
> >      0.000 true/1526383 syscalls:sys_enter_exit_group(SSSSSSSSSSSSSSSS)
> >
> > Ironically the junk buffer problem you fixed last time, thanks for
> > finding another printing problem in perf trace :), I'll figure out the
> > cause of this.
>
> Thanks, the problem is a parallel issue with trace__fprintf_sys_enter.
> I will include a patch for the problem in my spin of this series.
>
> >
> > Another thing is this test takes a long time to finish
> >
> > perf $ time ./perf test 109
> > 109: perf trace exit race                                            : FAILED!
> >
> > real    0m38.762s
> > user    0m15.090s
> > sys     0m21.794s
> >
> > Is it really necessary to run it 100 times? To me it seems to be just
> > a wrong handling logic of draining samples, will there be coincidence?
>
> Yes, it's racy. In my testing, sometimes the correct output would
> appear. However, I will reduce the number of iterations to 10.

Could you be a bit more specific about this race condition issue?
Could you please tell us how one might emulate and test the scenario
you mentioned? I ran the command 1,000 times before and after applying
the patch, and each time I got the expected output. Could you please
provide more information about your use cases?

Thanks,
Howard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-08 16:20       ` Howard Chu
@ 2024-11-08 16:59         ` Benjamin Peterson
  0 siblings, 0 replies; 14+ messages in thread
From: Benjamin Peterson @ 2024-11-08 16:59 UTC (permalink / raw)
  To: Howard Chu
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Liang, Kan, open list,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Fri, Nov 8, 2024 at 8:20 AM Howard Chu <howardchu95@gmail.com> wrote:
>
> Hello Benjamin,
>
> On Thu, Nov 7, 2024 at 3:18 PM Benjamin Peterson <benjamin@engflow.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 2:07 PM Howard Chu <howardchu95@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > On Wed, Nov 6, 2024 at 3:46 PM Benjamin Peterson <benjamin@engflow.com> wrote:
> > > >
> > > > Add a test that checks that trace output is not lost to races. This is
> > > > accomplished by tracing the exit_group syscall of "true" multiple times and
> > > > checking for correct output.
> > > >
> > > > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > > > trace: avoid garbage when not printing a trace event's arguments") because
> > > > exit_group triggers the previously buggy printing behavior.
> > > >
> > > > Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> > > > ---
> > > >  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> > > >  1 file changed, 31 insertions(+)
> > > >  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> > > >
> > > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > > new file mode 100755
> > > > index 000000000000..8b70324bc5b4
> > > > --- /dev/null
> > > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > > @@ -0,0 +1,31 @@
> > > > +#!/bin/sh
> > > > +# perf trace exit race
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +# Check that the last events of a perf trace'd subprocess are not
> > > > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > > > +# the output contains 100 correct lines.
> > > > +
> > > > +# shellcheck source=lib/probe.sh
> > > > +. "$(dirname $0)"/lib/probe.sh
> > > > +
> > > > +skip_if_no_perf_trace || exit 2
> > > > +
> > > > +trace_shutdown_race() {
> > > > +       for i in $(seq 100); do
> > > > +               perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > > > +       done
> > > > +       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > >
> > > The test failed due to regex mismatched, I think because of this:
> > >
> > >      0.000 true/1526046 syscalls:sys_enter_exit_group( )
> > >      0.000 true/1526212 syscalls:sys_enter_exit_group( )
> > >      0.000 true/1526383 syscalls:sys_enter_exit_group(SSSSSSSSSSSSSSSS)
> > >
> > > Ironically the junk buffer problem you fixed last time, thanks for
> > > finding another printing problem in perf trace :), I'll figure out the
> > > cause of this.
> >
> > Thanks, the problem is a parallel issue with trace__fprintf_sys_enter.
> > I will include a patch for the problem in my spin of this series.
> >
> > >
> > > Another thing is this test takes a long time to finish
> > >
> > > perf $ time ./perf test 109
> > > 109: perf trace exit race                                            : FAILED!
> > >
> > > real    0m38.762s
> > > user    0m15.090s
> > > sys     0m21.794s
> > >
> > > Is it really necessary to run it 100 times? To me it seems to be just
> > > a wrong handling logic of draining samples, will there be coincidence?
> >
> > Yes, it's racy. In my testing, sometimes the correct output would
> > appear. However, I will reduce the number of iterations to 10.
>
> Could you be a bit more specific about this race condition issue?

From the code, it's apparent that if the SIGCHLD from the traced
process is seen before the exit_group trace event, output will be
lost. If the signal is received after the trace event, the trace event
will be correctly printed.

> Could you please tell us how one might emulate and test the scenario
> you mentioned? I ran the command 1,000 times before and after applying
> the patch, and each time I got the expected output. Could you please
> provide more information about your use cases?

I can see it simply by running the unpatched perf few times in the
terminal, but it might be helpful for reproduction to create some
scheduling contention with stress.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-06 23:45 ` [PATCH 2/2] perf tests: add test for trace output loss Benjamin Peterson
  2024-11-07 22:06   ` Howard Chu
@ 2024-11-14 17:37   ` Arnaldo Carvalho de Melo
  2024-11-14 17:43     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-14 17:37 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list, open list:PERFORMANCE EVENTS SUBSYSTEM

On Wed, Nov 06, 2024 at 11:45:18PM +0000, Benjamin Peterson wrote:
> Add a test that checks that trace output is not lost to races. This is
> accomplished by tracing the exit_group syscall of "true" multiple times and
> checking for correct output.
> 
> Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> trace: avoid garbage when not printing a trace event's arguments") because
> exit_group triggers the previously buggy printing behavior.
> 
> Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> ---
>  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
>  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> 
> diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> new file mode 100755
> index 000000000000..8b70324bc5b4
> --- /dev/null
> +++ b/tools/perf/tests/shell/trace_exit_race.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +# perf trace exit race
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# Check that the last events of a perf trace'd subprocess are not
> +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> +# the output contains 100 correct lines.
> +
> +# shellcheck source=lib/probe.sh
> +. "$(dirname $0)"/lib/probe.sh
> +
> +skip_if_no_perf_trace || exit 2
> +
> +trace_shutdown_race() {
> +	for i in $(seq 100); do
> +		perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> +	done
> +	[ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> +}
> +
> +
> +file=$(mktemp /tmp/temporary_file.XXXXX)
> +
> +# Do not use whatever ~/.perfconfig file, it may change the output
> +# via trace.{show_timestamp,show_prefix,etc}
> +export PERF_CONFIG=/dev/null
> +
> +trace_shutdown_race
> +err=$?
> +rm -f ${file}
> +exit $err
> -- 

Its failing with shellcheck, I'm trying to fix it:

  CC      /tmp/build/perf-tools-next/builtin-trace.o
  TEST    /tmp/build/perf-tools-next/tests/shell/trace_exit_race.sh.shellcheck_log

In tests/shell/trace_exit_race.sh line 15:
	for i in $(seq 100); do
        ^-^ SC2034 (warning): i appears unused. Verify use (or export if used externally).


In tests/shell/trace_exit_race.sh line 18:
	[ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
          ^-- SC2046 (warning): Quote this to prevent word splitting.

For more information:
  https://www.shellcheck.net/wiki/SC2034 -- i appears unused. Verify use (or ...
  https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt...
make[4]: *** [tests/Build:91: /tmp/build/perf-tools-next/tests/shell/trace_exit_race.sh.shellcheck_log] Error 1
make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: tests] Error 2
make[2]: *** [Makefile.perf:777: /tmp/build/perf-tools-next/perf-test-in.o] Error 2
make[2]: *** Waiting for unfinished jobs....


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-14 17:37   ` Arnaldo Carvalho de Melo
@ 2024-11-14 17:43     ` Arnaldo Carvalho de Melo
  2024-11-14 17:44       ` Benjamin Peterson
  0 siblings, 1 reply; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-14 17:43 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list, open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Nov 14, 2024 at 02:37:52PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Nov 06, 2024 at 11:45:18PM +0000, Benjamin Peterson wrote:
> > Add a test that checks that trace output is not lost to races. This is
> > accomplished by tracing the exit_group syscall of "true" multiple times and
> > checking for correct output.
> > 
> > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > trace: avoid garbage when not printing a trace event's arguments") because
> > exit_group triggers the previously buggy printing behavior.
> > 
> > Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> > ---
> >  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> >  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> > 
> > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > new file mode 100755
> > index 000000000000..8b70324bc5b4
> > --- /dev/null
> > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > @@ -0,0 +1,31 @@
> > +#!/bin/sh
> > +# perf trace exit race
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +# Check that the last events of a perf trace'd subprocess are not
> > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > +# the output contains 100 correct lines.
> > +
> > +# shellcheck source=lib/probe.sh
> > +. "$(dirname $0)"/lib/probe.sh
> > +
> > +skip_if_no_perf_trace || exit 2
> > +
> > +trace_shutdown_race() {
> > +	for i in $(seq 100); do
> > +		perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > +	done
> > +	[ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > +}
> > +
> > +
> > +file=$(mktemp /tmp/temporary_file.XXXXX)
> > +
> > +# Do not use whatever ~/.perfconfig file, it may change the output
> > +# via trace.{show_timestamp,show_prefix,etc}
> > +export PERF_CONFIG=/dev/null
> > +
> > +trace_shutdown_race
> > +err=$?
> > +rm -f ${file}
> > +exit $err
> > -- 
> 
> Its failing with shellcheck, I'm trying to fix it:
> 
>   CC      /tmp/build/perf-tools-next/builtin-trace.o
>   TEST    /tmp/build/perf-tools-next/tests/shell/trace_exit_race.sh.shellcheck_log
> 
> In tests/shell/trace_exit_race.sh line 15:
> 	for i in $(seq 100); do
>         ^-^ SC2034 (warning): i appears unused. Verify use (or export if used externally).
> 
> 
> In tests/shell/trace_exit_race.sh line 18:
> 	[ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
>           ^-- SC2046 (warning): Quote this to prevent word splitting.
> 
> For more information:
>   https://www.shellcheck.net/wiki/SC2034 -- i appears unused. Verify use (or ...
>   https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt...
> make[4]: *** [tests/Build:91: /tmp/build/perf-tools-next/tests/shell/trace_exit_race.sh.shellcheck_log] Error 1
> make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: tests] Error 2
> make[2]: *** [Makefile.perf:777: /tmp/build/perf-tools-next/perf-test-in.o] Error 2
> make[2]: *** Waiting for unfinished jobs....

I've read the links provided by ShellCheck and folded this to satisfy
it, please consider installing ShellCheck, the perf build process will
use it when avaiable.

- Arnaldo

diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
index 8b70324bc5b4fb4c..c37ed6bb9f7e8fab 100755
--- a/tools/perf/tests/shell/trace_exit_race.sh
+++ b/tools/perf/tests/shell/trace_exit_race.sh
@@ -12,10 +12,10 @@
 skip_if_no_perf_trace || exit 2
 
 trace_shutdown_race() {
-	for i in $(seq 100); do
+	for _ in $(seq 100); do
 		perf trace -e syscalls:sys_enter_exit_group true 2>>$file
 	done
-	[ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
+	[ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
 }
 
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-14 17:43     ` Arnaldo Carvalho de Melo
@ 2024-11-14 17:44       ` Benjamin Peterson
  2024-11-14 20:09         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Peterson @ 2024-11-14 17:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list, open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Nov 14, 2024 at 9:43 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Thu, Nov 14, 2024 at 02:37:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Wed, Nov 06, 2024 at 11:45:18PM +0000, Benjamin Peterson wrote:
> > > Add a test that checks that trace output is not lost to races. This is
> > > accomplished by tracing the exit_group syscall of "true" multiple times and
> > > checking for correct output.
> > >
> > > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > > trace: avoid garbage when not printing a trace event's arguments") because
> > > exit_group triggers the previously buggy printing behavior.
> > >
> > > Signed-off-by: Benjamin Peterson <benjamin@engflow.com>
> > > ---
> > >  tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> > >  1 file changed, 31 insertions(+)
> > >  create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> > >
> > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > new file mode 100755
> > > index 000000000000..8b70324bc5b4
> > > --- /dev/null
> > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > @@ -0,0 +1,31 @@
> > > +#!/bin/sh
> > > +# perf trace exit race
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +
> > > +# Check that the last events of a perf trace'd subprocess are not
> > > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > > +# the output contains 100 correct lines.
> > > +
> > > +# shellcheck source=lib/probe.sh
> > > +. "$(dirname $0)"/lib/probe.sh
> > > +
> > > +skip_if_no_perf_trace || exit 2
> > > +
> > > +trace_shutdown_race() {
> > > +   for i in $(seq 100); do
> > > +           perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > > +   done
> > > +   [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > > +}
> > > +
> > > +
> > > +file=$(mktemp /tmp/temporary_file.XXXXX)
> > > +
> > > +# Do not use whatever ~/.perfconfig file, it may change the output
> > > +# via trace.{show_timestamp,show_prefix,etc}
> > > +export PERF_CONFIG=/dev/null
> > > +
> > > +trace_shutdown_race
> > > +err=$?
> > > +rm -f ${file}
> > > +exit $err
> > > --
> >
> > Its failing with shellcheck, I'm trying to fix it:
> >
> >   CC      /tmp/build/perf-tools-next/builtin-trace.o
> >   TEST    /tmp/build/perf-tools-next/tests/shell/trace_exit_race.sh.shellcheck_log
> >
> > In tests/shell/trace_exit_race.sh line 15:
> >       for i in $(seq 100); do
> >         ^-^ SC2034 (warning): i appears unused. Verify use (or export if used externally).
> >
> >
> > In tests/shell/trace_exit_race.sh line 18:
> >       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> >           ^-- SC2046 (warning): Quote this to prevent word splitting.
> >
> > For more information:
> >   https://www.shellcheck.net/wiki/SC2034 -- i appears unused. Verify use (or ...
> >   https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt...
> > make[4]: *** [tests/Build:91: /tmp/build/perf-tools-next/tests/shell/trace_exit_race.sh.shellcheck_log] Error 1
> > make[3]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:158: tests] Error 2
> > make[2]: *** [Makefile.perf:777: /tmp/build/perf-tools-next/perf-test-in.o] Error 2
> > make[2]: *** Waiting for unfinished jobs....
>
> I've read the links provided by ShellCheck and folded this to satisfy
> it, please consider installing ShellCheck, the perf build process will
> use it when avaiable.

Thank you.

>
> - Arnaldo
>
> diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> index 8b70324bc5b4fb4c..c37ed6bb9f7e8fab 100755
> --- a/tools/perf/tests/shell/trace_exit_race.sh
> +++ b/tools/perf/tests/shell/trace_exit_race.sh
> @@ -12,10 +12,10 @@
>  skip_if_no_perf_trace || exit 2
>
>  trace_shutdown_race() {
> -       for i in $(seq 100); do
> +       for _ in $(seq 100); do
>                 perf trace -e syscalls:sys_enter_exit_group true 2>>$file
>         done
> -       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> +       [ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
>  }

This all looks okay.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-14 17:44       ` Benjamin Peterson
@ 2024-11-14 20:09         ` Arnaldo Carvalho de Melo
  2024-11-14 20:16           ` Howard Chu
  2024-11-14 20:50           ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-14 20:09 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list, open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Nov 14, 2024 at 09:44:56AM -0800, Benjamin Peterson wrote:
> > On Thu, Nov 14, 2024 at 02:37:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > index 8b70324bc5b4fb4c..c37ed6bb9f7e8fab 100755
> > --- a/tools/perf/tests/shell/trace_exit_race.sh
> > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > @@ -12,10 +12,10 @@
> >  skip_if_no_perf_trace || exit 2

> >  trace_shutdown_race() {
> > -       for i in $(seq 100); do
> > +       for _ in $(seq 100); do
> >                 perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> >         done
> > -       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > +       [ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
> >  }
 
> This all looks okay.

The test is failing for me as there is garbage in the output of perf
trace even with Namhyung having already applied your previous patch:

⬢ [acme@toolbox perf-tools-next]$ git log --oneline --author benjamin@engflow.com
12bd434f68ea45c7 (HEAD -> perf-tools-next) perf tests: Add test for 'perf trace' output loss
efbcd2cd7eac10a9 perf trace: Do not lose last events in a race
5fb8e56542a3cf46 (perf-tools-next/tmp.perf-tools-next) perf trace: avoid garbage when not printing a trace event's arguments
⬢ [acme@toolbox perf-tools-next]$

root@number:~# perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1982428 syscalls:sys_enter_exit_group( k?m)
root@number:~# perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1982430 syscalls:sys_enter_exit_group()
root@number:~# perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1982432 syscalls:sys_enter_exit_group()
root@number:~# perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1982434 syscalls:sys_enter_exit_group()
root@number:~# perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1982437 syscalls:sys_enter_exit_group( k�)
root@number:~# perf trace -e syscalls:sys_enter_exit_group true
     0.000 true/1982439 syscalls:sys_enter_exit_group(, loads 8��1)
root@number:~#

So we don't _miss_ the events, which I was noticing and brought me to
test your latest 2 patches, which I applied and added a Tested-by, now
tryint to figure out this garbage...

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-14 20:09         ` Arnaldo Carvalho de Melo
@ 2024-11-14 20:16           ` Howard Chu
  2024-11-14 21:03             ` Arnaldo Carvalho de Melo
  2024-11-14 20:50           ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 14+ messages in thread
From: Howard Chu @ 2024-11-14 20:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Benjamin Peterson, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, Liang, Kan, open list,
	open list:PERFORMANCE EVENTS SUBSYSTEM

Hello,

On Thu, Nov 14, 2024 at 12:09 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Thu, Nov 14, 2024 at 09:44:56AM -0800, Benjamin Peterson wrote:
> > > On Thu, Nov 14, 2024 at 02:37:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > index 8b70324bc5b4fb4c..c37ed6bb9f7e8fab 100755
> > > --- a/tools/perf/tests/shell/trace_exit_race.sh
> > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > @@ -12,10 +12,10 @@
> > >  skip_if_no_perf_trace || exit 2
>
> > >  trace_shutdown_race() {
> > > -       for i in $(seq 100); do
> > > +       for _ in $(seq 100); do
> > >                 perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > >         done
> > > -       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > > +       [ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
> > >  }
>
> > This all looks okay.
>
> The test is failing for me as there is garbage in the output of perf
> trace even with Namhyung having already applied your previous patch:
>
> ⬢ [acme@toolbox perf-tools-next]$ git log --oneline --author benjamin@engflow.com
> 12bd434f68ea45c7 (HEAD -> perf-tools-next) perf tests: Add test for 'perf trace' output loss
> efbcd2cd7eac10a9 perf trace: Do not lose last events in a race
> 5fb8e56542a3cf46 (perf-tools-next/tmp.perf-tools-next) perf trace: avoid garbage when not printing a trace event's arguments
> ⬢ [acme@toolbox perf-tools-next]$
>
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982428 syscalls:sys_enter_exit_group( k?m)
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982430 syscalls:sys_enter_exit_group()
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982432 syscalls:sys_enter_exit_group()
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982434 syscalls:sys_enter_exit_group()
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982437 syscalls:sys_enter_exit_group( k�)
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982439 syscalls:sys_enter_exit_group(, loads 8��1)
> root@number:~#
>
> So we don't _miss_ the events, which I was noticing and brought me to
> test your latest 2 patches, which I applied and added a Tested-by, now
> tryint to figure out this garbage...

Benjamin has already sent v2 to fix this:
https://lore.kernel.org/linux-perf-users/20241107232128.108981-2-benjamin@engflow.com/

Thanks,
Howard

>
> - Arnaldo
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-14 20:09         ` Arnaldo Carvalho de Melo
  2024-11-14 20:16           ` Howard Chu
@ 2024-11-14 20:50           ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-14 20:50 UTC (permalink / raw)
  To: Benjamin Peterson
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, open list, open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Nov 14, 2024 at 05:09:17PM -0300, Arnaldo Carvalho de Melo wrote:
> On Thu, Nov 14, 2024 at 09:44:56AM -0800, Benjamin Peterson wrote:
> > > On Thu, Nov 14, 2024 at 02:37:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > index 8b70324bc5b4fb4c..c37ed6bb9f7e8fab 100755
> > > --- a/tools/perf/tests/shell/trace_exit_race.sh
> > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > @@ -12,10 +12,10 @@
> > >  skip_if_no_perf_trace || exit 2
> 
> > >  trace_shutdown_race() {
> > > -       for i in $(seq 100); do
> > > +       for _ in $(seq 100); do
> > >                 perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > >         done
> > > -       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > > +       [ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
> > >  }
>  
> > This all looks okay.
> 
> The test is failing for me as there is garbage in the output of perf
> trace even with Namhyung having already applied your previous patch:
> 
> ⬢ [acme@toolbox perf-tools-next]$ git log --oneline --author benjamin@engflow.com
> 12bd434f68ea45c7 (HEAD -> perf-tools-next) perf tests: Add test for 'perf trace' output loss
> efbcd2cd7eac10a9 perf trace: Do not lose last events in a race
> 5fb8e56542a3cf46 (perf-tools-next/tmp.perf-tools-next) perf trace: avoid garbage when not printing a trace event's arguments
> ⬢ [acme@toolbox perf-tools-next]$
> 
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982428 syscalls:sys_enter_exit_group( k?m)
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982430 syscalls:sys_enter_exit_group()
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982432 syscalls:sys_enter_exit_group()
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982434 syscalls:sys_enter_exit_group()
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982437 syscalls:sys_enter_exit_group( k�)
> root@number:~# perf trace -e syscalls:sys_enter_exit_group true
>      0.000 true/1982439 syscalls:sys_enter_exit_group(, loads 8��1)
> root@number:~#
> 
> So we don't _miss_ the events, which I was noticing and brought me to
> test your latest 2 patches, which I applied and added a Tested-by, now
> tryint to figure out this garbage...

So, I fixed it with this simple patch:

And then, after adding this patch to the 'perf test' for the exit race
that you contributed:

⬢ [acme@toolbox perf-tools-next]$ git diff tools/perf/tests/
diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
index c37ed6bb9f7e8fab..2eb39abaadffc344 100755
--- a/tools/perf/tests/shell/trace_exit_race.sh
+++ b/tools/perf/tests/shell/trace_exit_race.sh
@@ -11,11 +11,17 @@
 
 skip_if_no_perf_trace || exit 2
 
+if [ "$1" = "-v" ]; then
+        verbose="1"
+fi
+
+regexp=" +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$"
+
 trace_shutdown_race() {
        for _ in $(seq 100); do
                perf trace -e syscalls:sys_enter_exit_group true 2>>$file
        done
-       [ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
+       [ "$(grep -c -E '$regexp' $file)" = "100" ]
 }
 
 
@@ -27,5 +33,11 @@ export PERF_CONFIG=/dev/null
 
 trace_shutdown_race
 err=$?
+
+if [ $err != 0 ] && [ "${verbose}" = "1" ]; then
+       echo "Lines not matching the expected regexp: '$regexp':"
+       grep -v -E "$regexp" $file
+fi
+
 rm -f ${file}
 exit $err
⬢ [acme@toolbox perf-tools-next]

⬢ [acme@toolbox perf-tools-next]$ git diff tools/perf/builtin-trace.c
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index ca5c3f5d6c389427..0834a11940c18f05 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2405,6 +2405,9 @@ static size_t syscall__scnprintf_args(struct syscall *sc, char *bf, size_t size,
                }
        }
 
+       if (printed == 0)
+               bf[0] = '\0';
+
        return printed;
 }
 
⬢ [acme@toolbox perf-tools-next]$

The test still fails but for another reason:

root@number:~# perf test -v "trace exit race"
--- start ---
test child forked, pid 2002298
Lines not matching the expected regexp: ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$':
     0.000 :2002317/2002317 syscalls:sys_enter_exit_group()
     0.000 :2002319/2002319 syscalls:sys_enter_exit_group()
     0.000 :2002353/2002353 syscalls:sys_enter_exit_group()
     0.000 :2002357/2002357 syscalls:sys_enter_exit_group()
     0.000 :2002376/2002376 syscalls:sys_enter_exit_group()
     0.000 :2002378/2002378 syscalls:sys_enter_exit_group()
     0.000 :2002393/2002393 syscalls:sys_enter_exit_group()
     0.000 :2002444/2002444 syscalls:sys_enter_exit_group()
     0.000 :2002491/2002491 syscalls:sys_enter_exit_group()
     0.000 :2002503/2002503 syscalls:sys_enter_exit_group()
---- end(-1) ----
110: perf trace exit race                                            : FAILED!
root@number:~# 

Somehow we're missing the PERF_RECORD_COMM or racing to get it from
/proc for those processes and thus we end up printing just the pid/ppid

Since what we are testing here is garbage when printing the argument
list, we can just disable comm printing and get consistent results:

root@number:~# perf trace --no-comm -e syscalls:sys_enter_exit_group true
     0.000 2003312 syscalls:sys_enter_exit_group()
root@number:~#

I'll take this route and sent a patch series.

- Arnaldo

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] perf tests: add test for trace output loss
  2024-11-14 20:16           ` Howard Chu
@ 2024-11-14 21:03             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-11-14 21:03 UTC (permalink / raw)
  To: Howard Chu
  Cc: Benjamin Peterson, Peter Zijlstra, Ingo Molnar, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, Liang, Kan, open list,
	open list:PERFORMANCE EVENTS SUBSYSTEM

On Thu, Nov 14, 2024 at 12:16:46PM -0800, Howard Chu wrote:
> Hello,
> 
> On Thu, Nov 14, 2024 at 12:09 PM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > On Thu, Nov 14, 2024 at 09:44:56AM -0800, Benjamin Peterson wrote:
> > > > On Thu, Nov 14, 2024 at 02:37:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > > index 8b70324bc5b4fb4c..c37ed6bb9f7e8fab 100755
> > > > --- a/tools/perf/tests/shell/trace_exit_race.sh
> > > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > > @@ -12,10 +12,10 @@
> > > >  skip_if_no_perf_trace || exit 2
> >
> > > >  trace_shutdown_race() {
> > > > -       for i in $(seq 100); do
> > > > +       for _ in $(seq 100); do
> > > >                 perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > > >         done
> > > > -       [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > > > +       [ "$(grep -c -E ' +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$' $file)" = "100" ]
> > > >  }
> >
> > > This all looks okay.
> >
> > The test is failing for me as there is garbage in the output of perf
> > trace even with Namhyung having already applied your previous patch:
> >
> > ⬢ [acme@toolbox perf-tools-next]$ git log --oneline --author benjamin@engflow.com
> > 12bd434f68ea45c7 (HEAD -> perf-tools-next) perf tests: Add test for 'perf trace' output loss
> > efbcd2cd7eac10a9 perf trace: Do not lose last events in a race
> > 5fb8e56542a3cf46 (perf-tools-next/tmp.perf-tools-next) perf trace: avoid garbage when not printing a trace event's arguments
> > ⬢ [acme@toolbox perf-tools-next]$
> >
> > root@number:~# perf trace -e syscalls:sys_enter_exit_group true
> >      0.000 true/1982428 syscalls:sys_enter_exit_group( k?m)
> > root@number:~# perf trace -e syscalls:sys_enter_exit_group true
> >      0.000 true/1982430 syscalls:sys_enter_exit_group()
> > root@number:~# perf trace -e syscalls:sys_enter_exit_group true
> >      0.000 true/1982432 syscalls:sys_enter_exit_group()
> > root@number:~# perf trace -e syscalls:sys_enter_exit_group true
> >      0.000 true/1982434 syscalls:sys_enter_exit_group()
> > root@number:~# perf trace -e syscalls:sys_enter_exit_group true
> >      0.000 true/1982437 syscalls:sys_enter_exit_group( k�)
> > root@number:~# perf trace -e syscalls:sys_enter_exit_group true
> >      0.000 true/1982439 syscalls:sys_enter_exit_group(, loads 8��1)
> > root@number:~#
> >
> > So we don't _miss_ the events, which I was noticing and brought me to
> > test your latest 2 patches, which I applied and added a Tested-by, now
> > tryint to figure out this garbage...
> 
> Benjamin has already sent v2 to fix this:
> https://lore.kernel.org/linux-perf-users/20241107232128.108981-2-benjamin@engflow.com/

Interesting, I wonder why b4 didn't pick the v2 :-\

I'll try it, but see my other message, maybe the way I fixed is simpler?
Its a two liner that could be turned into a one liner by right at the
start of the function setting bf[0] = '\0', that way we wouldn't need
even to check if printed == 0 at the end.

Anyway, will test the newer version.

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-11-14 21:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-06 23:45 [PATCH 1/2] perf trace: do not lose last events in a race Benjamin Peterson
2024-11-06 23:45 ` [PATCH 2/2] perf tests: add test for trace output loss Benjamin Peterson
2024-11-07 22:06   ` Howard Chu
2024-11-07 23:17     ` Benjamin Peterson
2024-11-08 16:20       ` Howard Chu
2024-11-08 16:59         ` Benjamin Peterson
2024-11-14 17:37   ` Arnaldo Carvalho de Melo
2024-11-14 17:43     ` Arnaldo Carvalho de Melo
2024-11-14 17:44       ` Benjamin Peterson
2024-11-14 20:09         ` Arnaldo Carvalho de Melo
2024-11-14 20:16           ` Howard Chu
2024-11-14 21:03             ` Arnaldo Carvalho de Melo
2024-11-14 20:50           ` Arnaldo Carvalho de Melo
2024-11-07 21:46 ` [PATCH 1/2] perf trace: do not lose last events in a race Howard Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).