Linux Trace Kernel
 help / color / mirror / Atom feed
* [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded
@ 2025-01-16 14:49 Tomas Glozar
  2025-01-16 14:49 ` [PATCH 1/5] rtla: Add trace_instance_stop Tomas Glozar
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Tomas Glozar @ 2025-01-16 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco, Tomas Glozar

We have been seeing an issue where if rtla is run on machines with a high
number of CPUs (100+), timerlat can generate more samples than rtla is able
to process via tracefs_iterate_raw_events. This is especially common when
the interval is set to 100us (rteval and cyclictest default) as opposed
to the rtla default of 1000us, but also happens with the rtla default.

Currently, this leads to rtla hanging and having to be terminated with
SIGTERM. SIGINT setting stop_tracing is not enough, since more and more
events are coming and tracefs_iterate_raw_events never exits.

This patchset contains two changes:
- Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are
generated when rtla is supposed exit. This fixes rtla hanging and should
go to stable.
- On receiving SIGINT/SIGALRM twice, abort iteration immediately with
tracefs_iterate_stop, making rtla exit right away instead of waiting for
all events to be processed. This is more of a usability feature: if the user
is in a hurry, they can Ctrl-C twice (or once after the duration has expired)
and exit immediately, discarding any events pending processing.

Note: I am sending those together only because the second one depends on
the first. Also this should be fixed in osnoise, too.

In the future, two more patchsets will be sent: one to display how many
events/samples were dropped (either left in tracefs buffer or by buffer
overflow), one to improve sample processing performance to be on par with
cyclictest (ideally) so that samples are not dropped in the cases mentioned
in the beginning of the email.

Tomas Glozar (5):
  rtla: Add trace_instance_stop
  rtla/timerlat_hist: Stop timerlat tracer on signal
  rtla/timerlat_top: Stop timerlat tracer on signal
  rtla/timerlat_hist: Abort event processing on second signal
  rtla/timerlat_top: Abort event processing on second signal

 tools/tracing/rtla/src/timerlat_hist.c | 19 ++++++++++++++++++-
 tools/tracing/rtla/src/timerlat_top.c  | 20 +++++++++++++++++++-
 tools/tracing/rtla/src/trace.c         |  8 ++++++++
 tools/tracing/rtla/src/trace.h         |  1 +
 4 files changed, 46 insertions(+), 2 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] rtla: Add trace_instance_stop
  2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
@ 2025-01-16 14:49 ` Tomas Glozar
  2025-01-16 14:49 ` [PATCH 2/5] rtla/timerlat_hist: Stop timerlat tracer on signal Tomas Glozar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Tomas Glozar @ 2025-01-16 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco, Tomas Glozar

Support not only turning trace on for thet timerlat tracer, but also
turning it off.

This will be used in subsequent patches to stop the timerlat tracer
without also wiping the trace buffer.

Cc: stable@vger.kernel.org
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/src/trace.c | 8 ++++++++
 tools/tracing/rtla/src/trace.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/tracing/rtla/src/trace.c b/tools/tracing/rtla/src/trace.c
index 170a706248ab..440323a997c6 100644
--- a/tools/tracing/rtla/src/trace.c
+++ b/tools/tracing/rtla/src/trace.c
@@ -196,6 +196,14 @@ int trace_instance_start(struct trace_instance *trace)
 	return tracefs_trace_on(trace->inst);
 }
 
+/*
+ * trace_instance_stop - stop tracing a given rtla instance
+ */
+int trace_instance_stop(struct trace_instance *trace)
+{
+	return tracefs_trace_off(trace->inst);
+}
+
 /*
  * trace_events_free - free a list of trace events
  */
diff --git a/tools/tracing/rtla/src/trace.h b/tools/tracing/rtla/src/trace.h
index c7c92dc9a18a..76e1b77291ba 100644
--- a/tools/tracing/rtla/src/trace.h
+++ b/tools/tracing/rtla/src/trace.h
@@ -21,6 +21,7 @@ struct trace_instance {
 
 int trace_instance_init(struct trace_instance *trace, char *tool_name);
 int trace_instance_start(struct trace_instance *trace);
+int trace_instance_stop(struct trace_instance *trace);
 void trace_instance_destroy(struct trace_instance *trace);
 
 struct trace_seq *get_trace_seq(void);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] rtla/timerlat_hist: Stop timerlat tracer on signal
  2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
  2025-01-16 14:49 ` [PATCH 1/5] rtla: Add trace_instance_stop Tomas Glozar
@ 2025-01-16 14:49 ` Tomas Glozar
  2025-01-16 14:49 ` [PATCH 3/5] rtla/timerlat_top: " Tomas Glozar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Tomas Glozar @ 2025-01-16 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco, Tomas Glozar

Currently, when either SIGINT from the user or SIGALRM from the duration
timer is caught by rtla-timerlat, stop_tracing is set to break out of
the main loop. This is not sufficient for cases where the timerlat
tracer is producing more data than rtla can consume, since in that case,
rtla is looping indefinitely inside tracefs_iterate_raw_events, never
reaches the check of stop_tracing and hangs.

In addition to setting stop_tracing, also stop the timerlat tracer on
received signal (SIGINT or SIGALRM). This will stop new samples so that
the existing samples may be processed and tracefs_iterate_raw_events
eventually exits.

Cc: stable@vger.kernel.org
Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/src/timerlat_hist.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c
index 4403cc4eba30..e8d249e22251 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -1146,9 +1146,12 @@ static struct osnoise_tool
 }
 
 static int stop_tracing;
+static struct trace_instance *hist_inst = NULL;
 static void stop_hist(int sig)
 {
 	stop_tracing = 1;
+	if (hist_inst)
+		trace_instance_stop(hist_inst);
 }
 
 /*
@@ -1195,6 +1198,12 @@ int timerlat_hist_main(int argc, char *argv[])
 	}
 
 	trace = &tool->trace;
+	/*
+	 * Save trace instance into global variable so that SIGINT can stop
+	 * the timerlat tracer.
+	 * Otherwise, rtla could loop indefinitely when overloaded.
+	 */
+	hist_inst = trace;
 
 	retval = enable_timerlat(trace);
 	if (retval) {
@@ -1363,7 +1372,7 @@ int timerlat_hist_main(int argc, char *argv[])
 
 	return_value = 0;
 
-	if (trace_is_off(&tool->trace, &record->trace)) {
+	if (trace_is_off(&tool->trace, &record->trace) && !stop_tracing) {
 		printf("rtla timerlat hit stop tracing\n");
 
 		if (!params->no_aa)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] rtla/timerlat_top: Stop timerlat tracer on signal
  2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
  2025-01-16 14:49 ` [PATCH 1/5] rtla: Add trace_instance_stop Tomas Glozar
  2025-01-16 14:49 ` [PATCH 2/5] rtla/timerlat_hist: Stop timerlat tracer on signal Tomas Glozar
@ 2025-01-16 14:49 ` Tomas Glozar
  2025-01-17  0:56   ` Steven Rostedt
  2025-01-16 14:49 ` [PATCH 4/5] rtla/timerlat_hist: Abort event processing on second signal Tomas Glozar
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Tomas Glozar @ 2025-01-16 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco, Tomas Glozar

Apply the changes from the previous patch also to timerlat-top.

Cc: stable@vger.kernel.org
Fixes: a828cd18bc4a ("rtla: Add timerlat tool and timelart top mode")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/src/timerlat_top.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index 059b468981e4..d21a21053917 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -900,9 +900,12 @@ static struct osnoise_tool
 }
 
 static int stop_tracing;
+static struct trace_instance *top_inst = NULL;
 static void stop_top(int sig)
 {
 	stop_tracing = 1;
+	if (top_inst)
+		trace_instance_stop(top_inst);
 }
 
 /*
@@ -950,6 +953,13 @@ int timerlat_top_main(int argc, char *argv[])
 	}
 
 	trace = &top->trace;
+	/*
+	* Save trace instance into global variable so that SIGINT can stop
+	* the timerlat tracer.
+	* Otherwise, rtla could loop indefinitely when overloaded.
+	*/
+	top_inst = trace;
+
 
 	retval = enable_timerlat(trace);
 	if (retval) {
@@ -1131,7 +1141,7 @@ int timerlat_top_main(int argc, char *argv[])
 
 	return_value = 0;
 
-	if (trace_is_off(&top->trace, &record->trace)) {
+	if (trace_is_off(&top->trace, &record->trace) && !stop_tracing) {
 		printf("rtla timerlat hit stop tracing\n");
 
 		if (!params->no_aa)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] rtla/timerlat_hist: Abort event processing on second signal
  2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
                   ` (2 preceding siblings ...)
  2025-01-16 14:49 ` [PATCH 3/5] rtla/timerlat_top: " Tomas Glozar
@ 2025-01-16 14:49 ` Tomas Glozar
  2025-01-16 14:49 ` [PATCH 5/5] rtla/timerlat_top: " Tomas Glozar
  2025-01-17  0:46 ` [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Steven Rostedt
  5 siblings, 0 replies; 15+ messages in thread
From: Tomas Glozar @ 2025-01-16 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco, Tomas Glozar

If either SIGINT is received twice, or after a SIGALRM (that is, after
timerlat was supposed to stop), abort processing events currently left
in the tracefs buffer and exit immediately.

This allows the user to exit rtla without waiting for processing all
events, should that take longer than wanted, at the cost of not
processing all samples.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/src/timerlat_hist.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c
index e8d249e22251..d0a4d20b7196 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -1149,6 +1149,14 @@ static int stop_tracing;
 static struct trace_instance *hist_inst = NULL;
 static void stop_hist(int sig)
 {
+	if (stop_tracing) {
+		/*
+		 * Stop requested twice in a row; abort event processing and
+		 * exit immediately
+		 */
+		tracefs_iterate_stop(hist_inst->inst);
+		return;
+	}
 	stop_tracing = 1;
 	if (hist_inst)
 		trace_instance_stop(hist_inst);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] rtla/timerlat_top: Abort event processing on second signal
  2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
                   ` (3 preceding siblings ...)
  2025-01-16 14:49 ` [PATCH 4/5] rtla/timerlat_hist: Abort event processing on second signal Tomas Glozar
@ 2025-01-16 14:49 ` Tomas Glozar
  2025-01-17  0:57   ` Steven Rostedt
  2025-01-17  6:58   ` Gabriele Monaco
  2025-01-17  0:46 ` [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Steven Rostedt
  5 siblings, 2 replies; 15+ messages in thread
From: Tomas Glozar @ 2025-01-16 14:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco, Tomas Glozar

Apply the changes from the previous patch also to timerlat-top.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
---
 tools/tracing/rtla/src/timerlat_top.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index d21a21053917..d358cd39f360 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -903,6 +903,14 @@ static int stop_tracing;
 static struct trace_instance *top_inst = NULL;
 static void stop_top(int sig)
 {
+	if (stop_tracing) {
+		/*
+		 * Stop requested twice in a row; abort event processing and
+		 * exit immediately
+		 */
+		tracefs_iterate_stop(top_inst->inst);
+		return;
+	}
 	stop_tracing = 1;
 	if (top_inst)
 		trace_instance_stop(top_inst);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded
  2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
                   ` (4 preceding siblings ...)
  2025-01-16 14:49 ` [PATCH 5/5] rtla/timerlat_top: " Tomas Glozar
@ 2025-01-17  0:46 ` Steven Rostedt
  2025-01-17 12:04   ` Tomas Glozar
  5 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2025-01-17  0:46 UTC (permalink / raw)
  To: Tomas Glozar
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

On Thu, 16 Jan 2025 15:49:26 +0100
Tomas Glozar <tglozar@redhat.com> wrote:

> In the future, two more patchsets will be sent: one to display how many
> events/samples were dropped (either left in tracefs buffer or by buffer
> overflow), one to improve sample processing performance to be on par with
> cyclictest (ideally) so that samples are not dropped in the cases mentioned
> in the beginning of the email.

Hmm, I wonder if timerlat can handle per cpu data, then you could kick off
a thread per CPU (or a set of CPUs) where the thread is responsible for
handling the data.


		CPU_ZERO_S(cpu_size, cpusetp);
		CPU_SET_S(cpu, cpu_size, cpusetp);
                retval = tracefs_iterate_raw_events(trace->tep,
                                trace->inst,
                                cpusetp,
                                cpu_size,
                                collect_registered_events,
                                                    trace);

And then that iteration will only read over a subset of CPUs. Each thread
can do a different subset and then it should be able to keep up.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] rtla/timerlat_top: Stop timerlat tracer on signal
  2025-01-16 14:49 ` [PATCH 3/5] rtla/timerlat_top: " Tomas Glozar
@ 2025-01-17  0:56   ` Steven Rostedt
  2025-01-17  7:13     ` Tomas Glozar
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2025-01-17  0:56 UTC (permalink / raw)
  To: Tomas Glozar
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

On Thu, 16 Jan 2025 15:49:29 +0100
Tomas Glozar <tglozar@redhat.com> wrote:

> Apply the changes from the previous patch also to timerlat-top.

A change log should never reference another patch. This is meaningless when
seen in a git log. All change logs must be complete stand alone.

I copied the previous patch change log here:

    rtla/timerlat_top: Stop timerlat tracer on signal
    
    Currently, when either SIGINT from the user or SIGALRM from the duration
    timer is caught by rtla-timerlat, stop_tracing is set to break out of
    the main loop. This is not sufficient for cases where the timerlat
    tracer is producing more data than rtla can consume, since in that case,
    rtla is looping indefinitely inside tracefs_iterate_raw_events, never
    reaches the check of stop_tracing and hangs.
    
    In addition to setting stop_tracing, also stop the timerlat tracer on
    received signal (SIGINT or SIGALRM). This will stop new samples so that
    the existing samples may be processed and tracefs_iterate_raw_events
    eventually exits.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] rtla/timerlat_top: Abort event processing on second signal
  2025-01-16 14:49 ` [PATCH 5/5] rtla/timerlat_top: " Tomas Glozar
@ 2025-01-17  0:57   ` Steven Rostedt
  2025-01-17  6:58   ` Gabriele Monaco
  1 sibling, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2025-01-17  0:57 UTC (permalink / raw)
  To: Tomas Glozar
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

On Thu, 16 Jan 2025 15:49:31 +0100
Tomas Glozar <tglozar@redhat.com> wrote:

> Apply the changes from the previous patch also to timerlat-top.
> 

Same here. I replaced this with:

    rtla/timerlat_top: Abort event processing on second signal
    
    If either SIGINT is received twice, or after a SIGALRM (that is, after
    timerlat was supposed to stop), abort processing events currently left
    in the tracefs buffer and exit immediately.
    
    This allows the user to exit rtla without waiting for processing all
    events, should that take longer than wanted, at the cost of not
    processing all samples.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] rtla/timerlat_top: Abort event processing on second signal
  2025-01-16 14:49 ` [PATCH 5/5] rtla/timerlat_top: " Tomas Glozar
  2025-01-17  0:57   ` Steven Rostedt
@ 2025-01-17  6:58   ` Gabriele Monaco
  1 sibling, 0 replies; 15+ messages in thread
From: Gabriele Monaco @ 2025-01-17  6:58 UTC (permalink / raw)
  To: Tomas Glozar, Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves

On Thu, 2025-01-16 at 15:49 +0100, Tomas Glozar wrote:
> Apply the changes from the previous patch also to timerlat-top.
> 
> Signed-off-by: Tomas Glozar <tglozar@redhat.com>
> ---
>  tools/tracing/rtla/src/timerlat_top.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/tools/tracing/rtla/src/timerlat_top.c
> b/tools/tracing/rtla/src/timerlat_top.c
> index d21a21053917..d358cd39f360 100644
> --- a/tools/tracing/rtla/src/timerlat_top.c
> +++ b/tools/tracing/rtla/src/timerlat_top.c
> @@ -903,6 +903,14 @@ static int stop_tracing;
>  static struct trace_instance *top_inst = NULL;
>  static void stop_top(int sig)
>  {
> +	if (stop_tracing) {
> +		/*
> +		 * Stop requested twice in a row; abort event
> processing and
> +		 * exit immediately
> +		 */
> +		tracefs_iterate_stop(top_inst->inst);
> +		return;
> +	}
>  	stop_tracing = 1;
>  	if (top_inst)
>  		trace_instance_stop(top_inst);

I confirm the patchset works as expected on a 128 cores machine.
That is like the machine where the problem was first observed: run
timerlat with -p 100 and it would hang.

Now running it with -d terminates (a bit) after the expiration of the
timer with a sane report, sending a SIGINT terminates it too and
sending 2 SIGINT terminates it almost instantaneously.	

This works on both timerlat top and hist, with both -u and -k .
The report only gets printed in the end (as if -q was passed), but
these patches are not meant to fix that.

Tested-by: Gabriele Monaco <gmonaco@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] rtla/timerlat_top: Stop timerlat tracer on signal
  2025-01-17  0:56   ` Steven Rostedt
@ 2025-01-17  7:13     ` Tomas Glozar
  2025-01-17 10:50       ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Tomas Glozar @ 2025-01-17  7:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

pá 17. 1. 2025 v 1:56 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal:
>
> On Thu, 16 Jan 2025 15:49:29 +0100
> Tomas Glozar <tglozar@redhat.com> wrote:
>
> > Apply the changes from the previous patch also to timerlat-top.
>
> A change log should never reference another patch. This is meaningless when
> seen in a git log. All change logs must be complete stand alone.

If you look up "previous patch" in the Linux commit log, you will find
a considerable amount of patches which do this:

$ git log master | grep 'previous patch' | wc -l
3006

But you are right, if I apply this patch for example to fix an old
version of rtla that does not have timerlat-hist, the log will make no
sense. I will stop doing it.

>
> I copied the previous patch change log here:
>

Thank you.

Tomas


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] rtla/timerlat_top: Stop timerlat tracer on signal
  2025-01-17  7:13     ` Tomas Glozar
@ 2025-01-17 10:50       ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2025-01-17 10:50 UTC (permalink / raw)
  To: Tomas Glozar
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

On Fri, 17 Jan 2025 08:13:26 +0100
Tomas Glozar <tglozar@redhat.com> wrote:

> > A change log should never reference another patch. This is meaningless when
> > seen in a git log. All change logs must be complete stand alone.  
> 
> If you look up "previous patch" in the Linux commit log, you will find
> a considerable amount of patches which do this:
> 
> $ git log master | grep 'previous patch' | wc -l
> 3006

As Linus has scolded me before with "Just because someone else did it wrong
doesn't give you the excuse to do it wrong too" ;-)

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded
  2025-01-17  0:46 ` [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Steven Rostedt
@ 2025-01-17 12:04   ` Tomas Glozar
  2025-01-17 15:29     ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Tomas Glozar @ 2025-01-17 12:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

pá 17. 1. 2025 v 1:46 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal:
> Hmm, I wonder if timerlat can handle per cpu data, then you could kick off
> a thread per CPU (or a set of CPUs) where the thread is responsible for
> handling the data.
>
>
>                 CPU_ZERO_S(cpu_size, cpusetp);
>                 CPU_SET_S(cpu, cpu_size, cpusetp);
>                 retval = tracefs_iterate_raw_events(trace->tep,
>                                 trace->inst,
>                                 cpusetp,
>                                 cpu_size,
>                                 collect_registered_events,
>                                                     trace);
>
> And then that iteration will only read over a subset of CPUs. Each thread
> can do a different subset and then it should be able to keep up.
>

That's a good idea, I didn't think of that. But it doesn't help much
in a scenario where rtla is pinned to a few housekeeping CPUs with -H,
which is used for testing isolated-CPU-based setups.

I was thinking of turning timerlat_hist_handler/timerlat_top_handler
into a BPF program and having it executed right after the sample is
created, e.g. by using the BPF perf interface to hook it to a
tracepoint event. The histogram/counter would be stored in BPF maps,
which would be merely copied over in the main loop. This is
essentially how cyclictest does it, except in userspace. I expect this
solution to have good performance, but the obvious downside is that it
requires BPF. This is not a problem for us, but might be for other
rtla users and we'd likely have to keep both implementations of sample
processing in the code.

Also, before even starting with that, it would be likely necessary to
remove the duplicate code throughout timerlat/osnoise and test it
properly, so we don't have to do the same code changes twice or four
times.

Tomas


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded
  2025-01-17 12:04   ` Tomas Glozar
@ 2025-01-17 15:29     ` Steven Rostedt
  2025-01-17 15:55       ` Tomas Glozar
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2025-01-17 15:29 UTC (permalink / raw)
  To: Tomas Glozar
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

On Fri, 17 Jan 2025 13:04:07 +0100
Tomas Glozar <tglozar@redhat.com> wrote:


> I was thinking of turning timerlat_hist_handler/timerlat_top_handler
> into a BPF program and having it executed right after the sample is
> created, e.g. by using the BPF perf interface to hook it to a
> tracepoint event. The histogram/counter would be stored in BPF maps,
> which would be merely copied over in the main loop. This is
> essentially how cyclictest does it, except in userspace. I expect this
> solution to have good performance, but the obvious downside is that it
> requires BPF. This is not a problem for us, but might be for other
> rtla users and we'd likely have to keep both implementations of sample
> processing in the code.
> 
> Also, before even starting with that, it would be likely necessary to
> remove the duplicate code throughout timerlat/osnoise and test it
> properly, so we don't have to do the same code changes twice or four
> times.

We could also add kernel helpers to the code if it would help.

Hmm, the timerlat event could probably get access to a trigger to allow it
to do the work in the kernel like what the 'hist' triggers do. We can
extend on that.

The reason I haven't written a BPF program yet, is because when I feel
there's a useful operation that can be done, I just extend ftrace to do it
;-)

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded
  2025-01-17 15:29     ` Steven Rostedt
@ 2025-01-17 15:55       ` Tomas Glozar
  0 siblings, 0 replies; 15+ messages in thread
From: Tomas Glozar @ 2025-01-17 15:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-trace-kernel, linux-kernel, John Kacur, Luis Goncalves,
	Gabriele Monaco

pá 17. 1. 2025 v 16:29 odesílatel Steven Rostedt <rostedt@goodmis.org> napsal:
> Hmm, the timerlat event could probably get access to a trigger to allow it
> to do the work in the kernel like what the 'hist' triggers do. We can
> extend on that.
>

If we only need to count the numbers, that would be a perfect
solution: use the hist trigger in place of the BPF program (no need to
reinvent the wheel!). I'll have to look at exactly what we need.

> The reason I haven't written a BPF program yet, is because when I feel
> there's a useful operation that can be done, I just extend ftrace to do it
> ;-)

I used to work with BPF, so I usually think of a BPF solution first,
then I think about whether I can replace the BPF part :D

Tomas


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-01-17 15:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-16 14:49 [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Tomas Glozar
2025-01-16 14:49 ` [PATCH 1/5] rtla: Add trace_instance_stop Tomas Glozar
2025-01-16 14:49 ` [PATCH 2/5] rtla/timerlat_hist: Stop timerlat tracer on signal Tomas Glozar
2025-01-16 14:49 ` [PATCH 3/5] rtla/timerlat_top: " Tomas Glozar
2025-01-17  0:56   ` Steven Rostedt
2025-01-17  7:13     ` Tomas Glozar
2025-01-17 10:50       ` Steven Rostedt
2025-01-16 14:49 ` [PATCH 4/5] rtla/timerlat_hist: Abort event processing on second signal Tomas Glozar
2025-01-16 14:49 ` [PATCH 5/5] rtla/timerlat_top: " Tomas Glozar
2025-01-17  0:57   ` Steven Rostedt
2025-01-17  6:58   ` Gabriele Monaco
2025-01-17  0:46 ` [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded Steven Rostedt
2025-01-17 12:04   ` Tomas Glozar
2025-01-17 15:29     ` Steven Rostedt
2025-01-17 15:55       ` Tomas Glozar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox