Re: [PATCH] tracing/timerlat: Add latency threshold

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Bristot de Oliveira <bristot@kernel.org>
To: Costa Shulyupin <costa.shul@redhat.com>, linux-rt-users@vger.kernel.org
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"open list:TRACING" <linux-kernel@vger.kernel.org>,
	"open list:TRACING" <linux-trace-kernel@vger.kernel.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>
Subject: Re: [PATCH] tracing/timerlat: Add latency threshold
Date: Mon, 17 Jul 2023 17:36:00 +0200	[thread overview]
Message-ID: <9b212c49-ab51-0cd9-5871-c145c5f41b48@kernel.org> (raw)
In-Reply-To: <20230716193000.231406-1-costa.shul@redhat.com>

Hi Costa,

On 7/16/23 21:30, Costa Shulyupin wrote:
> The timerlat tracer generates a huge amount of traces.
> This affects the performance of the system and
> the delays we are trying to measure with timerlat.

More explanations in the code, but that is not the dominant factor,
neither relevant for large spikes.

> However, we are often interested in spikes of delay
> rather than small values.

That is why the stop tracing exists. One can set stop tracing,
ignore the trace buffer while tracing is on. Once tracing
turns off, tail the trace.

rtla timerlat has the --aa-only that ignores the trace until
the tracing stops, and then do the auto-analysis based on the
tail of the trace.

> 
> The patch effectively filters out irrelevant traces
> before they are generated and produces more reliable
> data.

More in the code, but it is not more reliable data... did
you mean human-readable data?

> 
> This patch helped to debug a very big problem
> and find this solution:
> https://lore.kernel.org/lkml/20221208075604.811710-1-junxiao.chang@intel.com/

Can you explain more about this? Because AFAICT, the patch above is
dealing with a SOFT timer. timerlat uses a HARD timer, so timerlat
is not affected by that bug... so I might be missing a point here.

> 
> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
> ---
>  Documentation/trace/timerlat-tracer.rst |  1 +
>  kernel/trace/trace_osnoise.c            | 17 +++++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/Documentation/trace/timerlat-tracer.rst b/Documentation/trace/timerlat-tracer.rst
> index 53a56823e903..71b1c63ca403 100644
> --- a/Documentation/trace/timerlat-tracer.rst
> +++ b/Documentation/trace/timerlat-tracer.rst
> @@ -68,6 +68,7 @@ directory. The timerlat configs are:
>  
>   - cpus: CPUs at which a timerlat thread will execute.
>   - timerlat_period_us: the period of the timerlat thread.
> + - timerlat_threshold_ns: filter out timer latencies below the threshold

All the interface is in us....

>   - stop_tracing_us: stop the system tracing if a
>     timer latency at the *irq* context higher than the configured
>     value happens. Writing 0 disables this option.
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index bd0d01d00fb9..43284a1e8bea 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -346,6 +346,7 @@ static struct osnoise_data {
>  	u64	stop_tracing_total;	/* stop trace in the final operation (report/thread) */
>  #ifdef CONFIG_TIMERLAT_TRACER
>  	u64	timerlat_period;	/* timerlat period */
> +	u64	timerlat_threshold_ns;

comment....

>  	u64	print_stack;		/* print IRQ stack if total > */
>  	int	timerlat_tracer;	/* timerlat tracer */
>  #endif
> @@ -358,6 +359,7 @@ static struct osnoise_data {
>  #ifdef CONFIG_TIMERLAT_TRACER
>  	.print_stack			= 0,
>  	.timerlat_period		= DEFAULT_TIMERLAT_PERIOD,
> +	.timerlat_threshold_ns		= 0,
>  	.timerlat_tracer		= 0,
>  #endif
>  };
> @@ -597,6 +599,10 @@ static void trace_timerlat_sample(struct timerlat_sample *sample)
>  	struct osnoise_instance *inst;
>  	struct trace_buffer *buffer;
>  
> +	if (osnoise_data.timerlat_threshold_ns &&
> +	    sample->timer_latency < osnoise_data.timerlat_threshold_ns)
> +		return;
> +

From the entire timerlat machinery, skipping the print is not the
dominant factor. It can save a few nanoseconds, but the values are already
produced at this point... so why not?

You mentioned that this is useful for large spikes, so... it will not be the few
ns to cause problems.

And I'm not convinced that it produces more reliable results.
For example, if you find a large latency in the thread context, but skip
the IRQ context trace, you will lose useful data.... so your analysis is
prone to be less reliable.

Did you mean readable? improve the readability... That I can see.

Also, for debugging purposes, one needs to enable more events... at least
the osnoise: ones. These events happen more frequently than the timerlat
ones... and they are not being filtered here... Filtering them using the
event filtering interface will not help either, e.g., hiding 1k one us
interrupts bomb will mask you a 1ms delay...

So, as I said, I am not convinced that adding this interface file is
better than setting the stop-tracing option... and then waiting for
the trace to stop... parsing only the tail of the buffer.

Thoughts?

-- Daniel

next prev parent reply	other threads:[~2023-07-17 15:36 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-16 19:30 [PATCH] tracing/timerlat: Add latency threshold Costa Shulyupin
2023-07-17 15:36 ` Daniel Bristot de Oliveira [this message]
2023-07-22  8:04   ` Costa Shulyupin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b212c49-ab51-0cd9-5871-c145c5f41b48@kernel.org \
    --to=bristot@kernel.org \
    --cc=corbet@lwn.net \
    --cc=costa.shul@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).