How to find long execution times in kernel threads?

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

* How to find long execution times in kernel threads?
@ 2013-04-30 13:48 Stanislav Meduna
  2013-04-30 16:52 ` Carsten Emde
  0 siblings, 1 reply; 6+ messages in thread
From: Stanislav Meduna @ 2013-04-30 13:48 UTC (permalink / raw)
  To: linux-rt-users@vger.kernel.org

Hi,

I am experimenting with the cyclictest on a system running
several RT threads to find out where some latencies are
coming from and to get a better undertanding how the system
behaves.

The setup is:

class prio   thread
FF    99     posixcputmr
FF    50     irq threads
FF    47     ksoftirqd
RR    45-25  my application exercising hrtimers, network, ...

Now I am trying to run
  cyclictest -l100000000 -m -n -a0 -t1 -i200 -h400
at several priorities. The results after a few minutes are:

prio   avg    max
55     31     74
49     33     1124 (takes tens of seconds to record >1000)
46     33     1190 (nearly immediately >1000)

This is with 3.4.41-rt55 on a 500 MHz Geode LX, so 1 ms is quite
a lot of work.

What is the best approach to find
- which irq generates the occasional spikes
- why the ksoftirqd regularly generates spikes and what
  soft-irq is responsible

These are obviously not 'hard' latencies - a runnable higher prio
thread gets the CPU - so the latency histograms won't show them.

Basically I'd like to trace what happened between a particular
process (not the highest priority one) was woken up until
it got scheduled, and stop if this took longer than some
defined time. Can this be done with the current tracing
infrastructure?

Thanks
-- 
                                       Stano

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to find long execution times in kernel threads?
  2013-04-30 13:48 How to find long execution times in kernel threads? Stanislav Meduna
@ 2013-04-30 16:52 ` Carsten Emde
  2013-04-30 18:02   ` Stanislav Meduna
  0 siblings, 1 reply; 6+ messages in thread
From: Carsten Emde @ 2013-04-30 16:52 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: Linux RT Users

Stano,

> I am experimenting with the cyclictest on a system running
> several RT threads to find out where some latencies are
> coming from and to get a better undertanding how the system
> behaves.
>[..]
> Basically I'd like to trace what happened between a particular
> process (not the highest priority one) was woken up until
> it got scheduled, and stop if this took longer than some
> defined time. Can this be done with the current tracing
> infrastructure?
Sure, record everything (trace-cmd record -e all) and analyze the 
traces. Use kernelshark, if you prefer a GUI. I wrote a blog article on 
this topic about three years ago:
   https://www.osadl.org/Single-View.111+M5dc445b8ee7.0.html

Once you get an idea what is going on and which traces, events and/or 
processes are particularly interesting, you may filter the recording 
(trace-cmd record --help) in order to focus on your case and to reduce 
Heisenberg.

Hope this helps,
	-Carsten.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to find long execution times in kernel threads?
  2013-04-30 16:52 ` Carsten Emde
@ 2013-04-30 18:02   ` Stanislav Meduna
  2013-05-01  3:58     ` Ashoka K
  2013-05-01  7:37     ` Carsten Emde
  0 siblings, 2 replies; 6+ messages in thread
From: Stanislav Meduna @ 2013-04-30 18:02 UTC (permalink / raw)
  To: Carsten Emde; +Cc: Linux RT Users

On 30.04.2013 18:52, Carsten Emde wrote:

> Sure, record everything (trace-cmd record -e all) and analyze the
> traces.

Well, the problem is that with record -e all I get 20 ms instead
of 1 ms, so I'd say Mr. Heisenberg laughs at me loudly here ;)

Thanks for the pointer anyway, I'll try to thoroughly read
the available documentation and come up with some sane set
of filters to start with.

Thanks
-- 
                                      Stano

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to find long execution times in kernel threads?
  2013-04-30 18:02   ` Stanislav Meduna
@ 2013-05-01  3:58     ` Ashoka K
  2013-05-01  7:37     ` Carsten Emde
  1 sibling, 0 replies; 6+ messages in thread
From: Ashoka K @ 2013-05-01  3:58 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: Carsten Emde, Linux RT Users

Even i had some issues in schedule latencies. i tried to use trace-cmd
record -e all and kernelshark.
What i found was since it is run as a separate process it also need to
get scheduled to get the traces and by default it is get scheduled in
SCHED_OTHER. This tracing is also affected by the values in
  /proc/sys/kernel/sched_rt_runtime_us  and  sched_rt_period_us.

So what i understood is for initial measurements kernelshark is a good
tool, but to go more deep this is not enough.
May be profiling tools like OPROF should be good. I am still exploring this.

Regards
Ashoka.  K

On Tue, Apr 30, 2013 at 11:32 PM, Stanislav Meduna <stano@meduna.org> wrote:
> On 30.04.2013 18:52, Carsten Emde wrote:
>
>> Sure, record everything (trace-cmd record -e all) and analyze the
>> traces.
>
> Well, the problem is that with record -e all I get 20 ms instead
> of 1 ms, so I'd say Mr. Heisenberg laughs at me loudly here ;)
>
> Thanks for the pointer anyway, I'll try to thoroughly read
> the available documentation and come up with some sane set
> of filters to start with.
>
> Thanks
> --
>                                       Stano
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to find long execution times in kernel threads?
  2013-04-30 18:02   ` Stanislav Meduna
  2013-05-01  3:58     ` Ashoka K
@ 2013-05-01  7:37     ` Carsten Emde
  2013-05-01  8:19       ` Stanislav Meduna
  1 sibling, 1 reply; 6+ messages in thread
From: Carsten Emde @ 2013-05-01  7:37 UTC (permalink / raw)
  To: Stanislav Meduna; +Cc: Linux RT Users

Stano,

>> Sure, record everything (trace-cmd record -e all) and analyze the
>> traces.
> Well, the problem is that with record -e all I get 20 ms instead
> of 1 ms, so I'd say Mr. Heisenberg laughs at me loudly here ;)
He who laughs last, laughs loudest. Since the entire system slows down, 
you may only need to adapt cycles and thresholds accordingly. Tracing is 
a very useful tool to identify sources of latencies and has helped a lot 
to make Linux RT as good as it is today. As a role of thumb, for 
example, we use five times longer cycles when enabling function tracing. 
If, for example, the output of

   cyclictest -m -M -Sp90 -i100 -d0

most of the time indicates a worst-case latency of about 50 microseconds 
but is interspersed with sporadic latencies of more than 500 microseconds,

   cyclictest -m -M -Sp90 -i500 -d0 -fb1000

probably will break at the first occurrence of the latency in question 
and let you diagnose its origin at the end of the trace output.

	-Carsten.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How to find long execution times in kernel threads?
  2013-05-01  7:37     ` Carsten Emde
@ 2013-05-01  8:19       ` Stanislav Meduna
  0 siblings, 0 replies; 6+ messages in thread
From: Stanislav Meduna @ 2013-05-01  8:19 UTC (permalink / raw)
  To: Carsten Emde; +Cc: Linux RT Users

On 01.05.2013 09:37, Carsten Emde wrote:

> He who laughs last, laughs loudest. Since the entire system slows down,
> you may only need to adapt cycles and thresholds accordingly.

Yep, I will try it. However as this is connected to a real application
running parallel and generating some load pattern this is not so easy -
for example I'll need to go down with the HZ, otherwise the timer
ticking once per ms can be already too much, which in turn will
probably affect the RR threads etc. There are also constraints
from the outside world that have to be met while running - these
are still far, but with tracing enabled not anymore an order
of magnitude far.

That I am running this on a low-spec embedded systems that already
struggles with saving the data does not make it easier either.

>From what I was already able to capture I did not see anything
that seems plainly wrong, the ksoftirqd just had really a lot
of work with hrtimers, RCU, network etc. But I am not sure whether
I caught _the_ situation.

> Tracing is a very useful tool to identify sources of latencies
> and has helped a lot to make Linux RT as good as it is today.

Yep, the infrastructure is fantastic.

>   cyclictest -m -M -Sp90 -i500 -d0 -fb1000
> 
> probably will break at the first occurrence of the latency in question
> and let you diagnose its origin at the end of the trace output.

I tried this but I was unable to load a trace.dat produced
by breaking the tracing this way into kernelshark on at least
two occasions. Only stopping the trace from trace-cmd itself
worked. Are there some known issues?

I am using trace-cmd from git. The recording and examining
system are not the same, but are reasonably similar library-wise
(both are Debian squeeze-based).

Thanks
-- 
                                  Stano

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-01  8:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-30 13:48 How to find long execution times in kernel threads? Stanislav Meduna
2013-04-30 16:52 ` Carsten Emde
2013-04-30 18:02   ` Stanislav Meduna
2013-05-01  3:58     ` Ashoka K
2013-05-01  7:37     ` Carsten Emde
2013-05-01  8:19       ` Stanislav Meduna

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox