From mboxrd@z Thu Jan 1 00:00:00 1970 From: gilles.chanteperdrix@xenomai.org (Gilles Chanteperdrix) Date: Mon, 27 Jun 2011 22:17:01 +0200 Subject: IRQS off tracer - is it useful or not? In-Reply-To: <20110627165423.GA6222@n2100.arm.linux.org.uk> References: <20110625132112.GH23234@n2100.arm.linux.org.uk> <4E08AF23.2060901@codeaurora.org> <20110627165423.GA6222@n2100.arm.linux.org.uk> Message-ID: <4E08E53D.4050209@xenomai.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 06/27/2011 06:54 PM, Russell King - ARM Linux wrote: > On Mon, Jun 27, 2011 at 09:26:11AM -0700, Stephen Boyd wrote: >> On 6/25/2011 6:21 AM, Russell King - ARM Linux wrote: >>> I've just been looking at the IRQS off tracer for the first time. I'm >>> getting output such as: >>> >>> -0 0d.s3 0us!: _raw_spin_lock_irqsave <-_raw_spin_lock_irqsave >>> -0 0dNs4 1709us+: _raw_spin_unlock_irqrestore <-_raw_spin_unlock_irqrestore >>> -0 0dNs4 1770us : time_hardirqs_on <-_raw_spin_unlock_irqrestore >>> -0 0dNs4 1770us : >>> >>> from it, which doesn't seem to be very useful. Figuring out that it >>> may be because the EABI unwinder doesn't work too well with my toolchain, >>> I decided to try going for the more reliable frame pointer stuff. This >>> gives me: >>> >>> kjournal-423 0d.s4 0us : trace_hardirqs_on <-_raw_spin_unlock_irq >>> kjournal-423 0d.s4 0us : time_hardirqs_on <-_raw_spin_unlock_irq >>> kjournal-423 0d.s3 0us!: trace_hardirqs_off <-_raw_spin_lock_irqsave >>> kjournal-423 0d.s4 1709us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore >>> kjournal-423 0d.s4 1770us : time_hardirqs_on <-_raw_spin_unlock_irqrestore >>> kjournal-423 0d.s4 1770us : >>> => time_hardirqs_on >>> => trace_hardirqs_on_caller >>> => trace_hardirqs_on >>> => _raw_spin_unlock_irqrestore >>> => cfq_idle_slice_timer >>> => run_timer_softirq >>> => __do_softirq >>> => irq_exit >>> >>> which is no better. It's telling me that {trace,time}_hardirqs_o{n,ff} is >>> involved is absurd - of course that function is involved, because that's >>> how these events are reported and that detail is just not interesting. >>> And yet again, we still don't get to find out where IRQs were disabled. >> >> Is ftrace enabled (/proc/sys/kernel/ftrace_enabled)? If it is you should >> a least see the functions that were called while irqs were off. >> >> There should also be a >> >> => started at: func_irq_off >> => ended at: func_irq_on >> >> near the top of the latency trace although it may not be entirely useful >> unless spinlocks are inlined. Perhaps we should start inlining spinlocks? > > I tried enabling ftrace too, and while that improved things, it also > made the latency results unbelievable. > > 1. ftrace didn't log entry into every function in paths - some were > skipped despite there being an mcount call. > 2. (and this is the more serious) the overhead of ftrace is _far_ too > large to draw any meaningful conclusions from the results. > > Are IRQs really disabled for 3ms on OMAP - does it really take 120us > to get into an interrupt handler through all the NO_HZ stuff - does it > really take 250us to do a few writes to the DMA controller? Hi Russell, Maybe this post is off-topic, but I worked on reducing latencies on omap3 with xenomai. Most paths are common with Linux, maybe this could be of interest for this list. The reasons for big latencies with xenomai on omap3 were: - the gpio masking/acking function calls nesting, this can be reduced by inlining some functions, especially when not compiling for multiple omaps (but my guess is that if you want the lowest latency, you are not going to compile for all the omaps anyway); - the gpio demuxing, which is one big irqs off section, this can be reduced by re-enabling irqs in the middle of it; - the gptimer 0/1 programming latency, this can be reduced by using other gptimer, and configuring the timer used as clockevent to be "non posted". On omap4, the reasons for high latencies (on 2.6.37, sorry, may be outdated) are: - gic mask/ack handlers which take a spinlock then call writel, which ends up with a call to outer_clean, which takes another spinlock. Solved by usin a writel_relaxed, and remove the spinlock for mask and ack, and also implement the mask_ack callback to avoid a function call by address, which ends up calling two other functions by address, which take two spinlocks, etc... - the l2 cache flushes in general, protected by a spinlock, currently unsolved. - the gptimer again, but here, we have another clocksource, the cortex a9 global timer (but maybe not usable with CONFIG_PM?). - the twd had a lot of early shots because the frequency calibration was too imprecise (503.6 MHz instead of 504MHz on the board I have), this is solved by the patches relying on clock frequency instead. Regards. -- Gilles.