* Re: How how latent should non-preemptive scheduling be? [not found] ` <fa.Jx/Ygtm46CVRawlA6OnfYNn6cN0@ifi.uio.no> @ 2008-09-18 7:26 ` Sitsofe Wheeler [not found] ` <fa.iIHgL48F3T5VAqFw3mqaf9Pzrs4@ifi.uio.no> 1 sibling, 0 replies; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-18 7:26 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Arjan van de Ven, linux-kernel, Ingo Molnar Peter Zijlstra wrote: > On Wed, 2008-09-17 at 14:54 -0700, Arjan van de Ven wrote: >> On Wed, 17 Sep 2008 22:48:55 +0100 >> >>>> Cause Maximum >>>> >> Percentage >> >> Scheduler: waiting for cpu 208 msec 59.4 % >> >> >> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds.. >> that sounds like a scheduler fairness issue! > > Really hard subject. Perfect fairness requires 0 latency - which with a > CPU only being able to run one thing at a time is impossible. So what > latency ends up being is a measure for the convergence towards fairness. > > Anyway - 200ms isn't too weird depending on the circumstances. We start > out with a 20ms latency for UP, we then multiply with 1+log2(nr_cpus) > which in say a quad core machine ends up with 60ms. That ought to mean > that under light load the max latency should not exceed twice that > (basically a consequence of the Nyquist-Shannon sampling theorem IIRC). > > Now, if you get get under some load (by default: nr_running > 5) the > expected latency starts to linearly grow with nr_running. > >>From what I gather from the reply to this email the machine was not > doing much (and after having looked up the original email I see its a > eeeeeeeee atom - which is dual cpu iirc, so that yields 40ms default) - > so 200 is definately on the high side. No, it's not an eeeeee atom. It's an eee celeron M (900Mhz) so it's definitely a single CPU with no hyperthreading (and SMP is not enabled in the kernel config either). It has less grunt that the atom and can't do cpu scaling either (although it seems to have C states). The load average is less than 0.5 but obviously I don't know if it is periodically spiking over 5 and then smoothing out. > What you can do to investigate this, is use the sched_wakeup tracer from > ftrace, that should give a function trace of the highest wakeup latency > showing what the kernel is doing. Thanks for the hint - I was wondering where to look next. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <fa.iIHgL48F3T5VAqFw3mqaf9Pzrs4@ifi.uio.no>]
[parent not found: <fa.Td8xkKZKMSMghlJmEYefTRVF2kc@ifi.uio.no>]
* Re: How how latent should non-preemptive scheduling be? [not found] ` <fa.Td8xkKZKMSMghlJmEYefTRVF2kc@ifi.uio.no> @ 2008-09-19 11:54 ` Sitsofe Wheeler 2008-09-19 14:20 ` Peter Zijlstra 2008-09-22 11:57 ` Ingo Molnar 0 siblings, 2 replies; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-19 11:54 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt Ingo Molnar wrote: > here's two quick howtos: > > http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt > http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt These two files appear to be identical. Is this intentional? Anyway after following your instructions a putting together a small script to dice the output, I collated the 10 switches which took the longest: # Top ten longest switches # Rel TS Process Abs TS 0.122161 hald-3423 1867.821170 *** 0.039438 <idle>-0 1867.379054 0.036318 hald-3423 1867.669009 0.031362 <idle>-0 1868.002762 0.030000 hald-3423 1867.699009 0.028933 <idle>-0 1867.529238 0.028539 <idle>-0 1867.228861 0.028196 <idle>-0 1867.128731 0.027763 <idle>-0 1868.101449 0.027513 <idle>-0 1867.028606 # tracer: sched_switch from around longest switch # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <idle>-0 [00] 1867.608017: 0:140:R + 3:115:S <idle>-0 [00] 1867.608038: 0:140:R + 3423:120:D <idle>-0 [00] 1867.608045: 0:140:R ==> 3:115:R ksoftirqd/0-3 [00] 1867.608048: 3:115:S ==> 3423:120:R hald-3423 [00] 1867.629350: 3423:120:R + 6096:120:S hald-3423 [00] 1867.632691: 3423:120:R + 3827:120:S hald-3423 [00] 1867.669009: 3423:120:R + 3998:120:S hald-3423 [00] 1867.699009: 3423:120:R + 6097:120:S ***hald-3423 [00] 1867.821170: 3423:120:R ==> 6096:120:R rhythmbox-6096 [00] 1867.821219: 6096:120:S ==> 6097:120:R rhythmbox-6097 [00] 1867.821262: 6097:120:R + 3826:120:S rhythmbox-6097 [00] 1867.821289: 6097:120:S ==> 3826:120:R pulseaudio-3826 [00] 1867.821332: 3826:120:R + 6097:120:S pulseaudio-3826 [00] 1867.821374: 3826:120:S ==> 6097:120:R rhythmbox-6097 [00] 1867.821380: 6097:120:S ==> 3998:120:R rhythmbox-3998 [00] 1867.821709: 3998:120:S ==> 3827:120:R pulseaudio-3827 [00] 1867.824041: 3827:120:R + 3826:120:S > LatencyTOP version 0.4 (C) 2008 Intel Corporation > > Cause Maximum Percentage > Scheduler: waiting for cpu 152.4 msec 13.8 % > Userspace lock contention 5.0 msec 68.0 % > Waiting for event (poll) 5.0 msec 14.3 % > Waiting for event (select) 4.9 msec 3.5 % > msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 % > msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 0.2 % > msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 % > Executing raw SCSI command 1.1 msec 0.0 % > Waiting for TTY to finish sending 0.4 msec 0.0 % > > > Process rhythmbox (3998) Total: 328.3 msec > Scheduler: waiting for cpu 152.4 msec 80.0 % > Userspace lock contention 4.1 msec 15.7 % > Waiting for event (poll) 2.7 msec 4.1 % > you need to enable: > > CONFIG_SCHED_TRACER=y > CONFIG_CONTEXT_SWITCH_TRACER=y I actually have both of these enabled but there's still no wakeup tracer (as mentioned in the git kernel documentation http://tinyurl.com/4f9s4l ). The good news is that your instructions don't need the wakeup tracer. > it's not particularly well named though. Why doesnt it say > LATENCY_TRACER or something? I agree it would be nicer if it had a better name. >> Additionally I think I found a trigger - unplugging the power cable >> from the EeePC and having it run on battery seems to then set off this >> periodic stall every 30 seconds... There's no CPU frequency scaling >> enabled either (Celeron M's seemingly don't have P states and support >> for cpufreq is configured out). > > sounds like potential SMM triggered latencies. I have just gone away and read about the SMM ( http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If you're right there is pretty much nothing that can be done about the problem : ( ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-19 11:54 ` Sitsofe Wheeler @ 2008-09-19 14:20 ` Peter Zijlstra 2008-09-22 11:57 ` Ingo Molnar 1 sibling, 0 replies; 17+ messages in thread From: Peter Zijlstra @ 2008-09-19 14:20 UTC (permalink / raw) To: Sitsofe Wheeler Cc: Ingo Molnar, Arjan van de Ven, linux-kernel, Steven Rostedt On Fri, 2008-09-19 at 12:54 +0100, Sitsofe Wheeler wrote: > Ingo Molnar wrote: > > here's two quick howtos: > > > > http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt > > http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt > > These two files appear to be identical. Is this intentional? > > Anyway after following your instructions a putting together a small > script to dice the output, I collated the 10 switches which took the > longest: > > # Top ten longest switches > # Rel TS Process Abs TS > 0.122161 hald-3423 1867.821170 *** > 0.039438 <idle>-0 1867.379054 > 0.036318 hald-3423 1867.669009 > 0.031362 <idle>-0 1868.002762 > 0.030000 hald-3423 1867.699009 > 0.028933 <idle>-0 1867.529238 > 0.028539 <idle>-0 1867.228861 > 0.028196 <idle>-0 1867.128731 > 0.027763 <idle>-0 1868.101449 > 0.027513 <idle>-0 1867.028606 > > # tracer: sched_switch from around longest switch > # > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > <idle>-0 [00] 1867.608017: 0:140:R + 3:115:S > <idle>-0 [00] 1867.608038: 0:140:R + 3423:120:D > <idle>-0 [00] 1867.608045: 0:140:R ==> 3:115:R > ksoftirqd/0-3 [00] 1867.608048: 3:115:S ==> 3423:120:R > hald-3423 [00] 1867.629350: 3423:120:R + 6096:120:S > hald-3423 [00] 1867.632691: 3423:120:R + 3827:120:S > hald-3423 [00] 1867.669009: 3423:120:R + 3998:120:S > hald-3423 [00] 1867.699009: 3423:120:R + 6097:120:S > ***hald-3423 [00] 1867.821170: 3423:120:R ==> 6096:120:R > rhythmbox-6096 [00] 1867.821219: 6096:120:S ==> 6097:120:R > rhythmbox-6097 [00] 1867.821262: 6097:120:R + 3826:120:S > rhythmbox-6097 [00] 1867.821289: 6097:120:S ==> 3826:120:R > pulseaudio-3826 [00] 1867.821332: 3826:120:R + 6097:120:S > pulseaudio-3826 [00] 1867.821374: 3826:120:S ==> 6097:120:R > rhythmbox-6097 [00] 1867.821380: 6097:120:S ==> 3998:120:R > rhythmbox-3998 [00] 1867.821709: 3998:120:S ==> 3827:120:R > pulseaudio-3827 [00] 1867.824041: 3827:120:R + 3826:120:S > Its actually function tracer output I'm interested in.. that shows what all its doing to make it take 120+ms. I thought we had a wakeup latency tracer exacty like we have preempt and irq off latency tracers, Steve, where'd that go? > > sounds like potential SMM triggered latencies. > > I have just gone away and read about the SMM ( > http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If > you're right there is pretty much nothing that can be done about the > problem : ( Yeah, SMM/SMI is nasty stuff :-( ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-19 11:54 ` Sitsofe Wheeler 2008-09-19 14:20 ` Peter Zijlstra @ 2008-09-22 11:57 ` Ingo Molnar 2008-09-22 12:07 ` Steven Rostedt 2008-09-23 6:33 ` Sitsofe Wheeler 1 sibling, 2 replies; 17+ messages in thread From: Ingo Molnar @ 2008-09-22 11:57 UTC (permalink / raw) To: Sitsofe Wheeler Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > Ingo Molnar wrote: >> here's two quick howtos: >> >> http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt >> http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt > > These two files appear to be identical. Is this intentional? > > Anyway after following your instructions a putting together a small > script to dice the output, I collated the 10 switches which took the > longest: > > # Top ten longest switches > # Rel TS Process Abs TS > 0.122161 hald-3423 1867.821170 *** > 0.039438 <idle>-0 1867.379054 > 0.036318 hald-3423 1867.669009 > 0.031362 <idle>-0 1868.002762 > 0.030000 hald-3423 1867.699009 > 0.028933 <idle>-0 1867.529238 > 0.028539 <idle>-0 1867.228861 > 0.028196 <idle>-0 1867.128731 > 0.027763 <idle>-0 1868.101449 > 0.027513 <idle>-0 1867.028606 > > # tracer: sched_switch from around longest switch > # > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > <idle>-0 [00] 1867.608017: 0:140:R + 3:115:S > <idle>-0 [00] 1867.608038: 0:140:R + 3423:120:D > <idle>-0 [00] 1867.608045: 0:140:R ==> 3:115:R > ksoftirqd/0-3 [00] 1867.608048: 3:115:S ==> 3423:120:R > hald-3423 [00] 1867.629350: 3423:120:R + 6096:120:S > hald-3423 [00] 1867.632691: 3423:120:R + 3827:120:S > hald-3423 [00] 1867.669009: 3423:120:R + 3998:120:S > hald-3423 [00] 1867.699009: 3423:120:R + 6097:120:S > ***hald-3423 [00] 1867.821170: 3423:120:R ==> 6096:120:R > rhythmbox-6096 [00] 1867.821219: 6096:120:S ==> 6097:120:R > rhythmbox-6097 [00] 1867.821262: 6097:120:R + 3826:120:S > rhythmbox-6097 [00] 1867.821289: 6097:120:S ==> 3826:120:R > pulseaudio-3826 [00] 1867.821332: 3826:120:R + 6097:120:S > pulseaudio-3826 [00] 1867.821374: 3826:120:S ==> 6097:120:R > rhythmbox-6097 [00] 1867.821380: 6097:120:S ==> 3998:120:R > rhythmbox-3998 [00] 1867.821709: 3998:120:S ==> 3827:120:R > pulseaudio-3827 [00] 1867.824041: 3827:120:R + 3826:120:S > > >> LatencyTOP version 0.4 (C) 2008 Intel Corporation >> >> Cause Maximum Percentage >> Scheduler: waiting for cpu 152.4 msec 13.8 % >> Userspace lock contention 5.0 msec 68.0 % >> Waiting for event (poll) 5.0 msec 14.3 % >> Waiting for event (select) 4.9 msec 3.5 % >> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 % >> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 0.2 % >> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 % >> Executing raw SCSI command 1.1 msec 0.0 % >> Waiting for TTY to finish sending 0.4 msec 0.0 % >> >> >> Process rhythmbox (3998) Total: 328.3 msec >> Scheduler: waiting for cpu 152.4 msec 80.0 % >> Userspace lock contention 4.1 msec 15.7 % >> Waiting for event (poll) 2.7 msec 4.1 % > > >> you need to enable: >> >> CONFIG_SCHED_TRACER=y >> CONFIG_CONTEXT_SWITCH_TRACER=y > > I actually have both of these enabled but there's still no wakeup tracer > (as mentioned in the git kernel documentation http://tinyurl.com/4f9s4l > ). The good news is that your instructions don't need the wakeup tracer. > >> it's not particularly well named though. Why doesnt it say >> LATENCY_TRACER or something? > > I agree it would be nicer if it had a better name. > >>> Additionally I think I found a trigger - unplugging the power cable >>> from the EeePC and having it run on battery seems to then set off >>> this periodic stall every 30 seconds... There's no CPU frequency >>> scaling enabled either (Celeron M's seemingly don't have P states and >>> support for cpufreq is configured out). >> >> sounds like potential SMM triggered latencies. > > I have just gone away and read about the SMM ( > http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If > you're right there is pretty much nothing that can be done about the > problem : ( well, since they went away after you enabled CONFIG_PREEMPT=y, they are definitely in-kernel latencies, not any external SMM latencies. I.e. they are inherently fixable. Could you enable: CONFIG_DYNAMIC_FTRACE=y CONFIG_FTRACE_MCOUNT_RECORD=y that should make the traces a lot more verbose - every kernel function executed in the latency path will be logged. That way we'll be able to say which one takes that long. note, you might have to increase /debug/tracing/trace_entries to get a long enough trace to capture the relevant portion of the latency. Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-22 11:57 ` Ingo Molnar @ 2008-09-22 12:07 ` Steven Rostedt 2008-09-23 6:33 ` Sitsofe Wheeler 1 sibling, 0 replies; 17+ messages in thread From: Steven Rostedt @ 2008-09-22 12:07 UTC (permalink / raw) To: Ingo Molnar Cc: Sitsofe Wheeler, Peter Zijlstra, Arjan van de Ven, linux-kernel On Mon, 22 Sep 2008, Ingo Molnar wrote: > * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > >> > >> sounds like potential SMM triggered latencies. > > > > I have just gone away and read about the SMM ( > > http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If > > you're right there is pretty much nothing that can be done about the > > problem : ( > > well, since they went away after you enabled CONFIG_PREEMPT=y, they are > definitely in-kernel latencies, not any external SMM latencies. > > I.e. they are inherently fixable. Could you enable: > > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > > that should make the traces a lot more verbose - every kernel function > executed in the latency path will be logged. That way we'll be able to > say which one takes that long. > > note, you might have to increase /debug/tracing/trace_entries to get a > long enough trace to capture the relevant portion of the latency. Also note, to modify trace_entries, you must be in the none (nop?) tracer, otherwise the size will not be effected. If you find the trace is also too big, you can echo a list of functions into: /debug/tracing/set_ftrace_notrace to not trace those functions. using '>' will remove any existing function in that file, but using '>>' will append functions to the file. For a list of functions that you can add, see: /debug/tracing/available_filter_functions -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-22 11:57 ` Ingo Molnar 2008-09-22 12:07 ` Steven Rostedt @ 2008-09-23 6:33 ` Sitsofe Wheeler 2008-09-23 11:53 ` Ingo Molnar 1 sibling, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-23 6:33 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt Ingo Molnar wrote: > well, since they went away after you enabled CONFIG_PREEMPT=y, they are > definitely in-kernel latencies, not any external SMM latencies. > > I.e. they are inherently fixable. Could you enable: > > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > > that should make the traces a lot more verbose - every kernel function > executed in the latency path will be logged. That way we'll be able to > say which one takes that long. I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in 2.6.27rc7. Is it an option that is only in -tip ? -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-23 6:33 ` Sitsofe Wheeler @ 2008-09-23 11:53 ` Ingo Molnar 2008-09-23 16:30 ` Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Ingo Molnar @ 2008-09-23 11:53 UTC (permalink / raw) To: Sitsofe Wheeler Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > Ingo Molnar wrote: >> well, since they went away after you enabled CONFIG_PREEMPT=y, they are >> definitely in-kernel latencies, not any external SMM latencies. >> >> I.e. they are inherently fixable. Could you enable: >> >> CONFIG_DYNAMIC_FTRACE=y >> CONFIG_FTRACE_MCOUNT_RECORD=y >> >> that should make the traces a lot more verbose - every kernel function >> executed in the latency path will be logged. That way we'll be able to >> say which one takes that long. > > I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in > 2.6.27rc7. Is it an option that is only in -tip ? yeah - it's a new ftrace feature queued up for v2.6.28. Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-23 11:53 ` Ingo Molnar @ 2008-09-23 16:30 ` Sitsofe Wheeler 2008-09-23 19:39 ` Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-23 16:30 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt Ingo Molnar wrote: > * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > >> Ingo Molnar wrote: >>> well, since they went away after you enabled CONFIG_PREEMPT=y, they are >>> definitely in-kernel latencies, not any external SMM latencies. >>> >>> I.e. they are inherently fixable. Could you enable: >>> >>> CONFIG_DYNAMIC_FTRACE=y >>> CONFIG_FTRACE_MCOUNT_RECORD=y >>> >>> that should make the traces a lot more verbose - every kernel function >>> executed in the latency path will be logged. That way we'll be able to >>> say which one takes that long. >> I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in >> 2.6.27rc7. Is it an option that is only in -tip ? > > yeah - it's a new ftrace feature queued up for v2.6.28. I've been struggling to boot -tip/master - currently it blows up just after printing SLUB information saying: BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c0120078>] account_system_time+0x48/0x120 *pde = 00000000 Thread overran stack, or stack corrupted Oops: 0002 [#1] PREEMPT -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-23 16:30 ` Sitsofe Wheeler @ 2008-09-23 19:39 ` Sitsofe Wheeler 2008-09-23 22:01 ` Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-23 19:39 UTC (permalink / raw) To: linux-kernel Cc: Ingo Molnar, Peter Zijlstra, Arjan van de Ven, linux-kernel, Steven Rostedt Sitsofe Wheeler wrote: > I've been struggling to boot -tip/master - currently it blows up just > after printing SLUB information saying: > > BUG: unable to handle kernel NULL pointer dereference at 00000004 > IP: [<c0120078>] account_system_time+0x48/0x120 > *pde = 00000000 > Thread overran stack, or stack corrupted > Oops: 0002 [#1] PREEMPT OK this BUG seems to have gone away in the past few hours. Ingo when you were asking for the ftrace report I presume that would be for a non preempt kernel (as a preemptive one only showed a very worst latency of 19657 us in one exceptional case)? -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-23 19:39 ` Sitsofe Wheeler @ 2008-09-23 22:01 ` Sitsofe Wheeler 2008-09-27 20:48 ` Ingo Molnar 0 siblings, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-23 22:01 UTC (permalink / raw) Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Arjan van de Ven, Steven Rostedt Sitsofe Wheeler wrote: > Ingo when you were asking for the ftrace report I presume that would be > for a non preempt kernel (as a preemptive one only showed a very worst > latency of 19657 us in one exceptional case)? With a non preempt kernel I found the logs would simply become too big without filtering. On Peter's suggestion I used the following to remove the most frequently called functions: echo acpi_os_release_object > set_ftrace_notrace && echo kmem_cache_* >> set_ftrace_notrace && echo acpi_ut_* >> set_ftrace_notrace By doing counts across multiple runs I would say that the most frequently called functions are the following (in most frequently called order). The counts are definitely approximate but are reasonable relative to each other. 475325 acpi_os_release_object (acpi_ut_delete_generic_state) 406895 kmem_cache_free (acpi_os_release_object) 402838 kmem_cache_alloc (acpi_ut_create_generic_state) 132968 acpi_ut_update_ref_count (acpi_ut_update_object_reference) 131041 acpi_ut_pop_generic_state (acpi_ut_update_object_reference) 131036 acpi_ut_delete_generic_state (acpi_ut_update_object_reference) 131025 acpi_ut_create_generic_state (acpi_ut_create_update_state) 131023 acpi_ut_create_update_state_and_push (acpi_ut_update_object_reference) 131020 acpi_ut_create_update_state (acpi_ut_create_update_state_and_push) 131018 acpi_ut_push_generic_state (acpi_ut_create_update_state_and_push) 60147 acpi_ns_get_next_node (acpi_ns_delete_namespace_by_owner) 28974 acpi_ns_get_next_valid_node (acpi_ns_get_next_node) Logs with the filtering on can be seen here (15Mbytes decompressed each): http://sucs.org/~sits/test/eeepc-debug/20080923/latency_trace.gz http://sucs.org/~sits/test/eeepc-debug/20080923/trace.txt.gz It looks like lots of acpi state is created and deleted... -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-23 22:01 ` Sitsofe Wheeler @ 2008-09-27 20:48 ` Ingo Molnar 2008-09-28 20:56 ` Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Ingo Molnar @ 2008-09-27 20:48 UTC (permalink / raw) To: Sitsofe Wheeler Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > Logs with the filtering on can be seen here (15Mbytes decompressed each): > http://sucs.org/~sits/test/eeepc-debug/20080923/latency_trace.gz > http://sucs.org/~sits/test/eeepc-debug/20080923/trace.txt.gz > > It looks like lots of acpi state is created and deleted... yeah. The latency starts here: cat-5901 0dNh. 1us : default_wake_function (__wake_up_common) cat-5901 0.Nh. 2us : kill_fasync (snd_pcm_period_elapsed) [...] and ends here: [...] cat-5901 0.N.. 270501us+: mutex_lock (acpi_ec_transaction) cat-5901 0d... 270507us : __cond_resched (_cond_resched) 270 _milliseconds_ later. That's excessive. The main overhead is starting here: cat-5901 0.N.. 167us : acpi_ds_result_push (acpi_ds_exec_end_op) lots of ACPI code executed ... does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That _should_ break up this section neatly. If it doesnt then please add a might_sleep() check to kernel/kernel/semaphore.c's down_timeout() function - that is called a number of times in this trace. Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-27 20:48 ` Ingo Molnar @ 2008-09-28 20:56 ` Sitsofe Wheeler 2008-09-29 8:37 ` Ingo Molnar 0 siblings, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-28 20:56 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt Ingo Molnar wrote: > does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That CONFIG_PREEMPT_VOLUNTARY=y has always been enabled in these results. > _should_ break up this section neatly. If it doesnt then please add a > might_sleep() check to kernel/kernel/semaphore.c's down_timeout() > function - that is called a number of times in this trace. I added might_sleep() to the start of down_timeout() but it neither printed anything to dmesg nor changed the latency issue... Strangely stalling only seems to turn up in linux-tip kernels with very little debugging options set within them. I have a linux-tip that has lots of extra debugging options set and this problem doesn't show up... I've also asked about this on the ACPI mailing list and had a huge amount of help from finding a good point for a cond_resched - http://marc.info/?l=linux-acpi&m=122236450105747&w=2 . Certain things (like games) still stutter when the battery is read but it's enough to play back sound in rhythmbox without stuttering. Any ideas why the issue would go away with a debug kernel though? -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-28 20:56 ` Sitsofe Wheeler @ 2008-09-29 8:37 ` Ingo Molnar 2008-09-29 23:11 ` Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Ingo Molnar @ 2008-09-29 8:37 UTC (permalink / raw) To: Sitsofe Wheeler Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > Ingo Molnar wrote: >> does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That >> > > CONFIG_PREEMPT_VOLUNTARY=y has always been enabled in these results. okay - then this definitely needs fixed. >> _should_ break up this section neatly. If it doesnt then please add a >> might_sleep() check to kernel/kernel/semaphore.c's down_timeout() >> function - that is called a number of times in this trace. > > I added might_sleep() to the start of down_timeout() but it neither > printed anything to dmesg nor changed the latency issue... > > Strangely stalling only seems to turn up in linux-tip kernels with > very little debugging options set within them. I have a linux-tip that > has lots of extra debugging options set and this problem doesn't show > up... > > I've also asked about this on the ACPI mailing list and had a huge > amount of help from finding a good point for a cond_resched - > http://marc.info/?l=linux-acpi&m=122236450105747&w=2 . Certain things > (like games) still stutter when the battery is read but it's enough to > play back sound in rhythmbox without stuttering. > > Any ideas why the issue would go away with a debug kernel though? hm, that's weird indeed. You could try to trace the full battery readout itself, via a script like this: cat /debug/tracing/trace > /dev/null # drain trace cat /proc/acpi/whatever cat /debug/tracing/trace > trace.txt this way you should be seeing a full trace of the whole thing. (with the 'ftrace' tracer plugin set in current_tracer) then, assuming the trace.txt is complete, you can check where it reschedules and why. Search for 'schedule' calls. Compare the debug versus non-debug traces. You can even turn on stack backtrace tracing feature, via: echo stacktrace > /debug/tracing/iter_ctrl this adds extra trace entries that show the call path of every reschedule. [this attribute is default-disabled.] Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-29 8:37 ` Ingo Molnar @ 2008-09-29 23:11 ` Sitsofe Wheeler 2008-09-30 11:22 ` Ingo Molnar 0 siblings, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-29 23:11 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt Ingo Molnar wrote: > * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > >> Any ideas why the issue would go away with a debug kernel though? > > hm, that's weird indeed. You could try to trace the full battery readout > itself, via a script like this: > > cat /debug/tracing/trace > /dev/null # drain trace > cat /proc/acpi/whatever > cat /debug/tracing/trace > trace.txt > > You can even turn on stack backtrace tracing feature, via: > > echo stacktrace > /debug/tracing/iter_ctrl > > this adds extra trace entries that show the call path of every > reschedule. [this attribute is default-disabled.] (Something bad seems to be happening with my kernels as I have been finding strange problems like network-manager detecting the wifi as a wired interface unless I did a make clean on my kernel sources before compiling). The stalling issue seems to keep coming and going and is currently showing up in both debug and non debug kernels. There are traces when looking at battery information on both AC (where the problem is never there) and battery (where the problem always is there) here: http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-ac.txt.bz2 http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-battery.txt.bz2 (15Mbytes uncompressed) I wasn't quite sure how to tell when a reschedule was done. I simply did a grep schedule on the file and most of it seemed reasonable. One part that caught my eye was the following: speaker-test-3891 [000] 1392.751699: __switch_to <-schedule <idle>-0 [000] 1392.752070: account_scheduler_latency <-enqueue_task_fair <idle>-0 [000] 1392.752091: __switch_to <-schedule cat-7717 [000] 1392.752092: del_timer <-schedule_timeout cat-7717 [000] 1392.772263: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.773224: __tasklet_schedule <-kbd_event cat-7717 [000] 1392.773225: schedule_console_callback <-kbd_event cat-7717 [000] 1392.773225: schedule_work <-schedule_console_callback cat-7717 [000] 1392.773226: queue_work <-schedule_work cat-7717 [000] 1392.773232: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.773240: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.773251: schedule_delayed_work <-put_queue cat-7717 [000] 1392.773251: queue_delayed_work <-schedule_delayed_work cat-7717 [000] 1392.773258: schedule_console_callback <-kbd_event cat-7717 [000] 1392.773258: schedule_work <-schedule_console_callback cat-7717 [000] 1392.773259: queue_work <-schedule_work cat-7717 [000] 1392.773267: schedule_console_callback <-kbd_event cat-7717 [000] 1392.773268: schedule_work <-schedule_console_callback cat-7717 [000] 1392.773268: queue_work <-schedule_work cat-7717 [000] 1392.807931: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.839464: __tasklet_schedule <-ath5k_intr cat-7717 [000] 1392.839506: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.941867: __tasklet_schedule <-ath5k_intr cat-7717 [000] 1393.005963: __tasklet_schedule <-ath5k_intr cat-7717 [000] 1393.033222: __switch_to <-schedule Here a schedule seemingly doesn't happen for a few hundredths of a second... -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-29 23:11 ` Sitsofe Wheeler @ 2008-09-30 11:22 ` Ingo Molnar 2008-09-30 13:18 ` Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Ingo Molnar @ 2008-09-30 11:22 UTC (permalink / raw) To: Sitsofe Wheeler Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > cat-7717 [000] 1392.941867: __tasklet_schedule <-ath5k_intr > cat-7717 [000] 1393.005963: __tasklet_schedule <-ath5k_intr > cat-7717 [000] 1393.033222: __switch_to <-schedule > > Here a schedule seemingly doesn't happen for a few hundredths of a second... that's about 60 msecs. ath5k_intr IRQ handling overhead? IRQ handlers are non-preemptible. (even under CONFIG_PREEMPT=y) Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: How how latent should non-preemptive scheduling be? 2008-09-30 11:22 ` Ingo Molnar @ 2008-09-30 13:18 ` Sitsofe Wheeler 2008-10-04 10:50 ` Reading EeePC900 battery info causes stalls when using SLUB (was Re: How how latent should non-preemptive scheduling be?) Sitsofe Wheeler 0 siblings, 1 reply; 17+ messages in thread From: Sitsofe Wheeler @ 2008-09-30 13:18 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt Ingo Molnar wrote: > * Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > >> cat-7717 [000] 1392.941867: __tasklet_schedule <-ath5k_intr >> cat-7717 [000] 1393.005963: __tasklet_schedule <-ath5k_intr >> cat-7717 [000] 1393.033222: __switch_to <-schedule >> >> Here a schedule seemingly doesn't happen for a few hundredths of a second... > > that's about 60 msecs. ath5k_intr IRQ handling overhead? IRQ handlers > are non-preemptible. (even under CONFIG_PREEMPT=y) What about the piece leading up to that: cat-7717 [000] 1392.752092: del_timer <-schedule_timeout cat-7717 [000] 1392.772263: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.773224: __tasklet_schedule <-kbd_event cat-7717 [000] 1392.773225: schedule_console_callback <-kbd_event cat-7717 [000] 1392.773225: schedule_work <-schedule_console_callback cat-7717 [000] 1392.773226: queue_work <-schedule_work cat-7717 [000] 1392.773232: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.773240: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.773251: schedule_delayed_work <-put_queue cat-7717 [000] 1392.773251: queue_delayed_work <-schedule_delayed_work cat-7717 [000] 1392.773258: schedule_console_callback <-kbd_event cat-7717 [000] 1392.773258: schedule_work <-schedule_console_callback cat-7717 [000] 1392.773259: queue_work <-schedule_work cat-7717 [000] 1392.773267: schedule_console_callback <-kbd_event cat-7717 [000] 1392.773268: schedule_work <-schedule_console_callback cat-7717 [000] 1392.773268: queue_work <-schedule_work cat-7717 [000] 1392.807931: account_scheduler_latency <-enqueue_task_fair cat-7717 [000] 1392.839464: __tasklet_schedule <-ath5k_intr Here's a snippet from http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-battery.txt.bz2 : cat-7717 [000] 1392.752092: del_timer <-schedule_timeout cat-7717 [000] 1392.752092: acpi_ec_check_status <-acpi_ec_wait cat-7717 [000] 1392.752093: constant_test_bit <-acpi_ec_check_status cat-7717 [000] 1392.752095: mutex_unlock <-acpi_ec_transaction cat-7717 [000] 1392.752096: acpi_ex_enter_interpreter <-acpi_ev_address_space_dispatch cat-7717 [000] 1392.752097: acpi_os_wait_semaphore <-acpi_ut_acquire_mutex cat-7717 [000] 1392.752097: down_timeout <-acpi_os_wait_semaphore cat-7717 [000] 1392.752098: acpi_ex_release_global_lock <-acpi_ex_read_data_from_field cat-7717 [000] 1392.752099: acpi_ex_release_mutex_object <-acpi_ex_release_global_lock cat-7717 [000] 1392.752100: acpi_ev_release_global_lock <-acpi_ex_release_mutex_object cat-7717 [000] 1392.752100: __acpi_release_global_lock <-acpi_ev_release_global_lock cat-7717 [000] 1392.752101: acpi_os_signal_semaphore <-acpi_ev_release_global_lock cat-7717 [000] 1392.752102: up <-acpi_os_signal_semaphore ... cat-7717 [000] 1392.839447: acpi_ex_truncate_for32bit_table <-acpi_ds_exec_end_op cat-7717 [000] 1392.839447: acpi_ps_next_parse_state <-acpi_ps_parse_loop cat-7717 [000] 1392.839448: acpi_ps_complete_op <-acpi_ps_parse_loop cat-7717 [000] 1392.839448: acpi_ps_complete_this_op <-acpi_ps_complete_op cat-7717 [000] 1392.839449: acpi_ps_delete_parse_tree <-acpi_ps_complete_this_op cat-7717 [000] 1392.839449: acpi_ps_get_arg <-acpi_ps_delete_parse_tree cat-7717 [000] 1392.839450: acpi_ps_get_opcode_info <-acpi_ps_get_arg cat-7717 [000] 1392.839450: acpi_ps_get_arg <-acpi_ps_delete_parse_tree cat-7717 [000] 1392.839451: acpi_ps_get_opcode_info <-acpi_ps_get_arg cat-7717 [000] 1392.839452: do_IRQ <-common_interrupt cat-7717 [000] 1392.839453: irq_enter <-do_IRQ cat-7717 [000] 1392.839453: irq_to_desc <-do_IRQ cat-7717 [000] 1392.839454: handle_fasteoi_irq <-do_IRQ cat-7717 [000] 1392.839455: handle_IRQ_event <-handle_fasteoi_irq cat-7717 [000] 1392.839455: usb_hcd_irq <-handle_IRQ_event cat-7717 [000] 1392.839456: uhci_irq <-usb_hcd_irq cat-7717 [000] 1392.839458: ath5k_intr <-handle_IRQ_event cat-7717 [000] 1392.839459: ath5k_hw_is_intr_pending <-ath5k_intr cat-7717 [000] 1392.839461: ath5k_hw_get_isr <-ath5k_intr cat-7717 [000] 1392.839464: __tasklet_schedule <-ath5k_intr Why didn't another process cut in from 1392.752092 to 1392.839464? In the schedule grep normally runs for cat are half as long e.g.: <idle>-0 [000] 1392.747102: __switch_to <-schedule cat-7717 [000] 1392.747103: del_timer <-schedule_timeout cat-7717 [000] 1392.747107: schedule_timeout_uninterruptible <-msleep cat-7717 [000] 1392.747107: schedule_timeout <-schedule_timeout_uninterruptible cat-7717 [000] 1392.747108: init_timer <-schedule_timeout cat-7717 [000] 1392.747108: __mod_timer <-schedule_timeout cat-7717 [000] 1392.747111: __switch_to <-schedule <idle>-0 [000] 1392.749067: account_scheduler_latency <-enqueue_task_fair Further why (after taking a break to process the ath5k interrupt) did it not schedule another process other than cat until 1393.033226: cat-7717 [000] 1393.033215: set_normalized_timespec <-ktime_get_ts cat-7717 [000] 1393.033216: ktime_add_safe <-hrtimer_start_range_ns cat-7717 [000] 1393.033217: ktime_add_safe <-hrtimer_start_range_ns cat-7717 [000] 1393.033217: enqueue_hrtimer <-hrtimer_start_range_ns cat-7717 [000] 1393.033222: __switch_to <-schedule hald-addon-inpu-3237 [000] 1393.033226: fget_light <-do_sys_poll -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Reading EeePC900 battery info causes stalls when using SLUB (was Re: How how latent should non-preemptive scheduling be?) 2008-09-30 13:18 ` Sitsofe Wheeler @ 2008-10-04 10:50 ` Sitsofe Wheeler 0 siblings, 0 replies; 17+ messages in thread From: Sitsofe Wheeler @ 2008-10-04 10:50 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Peter Zijlstra, Arjan van de Ven, Steven Rostedt, Christoph Lameter Sitsofe Wheeler wrote: > I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be > skipping while playing sound through various desktop apps with a > 2.6.27rc6 kernel. It is running off an SD card which really shows up After weeks of head scratching as to why these latencies didn't occur using the xandros 2.6.21.4 kernel (but keeping the same userspace) when my own kernels would always show this problem I finally found the answer after reading http://tservice.net.ru/~s0mbre/blog/devel/other/2008_10_01 on kernel planet - SLUB can cause regressions compared to SLAB. Switching from SLUB to SLAB made the problem more or less disappear (which I guess makes sense given the large number of kmem_* calls that are made when reading the battery). My understanding with the memory allocators is that SLOB is the smallest and simplest. Its forté is that it uses very little memory which makes it ideal for embedded systems and is the easiest allocator to understand. The downside is that it might tend towards memory fragmentation the longer a system runs and is apparently a little bit slower than SLAB. The SLAB allocator is something that the linux kernel has had for many years and was a good performer until the pressures to perform with large SMP systems started to come in to play (at which point lots of memory would be used up on fixed structures - http://lwn.net/Articles/229984/ ). SLUB is the newest allocator, scales up well and has finer grained diagnostics that SLAB. Prior to today my understanding was that SLUB would be able to replace SLAB in all scenarios and perform just as well or better. However now I'm not so sure (SLAB appears to be less latent than SLUB). Other than SLUB's newness are there cases where SLAB should be chosen instead of SLUB (e.g. uniprocessor desktop systems with hundreds of megabytes of RAM)? Will SLAB exist as an alternative to SLUB for certain setups? -- Sitsofe | http://sucs.org/~sits ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-10-04 10:50 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.ERZTl/6uH+mhNoef5fPJKTRjJag@ifi.uio.no>
[not found] ` <fa.PtPFzP5kIJVCCov6YCewrh+o4z4@ifi.uio.no>
[not found] ` <fa.C6WSm5Rh2Nb+Qho7b0qDOZ9RPV8@ifi.uio.no>
[not found] ` <fa.ch6j4qXs/2sFpLkHz5fXrtjTR8c@ifi.uio.no>
[not found] ` <fa.Jx/Ygtm46CVRawlA6OnfYNn6cN0@ifi.uio.no>
2008-09-18 7:26 ` How how latent should non-preemptive scheduling be? Sitsofe Wheeler
[not found] ` <fa.iIHgL48F3T5VAqFw3mqaf9Pzrs4@ifi.uio.no>
[not found] ` <fa.Td8xkKZKMSMghlJmEYefTRVF2kc@ifi.uio.no>
2008-09-19 11:54 ` Sitsofe Wheeler
2008-09-19 14:20 ` Peter Zijlstra
2008-09-22 11:57 ` Ingo Molnar
2008-09-22 12:07 ` Steven Rostedt
2008-09-23 6:33 ` Sitsofe Wheeler
2008-09-23 11:53 ` Ingo Molnar
2008-09-23 16:30 ` Sitsofe Wheeler
2008-09-23 19:39 ` Sitsofe Wheeler
2008-09-23 22:01 ` Sitsofe Wheeler
2008-09-27 20:48 ` Ingo Molnar
2008-09-28 20:56 ` Sitsofe Wheeler
2008-09-29 8:37 ` Ingo Molnar
2008-09-29 23:11 ` Sitsofe Wheeler
2008-09-30 11:22 ` Ingo Molnar
2008-09-30 13:18 ` Sitsofe Wheeler
2008-10-04 10:50 ` Reading EeePC900 battery info causes stalls when using SLUB (was Re: How how latent should non-preemptive scheduling be?) Sitsofe Wheeler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox