public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Scheduler regression: Too frequent timer interrupts(?)
@ 2009-04-16 19:53 Christoph Lameter
  2009-04-17  7:00 ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-16 19:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Since 2.6.22 the Linux scheduler interrupts programs with increasing
frequency. The interrupts cause run time variances that are affecting HPC
jobs. Various low latency jobs show increasing runtimes because of these
additional interrupts by the scheduler.

In the following test a simple program was run that continually retrieves
TSC and measures the times between the TSC retrievals. A run time variance
is noted whenever the time between two TSC retrievals is larger than 1
usec (on a 3.3Ghz Xeon box quad cores dual processor). The numbers given
are the interrupts occuring in a 10 second measurement period. The tests
can be downloaded from http://gentwo.org/ll .


Kernel		Test 1	Test 2	Test 3	Variances(SUM)
2.6.22		383	540	667	1590
2.6.23		2738	2019	2303	7060
2.6.24		2503	573	583	3659
2.6.25		302	359	241	902
2.6.26		2503	2501	2503	7507
2.6.27		2502	2503	2478	7483
2.6.28		2502	2504	2502	7508
2.6.29		2502	2490	2503	7495
2.6.30-rc2	2504	2503	2502	7509

The kernel was compiled with high res timer support and a HZ of 250.

The 2.6.22 kernel has only about 38.3 disruptions per second. That likely
means that HRTIMER is able to eliminate the timer interrupt.

In .23 this is no longer possible and we get the full number of 250 timer
interrupts per second. Fixed again in .25 but .26 looses it again.

Can we restore the .22 situation where a running task without any
competitor on the same cpu does not need the timer interrupt?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-16 19:53 Scheduler regression: Too frequent timer interrupts(?) Christoph Lameter
@ 2009-04-17  7:00 ` Peter Zijlstra
  2009-04-17 13:42   ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17  7:00 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Thu, 2009-04-16 at 15:53 -0400, Christoph Lameter wrote:
> Since 2.6.22 the Linux scheduler interrupts programs with increasing
> frequency. The interrupts cause run time variances that are affecting HPC
> jobs. Various low latency jobs show increasing runtimes because of these
> additional interrupts by the scheduler.
> 
> In the following test a simple program was run that continually retrieves
> TSC and measures the times between the TSC retrievals. A run time variance
> is noted whenever the time between two TSC retrievals is larger than 1
> usec (on a 3.3Ghz Xeon box quad cores dual processor). The numbers given
> are the interrupts occuring in a 10 second measurement period. The tests
> can be downloaded from http://gentwo.org/ll .
> 
> 
> Kernel		Test 1	Test 2	Test 3	Variances(SUM)
> 2.6.22		383	540	667	1590
> 2.6.23		2738	2019	2303	7060
> 2.6.24		2503	573	583	3659
> 2.6.25		302	359	241	902
> 2.6.26		2503	2501	2503	7507
> 2.6.27		2502	2503	2478	7483
> 2.6.28		2502	2504	2502	7508
> 2.6.29		2502	2490	2503	7495
> 2.6.30-rc2	2504	2503	2502	7509
> 
> The kernel was compiled with high res timer support and a HZ of 250.
> 
> The 2.6.22 kernel has only about 38.3 disruptions per second. That likely
> means that HRTIMER is able to eliminate the timer interrupt.

This has never been true afaikt, as long as we have a task running, we
take the interrupt, I just looked at the .22 code and that certainly
expects the scheduler_tick() to be called when there is a running
process.

Also, look at /proc/interrupts if you want to determine interrupt
frequency.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17  7:00 ` Peter Zijlstra
@ 2009-04-17 13:42   ` Christoph Lameter
  2009-04-17 14:16     ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 13:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Fri, 17 Apr 2009, Peter Zijlstra wrote:

> This has never been true afaikt, as long as we have a task running, we
> take the interrupt, I just looked at the .22 code and that certainly
> expects the scheduler_tick() to be called when there is a running
> process.
>
> Also, look at /proc/interrupts if you want to determine interrupt
> frequency.

I am mainly interested in limited the number and length of cpu holdoffs.

If the timer interrupt is always taken then it must take less than 1 usec
in certain versions of the kernel in order to not show up in the
statistics gathered.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 13:42   ` Christoph Lameter
@ 2009-04-17 14:16     ` Peter Zijlstra
  2009-04-17 14:29       ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 14:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 09:42 -0400, Christoph Lameter wrote:
> On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> 
> > This has never been true afaikt, as long as we have a task running, we
> > take the interrupt, I just looked at the .22 code and that certainly
> > expects the scheduler_tick() to be called when there is a running
> > process.
> >
> > Also, look at /proc/interrupts if you want to determine interrupt
> > frequency.
> 
> I am mainly interested in limited the number and length of cpu holdoffs.
> 
> If the timer interrupt is always taken then it must take less than 1 usec
> in certain versions of the kernel in order to not show up in the
> statistics gathered.

Such a test as you had is pretty useless for anything. 

If you want to measure something I'd suggest making a histogram of tsc
values in 10ns buckets or something, and seeing if there are a few
predominant spikes above the noise.

With something like that you could say, the jiffy tick went from 0.8+-.1
to 1.1+-.1 us or somesuch.

After that, you could possibly use oprofile or readprofile or
perf-counters to get an idea where the time is spend. I did a quick
profile on one of my machines, and about half the kernel time spend in a
while(1) loop comes from __do_softirq().

Really, I should not have to tell you this...


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 14:16     ` Peter Zijlstra
@ 2009-04-17 14:29       ` Christoph Lameter
  2009-04-17 14:51         ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 14:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Fri, 17 Apr 2009, Peter Zijlstra wrote:

> If you want to measure something I'd suggest making a histogram of tsc
> values in 10ns buckets or something, and seeing if there are a few
> predominant spikes above the noise.

See
http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Latencharts.ods
for more details. The tool we are using can produce histograms but they
were too voluminous to be included.

> With something like that you could say, the jiffy tick went from 0.8+-.1
> to 1.1+-.1 us or somesuch.

Well yeah we can look at this but there seem to be regressions in a lot of
other subsystems as well. Rescheduling is another thing that we tracked.
Its interesting that the holdoffs varied at lot during the scheduler
transition to CFS and then stayed high after that was complete.

> After that, you could possibly use oprofile or readprofile or
> perf-counters to get an idea where the time is spend. I did a quick
> profile on one of my machines, and about half the kernel time spend in a
> while(1) loop comes from __do_softirq().
>
> Really, I should not have to tell you this...

I can get down there but do you really want me to start hacking on the
scheduler again? This seems to be a regression from what we had working
fine before.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 14:29       ` Christoph Lameter
@ 2009-04-17 14:51         ` Peter Zijlstra
  2009-04-17 15:04           ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 14:51 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 10:29 -0400, Christoph Lameter wrote:

> > With something like that you could say, the jiffy tick went from 0.8+-.1
> > to 1.1+-.1 us or somesuch.
> 
> Well yeah we can look at this but there seem to be regressions in a lot of
> other subsystems as well. Rescheduling is another thing that we tracked.
> Its interesting that the holdoffs varied at lot during the scheduler
> transition to CFS and then stayed high after that was complete.
> 
> > After that, you could possibly use oprofile or readprofile or
> > perf-counters to get an idea where the time is spend. I did a quick
> > profile on one of my machines, and about half the kernel time spend in a
> > while(1) loop comes from __do_softirq().
> >
> > Really, I should not have to tell you this...
> 
> I can get down there but do you really want me to start hacking on the
> scheduler again? This seems to be a regression from what we had working
> fine before.

I won't mind you sending patches. But really, the first thing to do is
figuring out what is taking time.

And a random 1us cutoff, is well, random.

If you want to reduce interrupts, that's fine, but not counting an
interrupt because its below the magic 1us marker sounds a bit, well,
magic -- might work for you, might not for me on another machine, might
even be compiler dependent.

So 5 <1us interruption are not at all accounted, whereas a single 1>us
interruption is. I'd rather get rid of those 5 than try and shave a bit
of the one, if you get what I mean.

I'm pretty sure if we run the current kernel on a 5GHz machine all
interrupts are under 1us again :-), problem fixed? I don't think so.

Furthermore, yes the scheduler is one of those jiffy tick users, but
there are more. We can do ntp/gtod things in there, there is process
accounting, there is some RCU machinery, timers etc..

Like said, I did a profile on current -tip and __do_softirq was about
half the time spend in kernel. I'm not sure why it would be, maybe we're
doing tons of cache misses there for some reason, I dunno.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 14:51         ` Peter Zijlstra
@ 2009-04-17 15:04           ` Christoph Lameter
  2009-04-17 15:28             ` Peter Zijlstra
  2009-04-17 15:35             ` Ingo Molnar
  0 siblings, 2 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 15:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Fri, 17 Apr 2009, Peter Zijlstra wrote:

> And a random 1us cutoff, is well, random.

Its got to be somewhere.

> If you want to reduce interrupts, that's fine, but not counting an
> interrupt because its below the magic 1us marker sounds a bit, well,
> magic -- might work for you, might not for me on another machine, might
> even be compiler dependent.

The point is to reduce the interrupts of user space application. Hardware
interrupts etc are something else. Maybe I should not use the term
interrupt. Cpu unavailability? Cache pollution?

> So 5 <1us interruption are not at all accounted, whereas a single 1>us
> interruption is. I'd rather get rid of those 5 than try and shave a bit
> of the one, if you get what I mean.

Ok. We can set the threshold lower and see how that goes.

> Furthermore, yes the scheduler is one of those jiffy tick users, but
> there are more. We can do ntp/gtod things in there, there is process
> accounting, there is some RCU machinery, timers etc..

Again things were fine before the scheduler changed.

Could we defer interrupts if a single process is running on a cpu and
there is no other process on the runqueue and no other use of the timer
interrupt?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 15:04           ` Christoph Lameter
@ 2009-04-17 15:28             ` Peter Zijlstra
  2009-04-23  4:42               ` Pavel Machek
  2009-04-17 15:35             ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 15:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 11:04 -0400, Christoph Lameter wrote:
> On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> 
> > And a random 1us cutoff, is well, random.
> 
> Its got to be somewhere.

No it doesn't, a cutoff is useless. If I steal cputime at units below
the cutoff I can, in the limit, steal up to 100% cpu time and you'd not
notice.

> > If you want to reduce interrupts, that's fine, but not counting an
> > interrupt because its below the magic 1us marker sounds a bit, well,
> > magic -- might work for you, might not for me on another machine, might
> > even be compiler dependent.
> 
> The point is to reduce the interrupts of user space application. Hardware
> interrupts etc are something else. Maybe I should not use the term
> interrupt. Cpu unavailability? Cache pollution?

The thing is, a 1us cut-off, or any other cut-off, doesn't tell you
_anything_ about availablity what so ever.

> > So 5 <1us interruption are not at all accounted, whereas a single 1>us
> > interruption is. I'd rather get rid of those 5 than try and shave a bit
> > of the one, if you get what I mean.
> 
> Ok. We can set the threshold lower and see how that goes.

Pointless. See above.

> > Furthermore, yes the scheduler is one of those jiffy tick users, but
> > there are more. We can do ntp/gtod things in there, there is process
> > accounting, there is some RCU machinery, timers etc..
> 
> Again things were fine before the scheduler changed.

Sure, it might be the scheduler, it might not be. I'm perfectly fine
with it being the scheduler, but show me where you're spending your time
and we can try and do something about it.

> Could we defer interrupts if a single process is running on a cpu and
> there is no other process on the runqueue and no other use of the timer
> interrupt?

If you can resolve all dependencies on the jiffy tick, maybe. Last time
I looked there were too many.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 15:04           ` Christoph Lameter
  2009-04-17 15:28             ` Peter Zijlstra
@ 2009-04-17 15:35             ` Ingo Molnar
  2009-04-17 15:55               ` Christoph Lameter
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2009-04-17 15:35 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Peter Zijlstra, linux-kernel


* Christoph Lameter <cl@linux.com> wrote:

> On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> 
> > And a random 1us cutoff, is well, random.
> 
> Its got to be somewhere.

Sorry, that's not a rational answer that makes any sense.

Peter's point is statistics 101: please show absolute values not an 
event distribution cutoff - how much total time do we spend in the 
kernel in that workload?

Is the overhead 1%? 2%? 0.5%? And how did it change from 2.6.22 
onwards? Did it go up by 0.1%, from 1% to 1.1%? Or did the average 
go down by 0.05%, while increasing the spread of events (thus 
fooling your cutoff)?

These are very simple, very basic, very straightforward questions - 
and no straight answer was forthcoming from you. Are you not 
interested in that answer?

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 15:35             ` Ingo Molnar
@ 2009-04-17 15:55               ` Christoph Lameter
  2009-04-17 16:23                 ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 15:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel

On Fri, 17 Apr 2009, Ingo Molnar wrote:

>
> * Christoph Lameter <cl@linux.com> wrote:
>
> > On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> >
> > > And a random 1us cutoff, is well, random.
> >
> > Its got to be somewhere.
>
> Sorry, that's not a rational answer that makes any sense.

There must be criteria for what constitutes a interrupt of the user space
application that we care about. There are numerous effects (like cacheline
fetches etc) that also cause holdoffs. So its not only the OS. There are
also other randomizing influences due to the exact location in the
measurement code path where the interrupt occurred. The criterion used
here was that the cpu was not executing code for longer than 1 usec. We
can discuss if this makes sense or what makes sense. But the measurements
here are entirely from the perspective of the user space program
experiencing OS noise.

> Peter's point is statistics 101: please show absolute values not an
> event distribution cutoff - how much total time do we spend in the
> kernel in that workload?

Futher details are included in the document that I pointed him to. The
"workload" is a synthetic test of a busy loop continually reading the TSC
register.

http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Latencharts.ods

> Is the overhead 1%? 2%? 0.5%? And how did it change from 2.6.22
> onwards? Did it go up by 0.1%, from 1% to 1.1%? Or did the average
> go down by 0.05%, while increasing the spread of events (thus
> fooling your cutoff)?

As you see in the diagrams provided there is a 4 fold increase in the
number of interrupts >1usecs when going from 2.6.22 to 2.6.23. How would
you measure the overhead? Time spent in the OS? Disturbance of the caches
by the OS that cause the application to have to refetch data from Ram?

> These are very simple, very basic, very straightforward questions -
> and no straight answer was forthcoming from you. Are you not
> interested in that answer?

Is this goofiness going to continue? I provided measurements from a user
space perspective and you keep on demanding that I take other
measurements. I can dig deeper into the reasons for these regressions and
we can discuss what would be useful to measure but that wont happen
overnight.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 15:55               ` Christoph Lameter
@ 2009-04-17 16:23                 ` Peter Zijlstra
  2009-04-17 16:33                   ` Christoph Lameter
  2009-04-17 16:49                   ` Ingo Molnar
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 16:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 11:55 -0400, Christoph Lameter wrote:

> Futher details are included in the document that I pointed him to. The
> "workload" is a synthetic test of a busy loop continually reading the TSC
> register.
> 
> http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Latencharts.ods

You're really going to make me open that thing eh..

As we've constituted, graph 1 is useless.

I'm still not quite sure why you couldn't provide the data for the other
graphs in email. They are not at all that much:

Graph 2: Noise Length

Kernel          Test 1  Test 2  Test 3  Interruption(AVG)
2.6.22          2.55    2.61    1.92    2.36
2.6.23          1.33    1.38    1.34    1.35
2.6.24          1.97    1.86    1.87    1.90
2.6.25          2.09    2.29    2.09    2.16
2.6.26          1.49    1.22    1.22    1.31
2.6.27          1.67    1.28    1.18    1.38
2.6.28          1.27    1.21    1.14    1.21
2.6.29          1.44    1.33    1.54    1.44
2.6.30-rc2      2.06    1.49    1.24    1.60

Is pretty useless too, since it only counts >1us events. Hence it will
always be biased.

Much better would have been a graph constructed from a histrogram that
plots all lengths (say in 10ns) buckets. There you would have had a few
(at least 2) peaks, one around the time it takes to complete the
userspace loop, and one around the time it takes to do this interrupt
thing -- and maybe some other smaller ones. But now we're clueless.

Combined with graph 1, we can only compare 26+, and there we can see
there is some variance in how long a tick takes between kernels -- but a
std-dev along with this avg, would have been even better.

For the past few releases I cannot remember anything that would
immediately affect the tick length. Nothing structural was changed
around that code.

> > Is the overhead 1%? 2%? 0.5%? And how did it change from 2.6.22
> > onwards? Did it go up by 0.1%, from 1% to 1.1%? Or did the average
> > go down by 0.05%, while increasing the spread of events (thus
> > fooling your cutoff)?
> 
> As you see in the diagrams provided there is a 4 fold increase in the
> number of interrupts >1usecs when going from 2.6.22 to 2.6.23. How would
> you measure the overhead? Time spent in the OS? Disturbance of the caches
> by the OS that cause the application to have to refetch data from Ram?

You could for example run an NMI profiler at 10000 Hz and collect
samples. Or use PMU hardware to collect numbers


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 16:23                 ` Peter Zijlstra
@ 2009-04-17 16:33                   ` Christoph Lameter
  2009-04-17 16:49                   ` Ingo Molnar
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 16:33 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Fri, 17 Apr 2009, Peter Zijlstra wrote:

> As we've constituted, graph 1 is useless.

It certainly shows that the number of >1usec interrupts increases.

> I'm still not quite sure why you couldn't provide the data for the other
> graphs in email. They are not at all that much:

Could have but I thought I better focus on one.

> Graph 2: Noise Length
>
> Kernel          Test 1  Test 2  Test 3  Interruption(AVG)
> 2.6.22          2.55    2.61    1.92    2.36
> 2.6.23          1.33    1.38    1.34    1.35
> 2.6.24          1.97    1.86    1.87    1.90
> 2.6.25          2.09    2.29    2.09    2.16
> 2.6.26          1.49    1.22    1.22    1.31
> 2.6.27          1.67    1.28    1.18    1.38
> 2.6.28          1.27    1.21    1.14    1.21
> 2.6.29          1.44    1.33    1.54    1.44
> 2.6.30-rc2      2.06    1.49    1.24    1.60
>
> Is pretty useless too, since it only counts >1us events. Hence it will
> always be biased.

So what would be useful is to have all the differences in terms of
nanoseconds between measurements? I can just remove the cutoff and then
measure in tenth of usecs. That would be okay?

> You could for example run an NMI profiler at 10000 Hz and collect
> samples. Or use PMU hardware to collect numbers

The point is to have an test apps that can be used to measure OS noise
that may have a variety of causes.

Such a test app needs to be as simple as possible. The TSC loop should be
fine as far as I can tell. Histogram is already there. I can just remove
the 1usec cutoff to get you full details.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 16:23                 ` Peter Zijlstra
  2009-04-17 16:33                   ` Christoph Lameter
@ 2009-04-17 16:49                   ` Ingo Molnar
  2009-04-17 17:19                     ` Chris Friesen
  2009-04-17 17:45                     ` Christoph Lameter
  1 sibling, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2009-04-17 16:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Lameter, linux-kernel


* Peter Zijlstra <peterz@infradead.org> wrote:

> > > Is the overhead 1%? 2%? 0.5%? And how did it change from 
> > > 2.6.22 onwards? Did it go up by 0.1%, from 1% to 1.1%? Or did 
> > > the average go down by 0.05%, while increasing the spread of 
> > > events (thus fooling your cutoff)?
> > 
> > As you see in the diagrams provided there is a 4 fold increase 
> > in the number of interrupts >1usecs when going from 2.6.22 to 
> > 2.6.23. How would you measure the overhead? Time spent in the 
> > OS? Disturbance of the caches by the OS that cause the 
> > application to have to refetch data from Ram?
> 
> You could for example run an NMI profiler at 10000 Hz and collect 
> samples. Or use PMU hardware to collect numbers

it's even simpler than that: use a user-space loop and an alert() 
based signal-handler triggered measurement to count the number of 
loops user-space can execute, per second.

Such measurements can very easily be made much more precise than 
0.1%, and in a tight loop there's basically no caching issues - it 
measures pure 'cycles left for user-space' performance.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 16:49                   ` Ingo Molnar
@ 2009-04-17 17:19                     ` Chris Friesen
  2009-04-17 17:45                     ` Christoph Lameter
  1 sibling, 0 replies; 27+ messages in thread
From: Chris Friesen @ 2009-04-17 17:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Christoph Lameter, linux-kernel

Ingo Molnar wrote:

> it's even simpler than that: use a user-space loop and an alert() 
> based signal-handler triggered measurement to count the number of 
> loops user-space can execute, per second.

Alternately, just measure how long it takes to execute a given (large) 
number of loops.  This avoids the signal handler.

Chris


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 16:49                   ` Ingo Molnar
  2009-04-17 17:19                     ` Chris Friesen
@ 2009-04-17 17:45                     ` Christoph Lameter
  2009-04-17 18:11                       ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 17:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, linux-kernel

Ok I made some changes to the latencytest program (gentwo.org/ll). I hope
you can tell me what exactly you would consider meaningful numbers.

Here is a sample execution with a granularity of 1usec and a bucket size
of 1usec and the mininum set to zero (2.6.29 HZ=1000):

latencytest -d -m0 -b1000

CPUs: Freq=3.16Ghz Processors=8 Cores=4 cacheline_size=64 Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz
522 involuntary context switches
207231729 (100.00000000%) variances in 10.00 seconds: minimum 0.04us
maximum 109.58us average 0.05us
usecs percent samples
-------------------------
 0.0   100.0 207221180
 2.0     0.0       35
 3.0     0.0     3691
 4.0     0.0     4637
 5.0     0.0      891
 6.0     0.0      464
 7.0     0.0       90
 8.0     0.0       83
 9.0     0.0       44
10.0     0.0       42
11.0     0.0       42
12.0     0.0      171
13.0     0.0      150
14.0     0.0       44
15.0     0.0       19
16.0     0.0       23
17.0     0.0       16
18.0     0.0        9
19.0     0.0       15
20.0     0.0        2
21.0     0.0        6
22.0     0.0        4
23.0     0.0        4
24.0     0.0       13
25.0     0.0        2
27.0     0.0        2
29.0     0.0        2
30.0     0.0        3
31.0     0.0        1
32.0     0.0        3
33.0     0.0        2
34.0     0.0        1
36.0     0.0        2
37.0     0.0        1
38.0     0.0        1
39.0     0.0        7
40.0     0.0        7
41.0     0.0        9
42.0     0.0        3
43.0     0.0        1
44.0     0.0        1
46.0     0.0        1
49.0     0.0        1
50.0     0.0        1
57.0     0.0        1
61.0     0.0        1
109.0     0.0        1

Setting the mininum to 1usec yields:

christoph@JTCHITW00139:~/lldiag-0.2$ ./latencytest -d -m1000 -b1000
CPUs: Freq=3.16Ghz Processors=8 Cores=4 cacheline_size=64 Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz
247597652 samples below 1000 nsec
28 involuntary context switches
10011 (0.00404309%) variances in 10.00 seconds: minimum 3.20us maximum 336.82us average 5.35us
usecs percent samples
-------------------------
 3.0     6.4      645
 4.0    28.6     2866
 5.0    57.7     5781
 6.0     4.5      448
 7.0     1.1      113
 8.0     0.4       39
 9.0     0.6       59
10.0     0.2       19
11.0     0.0        1
12.0     0.0        2
13.0     0.0        4
14.0     0.0        1
15.0     0.0        1
16.0     0.0        2
20.0     0.0        3
26.0     0.0        1
28.0     0.0        1
30.0     0.0        1
37.0     0.0        1
38.0     0.0        1
43.0     0.0        1
45.0     0.0        3
46.0     0.0        1
47.0     0.0        2
48.0     0.0        1
53.0     0.0        1
55.0     0.0        1
56.0     0.0        4
57.0     0.0        4
61.0     0.0        1
76.0     0.0        1
93.0     0.0        1
336.0     0.0        1

More fine grained resolution (100ns do not count measurements below 100ns):

latencytest -d -m100 -b100

CPUs: Freq=3.16Ghz Processors=8 Cores=4 cacheline_size=64 Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz
266619352 samples below 100 nsec
19 involuntary context switches
13254 (0.00497088%) variances in 10.00 seconds: minimum 0.10us maximum
143.63us average 3.63us
usecs percent samples
-------------------------
 0.1    23.7     3141
 0.2     0.7       87
 0.3     0.1       12
 0.5     0.0        1
 0.6     0.0        1
 1.9     0.1       17
 2.0     0.4       47
 2.1     1.0      127
 2.2     2.3      309
 2.3     0.3       35
 2.4     0.2       25
 2.5     0.2       22
 2.6     0.2       23
 2.7     2.0      267
 2.8     4.2      559
 2.9     0.4       52
 3.0     1.1      143
 3.1     1.5      198
 3.2     1.1      149
 3.3     0.7       97
 3.4     0.6       78
 3.5     0.6       83
 3.6     0.9      121
 3.7     0.8      112
 3.8     4.1      547
 3.9     1.5      194
 4.0     0.3       42
 4.1     0.1       15
 4.2     0.2       23
 4.3     0.8      108
 4.4     0.5       63
 4.5     0.2       20
 4.6     0.2       25
 4.7     0.2       28
 4.8     0.3       44
 4.9     0.2       24
 5.0     1.0      126
 5.1     3.1      414
 5.2    18.0     2385
 5.3     6.5      857
 5.4     4.9      656
 5.5     2.5      335
 5.6     3.4      448
 5.7     3.4      445
 5.8     2.0      268
 5.9     0.9      116
 6.0     0.4       54
 6.1     0.2       22
 6.2     0.3       36
 6.3     0.1       17
 6.4     0.1       10
 6.5     0.1       11
 6.6     0.1        9
 6.7     0.1        9
 6.8     0.1        8
 6.9     0.0        4
 7.0     0.1        8
 7.1     0.0        6
 7.2     0.1        7
 7.3     0.1       14
 7.4     0.1        7
 7.5     0.0        4
 7.6     0.0        3
 7.7     0.0        6
 7.8     0.1        7
 7.9     0.0        6
 8.0     0.0        5
 8.1     0.0        4
 8.2     0.0        3
 8.3     0.0        2
 8.4     0.0        2

The group around 5usec is likely the timer interrupt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 17:45                     ` Christoph Lameter
@ 2009-04-17 18:11                       ` Peter Zijlstra
  2009-04-17 18:20                         ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 18:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 13:45 -0400, Christoph Lameter wrote:
> Ok I made some changes to the latencytest program (gentwo.org/ll). I hope
> you can tell me what exactly you would consider meaningful numbers.
> 

> More fine grained resolution (100ns do not count measurements below 100ns):
> 
> latencytest -d -m100 -b100
> 
> CPUs: Freq=3.16Ghz Processors=8 Cores=4 cacheline_size=64 Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz
> 266619352 samples below 100 nsec
> 19 involuntary context switches
> 13254 (0.00497088%) variances in 10.00 seconds: minimum 0.10us maximum
> 143.63us average 3.63us
> usecs percent samples
> -------------------------
>  0.1    23.7     3141
>  0.2     0.7       87
>  0.3     0.1       12
>  0.5     0.0        1
>  0.6     0.0        1
>  1.9     0.1       17
>  2.0     0.4       47
>  2.1     1.0      127
>  2.2     2.3      309
>  2.3     0.3       35
>  2.4     0.2       25
>  2.5     0.2       22
>  2.6     0.2       23
>  2.7     2.0      267
>  2.8     4.2      559
>  2.9     0.4       52
>  3.0     1.1      143
>  3.1     1.5      198
>  3.2     1.1      149
>  3.3     0.7       97
>  3.4     0.6       78
>  3.5     0.6       83
>  3.6     0.9      121
>  3.7     0.8      112
>  3.8     4.1      547
>  3.9     1.5      194
>  4.0     0.3       42
>  4.1     0.1       15
>  4.2     0.2       23
>  4.3     0.8      108
>  4.4     0.5       63
>  4.5     0.2       20
>  4.6     0.2       25
>  4.7     0.2       28
>  4.8     0.3       44
>  4.9     0.2       24
>  5.0     1.0      126
>  5.1     3.1      414
>  5.2    18.0     2385
>  5.3     6.5      857
>  5.4     4.9      656
>  5.5     2.5      335
>  5.6     3.4      448
>  5.7     3.4      445
>  5.8     2.0      268
>  5.9     0.9      116
>  6.0     0.4       54
>  6.1     0.2       22
>  6.2     0.3       36
>  6.3     0.1       17
>  6.4     0.1       10
>  6.5     0.1       11
>  6.6     0.1        9
>  6.7     0.1        9
>  6.8     0.1        8
>  6.9     0.0        4
>  7.0     0.1        8
>  7.1     0.0        6
>  7.2     0.1        7
>  7.3     0.1       14
>  7.4     0.1        7
>  7.5     0.0        4
>  7.6     0.0        3
>  7.7     0.0        6
>  7.8     0.1        7
>  7.9     0.0        6
>  8.0     0.0        5
>  8.1     0.0        4
>  8.2     0.0        3
>  8.3     0.0        2
>  8.4     0.0        2
> 
> The group around 5usec is likely the timer interrupt.

Right, this is nice. 

There are a number of peaks (aside from 0), 2.2, 2.8, 3.1, 3.8, 4.3 and
5.2 -- and I suspect your guess about 5.2 is right.

Something like this is nice to compare between kernels. Chris'
suggestion of timing a simple fixed loop:

$ time (let i=1000000; while [ $i -gt 0 ]; do let i--; done)

real    0m14.389s
user    0m13.787s
sys     0m0.498s

Is also useful, since it gives an absolute measure of time available to
user-space.

Although I suspect a simple C while(i--); might be better due to less
code involved.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 18:11                       ` Peter Zijlstra
@ 2009-04-17 18:20                         ` Christoph Lameter
  2009-04-17 18:58                           ` Peter Zijlstra
  2009-04-17 23:24                           ` Chris Friesen
  0 siblings, 2 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 18:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Fri, 17 Apr 2009, Peter Zijlstra wrote:

> Something like this is nice to compare between kernels. Chris'
> suggestion of timing a simple fixed loop:
>
> $ time (let i=1000000; while [ $i -gt 0 ]; do let i--; done)
>
> real    0m14.389s
> user    0m13.787s
> sys     0m0.498s
>
> Is also useful, since it gives an absolute measure of time available to
> user-space.
>
> Although I suspect a simple C while(i--); might be better due to less
> code involved.

The absolute time available to user space is not that important. It is
important that the processor stays available during latency critical
operations and is not taken away by the OS. The intervals that the OS
takes the processor away determine the mininum interval that the
application has to react to events (f.e. RDMA transfers via Infiniband,
or operations on requests coming in via shared memory). These operations
often must occur in parallel on multiple cores. Processing is delayed if
any of the cores encounters a delay due to OS noise.

The latencytest code simulates a busy processor (no system calls, all
memory is prefaulted). For some reasons Linux is increasingly taking time
away from such processes (that intentionally run uncontended on a
dedicated processor). This causes regressions so that current upstream is
not usable for these applications.

It would be best for these applications if the processor would be left
undisturbed. There is likely not much that the OS needs to do on a busy
processor if there are no competing threads and if there is no I/O taking
place.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 18:20                         ` Christoph Lameter
@ 2009-04-17 18:58                           ` Peter Zijlstra
  2009-04-17 20:34                             ` Christoph Lameter
  2009-04-17 23:24                           ` Chris Friesen
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 18:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 14:20 -0400, Christoph Lameter wrote:
> On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> 
> > Something like this is nice to compare between kernels. Chris'
> > suggestion of timing a simple fixed loop:
> >
> > $ time (let i=1000000; while [ $i -gt 0 ]; do let i--; done)
> >
> > real    0m14.389s
> > user    0m13.787s
> > sys     0m0.498s
> >
> > Is also useful, since it gives an absolute measure of time available to
> > user-space.
> >
> > Although I suspect a simple C while(i--); might be better due to less
> > code involved.
> 
> The absolute time available to user space is not that important. It is
> important that the processor stays available during latency critical
> operations and is not taken away by the OS. The intervals that the OS
> takes the processor away determine the mininum interval that the
> application has to react to events (f.e. RDMA transfers via Infiniband,
> or operations on requests coming in via shared memory). These operations
> often must occur in parallel on multiple cores. Processing is delayed if
> any of the cores encounters a delay due to OS noise.

So you have hard deadlines in the order of us? Are these SCHED_FIFO
tasks or SCHED_OTHER?

> The latencytest code simulates a busy processor (no system calls, all
> memory is prefaulted). For some reasons Linux is increasingly taking time
> away from such processes (that intentionally run uncontended on a
> dedicated processor). This causes regressions so that current upstream is
> not usable for these applications.
> 
> It would be best for these applications if the processor would be left
> undisturbed. There is likely not much that the OS needs to do on a busy
> processor if there are no competing threads and if there is no I/O taking
> place.

Agreed -- that would be nice.

I can't really match the pattern to anything. The one thing that worried
me was sched_clock_tick(), which does a GTOD to sync the clocks between
cpus.

Your Xeon is a core2 class machine and should have relatively stable
TSC, however its also a dual socket, which I think defeats the
stable-ness.

What clocksource do you have?

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Thing is, that doesn't really match the .23 is expensive and .25 isn't.

Also, looking over the rest of the scheduler tick code, I can't really
see what would be so expensive.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 18:58                           ` Peter Zijlstra
@ 2009-04-17 20:34                             ` Christoph Lameter
  2009-04-17 20:53                               ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-17 20:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel

On Fri, 17 Apr 2009, Peter Zijlstra wrote:

> > often must occur in parallel on multiple cores. Processing is delayed if
> > any of the cores encounters a delay due to OS noise.
>
> So you have hard deadlines in the order of us? Are these SCHED_FIFO
> tasks or SCHED_OTHER?

SCHED_FIFO has the effect of removing all the involuntary context switches
but it does not effect the other interrutions.

> Your Xeon is a core2 class machine and should have relatively stable
> TSC, however its also a dual socket, which I think defeats the
> stable-ness.

> What clocksource do you have?
>
> cat /sys/devices/system/clocksource/clocksource0/current_clocksource

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

> Also, looking over the rest of the scheduler tick code, I can't really
> see what would be so expensive.

The expensiveness may be fine if we can limit the number of occurences.
Maybe the histograms for those releases give more insight.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 20:34                             ` Christoph Lameter
@ 2009-04-17 20:53                               ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-17 20:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Ingo Molnar, linux-kernel

On Fri, 2009-04-17 at 16:34 -0400, Christoph Lameter wrote:
> On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> 
> > > often must occur in parallel on multiple cores. Processing is delayed if
> > > any of the cores encounters a delay due to OS noise.
> >
> > So you have hard deadlines in the order of us? Are these SCHED_FIFO
> > tasks or SCHED_OTHER?
> 
> SCHED_FIFO has the effect of removing all the involuntary context switches
> but it does not effect the other interrutions.

OK, that's good to know.

> > Your Xeon is a core2 class machine and should have relatively stable
> > TSC, however its also a dual socket, which I think defeats the
> > stable-ness.
> 
> > What clocksource do you have?
> >
> > cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> 
> cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> tsc

Ah, good. I could measure a significant difference on my testbox between
tsc and acpi_pm.

> > Also, looking over the rest of the scheduler tick code, I can't really
> > see what would be so expensive.
> 
> The expensiveness may be fine if we can limit the number of occurences.
> Maybe the histograms for those releases give more insight.

Yeah, curious to see what .22 looks like -- readprofile/oprofile runs of
the kernel for those might be interesting as well.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 18:20                         ` Christoph Lameter
  2009-04-17 18:58                           ` Peter Zijlstra
@ 2009-04-17 23:24                           ` Chris Friesen
  2009-04-18  7:35                             ` Ingo Molnar
  2009-04-18  7:59                             ` Andi Kleen
  1 sibling, 2 replies; 27+ messages in thread
From: Chris Friesen @ 2009-04-17 23:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel

Christoph Lameter wrote:

> The latencytest code simulates a busy processor (no system calls, all
> memory is prefaulted). For some reasons Linux is increasingly taking time
> away from such processes (that intentionally run uncontended on a
> dedicated processor). This causes regressions so that current upstream is
> not usable for these applications.
> 
> It would be best for these applications if the processor would be left
> undisturbed. There is likely not much that the OS needs to do on a busy
> processor if there are no competing threads and if there is no I/O taking
> place.

Peter/Ingo, could ftrace be used to determine where time is being spent 
in the kernel with suitable accuracy?   (This may be a dumb question, I 
haven't played with ftrace much.)

Given that we're talking about tens of usecs of duration, statistical 
sampling may not work all that well.

Chris

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 23:24                           ` Chris Friesen
@ 2009-04-18  7:35                             ` Ingo Molnar
  2009-04-18  7:59                             ` Andi Kleen
  1 sibling, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2009-04-18  7:35 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Christoph Lameter, Peter Zijlstra, linux-kernel


* Chris Friesen <cfriesen@nortel.com> wrote:

> Christoph Lameter wrote:
>
>> The latencytest code simulates a busy processor (no system calls, all
>> memory is prefaulted). For some reasons Linux is increasingly taking time
>> away from such processes (that intentionally run uncontended on a
>> dedicated processor). This causes regressions so that current upstream is
>> not usable for these applications.
>>
>> It would be best for these applications if the processor would be 
>> left undisturbed. There is likely not much that the OS needs to 
>> do on a busy processor if there are no competing threads and if 
>> there is no I/O taking place.
>
> Peter/Ingo, could ftrace be used to determine where time is being 
> spent in the kernel with suitable accuracy?  (This may be a dumb 
> question, I haven't played with ftrace much.)

Yes, both ftrace and perfcounters can be used to directly trace or 
sample any overhead/delay - no matter how sporadic it might be.

> Given that we're talking about tens of usecs of duration, 
> statistical sampling may not work all that well.

Peter did rather convincing perfcounters profiles. Of course, if 
only maximum latencies are concerned, tracing works better. But 
mainline is pretty inadequate for that purpose anyway - there's a 
whole PREEMPT_RT world with many stories to tell there ;-)

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 23:24                           ` Chris Friesen
  2009-04-18  7:35                             ` Ingo Molnar
@ 2009-04-18  7:59                             ` Andi Kleen
  1 sibling, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2009-04-18  7:59 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Christoph Lameter, Peter Zijlstra, Ingo Molnar, linux-kernel

"Chris Friesen" <cfriesen@nortel.com> writes:
>
> Given that we're talking about tens of usecs of duration, statistical
> sampling may not work all that well.

He could just use oprofile with a small period. The minimum period for
CPU_CLK_UNHALTED is 6000 (and it could be probably lowered by tweaking
the events) oprofile counters on x86 can be configured to only count
in ring 0 So you would get an entry every 6000 ring 0 cycles. That
should give a pretty good picture where the kernel is spending time.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-17 15:28             ` Peter Zijlstra
@ 2009-04-23  4:42               ` Pavel Machek
  2009-04-28 21:02                 ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2009-04-23  4:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Christoph Lameter, Ingo Molnar, linux-kernel

On Fri 2009-04-17 17:28:57, Peter Zijlstra wrote:
> On Fri, 2009-04-17 at 11:04 -0400, Christoph Lameter wrote:
> > On Fri, 17 Apr 2009, Peter Zijlstra wrote:
> > 
> > > And a random 1us cutoff, is well, random.
> > 
> > Its got to be somewhere.
> 
> No it doesn't, a cutoff is useless. If I steal cputime at units below
> the cutoff I can, in the limit, steal up to 100% cpu time and you'd not
> notice.

Are you sure cutoff is useless?

Irq entry/exit is sure to take > 1usec, right?

> > > So 5 <1us interruption are not at all accounted, whereas a single 1>us
> > > interruption is. I'd rather get rid of those 5 than try and shave a bit
> > > of the one, if you get what I mean.
> > 
> > Ok. We can set the threshold lower and see how that goes.
> 
> Pointless. See above.

I'd say that taking interrupt and returning in below 1000nsec is
impossible, even on very fast hw.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-23  4:42               ` Pavel Machek
@ 2009-04-28 21:02                 ` Christoph Lameter
  2009-04-28 21:23                   ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2009-04-28 21:02 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel

On Thu, 23 Apr 2009, Pavel Machek wrote:

> I'd say that taking interrupt and returning in below 1000nsec is
> impossible, even on very fast hw.

Hmmm.... Could it be that CONFIG_NO_HZ caused the timer interrupt to be
avoided in earlier releases? NO_HZ theory at least is to only schedule
interrupts when  necessary. So an interrupt is needed at the end of the
scheduling intervals and not at HZ frequency.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-28 21:23                   ` Peter Zijlstra
@ 2009-04-28 21:21                     ` Christoph Lameter
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2009-04-28 21:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Pavel Machek, Ingo Molnar, linux-kernel

On Tue, 28 Apr 2009, Peter Zijlstra wrote:

> On Tue, 2009-04-28 at 17:02 -0400, Christoph Lameter wrote:
> > On Thu, 23 Apr 2009, Pavel Machek wrote:
> >
> > > I'd say that taking interrupt and returning in below 1000nsec is
> > > impossible, even on very fast hw.
> >
> > Hmmm.... Could it be that CONFIG_NO_HZ caused the timer interrupt to be
> > avoided in earlier releases? NO_HZ theory at least is to only schedule
> > interrupts when  necessary. So an interrupt is needed at the end of the
> > scheduling intervals and not at HZ frequency.
>
> No, NOHZ only works on idle cpus.

It is advertised to do more:

config NO_HZ
        bool "Tickless System (Dynamic Ticks)"
        depends on GENERIC_TIME && GENERIC_CLOCKEVENTS
        select TICK_ONESHOT
        help
          This option enables a tickless system: timer interrupts will
          only trigger on an as-needed basis both when the system is
          busy and when the system is idle.


So NO_HZ is really IDLE_NO_HZ. The system is not tickless.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Scheduler regression: Too frequent timer interrupts(?)
  2009-04-28 21:02                 ` Christoph Lameter
@ 2009-04-28 21:23                   ` Peter Zijlstra
  2009-04-28 21:21                     ` Christoph Lameter
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2009-04-28 21:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pavel Machek, Ingo Molnar, linux-kernel

On Tue, 2009-04-28 at 17:02 -0400, Christoph Lameter wrote:
> On Thu, 23 Apr 2009, Pavel Machek wrote:
> 
> > I'd say that taking interrupt and returning in below 1000nsec is
> > impossible, even on very fast hw.
> 
> Hmmm.... Could it be that CONFIG_NO_HZ caused the timer interrupt to be
> avoided in earlier releases? NO_HZ theory at least is to only schedule
> interrupts when  necessary. So an interrupt is needed at the end of the
> scheduling intervals and not at HZ frequency.

No, NOHZ only works on idle cpus.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-04-28 21:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-16 19:53 Scheduler regression: Too frequent timer interrupts(?) Christoph Lameter
2009-04-17  7:00 ` Peter Zijlstra
2009-04-17 13:42   ` Christoph Lameter
2009-04-17 14:16     ` Peter Zijlstra
2009-04-17 14:29       ` Christoph Lameter
2009-04-17 14:51         ` Peter Zijlstra
2009-04-17 15:04           ` Christoph Lameter
2009-04-17 15:28             ` Peter Zijlstra
2009-04-23  4:42               ` Pavel Machek
2009-04-28 21:02                 ` Christoph Lameter
2009-04-28 21:23                   ` Peter Zijlstra
2009-04-28 21:21                     ` Christoph Lameter
2009-04-17 15:35             ` Ingo Molnar
2009-04-17 15:55               ` Christoph Lameter
2009-04-17 16:23                 ` Peter Zijlstra
2009-04-17 16:33                   ` Christoph Lameter
2009-04-17 16:49                   ` Ingo Molnar
2009-04-17 17:19                     ` Chris Friesen
2009-04-17 17:45                     ` Christoph Lameter
2009-04-17 18:11                       ` Peter Zijlstra
2009-04-17 18:20                         ` Christoph Lameter
2009-04-17 18:58                           ` Peter Zijlstra
2009-04-17 20:34                             ` Christoph Lameter
2009-04-17 20:53                               ` Peter Zijlstra
2009-04-17 23:24                           ` Chris Friesen
2009-04-18  7:35                             ` Ingo Molnar
2009-04-18  7:59                             ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox