public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.14-rt13
@ 2005-11-15  9:08 Ingo Molnar
  2005-11-15 16:36 ` 2.6.14-rt13 Mark Knecht
                   ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-15  9:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paul E. McKenney, K.R. Foley, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini,
	George Anzinger

i have released the 2.6.14-rt13 tree, which can be downloaded from the 
usual place:

   http://redhat.com/~mingo/realtime-preempt/

lots of fixes in this release affecting all supported architectures, all 
across the board. Big MIPS update from John Cooper.

Changes since 2.6.14-rt1:

 - lots of RCU fixes and updates in signal handling and related areas
   (Paul E. McKenney)

 - big RCU torture-test update (Paul E. McKenney)

 - fix netfilter/conntrack crash reported by Paweł Sikora

 - big MIPS update (John Cooper)

 - ARM updates (Daniel Walker)

 - PPC updates (Benedikt Spranger)

 - ktimers rounding fix (Thomas Gleixner)

 - off by one fix in timespec normalization (George Anzinger)

 - lpptest Kconfig dependency fix (Tom Rini)

 - clean up get_cpu_tick() -> get_cycles() in blocker, lpptest and 
   latency.c. (Tom Rini)

 - fix ppc32 bootwrapper code for new zlib (Tom Rini)

 - rtc histogram fixes merged for real :-) (K.R. Foley)

 - fix NMI watchdog false positive (Steven Rostedt, me)

 - added the nsleep() kernel API, which uses high-resolution sleeps

 - build fix on !PREEMPT_RT

 - cleanup of the PER_CPU_LOCKED infrastructure

 - fix softlockup false positives triggered by the RCU torture-test.

 - do not send a false -ERESTART_RESTARTBLOCK to userspace if the
   HRT timer hardware wakes us up early.

to build a 2.6.14-rt13 tree, the following patches should be applied:

  http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.14.tar.bz2
  http://redhat.com/~mingo/realtime-preempt/patch-2.6.14-rt13

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-15  9:08 2.6.14-rt13 Ingo Molnar
@ 2005-11-15 16:36 ` Mark Knecht
  2005-11-15 19:57   ` 2.6.14-rt13 Paul E. McKenney
  2005-11-16  3:48 ` 2.6.14-rt13 K.R. Foley
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 56+ messages in thread
From: Mark Knecht @ 2005-11-15 16:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On 11/15/05, Ingo Molnar <mingo@elte.hu> wrote:
> i have released the 2.6.14-rt13 tree, which can be downloaded from the
> usual place:
>
>    http://redhat.com/~mingo/realtime-preempt/
>
> lots of fixes in this release affecting all supported architectures, all
> across the board. Big MIPS update from John Cooper.
<SNIP>

2.6.14-rt13 is up and running here. Everything looks fine in the first
couple of hours. Nothing negative to report.

Please let me know if there are any particular features that you'd
like me to look at on an AMD64 machine.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-15 16:36 ` 2.6.14-rt13 Mark Knecht
@ 2005-11-15 19:57   ` Paul E. McKenney
  0 siblings, 0 replies; 56+ messages in thread
From: Paul E. McKenney @ 2005-11-15 19:57 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Ingo Molnar, linux-kernel

On Tue, Nov 15, 2005 at 08:36:40AM -0800, Mark Knecht wrote:
> On 11/15/05, Ingo Molnar <mingo@elte.hu> wrote:
> > i have released the 2.6.14-rt13 tree, which can be downloaded from the
> > usual place:
> >
> >    http://redhat.com/~mingo/realtime-preempt/
> >
> > lots of fixes in this release affecting all supported architectures, all
> > across the board. Big MIPS update from John Cooper.
> <SNIP>
> 
> 2.6.14-rt13 is up and running here. Everything looks fine in the first
> couple of hours. Nothing negative to report.

Ditto on an old x86 Netfinity box.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-15  9:08 2.6.14-rt13 Ingo Molnar
  2005-11-15 16:36 ` 2.6.14-rt13 Mark Knecht
@ 2005-11-16  3:48 ` K.R. Foley
  2005-11-16  8:40   ` 2.6.14-rt13 Ingo Molnar
  2005-11-18 18:02 ` 2.6.14-rt13 Fernando Lopez-Lezcano
  2005-11-21 21:32 ` 2.6.14-rt13 Fernando Lopez-Lezcano
  3 siblings, 1 reply; 56+ messages in thread
From: K.R. Foley @ 2005-11-16  3:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Paul E. McKenney, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini,
	George Anzinger

Ingo Molnar wrote:
> i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> usual place:
> 
>    http://redhat.com/~mingo/realtime-preempt/
> 
> lots of fixes in this release affecting all supported architectures, all 
> across the board. Big MIPS update from John Cooper.
> 
> Changes since 2.6.14-rt1:
> 
>  - lots of RCU fixes and updates in signal handling and related areas
>    (Paul E. McKenney)
> 
>  - big RCU torture-test update (Paul E. McKenney)
>

In case anyone else makes the same mistake I did. If you are using the
same config from a previous build, you may have RCU_TORTURE_TEST=Y (not
module) and not even know it when running RT patches. You will however
definitely notice it if you use the config to build a non RT kernel like
2.6.15-rc1. The previous RT patch defaulted RCU_TORTURE_TEST=y. By the
way, the fact that I didn't even notice that the torture test was
running with the RT kernel is a true measure of how well things have
progressed. :-)

-- 
   kr

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-16  3:48 ` 2.6.14-rt13 K.R. Foley
@ 2005-11-16  8:40   ` Ingo Molnar
  2005-11-16 17:02     ` 2.6.14-rt13 Paul E. McKenney
  0 siblings, 1 reply; 56+ messages in thread
From: Ingo Molnar @ 2005-11-16  8:40 UTC (permalink / raw)
  To: K.R. Foley
  Cc: linux-kernel, Paul E. McKenney, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini,
	George Anzinger


* K.R. Foley <kr@cybsft.com> wrote:

> >  - big RCU torture-test update (Paul E. McKenney)
> 
> In case anyone else makes the same mistake I did. If you are using the 
> same config from a previous build, you may have RCU_TORTURE_TEST=Y 
> (not module) and not even know it when running RT patches. You will 
> however definitely notice it if you use the config to build a non RT 
> kernel like 2.6.15-rc1. The previous RT patch defaulted 
> RCU_TORTURE_TEST=y. By the way, the fact that I didn't even notice 
> that the torture test was running with the RT kernel is a true measure 
> of how well things have progressed. :-)

yeah - i left it on by default, i usually do that with new debugging 
features, to give new code more exposure. In other words, mass 
distributed RCU stress-testing by stealth ;-)

I'll make it default-off once the RCU related changes have calmed down.  
The rcutorture kernel threads run at nice +19 so they should be barely 
noticeable. (except for a sudden and unexplained spike in the world's 
power consumption, and the resulting energy crisis ;-)

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-16  8:40   ` 2.6.14-rt13 Ingo Molnar
@ 2005-11-16 17:02     ` Paul E. McKenney
  0 siblings, 0 replies; 56+ messages in thread
From: Paul E. McKenney @ 2005-11-16 17:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: K.R. Foley, linux-kernel, Steven Rostedt, Thomas Gleixner, pluto,
	john cooper, Benedikt Spranger, Daniel Walker, Tom Rini,
	George Anzinger

On Wed, Nov 16, 2005 at 09:40:37AM +0100, Ingo Molnar wrote:
> 
> * K.R. Foley <kr@cybsft.com> wrote:
> 
> > >  - big RCU torture-test update (Paul E. McKenney)
> > 
> > In case anyone else makes the same mistake I did. If you are using the 
> > same config from a previous build, you may have RCU_TORTURE_TEST=Y 
> > (not module) and not even know it when running RT patches. You will 
> > however definitely notice it if you use the config to build a non RT 
> > kernel like 2.6.15-rc1. The previous RT patch defaulted 
> > RCU_TORTURE_TEST=y. By the way, the fact that I didn't even notice 
> > that the torture test was running with the RT kernel is a true measure 
> > of how well things have progressed. :-)
> 
> yeah - i left it on by default, i usually do that with new debugging 
> features, to give new code more exposure. In other words, mass 
> distributed RCU stress-testing by stealth ;-)

Cool!!!  If anyone sees a printk line starting with "rcutorture:" 
that includes the string "!!!", please pass it along accompanied by
your config and what your workload was doing at the time.

						Thanx, Paul

> I'll make it default-off once the RCU related changes have calmed down.  
> The rcutorture kernel threads run at nice +19 so they should be barely 
> noticeable. (except for a sudden and unexplained spike in the world's 
> power consumption, and the resulting energy crisis ;-)
> 
> 	Ingo
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-15  9:08 2.6.14-rt13 Ingo Molnar
  2005-11-15 16:36 ` 2.6.14-rt13 Mark Knecht
  2005-11-16  3:48 ` 2.6.14-rt13 K.R. Foley
@ 2005-11-18 18:02 ` Fernando Lopez-Lezcano
  2005-11-18 21:54   ` 2.6.14-rt13 Lee Revell
  2005-11-21 21:32 ` 2.6.14-rt13 Fernando Lopez-Lezcano
  3 siblings, 1 reply; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-18 18:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: nando, linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

On Tue, 2005-11-15 at 10:08 +0100, Ingo Molnar wrote:
> i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> usual place:
> 
>    http://redhat.com/~mingo/realtime-preempt/
> 
> lots of fixes in this release affecting all supported architectures, all 
> across the board. Big MIPS update from John Cooper.

Hi Ingo, I'm back from the trip and built -rt13 to test on my dual core
Athlons. As I emailed you yesterday off the list it looked good, but I
guess it took longer than usual for things to degrade. This morning I'm
seeing the usual warnings from Jack. And, for the first time in a while,
actual xruns. I'll try your suggestion of booting with idle=poll. 

[begin speculation]
You mentioned before that the TSC's from both cpus could drift from each
other over time. Assuming that is the source of timing (I have no idea)
that could explain the behavior of Jack, it gets a reference time from
one of the cpus and then compares that with what it gets from either cpu
depending on where it is running at a given time. If it is the same cpu
all is fine, if it is the other and it has drifted then the warning is
printed. 

-- Fernando



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 18:02 ` 2.6.14-rt13 Fernando Lopez-Lezcano
@ 2005-11-18 21:54   ` Lee Revell
  2005-11-18 22:05     ` 2.6.14-rt13 Fernando Lopez-Lezcano
  0 siblings, 1 reply; 56+ messages in thread
From: Lee Revell @ 2005-11-18 21:54 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Ingo Molnar, linux-kernel, Paul E. McKenney, K.R. Foley,
	Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 10:02 -0800, Fernando Lopez-Lezcano wrote:
> You mentioned before that the TSC's from both cpus could drift from
> each other over time. Assuming that is the source of timing (I have no
> idea) that could explain the behavior of Jack, it gets a reference
> time from one of the cpus and then compares that with what it gets
> from either cpu depending on where it is running at a given time. If
> it is the same cpu all is fine, if it is the other and it has drifted
> then the warning is printed.  

Yes, JACK uses rdtsc() for microsecond resolution timing and assumes
that the TSCs are in sync.

I've asked on this list what a better time source could be and didn't
get any useful responses, people just told me "use gettimeofday()" which
is WAY too slow.

Lee


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 21:54   ` 2.6.14-rt13 Lee Revell
@ 2005-11-18 22:05     ` Fernando Lopez-Lezcano
  2005-11-18 22:07       ` 2.6.14-rt13 Ingo Molnar
  2005-11-18 22:13       ` 2.6.14-rt13 Lee Revell
  0 siblings, 2 replies; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-18 22:05 UTC (permalink / raw)
  To: Lee Revell
  Cc: nando, Ingo Molnar, linux-kernel, Paul E. McKenney, K.R. Foley,
	Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 16:54 -0500, Lee Revell wrote:
> On Fri, 2005-11-18 at 10:02 -0800, Fernando Lopez-Lezcano wrote:
> > You mentioned before that the TSC's from both cpus could drift from
> > each other over time. Assuming that is the source of timing (I have no
> > idea) that could explain the behavior of Jack, it gets a reference
> > time from one of the cpus and then compares that with what it gets
> > from either cpu depending on where it is running at a given time. If
> > it is the same cpu all is fine, if it is the other and it has drifted
> > then the warning is printed.  
> 
> Yes, JACK uses rdtsc() for microsecond resolution timing and assumes
> that the TSCs are in sync.
> 
> I've asked on this list what a better time source could be and didn't
> get any useful responses, people just told me "use gettimeofday()" which
> is WAY too slow.

Arghhh, at least I take this as a confirmation that the TSCs do drift
and there is no workaround. It currently makes the -rt/Jack combination
not very useful, at least in my tests. 

Is there a way to resync the TSCs?
-- Fernando



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:05     ` 2.6.14-rt13 Fernando Lopez-Lezcano
@ 2005-11-18 22:07       ` Ingo Molnar
  2005-11-18 22:15         ` 2.6.14-rt13 Lee Revell
  2005-11-18 22:41         ` 2.6.14-rt13 Fernando Lopez-Lezcano
  2005-11-18 22:13       ` 2.6.14-rt13 Lee Revell
  1 sibling, 2 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-18 22:07 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Lee Revell, linux-kernel, Paul E. McKenney, K.R. Foley,
	Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger


* Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:

> Arghhh, at least I take this as a confirmation that the TSCs do drift 
> and there is no workaround. It currently makes the -rt/Jack 
> combination not very useful, at least in my tests.
> 
> Is there a way to resync the TSCs?

no reasonable way. Does idle=poll make any difference?

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:05     ` 2.6.14-rt13 Fernando Lopez-Lezcano
  2005-11-18 22:07       ` 2.6.14-rt13 Ingo Molnar
@ 2005-11-18 22:13       ` Lee Revell
  2005-11-18 22:32         ` 2.6.14-rt13 Vojtech Pavlik
  1 sibling, 1 reply; 56+ messages in thread
From: Lee Revell @ 2005-11-18 22:13 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Ingo Molnar, linux-kernel, Paul E. McKenney, K.R. Foley,
	Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 14:05 -0800, Fernando Lopez-Lezcano wrote:
> On Fri, 2005-11-18 at 16:54 -0500, Lee Revell wrote:
> > On Fri, 2005-11-18 at 10:02 -0800, Fernando Lopez-Lezcano wrote:
> > > You mentioned before that the TSC's from both cpus could drift from
> > > each other over time. Assuming that is the source of timing (I have no
> > > idea) that could explain the behavior of Jack, it gets a reference
> > > time from one of the cpus and then compares that with what it gets
> > > from either cpu depending on where it is running at a given time. If
> > > it is the same cpu all is fine, if it is the other and it has drifted
> > > then the warning is printed.  
> > 
> > Yes, JACK uses rdtsc() for microsecond resolution timing and assumes
> > that the TSCs are in sync.
> > 
> > I've asked on this list what a better time source could be and didn't
> > get any useful responses, people just told me "use gettimeofday()" which
> > is WAY too slow.
> 
> Arghhh, at least I take this as a confirmation that the TSCs do drift
> and there is no workaround. It currently makes the -rt/Jack combination
> not very useful, at least in my tests. 
> 
> Is there a way to resync the TSCs?

I don't think so.  A better question is what mechanism have the hardware
vendors provided to replace the apparently-no-longer-reliable TSC for
cheap high res timing on modern machines.  Unfortunately I suspect the
answer at this point is "nothing, you're screwed".

I've read that gettimeofday() does not have to enter the kernel on
x86-64, maybe it's fast enough, though almost certainly orders of
magnitude slower than rdtsc().  It seems like a huge step backwards for
any apps with high res timing requirements.

Lee


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:07       ` 2.6.14-rt13 Ingo Molnar
@ 2005-11-18 22:15         ` Lee Revell
  2005-11-18 22:25           ` 2.6.14-rt13 Steven Rostedt
  2005-11-18 22:41         ` 2.6.14-rt13 Fernando Lopez-Lezcano
  1 sibling, 1 reply; 56+ messages in thread
From: Lee Revell @ 2005-11-18 22:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fernando Lopez-Lezcano, linux-kernel, Paul E. McKenney,
	K.R. Foley, Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 23:07 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> 
> > Arghhh, at least I take this as a confirmation that the TSCs do drift 
> > and there is no workaround. It currently makes the -rt/Jack 
> > combination not very useful, at least in my tests.
> > 
> > Is there a way to resync the TSCs?
> 
> no reasonable way. Does idle=poll make any difference?

But JACK itself uses rdtsc() for timing calculations so TSC drift is
invariably fatal.

Lee


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:15         ` 2.6.14-rt13 Lee Revell
@ 2005-11-18 22:25           ` Steven Rostedt
  2005-11-18 23:36             ` 2.6.14-rt13 Fernando Lopez-Lezcano
  0 siblings, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2005-11-18 22:25 UTC (permalink / raw)
  To: Lee Revell
  Cc: Ingo Molnar, Fernando Lopez-Lezcano, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger


On Fri, 18 Nov 2005, Lee Revell wrote:

> On Fri, 2005-11-18 at 23:07 +0100, Ingo Molnar wrote:
> > * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> >
> > > Arghhh, at least I take this as a confirmation that the TSCs do drift
> > > and there is no workaround. It currently makes the -rt/Jack
> > > combination not very useful, at least in my tests.
> > >
> > > Is there a way to resync the TSCs?
> >
> > no reasonable way. Does idle=poll make any difference?
>
> But JACK itself uses rdtsc() for timing calculations so TSC drift is
> invariably fatal.

Can it simply be pinned to a cpu?

-- Steve


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:13       ` 2.6.14-rt13 Lee Revell
@ 2005-11-18 22:32         ` Vojtech Pavlik
  2005-11-19  2:28           ` 2.6.14-rt13 George Anzinger
  0 siblings, 1 reply; 56+ messages in thread
From: Vojtech Pavlik @ 2005-11-18 22:32 UTC (permalink / raw)
  To: Lee Revell
  Cc: Fernando Lopez-Lezcano, Ingo Molnar, linux-kernel,
	Paul E. McKenney, K.R. Foley, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini,
	George Anzinger

On Fri, Nov 18, 2005 at 05:13:03PM -0500, Lee Revell wrote:
> On Fri, 2005-11-18 at 14:05 -0800, Fernando Lopez-Lezcano wrote:
> > On Fri, 2005-11-18 at 16:54 -0500, Lee Revell wrote:
> > > On Fri, 2005-11-18 at 10:02 -0800, Fernando Lopez-Lezcano wrote:
> > > > You mentioned before that the TSC's from both cpus could drift from
> > > > each other over time. Assuming that is the source of timing (I have no
> > > > idea) that could explain the behavior of Jack, it gets a reference
> > > > time from one of the cpus and then compares that with what it gets
> > > > from either cpu depending on where it is running at a given time. If
> > > > it is the same cpu all is fine, if it is the other and it has drifted
> > > > then the warning is printed.  
> > > 
> > > Yes, JACK uses rdtsc() for microsecond resolution timing and assumes
> > > that the TSCs are in sync.
> > > 
> > > I've asked on this list what a better time source could be and didn't
> > > get any useful responses, people just told me "use gettimeofday()" which
> > > is WAY too slow.
> > 
> > Arghhh, at least I take this as a confirmation that the TSCs do drift
> > and there is no workaround. It currently makes the -rt/Jack combination
> > not very useful, at least in my tests. 
> > 
> > Is there a way to resync the TSCs?
> 
> I don't think so.  A better question is what mechanism have the hardware
> vendors provided to replace the apparently-no-longer-reliable TSC for
> cheap high res timing on modern machines.  Unfortunately I suspect the
> answer at this point is "nothing, you're screwed".

There are many mechanisms to keep time:

1) RTC: 0.5 sec resolution, interrupts
2) PIT: takes ages to read, overflows at each timer interrupt
3) PMTMR: takes ages to read, overflows in approx 4 seconds, no interrupt
4) HPET: slow to read, overflows in 5 minutes. Nice, but usually not present.
5) TSC: fast, completely unreliable. Frequency changes, CPUs diverge over time.
6) LAPIC: reasonably fast, unreliable, per-cpu

Userspace can only use 1), 4) and 5). mplayer uses the RTC to
synchronize, using it as a 1 kHz interrupt source.

The kernel does quite a lot of magic and jumps through many hoops to
make a reliable and fast time source combining these.

> I've read that gettimeofday() does not have to enter the kernel on
> x86-64, maybe it's fast enough, though almost certainly orders of
> magnitude slower than rdtsc(). 

It depends on the hardware config, and kernel version. With my latest
patch it takes approximately 175 ns on a reasonably fast CPU to do
gettimeofday() from userspace. And much better results will be possible
(~5x better) when RDTSCP enabled CPUs become available.

This patch still has problems, and as such I'll still have to rewrite
significant portions before I release it.

Anyway, current gettimeofday() on SMP AMD x86-64 can be as bad as 1500ns.

> It seems like a huge step backwards for
> any apps with high res timing requirements.

gettimeofday() is the only guaranteed working mechanism. And it's as
fast as the hardware allows.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:07       ` 2.6.14-rt13 Ingo Molnar
  2005-11-18 22:15         ` 2.6.14-rt13 Lee Revell
@ 2005-11-18 22:41         ` Fernando Lopez-Lezcano
  2005-11-19  2:39           ` 2.6.14-rt13 Steven Rostedt
  1 sibling, 1 reply; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-18 22:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Lee Revell, linux-kernel, Paul E. McKenney, K.R. Foley,
	Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 23:07 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> 
> > Arghhh, at least I take this as a confirmation that the TSCs do drift 
> > and there is no workaround. It currently makes the -rt/Jack 
> > combination not very useful, at least in my tests.
> > 
> > Is there a way to resync the TSCs?
> 
> no reasonable way. Does idle=poll make any difference?

I don't know yet, and I may never know :-) I've been running it for a
while and so far works but that's what I thought yesterday of -rt13. It
is not practical for normal use, it just heats the cpu unnecessarily and
there's no way to control it other than a reboot. I'll keep my machine
running like this till I go home later. 

-- Fernando



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:25           ` 2.6.14-rt13 Steven Rostedt
@ 2005-11-18 23:36             ` Fernando Lopez-Lezcano
  2005-11-18 23:57               ` 2.6.14-rt13 Steven Rostedt
  0 siblings, 1 reply; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-18 23:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: nando, Lee Revell, Ingo Molnar, linux-kernel, Paul E. McKenney,
	K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 17:25 -0500, Steven Rostedt wrote:
> On Fri, 18 Nov 2005, Lee Revell wrote:
> 
> > On Fri, 2005-11-18 at 23:07 +0100, Ingo Molnar wrote:
> > > * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> > >
> > > > Arghhh, at least I take this as a confirmation that the TSCs do drift
> > > > and there is no workaround. It currently makes the -rt/Jack
> > > > combination not very useful, at least in my tests.
> > > >
> > > > Is there a way to resync the TSCs?
> > >
> > > no reasonable way. Does idle=poll make any difference?
> >
> > But JACK itself uses rdtsc() for timing calculations so TSC drift is
> > invariably fatal.
> 
> Can it simply be pinned to a cpu?

Is there a way to know in which cpu a process is running? At least Jack
could ignore timinig issues if the measurement is going to happen in a
different cpu than the one where the original timestamp was collected. 

-- Fernando



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 23:36             ` 2.6.14-rt13 Fernando Lopez-Lezcano
@ 2005-11-18 23:57               ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-11-18 23:57 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Lee Revell, Ingo Molnar, linux-kernel, Paul E. McKenney,
	K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger



On Fri, 18 Nov 2005, Fernando Lopez-Lezcano wrote:

> > Can it simply be pinned to a cpu?
>
> Is there a way to know in which cpu a process is running? At least Jack
> could ignore timinig issues if the measurement is going to happen in a
> different cpu than the one where the original timestamp was collected.
>

Simple answer? No. At least not meaningfully.

If you do:

cpu = fictitious_get_my_cpu();
if (cpu == last_cpu()) {
	rdtsc(oldtime);
	...
}

There's no guarantee that jack doesn't switch cpu's from when it found out
what CPU it was on to doing the calculation.  So it would be easier to pin
it.

(apt-get schedutils)

man 1 taskset

or if you modify the code:

mn 2 sched_setaffinity

-- Steve




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:32         ` 2.6.14-rt13 Vojtech Pavlik
@ 2005-11-19  2:28           ` George Anzinger
  2005-11-19  7:45             ` 2.6.14-rt13 Vojtech Pavlik
  0 siblings, 1 reply; 56+ messages in thread
From: George Anzinger @ 2005-11-19  2:28 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Lee Revell, Fernando Lopez-Lezcano, Ingo Molnar, linux-kernel,
	Paul E. McKenney, K.R. Foley, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini

Vojtech Pavlik wrote:
> On Fri, Nov 18, 2005 at 05:13:03PM -0500, Lee Revell wrote:
> 
>>On Fri, 2005-11-18 at 14:05 -0800, Fernando Lopez-Lezcano wrote:
>>
>>>On Fri, 2005-11-18 at 16:54 -0500, Lee Revell wrote:
>>>
>>>>On Fri, 2005-11-18 at 10:02 -0800, Fernando Lopez-Lezcano wrote:
>>>>
>>>>>You mentioned before that the TSC's from both cpus could drift from
>>>>>each other over time. Assuming that is the source of timing (I have no
>>>>>idea) that could explain the behavior of Jack, it gets a reference
>>>>>time from one of the cpus and then compares that with what it gets
>>>>>from either cpu depending on where it is running at a given time. If
>>>>>it is the same cpu all is fine, if it is the other and it has drifted
>>>>>then the warning is printed.  
>>>>
>>>>Yes, JACK uses rdtsc() for microsecond resolution timing and assumes
>>>>that the TSCs are in sync.
>>>>
>>>>I've asked on this list what a better time source could be and didn't
>>>>get any useful responses, people just told me "use gettimeofday()" which
>>>>is WAY too slow.
>>>
>>>Arghhh, at least I take this as a confirmation that the TSCs do drift
>>>and there is no workaround. It currently makes the -rt/Jack combination
>>>not very useful, at least in my tests. 
>>>
>>>Is there a way to resync the TSCs?
>>
>>I don't think so.  A better question is what mechanism have the hardware
>>vendors provided to replace the apparently-no-longer-reliable TSC for
>>cheap high res timing on modern machines.  Unfortunately I suspect the
>>answer at this point is "nothing, you're screwed".
> 
> 
> There are many mechanisms to keep time:
> 
> 1) RTC: 0.5 sec resolution, interrupts
> 2) PIT: takes ages to read, overflows at each timer interrupt
> 3) PMTMR: takes ages to read, overflows in approx 4 seconds, no interrupt

The PMTMR can be read from user space (if you can find it).  See the 
"iopl" man page.  It is an I/O access and so is slow, but you can read 
it.

Finding it is another matter.  It does not have a fixed address (i.e. 
it differs from machine to machine, but is constant on any given 
machine).  The boot code roots it out of an info block put in memory 
by the BIOS.  I suppose one could put a printk in the boot code to 
disclose it...

George
-- 


> 4) HPET: slow to read, overflows in 5 minutes. Nice, but usually not present.
> 5) TSC: fast, completely unreliable. Frequency changes, CPUs diverge over time.
> 6) LAPIC: reasonably fast, unreliable, per-cpu
> 
> Userspace can only use 1), 4) and 5). mplayer uses the RTC to
> synchronize, using it as a 1 kHz interrupt source.
> 
> The kernel does quite a lot of magic and jumps through many hoops to
> make a reliable and fast time source combining these.
> 
> 
>>I've read that gettimeofday() does not have to enter the kernel on
>>x86-64, maybe it's fast enough, though almost certainly orders of
>>magnitude slower than rdtsc(). 
> 
> 
> It depends on the hardware config, and kernel version. With my latest
> patch it takes approximately 175 ns on a reasonably fast CPU to do
> gettimeofday() from userspace. And much better results will be possible
> (~5x better) when RDTSCP enabled CPUs become available.
> 
> This patch still has problems, and as such I'll still have to rewrite
> significant portions before I release it.
> 
> Anyway, current gettimeofday() on SMP AMD x86-64 can be as bad as 1500ns.
> 
> 
>>It seems like a huge step backwards for
>>any apps with high res timing requirements.
> 
> 
> gettimeofday() is the only guaranteed working mechanism. And it's as
> fast as the hardware allows.
> 

-- 
George Anzinger   george@mvista.com
HRT (High-res-timers):  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-18 22:41         ` 2.6.14-rt13 Fernando Lopez-Lezcano
@ 2005-11-19  2:39           ` Steven Rostedt
  2005-11-24 15:07             ` 2.6.14-rt13 Ingo Molnar
  0 siblings, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2005-11-19  2:39 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Ingo Molnar, Lee Revell, linux-kernel, Paul E. McKenney,
	K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Fri, 2005-11-18 at 14:41 -0800, Fernando Lopez-Lezcano wrote:
> On Fri, 2005-11-18 at 23:07 +0100, Ingo Molnar wrote:
> > * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> > 
> > > Arghhh, at least I take this as a confirmation that the TSCs do drift 
> > > and there is no workaround. It currently makes the -rt/Jack 
> > > combination not very useful, at least in my tests.
> > > 
> > > Is there a way to resync the TSCs?
> > 
> > no reasonable way. Does idle=poll make any difference?
> 
> I don't know yet, and I may never know :-) I've been running it for a
> while and so far works but that's what I thought yesterday of -rt13. It
> is not practical for normal use, it just heats the cpu unnecessarily and
> there's no way to control it other than a reboot.

Not anymore! 

OK, I used this as an exercise to learn how kobject and sysfs work (I've
been putting this off for too long). So if this isn't exactly proper,
let me know :-)

Ingo,  This could be a temporary patch until we come up with a better
solution.  This adds  /sys/kernel/idle/idle_poll, which if idle=poll is
_not_ set, it still lets you switch the machine to idle=poll on the fly,
as well as turn it off. If you have idle=poll, this doesn't even show
up.

So for example (I'm currently running it):

# cat /sys/kernel/idle/idle_poll
off
# echo 1 > /sys/kernel/idle/idle_poll
# cat /sys/kernel/idle/idle_poll on
# echo 0 > /sys/kernel/idle/idle_poll
# cat /sys/kernel/idle/idle_poll off

# echo on > /sys/kernel/idle/idle_poll
and 
# echo off > /sys/kernel/idle/idle_poll
also work.

So like I said.  This could be used for just those that need to have
idle=poll for running benchmarks but don't want to reboot when they are
done.

-- Steve

PS. I haven't tested to see if the idle actually changes, but it looks
pretty obvious in the code in cpu_idle:

			idle = pm_idle;
			if (!idle)
				idle = default_idle;
			if (cpu_is_offline(smp_processor_id()))
				play_dead();
			stop_critical_timing();
			propagate_preempt_locks_value();
			idle();



Index: linux-2.6.14-rt13/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.14-rt13.orig/arch/x86_64/kernel/process.c	2005-11-15 11:12:37.000000000 -0500
+++ linux-2.6.14-rt13/arch/x86_64/kernel/process.c	2005-11-18 21:12:53.000000000 -0500
@@ -822,3 +822,104 @@
 		sp -= get_random_int() % 8192;
 	return sp & ~0xf;
 }
+
+#ifdef CONFIG_SYSFS
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/spinlock.h>
+
+#define KERNEL_ATTR_RW(_name) \
+static struct subsys_attribute _name##_attr = \
+	__ATTR(_name, 0644, _name##_show, _name##_store)
+
+static spinlock_t idle_switch_lock = SPIN_LOCK_UNLOCKED(idle_switch_lock);
+
+static struct idlep_kobject
+{
+	struct kobject kobj;
+	int is_poll;
+	void (*idle)(void);
+} idle_kobj;
+
+static ssize_t idle_poll_show(struct subsystem *subsys, char *page)
+{
+	return sprintf(page, "%s\n", (idle_kobj.is_poll ? "on" : "off"));
+}
+
+static ssize_t idle_poll_store(struct subsystem *subsys,
+			       const char *buf, size_t len)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&idle_switch_lock, flags);
+
+	if (strncmp(buf,"1",1)==0 ||
+	    (len >=2 && strncmp(buf,"on",2)==0)) {
+		if (idle_kobj.is_poll != 1) {
+			idle_kobj.is_poll = 1;
+			pm_idle = poll_idle;
+		}
+	} else if (strncmp(buf,"0",1)==0 ||
+		   (len >= 3 && strncmp(buf,"off",3)==0)) {
+		if (idle_kobj.is_poll != 0) {
+			idle_kobj.is_poll = 0;
+			pm_idle = idle_kobj.idle;
+		}
+	}
+
+	spin_unlock_irqrestore(&idle_switch_lock, flags);
+
+	return len;
+}
+
+
+KERNEL_ATTR_RW(idle_poll);
+
+static struct attribute * idle_attrs[] = {
+	&idle_poll_attr.attr,
+	NULL
+};
+
+static struct attribute_group idle_attr_group = {
+	.attrs = idle_attrs,
+};
+
+static int __init idle_poll_set_init(void)
+{
+	int err;
+
+	/*
+	 * If the default is alread poll_idle then
+	 * don't even bother with this.
+	 */
+	if (pm_idle == poll_idle)
+		return 0;
+
+	memset(&idle_kobj, 0, sizeof(idle_kobj));
+
+	idle_kobj.is_poll = 0;
+	idle_kobj.idle = pm_idle;
+
+	err = kobject_set_name(&idle_kobj.kobj, "%s", "idle");
+	if (err)
+		goto out;
+
+	idle_kobj.kobj.parent = &kernel_subsys.kset.kobj;
+	err = kobject_register(&idle_kobj.kobj);
+	if (err)
+		goto out;
+
+	err = sysfs_create_group(&idle_kobj.kobj,
+				 &idle_attr_group);
+	if (err)
+		goto out;
+
+	return 0;
+out:
+	printk(KERN_INFO "Problem setting up sysfs idle_poll\n");
+	return 0;
+}
+
+late_initcall(idle_poll_set_init);
+#endif /* CONFIG_FS */
+



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-19  2:28           ` 2.6.14-rt13 George Anzinger
@ 2005-11-19  7:45             ` Vojtech Pavlik
  2005-11-19 18:27               ` 2.6.14-rt13 Lee Revell
  0 siblings, 1 reply; 56+ messages in thread
From: Vojtech Pavlik @ 2005-11-19  7:45 UTC (permalink / raw)
  To: George Anzinger
  Cc: Lee Revell, Fernando Lopez-Lezcano, Ingo Molnar, linux-kernel,
	Paul E. McKenney, K.R. Foley, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini

On Fri, Nov 18, 2005 at 06:28:24PM -0800, George Anzinger wrote:

> >There are many mechanisms to keep time:
> >
> >1) RTC: 0.5 sec resolution, interrupts
> >2) PIT: takes ages to read, overflows at each timer interrupt
> >3) PMTMR: takes ages to read, overflows in approx 4 seconds, no interrupt
> 
> The PMTMR can be read from user space (if you can find it).  See the 
> "iopl" man page.  It is an I/O access and so is slow, but you can read 
> it.

Yes, however this must be limited to a small number of privileged
applications - iopl() is only available to CAP_SYS_RAWIO IIRC,
and thus it's not suitable for general use.

> Finding it is another matter.  It does not have a fixed address (i.e. 
> it differs from machine to machine, but is constant on any given 
> machine).  The boot code roots it out of an info block put in memory 
> by the BIOS.  I suppose one could put a printk in the boot code to 
> disclose it...

There is really no reason to do that, since the time to read it (~1200
ns) is much less than the time to enter the kernel (less than 200 ns),
so gettimeofday() is definitely easier to use and also doesn't overflow.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-19  7:45             ` 2.6.14-rt13 Vojtech Pavlik
@ 2005-11-19 18:27               ` Lee Revell
  0 siblings, 0 replies; 56+ messages in thread
From: Lee Revell @ 2005-11-19 18:27 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: George Anzinger, Fernando Lopez-Lezcano, Ingo Molnar,
	linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini

On Sat, 2005-11-19 at 08:45 +0100, Vojtech Pavlik wrote:
> On Fri, Nov 18, 2005 at 06:28:24PM -0800, George Anzinger wrote:
> > Finding it is another matter.  It does not have a fixed address (i.e. 
> > it differs from machine to machine, but is constant on any given 
> > machine).  The boot code roots it out of an info block put in memory 
> > by the BIOS.  I suppose one could put a printk in the boot code to 
> > disclose it...
> 
> There is really no reason to do that, since the time to read it (~1200
> ns) is much less than the time to enter the kernel (less than 200 ns),
> so gettimeofday() is definitely easier to use and also doesn't overflow.
> 

Thanks very much, you have answered my question.  We would prefer
gettimeofday() anyway for portability, so if the plan is to make it
faster then we can deal with losing the TSC.

Lee


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-15  9:08 2.6.14-rt13 Ingo Molnar
                   ` (2 preceding siblings ...)
  2005-11-18 18:02 ` 2.6.14-rt13 Fernando Lopez-Lezcano
@ 2005-11-21 21:32 ` Fernando Lopez-Lezcano
  2005-11-21 21:41   ` 2.6.14-rt13 john stultz
                     ` (2 more replies)
  3 siblings, 3 replies; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-21 21:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: nando, linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

On Tue, 2005-11-15 at 10:08 +0100, Ingo Molnar wrote:
> i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> usual place:
> 
>    http://redhat.com/~mingo/realtime-preempt/
> 
> lots of fixes in this release affecting all supported architectures, all 
> across the board. Big MIPS update from John Cooper.

Can someone tell me if 2.6.14-rt13 is supposed to be fixed re: the
problems I was having with random screensaver triggering and keyboard
repeats?

It is apparently not fixed. 

I just had a short burst of key repeats and saw one random screen blank.
Right now everything seems normal but I was not allucinating :-)

-- Fernando



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-21 21:32 ` 2.6.14-rt13 Fernando Lopez-Lezcano
@ 2005-11-21 21:41   ` john stultz
       [not found]   ` <20051121221511.GA7255@elte.hu>
  2005-11-22 11:19   ` 2.6.14-rt13 Ingo Molnar
  2 siblings, 0 replies; 56+ messages in thread
From: john stultz @ 2005-11-21 21:41 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Ingo Molnar, linux-kernel, Paul E. McKenney, K.R. Foley,
	Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Mon, 2005-11-21 at 13:32 -0800, Fernando Lopez-Lezcano wrote:
> On Tue, 2005-11-15 at 10:08 +0100, Ingo Molnar wrote:
> > i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> > usual place:
> > 
> >    http://redhat.com/~mingo/realtime-preempt/
> > 
> > lots of fixes in this release affecting all supported architectures, all 
> > across the board. Big MIPS update from John Cooper.
> 
> Can someone tell me if 2.6.14-rt13 is supposed to be fixed re: the
> problems I was having with random screensaver triggering and keyboard
> repeats?
> 
> It is apparently not fixed. 
> 
> I just had a short burst of key repeats and saw one random screen blank.
> Right now everything seems normal but I was not allucinating :-)

Hmm. Sounds like timekeeping issues, could you send me dmesg output?

thanks
-john


^ permalink raw reply	[flat|nested] 56+ messages in thread

* test time-warps [was: Re: 2.6.14-rt13]
       [not found]   ` <20051121221511.GA7255@elte.hu>
@ 2005-11-21 22:19     ` Ingo Molnar
  2005-11-21 23:08       ` Fernando Lopez-Lezcano
                         ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-21 22:19 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

* Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:

> On Tue, 2005-11-15 at 10:08 +0100, Ingo Molnar wrote:
> > i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> > usual place:
> > 
> >    http://redhat.com/~mingo/realtime-preempt/
> > 
> > lots of fixes in this release affecting all supported architectures, all 
> > across the board. Big MIPS update from John Cooper.
> 
> Can someone tell me if 2.6.14-rt13 is supposed to be fixed re: the 
> problems I was having with random screensaver triggering and keyboard 
> repeats?
> 
> It is apparently not fixed.
> 
> I just had a short burst of key repeats and saw one random screen 
> blank. Right now everything seems normal but I was not allucinating 
> :-)

is this on the dual-core X2 box, running 32-bit code? Did it happen with 
idle=poll? Without idle=poll the TSCs run apart and a number of 
artifacts may happen. With idle=poll specified the TSC _should_ be fully 
synchronized.

To make sure could you run the attached time-warp-test utility i wrote 
today? Compile it with:

  gcc -Wall -O2 -o time-warp-test time-warp-test.c

it detects and reports time-warps (and does a maximum search for them 
over time, that way you can see systematic drifts too). (It auto-detects 
the # of CPUs and runs the appropriate number of tasks.)

running this tool on a X2 with idle=poll and an -rt kernel should give a 
silent test-output.

running a vanilla kernel should give TSC level time warps:

 #CPUs: 2
 running 2 tasks to check for time-warps.
 warp ..        -1 cycles, ... 00000277ed9520c6 -> 00000277ed9520c5 ?
 warp ..       -18 cycles, ... 00000277ed97ac77 -> 00000277ed97ac65 ?
 warp ..       -19 cycles, ... 00000277edaedd54 -> 00000277edaedd41 ?
 warp ..       -84 cycles, ... 00000277ede0558a -> 00000277ede05536 ?
 warp ..       -97 cycles, ... 00000278035328a5 -> 0000027803532844 ?
 warp ..      -224 cycles, ... 000002781ed2db04 -> 000002781ed2da24 ?

(because the vanilla kernel doesnt do TSC synchronization accurately)

running it without idle=poll should give some really big time warps:

 neptune:~> ./time-warp-test
 #CPUs: 2
 running 2 tasks to check for time-warps.
 warp ..   -435934 cycles, ... 00000101a2db4a8f -> 00000101a2d4a3b1 ?
 WARP ..      -123 usecs, .... 0003e96c2f3bb579 -> 0003e96c2f3bb4fe ?
 WARP ..      -198 usecs, .... 0003e96c2f3bb625 -> 0003e96c2f3bb55f ?
 WARP ..      -199 usecs, .... 0003e96c2f3bb659 -> 0003e96c2f3bb592 ?
 warp ..   -436117 cycles, ... 00000101a2e5aaf0 -> 00000101a2df035b ?
 warp ..   -437143 cycles, ... 00000101a2e84590 -> 00000101a2e199f9 ?
 warp ..   -437314 cycles, ... 00000101a2ead1b1 -> 00000101a2e4256f ?
 warp ..   -437363 cycles, ... 00000101a2ed9b19 -> 00000101a2e6eea6 ?
 WARP ..  -1951680 usecs, .... 0003e96c2f597f70 -> 0003e96c2f3bb7b0 ?
 WARP ..  -1951879 usecs, .... 0003e96c2f598016 -> 0003e96c2f3bb78f ?
 WARP ..  -1951681 usecs, .... 0003e96c2f598014 -> 0003e96c2f3bb853 ?
 warp ..   -437365 cycles, ... 00000101a4c5be7b -> 00000101a4bf1206 ?
 warp ..   -437366 cycles, ... 00000101a8f4af76 -> 00000101a8ee0300 ?
 warp ..   -437367 cycles, ... 00000101a968a34a -> 00000101a961f6d3 ?

these time warps will get worse over time - as the two cores drift 
apart. (note that they wont drift during the test itself, because the 
test makes all cores artificially busy and the X2 TSC drifting depends 
on the core being idle)

but in any case, -rt13 should be silent and there should be no time 
warps. If there are any then those could cause the keyboard repeat 
problems.

	Ingo

-------{ CUT HERE time-warp-test.c }-------------->

/*
 * Copyright (C) 2005, Ingo Molnar
 *
 * time-warp-test.c: check TSC synchronity on x86 CPUs. Also detects
 *                   gettimeofday()-level time warps.
 */
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <linux/unistd.h>
#include <unistd.h>
#include <string.h>
#include <pwd.h>
#include <grp.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <regex.h>
#include <fcntl.h>
#include <time.h>
#include <sys/mman.h>
#include <dlfcn.h>
#include <popt.h>
#include <sys/socket.h>
#include <ctype.h>
#include <assert.h>
#include <sched.h>

#define TEST_TSC
#define TEST_TOD

#define MAX_TASKS 128

#if DEBUG
# define Printf(x...) printf(x)
#else
# define Printf(x...) do { } while (0)
#endif

enum {
	SHARED_TSC = 0,
	SHARED_LOCK = 2,
	SHARED_TOD = 3,
	SHARED_WORST_TSC = 5,
	SHARED_WORST_TOD = 7,
	SHARED_LOCK2 = 200,
};

#define BUG_ON(c) assert(!(c))

typedef unsigned long long cycles_t;
typedef unsigned long long usecs_t;

#define rdtscll(val) \
	__asm__ __volatile__("rdtsc" : "=A" (val))

#define rdtod(val)					\
do {							\
	struct timeval tv;				\
							\
	gettimeofday(&tv, NULL);			\
	(val) = tv.tv_sec * 1000000LL + tv.tv_usec;	\
} while (0)

#define mb() \
	__asm__ __volatile__("lock; addl $0, (%esp)")

static unsigned long *setup_shared_var(void)
{
	char zerobuff [4096] = { 0, };
	int ret, fd;
	unsigned long *buf;

	fd = creat(".tmp_mmap", 0700);
	BUG_ON(fd == -1);
	close(fd);

	fd = open(".tmp_mmap", O_RDWR|O_CREAT|O_TRUNC);
	BUG_ON(fd == -1);
	ret = write(fd, zerobuff, 4096);
	BUG_ON(ret != 4096);

	buf = (void *)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	BUG_ON(buf == (void *)-1);

	close(fd);

	return buf;
}

#define LOOPS 1000000

static inline unsigned long
cmpxchg(volatile unsigned long *ptr, unsigned long old, unsigned long new)
{
	unsigned long prev;

	__asm__ __volatile__("lock; cmpxchg %b1,%2"
			     : "=a"(prev)
			     : "q"(new), "m"(*(ptr)), "0"(old)
			     : "memory");
	return prev;
}

static inline void lock(unsigned long *flag)
{
	while (cmpxchg(flag, 0, 1) != 0)
		/* nothing */;
}

static inline void unlock(unsigned long *flag)
{
	*flag = 0;
	mb();
}

static void print_status(void)
{
	const char progress[] = "\\|/-";
	static usecs_t prev_tod;
	static int count;

	usecs_t tod;

	rdtod(tod);
	if (tod - prev_tod < 100000ULL)
		return;
	prev_tod = tod;
	count++;
	printf("%c\r", progress[count & 3]);
	fflush(stdout);
}

int main(int argc, char **argv)
{
	int i, parent, me;
	unsigned long *shared;
	unsigned long cpus, tasks;

	cpus = system("exit `grep processor /proc/cpuinfo  | wc -l`");
	cpus = WEXITSTATUS(cpus);

	if (argc > 2) {
usage:
		fprintf(stderr,
			"usage: tsc-sync-test <threads>\n");
		exit(-1);
	}
	if (argc == 2) {
		tasks = atol(argv[1]);
		if (!tasks)
			goto usage;
	} else
		tasks = cpus;

	printf("#CPUs: %ld\n", cpus);
	printf("running %ld tasks to check for time-warps.\n", tasks);
	shared = setup_shared_var();

	parent = getpid();

	for (i = 1; i < tasks; i++)
		if (!fork())
			break;
	me = getpid();

	while (1) {
		cycles_t t0, t1;
		usecs_t T0, T1;
		long long delta;

#ifdef TEST_TSC
		lock(shared + SHARED_LOCK);
		rdtscll(t1);
		t0 = *(cycles_t *)(shared + SHARED_TSC);
		*(cycles_t *)(shared + SHARED_TSC) = t1;
		unlock(shared + SHARED_LOCK);

		delta = t1-t0;
		if (delta < *(long long *)(shared + SHARED_WORST_TSC)) {
			*(long long *)(shared + SHARED_WORST_TSC) = delta;
			printf("\rwarp .. %9Ld cycles, ... %016Lx -> %016Lx ?\n",
				delta, t0, t1);
		}

		// occasionally disturb things a bit
		if (!(t0 & 7)) {
			lock(shared + SHARED_LOCK2);
			unlock(shared + SHARED_LOCK2);
		}
#endif

#ifdef TEST_TOD
		lock(shared + SHARED_LOCK);
		rdtod(T1);
		T0 = *(usecs_t *)(shared + SHARED_TOD);
		*(usecs_t *)(shared + SHARED_TOD) = T1;
		unlock(shared + SHARED_LOCK);

		delta = T1-T0;
		if (delta < *(long long *)(shared + SHARED_WORST_TOD)) {
			*(long long *)(shared + SHARED_WORST_TOD) = delta;
			printf("\rWARP .. %9Ld usecs, .... %016Lx -> %016Lx ?\n",
				delta, T0, T1);
		}
		if (!(T0 & 7)) {
			lock(shared + SHARED_LOCK2);
			unlock(shared + SHARED_LOCK2);
		}
#endif

		if (me == parent)
			print_status();
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-21 22:19     ` test time-warps [was: Re: 2.6.14-rt13] Ingo Molnar
@ 2005-11-21 23:08       ` Fernando Lopez-Lezcano
  2005-11-21 23:38       ` Fernando Lopez-Lezcano
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-21 23:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: nando, linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

On Mon, 2005-11-21 at 23:19 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> 
> > On Tue, 2005-11-15 at 10:08 +0100, Ingo Molnar wrote:
> > > i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> > > usual place:
> > > 
> > >    http://redhat.com/~mingo/realtime-preempt/
> > > 
> > > lots of fixes in this release affecting all supported architectures, all 
> > > across the board. Big MIPS update from John Cooper.
> > 
> > Can someone tell me if 2.6.14-rt13 is supposed to be fixed re: the 
> > problems I was having with random screensaver triggering and keyboard 
> > repeats?
> > 
> > It is apparently not fixed.
> > 
> > I just had a short burst of key repeats and saw one random screen 
> > blank. Right now everything seems normal but I was not allucinating 
> > :-)
> 
> is this on the dual-core X2 box, running 32-bit code? 

That's correct. 

> Did it happen with idle=poll? 

No, I'm not running with idle=poll right now.

> Without idle=poll the TSCs run apart and a number of 
> artifacts may happen. With idle=poll specified the TSC _should_ be fully 
> synchronized.

Well, I could try but it is not a solution I could use. It would turn
all my machines into space heaters 24x7, no sense in doing that :-)

I got an answer off the list from John (Stultz) in response to the dmesg
output I sent him and he suggested I try idle=poll (which I briefly did
last week) and also changing:
  /sys/devices/system/clocksource/clocksource0/clocksource
to acpi_pm, which I just did. It is too early to tell re: keyboard
repeats and screensaver false triggers, but it did fix the problems I
was seeing with a hacked Jack that is using gettimeofday instead of tsc
reads. Meaning, Jack with gettimeofday + tsc timing source has problems,
Jack with gettimeofday + acpi_pm does not. It would seem gettimeofday is
not working correctly with tsc. 

> To make sure could you run the attached time-warp-test utility i wrote 
> today? 

I will and report back.
Thanks.
-- Fernando


> Compile it with:
> 
>   gcc -Wall -O2 -o time-warp-test time-warp-test.c
> 
> it detects and reports time-warps (and does a maximum search for them 
> over time, that way you can see systematic drifts too). (It auto-detects 
> the # of CPUs and runs the appropriate number of tasks.)
> 
> running this tool on a X2 with idle=poll and an -rt kernel should give a 
> silent test-output.
> 
> running a vanilla kernel should give TSC level time warps:
> 
>  #CPUs: 2
>  running 2 tasks to check for time-warps.
>  warp ..        -1 cycles, ... 00000277ed9520c6 -> 00000277ed9520c5 ?
>  warp ..       -18 cycles, ... 00000277ed97ac77 -> 00000277ed97ac65 ?
>  warp ..       -19 cycles, ... 00000277edaedd54 -> 00000277edaedd41 ?
>  warp ..       -84 cycles, ... 00000277ede0558a -> 00000277ede05536 ?
>  warp ..       -97 cycles, ... 00000278035328a5 -> 0000027803532844 ?
>  warp ..      -224 cycles, ... 000002781ed2db04 -> 000002781ed2da24 ?
> 
> (because the vanilla kernel doesnt do TSC synchronization accurately)
> 
> running it without idle=poll should give some really big time warps:
> 
>  neptune:~> ./time-warp-test
>  #CPUs: 2
>  running 2 tasks to check for time-warps.
>  warp ..   -435934 cycles, ... 00000101a2db4a8f -> 00000101a2d4a3b1 ?
>  WARP ..      -123 usecs, .... 0003e96c2f3bb579 -> 0003e96c2f3bb4fe ?
>  WARP ..      -198 usecs, .... 0003e96c2f3bb625 -> 0003e96c2f3bb55f ?
>  WARP ..      -199 usecs, .... 0003e96c2f3bb659 -> 0003e96c2f3bb592 ?
>  warp ..   -436117 cycles, ... 00000101a2e5aaf0 -> 00000101a2df035b ?
>  warp ..   -437143 cycles, ... 00000101a2e84590 -> 00000101a2e199f9 ?
>  warp ..   -437314 cycles, ... 00000101a2ead1b1 -> 00000101a2e4256f ?
>  warp ..   -437363 cycles, ... 00000101a2ed9b19 -> 00000101a2e6eea6 ?
>  WARP ..  -1951680 usecs, .... 0003e96c2f597f70 -> 0003e96c2f3bb7b0 ?
>  WARP ..  -1951879 usecs, .... 0003e96c2f598016 -> 0003e96c2f3bb78f ?
>  WARP ..  -1951681 usecs, .... 0003e96c2f598014 -> 0003e96c2f3bb853 ?
>  warp ..   -437365 cycles, ... 00000101a4c5be7b -> 00000101a4bf1206 ?
>  warp ..   -437366 cycles, ... 00000101a8f4af76 -> 00000101a8ee0300 ?
>  warp ..   -437367 cycles, ... 00000101a968a34a -> 00000101a961f6d3 ?
> 
> these time warps will get worse over time - as the two cores drift 
> apart. (note that they wont drift during the test itself, because the 
> test makes all cores artificially busy and the X2 TSC drifting depends 
> on the core being idle)
> 
> but in any case, -rt13 should be silent and there should be no time 
> warps. If there are any then those could cause the keyboard repeat 
> problems.
> 
> 	Ingo
> 
> -------{ CUT HERE time-warp-test.c }-------------->
[MUNCH]



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-21 22:19     ` test time-warps [was: Re: 2.6.14-rt13] Ingo Molnar
  2005-11-21 23:08       ` Fernando Lopez-Lezcano
@ 2005-11-21 23:38       ` Fernando Lopez-Lezcano
  2005-11-21 23:41       ` john stultz
  2005-11-22  1:15       ` Steven Rostedt
  3 siblings, 0 replies; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-21 23:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

On Mon, 2005-11-21 at 23:19 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> > I just had a short burst of key repeats and saw one random screen 
> > blank. Right now everything seems normal but I was not allucinating 
> > :-)
> 
> is this on the dual-core X2 box, running 32-bit code? Did it happen with 
> idle=poll? Without idle=poll the TSCs run apart and a number of 
> artifacts may happen. With idle=poll specified the TSC _should_ be fully 
> synchronized.
> 
> To make sure could you run the attached time-warp-test utility i wrote 
> today? Compile it with:
> 
>   gcc -Wall -O2 -o time-warp-test time-warp-test.c
> 
> it detects and reports time-warps (and does a maximum search for them 
> over time, that way you can see systematic drifts too). (It auto-detects 
> the # of CPUs and runs the appropriate number of tasks.)

Ok, here are some test runs:

Athlon X2, 2.6.14-rt13, __not__ booting idle=poll
cat /sys/devices/system/clocksource/clocksource0/clocksource 
  acpi_pm jiffies *tsc pit

[hacked Jack with gettimeofday fails with "delay exceeded..." messages]

# ./time-warp-test #CPUs: 2
running 2 tasks to check for time-warps.
warp ..  -2735313 cycles, ... 000014b9f770036f -> 000014b9f746469e ?
WARP ..     -1224 usecs, .... 0004061b6acd7dc6 -> 0004061b6acd78fe ?
WARP ..     -1237 usecs, .... 0004061b6acd7e07 -> 0004061b6acd7932 ?
warp ..  -2735317 cycles, ... 000014b9f7773a97 -> 000014b9f74d7dc2 ?
WARP ..     -1238 usecs, .... 0004061b6acd7e65 -> 0004061b6acd798f ?
warp ..  -2736775 cycles, ... 000014b9f77a9bd0 -> 000014b9f750d949 ?
warp ..  -2736848 cycles, ... 000014b9f77c83aa -> 000014b9f752c0da ?
warp ..  -2736953 cycles, ... 000014b9f77e82a6 -> 000014b9f754bf6d ?
warp ..  -2737060 cycles, ... 000014b9f7831875 -> 000014b9f75954d1 ?
warp ..  -2737090 cycles, ... 000014b9f792d70b -> 000014b9f7691349 ?
warp ..  -2737265 cycles, ... 000014b9f79c9509 -> 000014b9f772d098 ?
warp ..  -2737387 cycles, ... 000014ba0129c8e7 -> 000014ba010003fc ?
warp ..  -2737405 cycles, ... 000014ba0b696ad1 -> 000014ba0b3fa5d4 ?
WARP .. -4398045268 usecs, .... 0004061c70fdbd6e -> 0004061b6ad8e51a ?
WARP .. -4398045269 usecs, .... 0004061c70fdbe56 -> 0004061b6ad8e601 ?
warp ..  -2737407 cycles, ... 000014c0f4960dfd -> 000014c0f46c48fe ?
warp ..  -2737435 cycles, ... 000014c100f929b5 -> 000014c100cf649a ?
warp ..  -2737450 cycles, ... 000014ef1eff0250 -> 000014ef1ed53d26 ?
warp ..  -2737470 cycles, ... 000014ef2a976748 -> 000014ef2a6da20a ?
warp ..  -2737472 cycles, ... 000014ef98ee8f62 -> 000014ef98c4ca22 ?
warp ..  -2737494 cycles, ... 000014efac5b0d44 -> 000014efac3147ee ?
warp ..  -2737506 cycles, ... 000014f42d48833f -> 000014f42d1ebddd ?
WARP .. -4398046507 usecs, .... 0004061c788544c5 -> 0004061b7260679a ?
warp ..  -2737535 cycles, ... 000014ffb2b84ca9 -> 000014ffb28e872a ?
warp ..  -2737678 cycles, ... 0000150b8cae9ad3 -> 0000150b8c84d4c5 ?
warp ..  -2737847 cycles, ... 0000153e388bc05d -> 0000153e3861f9a6 ?
warp ..  -2737851 cycles, ... 0000153e3b472185 -> 0000153e3b1d5aca ?
warp ..  -2737871 cycles, ... 0000153e3b94270d -> 0000153e3b6a603e ?
warp ..  -2737872 cycles, ... 0000153e3c3d4034 -> 0000153e3c137964 ?
warp ..  -2737891 cycles, ... 0000153e51313527 -> 0000153e51076e44 ?
warp ..  -2737935 cycles, ... 0000153e55df386a -> 0000153e55b5715b ?
warp ..  -2737987 cycles, ... 0000153ec3280132 -> 0000153ec2fe39ef ?
warp ..  -2738044 cycles, ... 00001542b6d5c7bd -> 00001542b6ac0041 ?
warp ..  -2738056 cycles, ... 0000154332e5f8dd -> 0000154332bc3155 ?
warp ..  -2738059 cycles, ... 000015433aa0e85b -> 000015433a7720d0 ?
warp ..  -2738087 cycles, ... 0000154363eb9eb5 -> 0000154363c1d70e ?
warp ..  -2738100 cycles, ... 00001547a3407554 -> 00001547a316ada0 ?
warp ..  -2738101 cycles, ... 00001547a342315e -> 00001547a31869a9 ?
warp ..  -2738131 cycles, ... 00001547a36dca74 -> 00001547a34402a1 ?
warp ..  -2738251 cycles, ... 00001547a67672fd -> 00001547a64caab2 ?
warp ..  -2738253 cycles, ... 0000154811d20a22 -> 0000154811a841d5 ?
warp ..  -2738261 cycles, ... 00001548bd4fe888 -> 00001548bd262033 ?
warp ..  -2738270 cycles, ... 00001549e8ba9459 -> 00001549e890cbfb ?
warp ..  -2738284 cycles, ... 0000154bca42c59f -> 0000154bca18fd33 ?
warp ..  -2738287 cycles, ... 0000154c15d10b04 -> 0000154c15a74295 ?
warp ..  -2738393 cycles, ... 00001559054f8a3b -> 000015590525c162 ?
warp ..  -2738445 cycles, ... 00001559055cd294 -> 0000155905330987 ?
warp ..  -2738462 cycles, ... 00001559057d79e3 -> 000015590553b0c5 ?
warp ..  -2738482 cycles, ... 00001559221f9b08 -> 0000155921f5d1d6 ?
warp ..  -2738486 cycles, ... 000015593f6a2298 -> 000015593f405962 ?
warp ..  -2738602 cycles, ... 000015594da97b42 -> 000015594d7fb198 ?
warp ..  -2738607 cycles, ... 0000155a41e90e62 -> 0000155a41bf44b3 ?
warp ..  -2738621 cycles, ... 0000155e0f15910d -> 0000155e0eebc750 ?
warp ..  -2738650 cycles, ... 0000155f746123f6 -> 0000155f74375a1c ?
warp ..  -2738653 cycles, ... 000015610cbc0276 -> 000015610c923899 ?
warp ..  -2738655 cycles, ... 0000156241a4f73a -> 00001562417b2d5b ?

Now with 
cat /sys/devices/system/clocksource/clocksource0/clocksource 
  *acpi_pm jiffies tsc pit

[hacked Jack with gettimeofday works fine]

# ./time-warp-test
#CPUs: 2
running 2 tasks to check for time-warps.
warp ..  -2709892 cycles, ... 000015870e3c5333 -> 000015870e12f9af ?
warp ..  -2709931 cycles, ... 000015870e611d33 -> 000015870e37c388 ?
warp ..  -2714592 cycles, ... 000015871b20ef38 -> 000015871af78358 ?
warp ..  -2714599 cycles, ... 0000158727b08141 -> 000015872787155a ?
warp ..  -2714610 cycles, ... 00001587341f8c9c -> 0000158733f620aa ?
warp ..  -2714611 cycles, ... 0000158740a746a4 -> 00001587407ddab1 ?
warp ..  -2714632 cycles, ... 000015874d202559 -> 000015874cf6b951 ?
warp ..  -2714672 cycles, ... 000015875aa36481 -> 000015875a79f851 ?
warp ..  -2714674 cycles, ... 000015876eabae9b -> 000015876e824269 ?
warp ..  -2714676 cycles, ... 0000158c00b9eec1 -> 0000158c0090828d ?
warp ..  -2714851 cycles, ... 000015a87d87fdf7 -> 000015a87d5e9114 ?
warp ..  -2714868 cycles, ... 000015a91f8611c6 -> 000015a91f5ca4d2 ?
warp ..  -2714900 cycles, ... 000015d4abcac875 -> 000015d4aba15b61 ?
warp ..  -2714932 cycles, ... 000016722ed1bafe -> 000016722ea84dca ?
warp ..  -2714933 cycles, ... 000016722edb5d24 -> 000016722eb1efef ?
warp ..  -2714960 cycles, ... 000016722edf16d0 -> 000016722eb5a980 ?
warp ..  -2715093 cycles, ... 0000167241711403 -> 000016724147a62e ?
warp ..  -2715369 cycles, ... 0000167254f44d20 -> 0000167254cade37 ?
warp ..  -2715372 cycles, ... 000016727c056ff2 -> 000016727bdc0106 ?
warp ..  -2715382 cycles, ... 0000167294580d33 -> 00001672942e9e3d ?
warp ..  -2715386 cycles, ... 00001672acf231c5 -> 00001672acc8c2cb ?
warp ..  -2715394 cycles, ... 00001672c5a30efc -> 00001672c5799ffa ?
warp ..  -2715397 cycles, ... 00001672f3946ebc -> 00001672f36affb7 ?
warp ..  -2715417 cycles, ... 000016733b4806b8 -> 000016733b1e979f ?
warp ..  -2715464 cycles, ... 00001675810adae0 -> 0000167580e16b98 ?
warp ..  -2715471 cycles, ... 0000174825657d7a -> 00001748253c0e2b ?

I both cases messages seem to come in bunches. I get 5 to 15 on startup
of the test no matter what. After that it is more sporadic. 

-- Fernando



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-21 22:19     ` test time-warps [was: Re: 2.6.14-rt13] Ingo Molnar
  2005-11-21 23:08       ` Fernando Lopez-Lezcano
  2005-11-21 23:38       ` Fernando Lopez-Lezcano
@ 2005-11-21 23:41       ` john stultz
  2005-11-22  1:31         ` Lee Revell
  2005-11-22  1:15       ` Steven Rostedt
  3 siblings, 1 reply; 56+ messages in thread
From: john stultz @ 2005-11-21 23:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fernando Lopez-Lezcano, linux-kernel, Paul E. McKenney,
	K.R. Foley, Steven Rostedt, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Mon, 2005-11-21 at 23:19 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:
> 
> > On Tue, 2005-11-15 at 10:08 +0100, Ingo Molnar wrote:
> > > i have released the 2.6.14-rt13 tree, which can be downloaded from the 
> > > usual place:
> > > 
> > >    http://redhat.com/~mingo/realtime-preempt/
> > > 
> > > lots of fixes in this release affecting all supported architectures, all 
> > > across the board. Big MIPS update from John Cooper.
> > 
> > Can someone tell me if 2.6.14-rt13 is supposed to be fixed re: the 
> > problems I was having with random screensaver triggering and keyboard 
> > repeats?
> > 
> > It is apparently not fixed.
> > 
> > I just had a short burst of key repeats and saw one random screen 
> > blank. Right now everything seems normal but I was not allucinating 
> > :-)
> 
> is this on the dual-core X2 box, running 32-bit code? Did it happen with 
> idle=poll? Without idle=poll the TSCs run apart and a number of 
> artifacts may happen. With idle=poll specified the TSC _should_ be fully 
> synchronized.
> 
> To make sure could you run the attached time-warp-test utility i wrote 
> today? Compile it with:
> 
>   gcc -Wall -O2 -o time-warp-test time-warp-test.c
> 
> it detects and reports time-warps (and does a maximum search for them 
> over time, that way you can see systematic drifts too). (It auto-detects 
> the # of CPUs and runs the appropriate number of tasks.)
> 
> running this tool on a X2 with idle=poll and an -rt kernel should give a 
> silent test-output.
> 
> running a vanilla kernel should give TSC level time warps:
> 
>  #CPUs: 2
>  running 2 tasks to check for time-warps.
>  warp ..        -1 cycles, ... 00000277ed9520c6 -> 00000277ed9520c5 ?
>  warp ..       -18 cycles, ... 00000277ed97ac77 -> 00000277ed97ac65 ?
>  warp ..       -19 cycles, ... 00000277edaedd54 -> 00000277edaedd41 ?
>  warp ..       -84 cycles, ... 00000277ede0558a -> 00000277ede05536 ?
>  warp ..       -97 cycles, ... 00000278035328a5 -> 0000027803532844 ?
>  warp ..      -224 cycles, ... 000002781ed2db04 -> 000002781ed2da24 ?
> 
> (because the vanilla kernel doesnt do TSC synchronization accurately)
> 
> running it without idle=poll should give some really big time warps:
> 
>  neptune:~> ./time-warp-test
>  #CPUs: 2
>  running 2 tasks to check for time-warps.
>  warp ..   -435934 cycles, ... 00000101a2db4a8f -> 00000101a2d4a3b1 ?
>  WARP ..      -123 usecs, .... 0003e96c2f3bb579 -> 0003e96c2f3bb4fe ?
>  WARP ..      -198 usecs, .... 0003e96c2f3bb625 -> 0003e96c2f3bb55f ?
>  WARP ..      -199 usecs, .... 0003e96c2f3bb659 -> 0003e96c2f3bb592 ?
>  warp ..   -436117 cycles, ... 00000101a2e5aaf0 -> 00000101a2df035b ?
>  warp ..   -437143 cycles, ... 00000101a2e84590 -> 00000101a2e199f9 ?
>  warp ..   -437314 cycles, ... 00000101a2ead1b1 -> 00000101a2e4256f ?
>  warp ..   -437363 cycles, ... 00000101a2ed9b19 -> 00000101a2e6eea6 ?
>  WARP ..  -1951680 usecs, .... 0003e96c2f597f70 -> 0003e96c2f3bb7b0 ?
>  WARP ..  -1951879 usecs, .... 0003e96c2f598016 -> 0003e96c2f3bb78f ?
>  WARP ..  -1951681 usecs, .... 0003e96c2f598014 -> 0003e96c2f3bb853 ?
>  warp ..   -437365 cycles, ... 00000101a4c5be7b -> 00000101a4bf1206 ?
>  warp ..   -437366 cycles, ... 00000101a8f4af76 -> 00000101a8ee0300 ?
>  warp ..   -437367 cycles, ... 00000101a968a34a -> 00000101a961f6d3 ?
> 
> these time warps will get worse over time - as the two cores drift 
> apart. (note that they wont drift during the test itself, because the 
> test makes all cores artificially busy and the X2 TSC drifting depends 
> on the core being idle)

I believe this is the same dual-core TSC drift that has been seen w/
x86-64. I have just added some similar logic to the TSC clocksource that
mimics what x86-64 does so an alternative clocksource will be selected
automatically.

I should be sending out another release later tonight with these
updates.

thanks
-john



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-21 22:19     ` test time-warps [was: Re: 2.6.14-rt13] Ingo Molnar
                         ` (2 preceding siblings ...)
  2005-11-21 23:41       ` john stultz
@ 2005-11-22  1:15       ` Steven Rostedt
  2005-11-22 11:16         ` Ingo Molnar
  3 siblings, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2005-11-22  1:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fernando Lopez-Lezcano, linux-kernel, Paul E. McKenney,
	K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger



On Mon, 21 Nov 2005, Ingo Molnar wrote:

>
> but in any case, -rt13 should be silent and there should be no time
> warps. If there are any then those could cause the keyboard repeat
> problems.
>

Hi Ingo,

I'm running -rt13 with the following command line:

root=/dev/md0 ro console=ttyS0,115200 console=tty0 nmi_watchdog=2 lapic
earlyprintk=ttyS0,115200 idle=poll

I just got the following output:

$ ./time-warp-test
#CPUs: 2
running 2 tasks to check for time-warps.
warp ..        -5 cycles, ... 0000004fc2ab2b7f -> 0000004fc2ab2b7a ?
warp ..       -12 cycles, ... 000000506d1d558c -> 000000506d1d5580 ?
warp ..       -97 cycles, ... 000000536c8868d3 -> 000000536c886872 ?
warp ..       -99 cycles, ... 00000059ae9d49a1 -> 00000059ae9d493e ?
warp ..      -110 cycles, ... 00000059ed0f05d6 -> 00000059ed0f0568 ?
warp ..      -118 cycles, ... 0000007392963142 -> 00000073929630cc ?
warp ..      -122 cycles, ... 0000007d6a94bc76 -> 0000007d6a94bbfc ?
warp ..      -346 cycles, ... 0000008acf28a18e -> 0000008acf28a034 ?
warp ..      -390 cycles, ... 0000008b2fc61fef -> 0000008b2fc61e69 ?

-- Steve



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-21 23:41       ` john stultz
@ 2005-11-22  1:31         ` Lee Revell
  0 siblings, 0 replies; 56+ messages in thread
From: Lee Revell @ 2005-11-22  1:31 UTC (permalink / raw)
  To: john stultz
  Cc: Ingo Molnar, Fernando Lopez-Lezcano, linux-kernel,
	Paul E. McKenney, K.R. Foley, Steven Rostedt, Thomas Gleixner,
	pluto, john cooper, Benedikt Spranger, Daniel Walker, Tom Rini,
	George Anzinger

On Mon, 2005-11-21 at 15:41 -0800, john stultz wrote:
> I believe this is the same dual-core TSC drift that has been seen w/
> x86-64. I have just added some similar logic to the TSC clocksource
> that mimics what x86-64 does so an alternative clocksource will be
> selected automatically.
> 
> I should be sending out another release later tonight with these
> updates.
> 

It is really unfortunate that the TSC cannot be used for timekeeping on
these machines.  I wrote a simple benchmark that shows rdtsc on
Fernando's box to be insanely fast - 10000 iterations in 68
microseconds.  This was an order of magnitude faster than any other
machine we tested.  Why would they bother making it so fast if it's
useless for timekeeping?

Lee


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-22  1:15       ` Steven Rostedt
@ 2005-11-22 11:16         ` Ingo Molnar
  2005-11-22 17:49           ` Fernando Lopez-Lezcano
  0 siblings, 1 reply; 56+ messages in thread
From: Ingo Molnar @ 2005-11-22 11:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Fernando Lopez-Lezcano, linux-kernel, Paul E. McKenney,
	K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Hi Ingo,
> 
> I'm running -rt13 with the following command line:
> 
> root=/dev/md0 ro console=ttyS0,115200 console=tty0 nmi_watchdog=2 lapic
> earlyprintk=ttyS0,115200 idle=poll
> 
> I just got the following output:
> 
> $ ./time-warp-test
> #CPUs: 2
> running 2 tasks to check for time-warps.
> warp ..        -5 cycles, ... 0000004fc2ab2b7f -> 0000004fc2ab2b7a ?
> warp ..       -12 cycles, ... 000000506d1d558c -> 000000506d1d5580 ?
> warp ..       -97 cycles, ... 000000536c8868d3 -> 000000536c886872 ?
> warp ..       -99 cycles, ... 00000059ae9d49a1 -> 00000059ae9d493e ?
> warp ..      -110 cycles, ... 00000059ed0f05d6 -> 00000059ed0f0568 ?
> warp ..      -118 cycles, ... 0000007392963142 -> 00000073929630cc ?
> warp ..      -122 cycles, ... 0000007d6a94bc76 -> 0000007d6a94bbfc ?
> warp ..      -346 cycles, ... 0000008acf28a18e -> 0000008acf28a034 ?
> warp ..      -390 cycles, ... 0000008b2fc61fef -> 0000008b2fc61e69 ?

i've attached an updated utility below. But i too can see similar output 
on an X2. A TSC-warp of 390 cycles _might_ be OK, but there are no 
guarantees. It wont show up as a usecs-level (i.e. gettimeofday()) warp, 
because 390 cycles is still much lower than the ~2000 cycles one 
microsecond takes, but it could cause problems for other TSC users.
 
Basically if there is an observable and provable warp in the TSC output 
then it must not be used for any purpose that is not strictly 
per-CPU-ified (such as userspace threads bound to a single CPU, and the 
TSC never used between threads).

	Ingo

---------{ time-warp-test.c }--------->
/*
 * Copyright (C) 2005, Ingo Molnar
 *
 * time-warp-test.c: check TSC synchronity on x86 CPUs. Also detects
 *                   gettimeofday()-level time warps.
 */
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <linux/unistd.h>
#include <unistd.h>
#include <string.h>
#include <pwd.h>
#include <grp.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <regex.h>
#include <fcntl.h>
#include <time.h>
#include <sys/mman.h>
#include <dlfcn.h>
#include <popt.h>
#include <sys/socket.h>
#include <ctype.h>
#include <assert.h>
#include <sched.h>

#define TEST_TSC 1
#define TEST_TOD 1

#if !TEST_TSC && !TEST_TOD
# error this makes no sense ...
#endif

#if DEBUG
# define Printf(x...) printf(x)
#else
# define Printf(x...) do { } while (0)
#endif

/*
 * Shared locks and variables between the test tasks:
 */
enum {
	SHARED_TSC = 0,
	SHARED_LOCK = 2,
	SHARED_TOD = 3,
	SHARED_WORST_TSC = 5,
	SHARED_WORST_TOD = 7,
	SHARED_NR_TSC_WARPS = 9,
	SHARED_NR_TOD_WARPS = 10,
};

#define SHARED(x)	(*(shared + SHARED_##x))
#define SHARED_LL(x)	(*(long long *)(shared + SHARED_##x))

#define BUG_ON(c) assert(!(c))

typedef unsigned long long cycles_t;
typedef unsigned long long usecs_t;

#define rdtscll(val)					\
do {							\
	__asm__ __volatile__("rdtsc" : "=A" (val));	\
} while (0)

#define rdtod(val)					\
do {							\
	struct timeval tv;				\
							\
	gettimeofday(&tv, NULL);			\
	(val) = tv.tv_sec * 1000000LL + tv.tv_usec;	\
} while (0)

static unsigned long *setup_shared_var(void)
{
	char zerobuff [4096] = { 0, };
	int ret, fd;
	unsigned long *buf;

	fd = creat(".tmp_mmap", 0700);
	BUG_ON(fd == -1);
	close(fd);

	fd = open(".tmp_mmap", O_RDWR|O_CREAT|O_TRUNC);
	BUG_ON(fd == -1);
	ret = write(fd, zerobuff, 4096);
	BUG_ON(ret != 4096);

	buf = (void *)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	BUG_ON(buf == (void *)-1);

	close(fd);

	return buf;
}

#define LOOPS 1000000

static inline unsigned long
cmpxchg(volatile unsigned long *ptr, unsigned long old, unsigned long new)
{
	unsigned long prev;

	__asm__ __volatile__("lock; cmpxchg %b1,%2"
			     : "=a"(prev)
			     : "q"(new), "m"(*(ptr)), "0"(old)
			     : "memory");
	return prev;
}

static inline void lock(unsigned long *flag)
{
	while (cmpxchg(flag, 0, 1) != 0)
		/* nothing */;
}

static inline void unlock(unsigned long *flag)
{
	*flag = 0;
}

static void print_status(unsigned long *shared)
{
	const char progress[] = "\\|/-";
	static usecs_t prev_tod;
	static int count1, count2;

	usecs_t tod;

	count1++;
	if (count1 < 1000)
		return;
	count1 = 0;

	rdtod(tod);
	if (tod - prev_tod < 100000ULL)
		return;
	prev_tod = tod;
	count2++;

	if (TEST_TSC)
		printf("| # of TSC-warps:%ld", SHARED(NR_TSC_WARPS));

	if (TEST_TOD)
		printf(" | # of TOD-warps:%ld", SHARED(NR_TOD_WARPS));

	printf(" %c\r", progress[count2 & 3]);
	fflush(stdout);
}

static inline void test_TSC(unsigned long *shared)
{
#if TEST_TSC
	cycles_t t0, t1;
	long long delta;

	lock(&SHARED(LOCK));
	rdtscll(t1);
	t0 = SHARED_LL(TSC);
	SHARED_LL(TSC) = t1;

	delta = t1-t0;
	if (delta < 0) {
		SHARED(NR_TSC_WARPS)++;
		if (delta < SHARED_LL(WORST_TSC)) {
			SHARED_LL(WORST_TSC) = delta;
			fprintf(stderr, "\rnew TSC-warp maximum: %9Ld cycles, %016Lx -> %016Lx\n",
				delta, t0, t1);
		}
	}
	unlock(&SHARED(LOCK));
#endif
}

static inline void test_TOD(unsigned long *shared)
{
#if TEST_TOD
	usecs_t T0, T1;
	long long delta;

	lock(&SHARED(LOCK));
	rdtod(T1);
	T0 = SHARED_LL(TOD);
	SHARED_LL(TOD) = T1;

	delta = T1-T0;
	if (delta < 0) {
		SHARED(NR_TOD_WARPS)++;
		if (delta < SHARED_LL(WORST_TOD)) {
			SHARED_LL(WORST_TOD) = delta;
			fprintf(stderr, "\rnew TOD-warp maximum: %9Ld usecs,  %016Lx -> %016Lx\n",
				delta, T0, T1);
		}
	}
	unlock(&SHARED(LOCK));
#endif
}

int main(int argc, char **argv)
{
	int i, parent, me;
	unsigned long *shared;
	unsigned long cpus, tasks;

	cpus = system("exit `grep processor /proc/cpuinfo  | wc -l`");
	cpus = WEXITSTATUS(cpus);

	if (argc > 2) {
usage:
		fprintf(stderr,
			"usage: tsc-sync-test <threads>\n");
		exit(-1);
	}
	if (argc == 2) {
		tasks = atol(argv[1]);
		if (!tasks)
			goto usage;
	} else
		tasks = cpus;

	printf("%ld CPUs, running %ld parallel test-tasks.\n", cpus, tasks);
	printf("checking for time-warps via:\n"
#if TEST_TSC
	"- read time stamp counter (RDTSC) instruction (cycle resolution)\n"
#endif
#if TEST_TOD
	"- gettimeofday (TOD) syscall (usec resolution)\n"
#endif
		"\n"
		);
	shared = setup_shared_var();

	parent = getpid();

	for (i = 1; i < tasks; i++)
		if (!fork())
			break;
	me = getpid();

	while (1) {
		test_TSC(shared);
		test_TOD(shared);

		if (me == parent)
			print_status(shared);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-21 21:32 ` 2.6.14-rt13 Fernando Lopez-Lezcano
  2005-11-21 21:41   ` 2.6.14-rt13 john stultz
       [not found]   ` <20051121221511.GA7255@elte.hu>
@ 2005-11-22 11:19   ` Ingo Molnar
  2 siblings, 0 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-22 11:19 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: linux-kernel, Paul E. McKenney, K.R. Foley, Steven Rostedt,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger


* Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU> wrote:

> I just had a short burst of key repeats and saw one random screen 
> blank. Right now everything seems normal but I was not allucinating 
> :-)

btw., today i have experienced a 'key repeat' event with the stock FC4 
SMP kernel too, on an X2 athlon. That kernel didnt have idle=poll 
specified, so gettimeofday() could time-warp in substantial ways.

so i'd say the 'key repeat' problem is almost certainly caused by TSC 
"time warps" on X2's.

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-22 11:16         ` Ingo Molnar
@ 2005-11-22 17:49           ` Fernando Lopez-Lezcano
  2005-11-22 18:01             ` Christopher Friesen
  0 siblings, 1 reply; 56+ messages in thread
From: Fernando Lopez-Lezcano @ 2005-11-22 17:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, linux-kernel, Paul E. McKenney, K.R. Foley,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

On Tue, 2005-11-22 at 12:16 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Hi Ingo,
> > 
> > I'm running -rt13 with the following command line:
> > 
> > root=/dev/md0 ro console=ttyS0,115200 console=tty0 nmi_watchdog=2 lapic
> > earlyprintk=ttyS0,115200 idle=poll
> > 
> > I just got the following output:
> > 
> > $ ./time-warp-test
> > #CPUs: 2
> > running 2 tasks to check for time-warps.
> > warp ..        -5 cycles, ... 0000004fc2ab2b7f -> 0000004fc2ab2b7a ?
> > warp ..       -12 cycles, ... 000000506d1d558c -> 000000506d1d5580 ?
> > warp ..       -97 cycles, ... 000000536c8868d3 -> 000000536c886872 ?
> > warp ..       -99 cycles, ... 00000059ae9d49a1 -> 00000059ae9d493e ?
> > warp ..      -110 cycles, ... 00000059ed0f05d6 -> 00000059ed0f0568 ?
> > warp ..      -118 cycles, ... 0000007392963142 -> 00000073929630cc ?
> > warp ..      -122 cycles, ... 0000007d6a94bc76 -> 0000007d6a94bbfc ?
> > warp ..      -346 cycles, ... 0000008acf28a18e -> 0000008acf28a034 ?
> > warp ..      -390 cycles, ... 0000008b2fc61fef -> 0000008b2fc61e69 ?
> 
> i've attached an updated utility below. 

I'm adding a run with:
echo "tsc"> /sys/devices/system/clocksource/clocksource0/clocksource
_not_ booted with idle=poll
at the end of this email. 

> But i too can see similar output 
> on an X2. A TSC-warp of 390 cycles _might_ be OK, but there are no 
> guarantees. 

In my experience the amount seems to be related to how long the system
has been up. Which is to be expected if the two TSCs drift, right?

> It wont show up as a usecs-level (i.e. gettimeofday()) warp, 
> because 390 cycles is still much lower than the ~2000 cycles one 
> microsecond takes, but it could cause problems for other TSC users.
>  
> Basically if there is an observable and provable warp in the TSC output 
> then it must not be used for any purpose that is not strictly 
> per-CPU-ified (such as userspace threads bound to a single CPU, and the 
> TSC never used between threads).

Apparently that's the case. 

John Stultz just released a new version of his patch that takes care of
not using the TSC as a time source on X2's. Hopefully that will make its
way to the -rt patches soon :-) This would take care of the key repeat /
screensaver problems (I just saw a post yesterday on linux-audio-user
about someone else on an X2 processor having the same problems), Jack
will need a patch to use gettimeofday in those cases. 

Is /sys/devices/system/clocksource/clocksource0/clocksource part of the
standard kernel tree? I was thinking on using that for the Jack patch to
decide whether to use tsc or not (ie: if it is good enough for the
kernel it should be good enough for Jack). 

To all involved, a big _THANKS_ for helping track this very annoying
problem!

-- Fernando


# time ./time-warp
2 CPUs, running 2 parallel test-tasks.
checking for time-warps via:
- read time stamp counter (RDTSC) instruction (cycle resolution)
- gettimeofday (TOD) syscall (usec resolution)

new TOD-warp maximum: -4398046507 usecs,  0004062bea76af5b ->
0004062ae451d230
new TSC-warp maximum:  -3122849 cycles, 00009a5f725821a3 ->
00009a5f72287b02
new TSC-warp maximum:  -3123428 cycles, 00009a5f725b26a8 ->
00009a5f722b7dc4
new TSC-warp maximum:  -3123690 cycles, 00009a60ccc01765 ->
00009a60cc906d7b
new TSC-warp maximum:  -3123793 cycles, 00009a61a5897c78 ->
00009a61a559d227
new TSC-warp maximum:  -3123965 cycles, 00009a68b7481924 ->
00009a68b7186e27
new TSC-warp maximum:  -3123966 cycles, 00009a68b754b37b ->
00009a68b725087d
new TSC-warp maximum:  -3124141 cycles, 00009a68c003e8ee ->
00009a68bfd43d41
new TSC-warp maximum:  -3124253 cycles, 00009a68c8b511d9 ->
00009a68c88565bc
new TSC-warp maximum:  -3124268 cycles, 00009a68d2bcaaad ->
00009a68d28cfe81
new TSC-warp maximum:  -3124269 cycles, 00009a68eedd440e ->
00009a68eead97e1
new TSC-warp maximum:  -3124280 cycles, 00009a68eefefe95 ->
00009a68eecf525d
new TSC-warp maximum:  -3124342 cycles, 00009a6907369ac7 ->
00009a690706ee51
new TSC-warp maximum:  -3124592 cycles, 00009a69147b7019 ->
00009a69144bc2a9
new TSC-warp maximum:  -3124609 cycles, 00009a69aa0dd745 ->
00009a69a9de29c4
new TSC-warp maximum:  -3124637 cycles, 00009a69df64a2ff ->
00009a69df34f562
new TSC-warp maximum:  -3124652 cycles, 00009a6a649d4a10 ->
00009a6a646d9c64
new TSC-warp maximum:  -3124663 cycles, 00009a6ad73c29e2 ->
00009a6ad70c7c2b
new TSC-warp maximum:  -3124699 cycles, 00009af351a28fbb ->
00009af35172e1e0
| # of TSC-warps:185478076 | # of TOD-warps:185477650 \

real    6m58.633s
user    5m17.436s
sys     1m27.135s



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-22 17:49           ` Fernando Lopez-Lezcano
@ 2005-11-22 18:01             ` Christopher Friesen
  2005-11-22 18:22               ` Steven Rostedt
  0 siblings, 1 reply; 56+ messages in thread
From: Christopher Friesen @ 2005-11-22 18:01 UTC (permalink / raw)
  To: Fernando Lopez-Lezcano
  Cc: Ingo Molnar, Steven Rostedt, linux-kernel, Paul E. McKenney,
	K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

Fernando Lopez-Lezcano wrote:

>>Basically if there is an observable and provable warp in the TSC output 
>>then it must not be used for any purpose that is not strictly 
>>per-CPU-ified (such as userspace threads bound to a single CPU, and the 
>>TSC never used between threads).

> Apparently that's the case.

What about periodically re-syncing the TSCs on the cpus?  Are they 
writeable?

Chris


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-22 18:01             ` Christopher Friesen
@ 2005-11-22 18:22               ` Steven Rostedt
  2005-11-22 20:52                 ` Ingo Molnar
  0 siblings, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2005-11-22 18:22 UTC (permalink / raw)
  To: Christopher Friesen
  Cc: Fernando Lopez-Lezcano, Ingo Molnar, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger


On Tue, 22 Nov 2005, Christopher Friesen wrote:

> Fernando Lopez-Lezcano wrote:
>
> >>Basically if there is an observable and provable warp in the TSC output
> >>then it must not be used for any purpose that is not strictly
> >>per-CPU-ified (such as userspace threads bound to a single CPU, and the
> >>TSC never used between threads).
>
> > Apparently that's the case.
>
> What about periodically re-syncing the TSCs on the cpus?  Are they
> writeable?
>

I believe you can reset them to zero, but I don't think you can set them
to anything else.  I had to do something similar a few years ago, and I
don't have the specs in front of me, so this is coming straight from
memory.

Even if you could reset them, it would be very difficult to make all CPUs
have the same counter. Not to mention that this would also screw up all
timings elsewhere when the sync happens. Remember, this would have to work
not just on 2 cpus, but 4, 8 and beyond.

-- Steve


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: test time-warps [was: Re: 2.6.14-rt13]
  2005-11-22 18:22               ` Steven Rostedt
@ 2005-11-22 20:52                 ` Ingo Molnar
  0 siblings, 0 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-22 20:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Christopher Friesen, Fernando Lopez-Lezcano, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger,
	john stultz


* Steven Rostedt <rostedt@goodmis.org> wrote:

> > > Apparently that's the case.
> >
> > What about periodically re-syncing the TSCs on the cpus?  Are they
> > writeable?
> 
> I believe you can reset them to zero, but I don't think you can set 
> them to anything else.  I had to do something similar a few years ago, 
> and I don't have the specs in front of me, so this is coming straight 
> from memory.

on a reasonably new CPU you ought to be able to set the 64-bit value - 
but that doesnt change the fundamental fact: we have no idea how much 
time has passed while we were in HLT. Especially with things like 
dyntick/noidle we could spend _alot_ of time in HLT, and the TSC could 
drift significantly. How do we know how much that is?

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-19  2:39           ` 2.6.14-rt13 Steven Rostedt
@ 2005-11-24 15:07             ` Ingo Molnar
  2005-11-24 15:21               ` 2.6.14-rt13 Steven Rostedt
  2005-11-25 20:56               ` [RFC][PATCH] Runtime switching to idle_poll (was: Re: 2.6.14-rt13) Steven Rostedt
  0 siblings, 2 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-24 15:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Fernando Lopez-Lezcano, Lee Revell, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger


* Steven Rostedt <rostedt@goodmis.org> wrote:

> OK, I used this as an exercise to learn how kobject and sysfs work 
> (I've been putting this off for too long). So if this isn't exactly 
> proper, let me know :-)
> 
> Ingo, This could be a temporary patch until we come up with a better 
> solution.  This adds /sys/kernel/idle/idle_poll, which if idle=poll is 
> _not_ set, it still lets you switch the machine to idle=poll on the 
> fly, as well as turn it off. If you have idle=poll, this doesn't even 
> show up.

ok, i've applied this one too. Could you also submit it upstream (and 
implement it for x86)? It makes sense to enable/disable the 
polling-based idle routine runtime.

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 2.6.14-rt13
  2005-11-24 15:07             ` 2.6.14-rt13 Ingo Molnar
@ 2005-11-24 15:21               ` Steven Rostedt
  2005-11-25 20:56               ` [RFC][PATCH] Runtime switching to idle_poll (was: Re: 2.6.14-rt13) Steven Rostedt
  1 sibling, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-11-24 15:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fernando Lopez-Lezcano, Lee Revell, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

On Thu, 2005-11-24 at 16:07 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > OK, I used this as an exercise to learn how kobject and sysfs work 
> > (I've been putting this off for too long). So if this isn't exactly 
> > proper, let me know :-)
> > 
> > Ingo, This could be a temporary patch until we come up with a better 
> > solution.  This adds /sys/kernel/idle/idle_poll, which if idle=poll is 
> > _not_ set, it still lets you switch the machine to idle=poll on the 
> > fly, as well as turn it off. If you have idle=poll, this doesn't even 
> > show up.
> 
> ok, i've applied this one too. Could you also submit it upstream (and 
> implement it for x86)? It makes sense to enable/disable the 
> polling-based idle routine runtime.

OK, it'll have to wait till tomorrow.  As you probably know, it is
Thanksgiving here in the US. And my wife would kill me if I work
today ;-)

-- Steve



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH] Runtime switching to idle_poll (was: Re: 2.6.14-rt13)
  2005-11-24 15:07             ` 2.6.14-rt13 Ingo Molnar
  2005-11-24 15:21               ` 2.6.14-rt13 Steven Rostedt
@ 2005-11-25 20:56               ` Steven Rostedt
  2005-11-26 13:05                 ` Ingo Molnar
  1 sibling, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2005-11-25 20:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: acpi-devel, len.brown, Andrew Morton, Fernando Lopez-Lezcano,
	Lee Revell, linux-kernel, Paul E. McKenney, K.R. Foley,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

On Thu, 2005-11-24 at 16:07 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> > Ingo, This could be a temporary patch until we come up with a better 
> > solution.  This adds /sys/kernel/idle/idle_poll, which if idle=poll is 
> > _not_ set, it still lets you switch the machine to idle=poll on the 
> > fly, as well as turn it off. If you have idle=poll, this doesn't even 
> > show up.
> 
> ok, i've applied this one too. Could you also submit it upstream (and 
> implement it for x86)? It makes sense to enable/disable the 
> polling-based idle routine runtime.

As a request from Ingo, I fixed up this patch a little to allow both
x86_64 and i386 to switch to and from idle_poll at runtime.  I noticed
that the APCI driver in drivers/acpi/processor_idle.c may cause some
race condition with this patch so I added some protection there.
Basically, if the acpi code changes pm_idle, then you can't change to
idle_poll, and vice-versa.

What this patch does is creates an entry
into /sys/kernel/idle/idle_poll.  It will show whether or not the
idle_poll is being used as a runtime idle routine.  It is also used to
set the runtime idle.

with:

# echo 1 > /sys/kernel/idle/idle_poll
  or
# echo on > /sys/kernel/idle/idle_poll

The system will switch to the idle_poll idle routine.

with:

# echo 0 > /sys/kernel/idle/idle_poll
  or
# echo off > /sys/kernel/idle/idle_poll

The system will switch out of idle poll.  Note that if the command line
states "idle=poll" then this will not be implemented.

This is still a work-in-progress.  Since I only own a x86_64 and i386
that is all I ported the code for and tested.  Looking for who else
exports pm_idle I see that the following archs may also need to be
updated:

arm, arm26, i64, sparc.

I also have not yet protected the pm_idle in arch/i386/kernel/apm.c

I figure that I should get some comments before I spend any more time on
this.

Thanks,

-- Steve

Index: linux-2.6.15-rc2-git5/arch/i386/kernel/Makefile
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/kernel/Makefile	2005-10-27 20:02:08.000000000 -0400
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/Makefile	2005-11-25 11:56:25.000000000 -0500
@@ -34,6 +34,7 @@
 obj-$(CONFIG_HPET_TIMER) 	+= time_hpet.o
 obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
+obj-$(CONFIG_SYSFS)		+= switch2poll.o
 
 EXTRA_AFLAGS   := -traditional
 
Index: linux-2.6.15-rc2-git5/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/kernel/process.c	2005-11-25 10:58:53.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/process.c	2005-11-25 12:18:12.000000000 -0500
@@ -39,6 +39,7 @@
 #include <linux/ptrace.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/spinlock.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -64,6 +65,12 @@
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
+spinlock_t pm_idle_switch_lock = SPIN_LOCK_UNLOCKED;
+EXPORT_SYMBOL(pm_idle_switch_lock);
+
+int pm_idle_locked = 0;
+EXPORT_SYMBOL(pm_idle_locked);
+
 /*
  * Return saved PC of a blocked thread.
  */
@@ -126,7 +133,7 @@
  * to poll the ->work.need_resched flag instead of waiting for the
  * cross-CPU IPI to arrive. Use this option with caution.
  */
-static void poll_idle (void)
+void poll_idle (void)
 {
 	local_irq_enable();
 
Index: linux-2.6.15-rc2-git5/arch/i386/kernel/switch2poll.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/switch2poll.c	2005-11-25 11:55:19.000000000 -0500
@@ -0,0 +1,5 @@
+/*
+ * Same type of hack used for early_printk.  This keeps the code
+ * in one place.
+ */
+#include "../../x86_64/kernel/switch2poll.c"
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/Makefile
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/kernel/Makefile	2005-11-22 12:13:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/Makefile	2005-11-25 11:56:40.000000000 -0500
@@ -30,6 +30,7 @@
 obj-$(CONFIG_DUMMY_IOMMU)	+= pci-nommu.o pci-dma.o
 obj-$(CONFIG_KPROBES)		+= kprobes.o
 obj-$(CONFIG_X86_PM_TIMER)	+= pmtimer.o
+obj-$(CONFIG_SYSFS)		+= switch2poll.o
 
 obj-$(CONFIG_MODULES)		+= module.o
 
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/kernel/process.c	2005-11-25 10:58:53.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/process.c	2005-11-25 12:17:53.000000000 -0500
@@ -36,6 +36,7 @@
 #include <linux/utsname.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/spinlock.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -60,6 +61,12 @@
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
+spinlock_t pm_idle_switch_lock = SPIN_LOCK_UNLOCKED;
+EXPORT_SYMBOL(pm_idle_switch_lock);
+
+int pm_idle_locked = 0;
+EXPORT_SYMBOL(pm_idle_locked);
+
 /*
  * Powermanagement idle function, if any..
  */
@@ -110,7 +117,7 @@
  * to poll the ->need_resched flag instead of waiting for the
  * cross-CPU IPI to arrive. Use this option with caution.
  */
-static void poll_idle (void)
+void poll_idle (void)
 {
 	local_irq_enable();
 
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/switch2poll.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/switch2poll.c	2005-11-25 12:23:22.000000000 -0500
@@ -0,0 +1,112 @@
+#include <linux/module.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/spinlock.h>
+#include <linux/pm.h>
+
+extern void poll_idle (void);
+
+#define KERNEL_ATTR_RW(_name) \
+static struct subsys_attribute _name##_attr = \
+	__ATTR(_name, 0644, _name##_show, _name##_store)
+
+static struct idlep_kobject
+{
+	struct kobject kobj;
+	int is_poll;
+	void (*idle)(void);
+} idle_kobj;
+
+static ssize_t idle_poll_show(struct subsystem *subsys, char *page)
+{
+	return sprintf(page, "%s\n", (idle_kobj.is_poll ? "on" : "off"));
+}
+
+static ssize_t idle_poll_store(struct subsystem *subsys,
+			       const char *buf, size_t len)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pm_idle_switch_lock, flags);
+
+	/*
+	 * If power management is handling the idle function,
+	 * then leave it be.
+	 */
+	if (pm_idle_locked) {
+		len = -EBUSY;
+		goto out;
+	}
+
+	if (strncmp(buf,"1",1)==0 ||
+	    (len >=2 && strncmp(buf,"on",2)==0)) {
+		if (idle_kobj.is_poll != 1) {
+			idle_kobj.is_poll = 1;
+			boot_option_idle_override = 1;
+			idle_kobj.idle = pm_idle;
+			pm_idle = poll_idle;
+		}
+	} else if (strncmp(buf,"0",1)==0 ||
+		   (len >= 3 && strncmp(buf,"off",3)==0)) {
+		if (idle_kobj.is_poll != 0) {
+			boot_option_idle_override = 0;
+			idle_kobj.is_poll = 0;
+			pm_idle = idle_kobj.idle;
+		}
+	}
+
+out:
+	spin_unlock_irqrestore(&pm_idle_switch_lock, flags);
+
+	return len;
+}
+
+
+KERNEL_ATTR_RW(idle_poll);
+
+static struct attribute * idle_attrs[] = {
+	&idle_poll_attr.attr,
+	NULL
+};
+
+static struct attribute_group idle_attr_group = {
+	.attrs = idle_attrs,
+};
+
+static int __init idle_poll_set_init(void)
+{
+	int err;
+
+	/*
+	 * If the default is alread poll_idle then
+	 * don't even bother with this.
+	 */
+	if (pm_idle == poll_idle)
+		return 0;
+
+	memset(&idle_kobj, 0, sizeof(idle_kobj));
+
+	idle_kobj.is_poll = 0;
+	idle_kobj.idle = pm_idle;
+
+	err = kobject_set_name(&idle_kobj.kobj, "%s", "idle");
+	if (err)
+		goto out;
+
+	idle_kobj.kobj.parent = &kernel_subsys.kset.kobj;
+	err = kobject_register(&idle_kobj.kobj);
+	if (err)
+		goto out;
+
+	err = sysfs_create_group(&idle_kobj.kobj,
+				 &idle_attr_group);
+	if (err)
+		goto out;
+
+	return 0;
+out:
+	printk(KERN_INFO "Problem setting up sysfs idle_poll\n");
+	return 0;
+}
+
+late_initcall(idle_poll_set_init);
Index: linux-2.6.15-rc2-git5/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/drivers/acpi/processor_idle.c	2005-11-22 12:13:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/drivers/acpi/processor_idle.c	2005-11-25 13:15:59.000000000 -0500
@@ -38,6 +38,7 @@
 #include <linux/dmi.h>
 #include <linux/moduleparam.h>
 #include <linux/sched.h>	/* need_resched() */
+#include <linux/spinlock.h>
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -990,6 +991,7 @@
 	static int first_run = 0;
 	struct proc_dir_entry *entry = NULL;
 	unsigned int i;
+	unsigned long flags;
 
 	ACPI_FUNCTION_TRACE("acpi_processor_power_init");
 
@@ -1023,6 +1025,7 @@
 	 * Note that we use previously set idle handler will be used on
 	 * platforms that only support C1.
 	 */
+	spin_lock_irqsave(&pm_idle_switch_lock, flags);
 	if ((pr->flags.power) && (!boot_option_idle_override)) {
 		printk(KERN_INFO PREFIX "CPU%d (power states:", pr->id);
 		for (i = 1; i <= pr->power.count; i++)
@@ -1034,8 +1037,13 @@
 		if (pr->id == 0) {
 			pm_idle_save = pm_idle;
 			pm_idle = acpi_processor_idle;
+			/*
+			 * Don't allow switching of the pm_idle to poll.
+			 */
+			pm_idle_locked = 1;
 		}
 	}
+	spin_unlock_irqrestore(&pm_idle_switch_lock, flags);
 
 	/* 'power' [R] */
 	entry = create_proc_entry(ACPI_PROCESSOR_FILE_POWER,
@@ -1078,5 +1086,7 @@
 		cpu_idle_wait();
 	}
 
+	pm_idle_locked = 0;
+
 	return_VALUE(0);
 }
Index: linux-2.6.15-rc2-git5/include/linux/pm.h
===================================================================
--- linux-2.6.15-rc2-git5.orig/include/linux/pm.h	2005-11-25 12:05:33.000000000 -0500
+++ linux-2.6.15-rc2-git5/include/linux/pm.h	2005-11-25 12:17:17.000000000 -0500
@@ -25,6 +25,7 @@
 
 #include <linux/config.h>
 #include <linux/list.h>
+#include <linux/spinlock.h>
 #include <asm/atomic.h>
 
 /*
@@ -102,6 +103,8 @@
  */
 extern void (*pm_idle)(void);
 extern void (*pm_power_off)(void);
+extern spinlock_t pm_idle_switch_lock;
+extern int pm_idle_locked;
 
 typedef int __bitwise suspend_state_t;
 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching to idle_poll (was: Re: 2.6.14-rt13)
  2005-11-25 20:56               ` [RFC][PATCH] Runtime switching to idle_poll (was: Re: 2.6.14-rt13) Steven Rostedt
@ 2005-11-26 13:05                 ` Ingo Molnar
  2005-11-29  2:48                   ` [RFC][PATCH] Runtime switching of the idle function [take 2] Steven Rostedt
  0 siblings, 1 reply; 56+ messages in thread
From: Ingo Molnar @ 2005-11-26 13:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: acpi-devel, len.brown, Andrew Morton, Fernando Lopez-Lezcano,
	Lee Revell, linux-kernel, Paul E. McKenney, K.R. Foley,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger


* Steven Rostedt <rostedt@goodmis.org> wrote:

> As a request from Ingo, I fixed up this patch a little to allow both 
> x86_64 and i386 to switch to and from idle_poll at runtime.  I noticed 
> that the APCI driver in drivers/acpi/processor_idle.c may cause some 
> race condition with this patch so I added some protection there. 
> Basically, if the acpi code changes pm_idle, then you can't change to 
> idle_poll, and vice-versa.
> 
> What this patch does is creates an entry into 
> /sys/kernel/idle/idle_poll.  It will show whether or not the idle_poll 
> is being used as a runtime idle routine.  It is also used to set the 
> runtime idle.
> 
> with:
> 
> # echo 1 > /sys/kernel/idle/idle_poll
>   or
> # echo on > /sys/kernel/idle/idle_poll

find some minor cleanups below.

a more general question is, shouldnt the configuration method rather be 
something like:

   echo idle > /sys/kernel/idle

and there could also be a /sys/kernel/idle_methods which would enumerate 
all the strings that are possible? This way we'd not hardcode 
'idle-poll' in any way.

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>

 arch/i386/kernel/process.c   |    6 +++---
 arch/x86_64/kernel/process.c |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -65,11 +65,11 @@ static int hlt_counter;
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-spinlock_t pm_idle_switch_lock = SPIN_LOCK_UNLOCKED;
-EXPORT_SYMBOL(pm_idle_switch_lock);
+DEFINE_SPINLOCK(pm_idle_switch_lock);
+EXPORT_SYMBOL_GPL(pm_idle_switch_lock);
 
 int pm_idle_locked = 0;
-EXPORT_SYMBOL(pm_idle_locked);
+EXPORT_SYMBOL_GPL(pm_idle_locked);
 
 /*
  * Return saved PC of a blocked thread.
Index: linux/arch/x86_64/kernel/process.c
===================================================================
--- linux.orig/arch/x86_64/kernel/process.c
+++ linux/arch/x86_64/kernel/process.c
@@ -61,11 +61,11 @@ static atomic_t hlt_counter = ATOMIC_INI
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-spinlock_t pm_idle_switch_lock = SPIN_LOCK_UNLOCKED;
-EXPORT_SYMBOL(pm_idle_switch_lock);
+DEFINE_SPINLOCK(pm_idle_switch_lock);
+EXPORT_SYMBOL_GPL(pm_idle_switch_lock);
 
 int pm_idle_locked = 0;
-EXPORT_SYMBOL(pm_idle_locked);
+EXPORT_SYMBOL_GPL(pm_idle_locked);
 
 /*
  * Powermanagement idle function, if any..

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-26 13:05                 ` Ingo Molnar
@ 2005-11-29  2:48                   ` Steven Rostedt
  2005-11-29  3:02                     ` Andrew Morton
  2005-11-29 13:08                     ` Pavel Machek
  0 siblings, 2 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-11-29  2:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: acpi-devel, len.brown, Andrew Morton, Fernando Lopez-Lezcano,
	Lee Revell, linux-kernel, Paul E. McKenney, K.R. Foley,
	Thomas Gleixner, pluto, john cooper, Benedikt Spranger,
	Daniel Walker, Tom Rini, George Anzinger

Here's an update on the switching of the idle function.

As Ingo has suggested, I removed this from being specific to the
poll_idle function.

Description:

This patch creates a directory in /sys/kernel called idle.  This
directory contains two files: idle_ctrl and idle_methods.  Reading
idle_ctrl will show the function that is currently being used for idle,
and idle_methods shows the available methods for the user to send write
into idle_ctrl to change which function to use for idle.

If the freeze attribute is set for an idle function (defined in the
idle_info struct explained below), then the user cannot add or remove
that function.  This is used by the acpi since I wasn't sure how it
would handle having that function added or removed dynamically.
Functions that are frozen are shown in the idle_methods (and idle_ctrl
when used) with an asterisk (*) in front of the name.

I moved the code from arch/x86_64 to outside the arch directories into
kernel.  The file is called idle.c.  This implements functions to
register idle and unregister idle.  It also has the functions to set
which idle to use. This file also creates the entries into the sysfs
directory.  Currently this is only compiled for i386, x86_64, and
ia64.  

Since I only have i386 and x86_64, I was only able to test the changes
in those two archs. I modified ia64, but haven't even tried to compile
it.  If someone with that arch would like to do me the favor, please
do ;-)

I've created an idle_info structure that is used to register the idle
functions.  This is now how acpi adds its functions.

struct idle_info {
  struct list_head list; /* used to link in with all other registered */
  const char *name; /* name to be used to add as well as to show */
  idlefunc_t func; /* the function to be called for idle */
  int freeze;  /* set to disallow the user from adding or removing it */
  int inuse; /* set when being used as the idle function */
};

This is a much more robust way of handling changes of the idle function
and can easily be adapted to other archs that would like to also
implement dynamic changes of the idle function.  This would be nice to
add to sparc (hint hint).

Here's the patch:

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-2.6.15-rc2-git5/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/kernel/process.c	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/process.c	2005-11-28 20:30:51.000000000 -0500
@@ -39,6 +39,7 @@
 #include <linux/ptrace.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/idle.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -72,11 +73,6 @@
 	return ((unsigned long *)tsk->thread.esp)[3];
 }
 
-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
-EXPORT_SYMBOL(pm_idle);
 static DEFINE_PER_CPU(unsigned int, cpu_idle_state);
 
 void disable_hlt(void)
@@ -185,7 +181,7 @@
 				__get_cpu_var(cpu_idle_state) = 0;
 
 			rmb();
-			idle = pm_idle;
+			idle = idle_func;
 
 			if (!idle)
 				idle = default_idle;
@@ -230,6 +226,8 @@
 }
 EXPORT_SYMBOL_GPL(cpu_idle_wait);
 
+static struct idle_info idle_mwait;
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
@@ -258,25 +256,62 @@
 		 * Skip, if setup has overridden idle.
 		 * One CPU supports mwait => All CPUs supports mwait
 		 */
-		if (!pm_idle) {
+		memset(&idle_mwait, 0, sizeof(idle_mwait));
+		idle_mwait.name = "mwait";
+		idle_mwait.func = mwait_idle;
+		register_idle(&idle_mwait);
+
+		if (!idle_func) {
 			printk("using mwait in idle threads.\n");
-			pm_idle = mwait_idle;
+			set_idle("mwait");
 		}
 	}
 }
 
+static struct idle_info idle_default;
+static struct idle_info idle_poll;
+
+static int __init add_idle(void)
+{
+	static int set;
+
+	if (set)
+		return 0;
+	set = 1;
+
+	memset(&idle_poll, 0, sizeof(idle_poll));
+	idle_poll.name = "poll";
+	idle_poll.func = poll_idle;
+	register_idle(&idle_poll);
+
+	/*
+	 * Allow the user to switch out of poll_idle even
+	 * if it was a boot option.
+	 */
+	memset(&idle_default, 0, sizeof(idle_default));
+	idle_default.name = "default";
+	idle_default.func = default_idle;
+	register_idle(&idle_default);
+
+	return 0;
+}
+
+arch_initcall(add_idle);
+
 static int __init idle_setup (char *str)
 {
+	add_idle();
 	if (!strncmp(str, "poll", 4)) {
 		printk("using polling idle threads.\n");
-		pm_idle = poll_idle;
+		set_idle("poll");
+
 #ifdef CONFIG_X86_SMP
 		if (smp_num_siblings > 1)
 			printk("WARNING: polling idle and HT enabled, performance may degrade.\n");
 #endif
 	} else if (!strncmp(str, "halt", 4)) {
 		printk("using halt in idle threads.\n");
-		pm_idle = default_idle;
+		set_idle("default");
 	}
 
 	boot_option_idle_override = 1;
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/kernel/process.c	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/process.c	2005-11-28 20:30:21.000000000 -0500
@@ -36,6 +36,8 @@
 #include <linux/utsname.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/spinlock.h>
+#include <linux/idle.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -60,10 +62,6 @@
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
 static DEFINE_PER_CPU(unsigned int, cpu_idle_state);
 
 void disable_hlt(void)
@@ -195,7 +193,7 @@
 				__get_cpu_var(cpu_idle_state) = 0;
 
 			rmb();
-			idle = pm_idle;
+			idle = idle_func;
 			if (!idle)
 				idle = default_idle;
 			if (cpu_is_offline(smp_processor_id()))
@@ -209,6 +207,8 @@
 	}
 }
 
+struct idle_info idle_mwait;
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
@@ -233,25 +233,61 @@
 {
 	static int printed;
 	if (cpu_has(c, X86_FEATURE_MWAIT)) {
+		memset(&idle_mwait, 0, sizeof(idle_mwait));
+		idle_mwait.name = "mwait";
+		idle_mwait.func = mwait_idle;
+		register_idle(&idle_mwait);
+
 		/*
 		 * Skip, if setup has overridden idle.
 		 * One CPU supports mwait => All CPUs supports mwait
 		 */
-		if (!pm_idle) {
+		if (!idle_func) {
 			if (!printed) {
 				printk("using mwait in idle threads.\n");
 				printed = 1;
 			}
-			pm_idle = mwait_idle;
+			set_idle("mwait");
 		}
 	}
 }
 
+static struct idle_info idle_default;
+static struct idle_info idle_poll;
+
+static int __init add_idle(void)
+{
+	static int set;
+
+	if (set)
+		return 0;
+	set = 1;
+
+	memset(&idle_poll, 0, sizeof(idle_poll));
+	idle_poll.name = "poll";
+	idle_poll.func = poll_idle;
+	register_idle(&idle_poll);
+
+	/*
+	 * Allow the user to switch out of poll_idle even
+	 * if it was a boot option.
+	 */
+	memset(&idle_default, 0, sizeof(idle_default));
+	idle_default.name = "default";
+	idle_default.func = default_idle;
+	register_idle(&idle_default);
+
+	return 0;
+}
+arch_initcall(add_idle);
+
 static int __init idle_setup (char *str)
 {
+	add_idle();
+
 	if (!strncmp(str, "poll", 4)) {
 		printk("using polling idle threads.\n");
-		pm_idle = poll_idle;
+		set_idle("poll");
 	}
 
 	boot_option_idle_override = 1;
Index: linux-2.6.15-rc2-git5/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/drivers/acpi/processor_idle.c	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/drivers/acpi/processor_idle.c	2005-11-28 19:59:42.000000000 -0500
@@ -38,6 +38,8 @@
 #include <linux/dmi.h>
 #include <linux/moduleparam.h>
 #include <linux/sched.h>	/* need_resched() */
+#include <linux/spinlock.h>
+#include <linux/idle.h>
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -56,6 +58,7 @@
 #define C3_OVERHEAD			4	/* 1us (3.579 ticks per us) */
 static void (*pm_idle_save) (void);
 module_param(max_cstate, uint, 0644);
+#define PM_IDLE_NAME "pm_idle"
 
 static unsigned int nocst = 0;
 module_param(nocst, uint, 0000);
@@ -891,13 +894,13 @@
 		return_VALUE(-ENODEV);
 
 	/* Fall back to the default idle loop */
-	pm_idle = pm_idle_save;
+	set_idle(NULL);
 	synchronize_sched();	/* Relies on interrupts forcing exit from idle. */
 
 	pr->flags.power = 0;
 	result = acpi_processor_get_power_info(pr);
 	if ((pr->flags.power == 1) && (pr->flags.power_setup_done))
-		pm_idle = acpi_processor_idle;
+		set_idle(PM_IDLE_NAME);
 
 	return_VALUE(result);
 }
@@ -983,6 +986,8 @@
 	.release = single_release,
 };
 
+static struct idle_info pm_idle_info;
+
 int acpi_processor_power_init(struct acpi_processor *pr,
 			      struct acpi_device *device)
 {
@@ -1032,8 +1037,17 @@
 		printk(")\n");
 
 		if (pr->id == 0) {
-			pm_idle_save = pm_idle;
-			pm_idle = acpi_processor_idle;
+			memset(&pm_idle_info, 0, sizeof(pm_idle_info));
+			pm_idle_info.name = PM_IDLE_NAME;
+			pm_idle_info.func = acpi_processor_idle;
+			pm_idle_info.freeze = 1;
+
+			register_idle(&pm_idle_info);
+			/*
+			 * Just use the default idle
+			 */
+			pm_idle_save = get_idle(NULL);
+			set_idle(PM_IDLE_NAME);
 		}
 	}
 
@@ -1068,7 +1082,29 @@
 
 	/* Unregister the idle handler when processor #0 is removed. */
 	if (pr->id == 0) {
-		pm_idle = pm_idle_save;
+		int tries = 0;
+		int ret;
+		set_idle(NULL);
+		do {
+			if ((ret = unregister_idle(PM_IDLE_NAME)) == 0)
+				break;
+			/*
+			 * for some reason the idle function is being used.
+			 * Wait a little and then try again.
+			 */
+			if (ret == -EINVAL) {
+				printk(KERN_WARNING
+				       "ACPI idle function never registered?\n");
+				break;
+			}
+			yield();
+		} while (tries++ < 10);
+		if (tries > 10) {
+			printk(KERN_WARNING
+			       "Unable to unresgister ACPI idle function\n");
+			/* don't unregister */
+			return_VALUE(ret);
+		}
 
 		/*
 		 * We are about to unload the current idle thread pm callback
Index: linux-2.6.15-rc2-git5/include/linux/pm.h
===================================================================
--- linux-2.6.15-rc2-git5.orig/include/linux/pm.h	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/include/linux/pm.h	2005-11-28 19:59:42.000000000 -0500
@@ -25,6 +25,7 @@
 
 #include <linux/config.h>
 #include <linux/list.h>
+#include <linux/spinlock.h>
 #include <asm/atomic.h>
 
 /*
@@ -102,6 +103,8 @@
  */
 extern void (*pm_idle)(void);
 extern void (*pm_power_off)(void);
+extern spinlock_t pm_idle_switch_lock;
+extern int pm_idle_locked;
 
 typedef int __bitwise suspend_state_t;
 
Index: linux-2.6.15-rc2-git5/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/Kconfig	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/Kconfig	2005-11-28 19:59:42.000000000 -0500
@@ -69,6 +69,10 @@
 	bool
 	default y
 
+config DYNAMIC_IDLE
+	bool
+	default y
+
 source "init/Kconfig"
 
 
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/x8664_ksyms.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/kernel/x8664_ksyms.c	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/x8664_ksyms.c	2005-11-28 19:59:42.000000000 -0500
@@ -58,7 +58,6 @@
 EXPORT_SYMBOL(disable_irq_nosync);
 EXPORT_SYMBOL(probe_irq_mask);
 EXPORT_SYMBOL(kernel_thread);
-EXPORT_SYMBOL(pm_idle);
 EXPORT_SYMBOL(pm_power_off);
 EXPORT_SYMBOL(get_cmos_time);
 
Index: linux-2.6.15-rc2-git5/include/linux/idle.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc2-git5/include/linux/idle.h	2005-11-28 21:36:00.000000000 -0500
@@ -0,0 +1,67 @@
+/*
+ *  idle.h - Registering of the idle function (for supported archs)
+ *
+ *  Copyright (C) 2005 Steven Rostedt <rostedt@goodmis.org>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef _LINUX_IDLE_H
+#define _LINUX_IDLE_H
+
+#include <linux/config.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <asm/atomic.h>
+
+typedef void (*idlefunc_t)(void);
+
+struct idle_info {
+	struct list_head list;
+	const char *name;	/* Name visible to users */
+	idlefunc_t func;	/* idle function to run  */
+	int freeze;		/* Only allow kernel to add or remove */
+	int inuse;		/* set when being used */
+};
+
+/*
+ * Registering and unregistering functions that may be used
+ * instead of the default idle function.  This only adds
+ * them to the list of functions to be used, it does not
+ * set the
+ */
+extern int register_idle(struct idle_info *info);
+extern int unregister_idle(const char *name);
+
+/*
+ * This sets the idle function to the registered function
+ * by name.  Use NULL to set the idle function back to
+ * the default.
+ */
+extern int set_idle(const char *name);
+
+/*
+ * Return the function that is registered by name.
+ * Use NULL to get the default function.
+ * NULL may be returned (as that may be what the current
+ * idle function is set to, to use a default). NULL will
+ * also be returned if name is not registered.
+ */
+extern idlefunc_t get_idle(const char *name);
+
+extern idlefunc_t idle_func;
+
+#endif /* _LINUX_IDLE_H */
Index: linux-2.6.15-rc2-git5/kernel/Makefile
===================================================================
--- linux-2.6.15-rc2-git5.orig/kernel/Makefile	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/kernel/Makefile	2005-11-28 19:59:42.000000000 -0500
@@ -32,6 +32,7 @@
 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-$(CONFIG_DYNAMIC_IDLE) += idle.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6.15-rc2-git5/kernel/idle.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc2-git5/kernel/idle.c	2005-11-28 20:29:57.000000000 -0500
@@ -0,0 +1,308 @@
+/*
+ *  kernel/idle.c
+ *
+ *  Setting up of the idle function to be dynamic.
+ *
+ *  Copyright (C) 2005 Steven Rostedt
+ */
+#include <linux/module.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/spinlock.h>
+#include <linux/idle.h>
+
+idlefunc_t idle_func;
+
+static void (*idle_default)(void);
+static LIST_HEAD(idle_elements);
+static DECLARE_MUTEX(idle_sem);
+static struct idle_info *curr_idle;
+
+#ifdef CONFIG_SYSFS
+int idle_sysfs_init;
+#endif
+
+extern void poll_idle (void);
+
+static struct idle_info *__find_idle_info(const char *name)
+{
+	struct list_head *curr;
+	struct idle_info *p;
+	/*
+	 * A little inefficient, but this isn't called often.
+	 */
+	list_for_each(curr, &idle_elements) {
+		p = list_entry(curr, struct idle_info, list);
+		if (!strcmp(name, p->name))
+			break;
+	}
+	if (curr == &idle_elements)
+		p = NULL;
+
+	return p;
+}
+
+int register_idle(struct idle_info *info)
+{
+	struct idle_info *p;
+	int ret = -EEXIST;
+
+	BUG_ON(!info->name);
+
+	down(&idle_sem);
+
+	p = __find_idle_info(info->name);
+	if (p)
+		goto out;
+	ret = 0;
+
+	list_add(&info->list, &idle_elements);
+
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(register_idle);
+
+int unregister_idle(const char *name)
+{
+	struct idle_info *p;
+	int ret = -EINVAL;
+
+	BUG_ON(!name);
+
+	down(&idle_sem);
+
+	p = __find_idle_info(name);
+	if (!p)
+		goto out;
+	if (p->inuse) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	ret = 0;
+
+	list_del_init(&p->list);
+
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(unregister_idle);
+
+static int __set_idle(struct idle_info *info)
+{
+	if (curr_idle)
+		curr_idle->inuse--;
+	info->inuse++;
+	curr_idle = info;
+	return 0;
+}
+
+int set_idle(const char *name)
+{
+	struct idle_info *p;
+	int ret = 0;
+
+	down(&idle_sem);
+
+	if (!name) {
+		/* Set to the default function */
+		if (curr_idle) {
+			curr_idle->inuse--;
+			curr_idle = NULL;
+		}
+		idle_func = idle_default;
+		goto out;
+	}
+
+	ret = -EINVAL;
+	p = __find_idle_info(name);
+	if (!p)
+		goto out;
+
+	__set_idle(p);
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(set_idle);
+
+idlefunc_t get_idle(const char *name)
+{
+	struct idle_info *p;
+	idlefunc_t ret = idle_default;
+
+	down(&idle_sem);
+
+	if (!name)
+		goto out;
+
+	p = __find_idle_info(name);
+	if (!p)
+		goto out;
+
+	ret = p->func;
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(get_idle);
+
+#ifdef CONFIG_SYSFS
+#define KERNEL_ATTR_RW(_name) \
+static struct subsys_attribute _name##_attr = \
+	__ATTR(_name, 0644, _name##_show, _name##_store)
+
+static struct idlep_kobject
+{
+	struct kobject kobj;
+} idle_kobj;
+
+static ssize_t idle_ctrl_show(struct subsystem *subsys, char *page)
+{
+	ssize_t ret;
+	char *star = "";
+	const char *name = "default";
+
+	down(&idle_sem);
+	if (curr_idle) {
+		name = curr_idle->name;
+		if (curr_idle->freeze)
+			star = "*";
+	}
+	ret = sprintf(page, "%s%s\n", star, name);
+	up(&idle_sem);
+
+	return ret;
+}
+
+static ssize_t idle_ctrl_store(struct subsystem *subsys,
+			       const char *buf, size_t len)
+{
+	struct list_head *curr;
+	struct idle_info *p;
+	ssize_t ret = -EBUSY;
+
+	down(&idle_sem);
+
+	if (curr_idle && curr_idle->freeze)
+		goto out;
+
+	list_for_each(curr, &idle_elements) {
+		int size;
+		p = list_entry(curr, struct idle_info, list);
+
+		size = strlen(p->name);
+		if (len <= size)
+			continue;
+		if (!strncmp(p->name, buf, size))
+			break;
+	}
+	if (curr == &idle_elements) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * This idle routine may have been registered to
+	 * not allow users to add or remove this.
+	 */
+	if (p->freeze)
+		goto out;
+
+	__set_idle(p);
+
+	ret = len;
+out:
+	up(&idle_sem);
+
+	return ret;
+}
+
+KERNEL_ATTR_RW(idle_ctrl);
+
+static ssize_t idle_methods_show(struct subsystem *subsys, char *page)
+{
+	struct list_head *curr;
+	struct idle_info *p;
+	ssize_t len = 0;
+
+	down(&idle_sem);
+	list_for_each(curr, &idle_elements) {
+		p = list_entry(curr, struct idle_info, list);
+		if (len + 3 + strlen(p->name) >= PAGE_SIZE) {
+			printk("idle functions overflowed sysfs??\n");
+			break;
+		}
+		len += sprintf(page+len, "%s%s%s",
+			       len ? " " : "",
+			       p->freeze ? "*" : "",
+			       p->name);
+	}
+	if (len + 2 < PAGE_SIZE)
+		len += sprintf(page+len, "\n");
+
+	up(&idle_sem);
+	return len;
+}
+
+static ssize_t idle_methods_store(struct subsystem *subsys,
+				  const char *buf, size_t len)
+{
+	/* do nothing */
+	return len;
+}
+
+KERNEL_ATTR_RW(idle_methods);
+
+static struct attribute * idle_attrs[] = {
+	&idle_ctrl_attr.attr,
+	&idle_methods_attr.attr,
+	NULL
+};
+
+static struct attribute_group idle_attr_group = {
+	.attrs = idle_attrs,
+};
+
+static int __init idle_setup_sysfs(void)
+{
+	int err;
+
+	memset(&idle_kobj, 0, sizeof(idle_kobj));
+	err = kobject_set_name(&idle_kobj.kobj, "%s", "idle");
+	if (err)
+		goto out;
+
+	kobj_set_kset_s(&idle_kobj, kernel_subsys);
+
+	idle_kobj.kobj.parent = &kernel_subsys.kset.kobj;
+	err = kobject_register(&idle_kobj.kobj);
+	if (err)
+		goto out;
+
+	err = sysfs_create_group(&idle_kobj.kobj,
+				 &idle_attr_group);
+	if (err)
+		goto out;
+
+       	return 0;
+out:
+	printk(KERN_INFO "Problem setting up sysfs idle_ctrl\n");
+	return 0;
+}
+#endif /* CONFIG_SYSFS */
+
+static int __init idle_setup(void)
+{
+	idle_default = idle_func;
+
+#ifdef CONFIG_SYSFS
+	idle_setup_sysfs();
+#endif
+	return 0;
+}
+
+late_initcall(idle_setup);
Index: linux-2.6.15-rc2-git5/arch/i386/Kconfig
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/Kconfig	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/Kconfig	2005-11-28 19:59:42.000000000 -0500
@@ -45,6 +45,10 @@
 	bool
 	default y
 
+config DYNAMIC_IDLE
+	bool
+	default y
+
 source "init/Kconfig"
 
 menu "Processor type and features"
Index: linux-2.6.15-rc2-git5/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/kernel/apm.c	2005-11-28 19:59:34.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/apm.c	2005-11-28 19:59:42.000000000 -0500
@@ -225,6 +225,7 @@
 #include <linux/smp_lock.h>
 #include <linux/dmi.h>
 #include <linux/suspend.h>
+#include <linux/idle.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -2220,6 +2221,9 @@
 	{ }
 };
 
+static struct idle_info apm_idle;
+#define APM_IDLE_NAME "apm"
+
 /*
  * Just start the APM thread. We do NOT want to do APM BIOS
  * calls from anything but the APM thread, if for no other reason
@@ -2373,8 +2377,14 @@
 	if (HZ != 100)
 		idle_period = (idle_period * HZ) / 100;
 	if (idle_threshold < 100) {
-		original_pm_idle = pm_idle;
-		pm_idle  = apm_cpu_idle;
+                memset(&apm_idle, 0, sizeof(apm_idle));
+                apm_idle.name = APM_IDLE_NAME;
+                apm_idle.func = apm_cpu_idle;
+                apm_idle.freeze = 1;
+                register_idle(&apm_idle);
+
+		original_pm_idle = get_idle(NULL);
+                set_idle(APM_IDLE_NAME);
 		set_pm_idle = 1;
 	}
 
@@ -2386,7 +2396,26 @@
 	int	error;
 
 	if (set_pm_idle) {
-		pm_idle = original_pm_idle;
+		int tries = 0;
+		int ret;
+		set_idle(NULL);
+		do {
+			if ((ret = unregister_idle(APM_IDLE_NAME)) == 0)
+				break;
+			/*
+			 * for some reason the idle function is being used.
+			 * Wait a little and then try again.
+			 */
+			if (ret == -EINVAL) {
+				printk(KERN_WARNING
+				       "APM idle function never registered?\n");
+				break;
+			}
+			yield();
+		} while (tries++ < 10);
+		if (tries > 10)
+			printk(KERN_WARNING
+			       "Unable to unresgister APM idle function\n");
 		/*
 		 * We are about to unload the current idle thread pm callback
 		 * (pm_idle), Wait for all processors to update cached/local
Index: linux-2.6.15-rc2-git5/arch/ia64/Kconfig
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/Kconfig	2005-11-22 12:13:22.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/Kconfig	2005-11-28 20:17:30.000000000 -0500
@@ -62,6 +62,10 @@
 	bool
 	default y
 
+config DYNAMIC_IDLE
+	bool
+	default y
+
 choice
 	prompt "System type"
 	default IA64_GENERIC
Index: linux-2.6.15-rc2-git5/arch/ia64/kernel/acpi.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/kernel/acpi.c	2005-11-22 12:13:22.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/kernel/acpi.c	2005-11-28 20:23:41.000000000 -0500
@@ -60,8 +60,6 @@
 
 #define PREFIX			"ACPI: "
 
-void (*pm_idle) (void);
-EXPORT_SYMBOL(pm_idle);
 void (*pm_power_off) (void);
 EXPORT_SYMBOL(pm_power_off);
 
Index: linux-2.6.15-rc2-git5/arch/ia64/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/kernel/process.c	2005-11-25 10:58:53.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/kernel/process.c	2005-11-28 20:29:33.000000000 -0500
@@ -31,6 +31,7 @@
 #include <linux/interrupt.h>
 #include <linux/delay.h>
 #include <linux/kprobes.h>
+#include <linux/idle.h>
 
 #include <asm/cpu.h>
 #include <asm/delay.h>
@@ -289,7 +290,7 @@
 			if (mark_idle)
 				(*mark_idle)(1);
 
-			idle = pm_idle;
+			idle = idle_func;
 			if (!idle)
 				idle = default_idle;
 			(*idle)();
Index: linux-2.6.15-rc2-git5/arch/ia64/kernel/setup.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/kernel/setup.c	2005-11-22 12:13:22.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/kernel/setup.c	2005-11-28 20:23:09.000000000 -0500
@@ -43,6 +43,7 @@
 #include <linux/initrd.h>
 #include <linux/platform.h>
 #include <linux/pm.h>
+#include <linux/idle.h>
 
 #include <asm/ia32.h>
 #include <asm/machvec.h>
@@ -738,6 +739,8 @@
 		ia64_max_cacheline_size = max;
 }
 
+struct idle_info idle_default;
+
 /*
  * cpu_init() initializes state that is per-CPU.  This function acts
  * as a 'CPU state barrier', nothing should get across.
@@ -861,7 +864,13 @@
 	/* size of physical stacked register partition plus 8 bytes: */
 	__get_cpu_var(ia64_phys_stacked_size_p8) = num_phys_stacked*8 + 8;
 	platform_cpu_init();
-	pm_idle = default_idle;
+
+	memset(&idle_default, 0, sizeof(idle_default));
+	idle_default.name = "default";
+	idle_default.func = default_idle;
+	register_idle(&idle_default);
+
+	set_idle("default");
 }
 
 void



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  2:48                   ` [RFC][PATCH] Runtime switching of the idle function [take 2] Steven Rostedt
@ 2005-11-29  3:02                     ` Andrew Morton
  2005-11-29  3:42                       ` Steven Rostedt
  2005-11-29 13:08                     ` Pavel Machek
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2005-11-29  3:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, acpi-devel, len.brown, nando, rlrevell, linux-kernel,
	paulmck, kr, tglx, pluto, john.cooper, bene, dwalker, trini,
	george

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> This patch creates a directory in /sys/kernel called idle.
>

At no point do you appear to explain _why_ the kernel needs this feature?

> ...
> -		pm_idle = pm_idle_save;
> +		int tries = 0;
> +		int ret;
> +		set_idle(NULL);
> +		do {
> +			if ((ret = unregister_idle(PM_IDLE_NAME)) == 0)
> +				break;
> +			/*
> +			 * for some reason the idle function is being used.
> +			 * Wait a little and then try again.
> +			 */
> +			if (ret == -EINVAL) {
> +				printk(KERN_WARNING
> +				       "ACPI idle function never registered?\n");
> +				break;
> +			}
> +			yield();
> +		} while (tries++ < 10);

The use of yield() could be problematic - its semantics are rather
ill-defined.  Maybe msleep(1) or something?

What's this loop here for anyway?  Looks kludgy.

> +		if (tries > 10) {
> +			printk(KERN_WARNING
> +			       "Unable to unresgister ACPI idle function\n");

tpyo

> +	memset(&idle_kobj, 0, sizeof(idle_kobj));

There are several memsets of statically allocated structures which are
already all-zero.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  3:02                     ` Andrew Morton
@ 2005-11-29  3:42                       ` Steven Rostedt
  2005-11-29  4:01                         ` Andrew Morton
  2005-11-29  4:22                         ` john stultz
  0 siblings, 2 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-11-29  3:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mingo, acpi-devel, len.brown, nando, rlrevell, linux-kernel,
	paulmck, kr, tglx, pluto, john.cooper, bene, dwalker, trini,
	george

On Mon, 2005-11-28 at 19:02 -0800, Andrew Morton wrote:
> Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > This patch creates a directory in /sys/kernel called idle.
> >
> 
> At no point do you appear to explain _why_ the kernel needs this feature?

Sorry about that.  This originally came up when we had problems with the
AMD64 x2 in the -rt patch.  It was noted that the TSCs would get very
far out of sync and cause problems.  The way to solve this was to set
idle=poll.  The original patch I sent was to allow the user to change to
idle=poll dynamically.  This way they could switch to the poll_idle and
run there tests (requiring tsc not to drift) and then switch back to the
default idle to save on electricity.

Note: It's been stated that the tsc drift can cause problems with the
vanilla kernel too.

Ingo asked if I could make this more robust and not dependent on
idle_poll.

Maybe Ingo can give a better explanation?

> 
> > ...
> > -		pm_idle = pm_idle_save;
> > +		int tries = 0;
> > +		int ret;
> > +		set_idle(NULL);
> > +		do {
> > +			if ((ret = unregister_idle(PM_IDLE_NAME)) == 0)
> > +				break;
> > +			/*
> > +			 * for some reason the idle function is being used.
> > +			 * Wait a little and then try again.
> > +			 */
> > +			if (ret == -EINVAL) {
> > +				printk(KERN_WARNING
> > +				       "ACPI idle function never registered?\n");
> > +				break;
> > +			}
> > +			yield();
> > +		} while (tries++ < 10);
> 
> The use of yield() could be problematic - its semantics are rather
> ill-defined.  Maybe msleep(1) or something?
> 
> What's this loop here for anyway?  Looks kludgy.

Oops! That was required by some other garbage that I had earlier. I
cleaned up the patch some more, and this is no longer required. (will
remove).

> 
> > +		if (tries > 10) {
> > +			printk(KERN_WARNING
> > +			       "Unable to unresgister ACPI idle function\n");
> 
> tpyo

Will fix.

> 
> > +	memset(&idle_kobj, 0, sizeof(idle_kobj));
> 
> There are several memsets of statically allocated structures which are
> already all-zero.
> 

:) I'm really paranoid!  OK, I always like to do a memset even when it's
not needed.  I'll purge them too.

Thanks for having a look.

-- Steve



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  3:42                       ` Steven Rostedt
@ 2005-11-29  4:01                         ` Andrew Morton
  2005-11-29  6:44                           ` Ingo Molnar
  2005-11-29  4:22                         ` john stultz
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2005-11-29  4:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, acpi-devel, len.brown, nando, rlrevell, linux-kernel,
	paulmck, kr, tglx, pluto, john.cooper, bene, dwalker, trini,
	george

Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 2005-11-28 at 19:02 -0800, Andrew Morton wrote:
>  > Steven Rostedt <rostedt@goodmis.org> wrote:
>  > >
>  > > This patch creates a directory in /sys/kernel called idle.
>  > >
>  > 
>  > At no point do you appear to explain _why_ the kernel needs this feature?
> 
>  Sorry about that.  This originally came up when we had problems with the
>  AMD64 x2 in the -rt patch.  It was noted that the TSCs would get very
>  far out of sync and cause problems.

Unsynced TSCs are rare, but they happen.  I guess even if we were to resync
them, these measurements would screw up.


> The way to solve this was to set
>  idle=poll.  The original patch I sent was to allow the user to change to
>  idle=poll dynamically.  This way they could switch to the poll_idle and
>  run there tests (requiring tsc not to drift) and then switch back to the
>  default idle to save on electricity.

Use gettimeofday()?

If it's just for some sort of instrumentation, run NR_CPUS instances of a
niced-down busyloop, pin each one to a different CPU?  That way the idle
function doesn't get called at all..


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  3:42                       ` Steven Rostedt
  2005-11-29  4:01                         ` Andrew Morton
@ 2005-11-29  4:22                         ` john stultz
  2005-11-29 14:22                           ` Steven Rostedt
  1 sibling, 1 reply; 56+ messages in thread
From: john stultz @ 2005-11-29  4:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andrew Morton, mingo, acpi-devel, len.brown, nando, rlrevell,
	linux-kernel, paulmck, kr, tglx, pluto, john.cooper, bene,
	dwalker, trini, george

On Mon, 2005-11-28 at 22:42 -0500, Steven Rostedt wrote:
> On Mon, 2005-11-28 at 19:02 -0800, Andrew Morton wrote:
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > This patch creates a directory in /sys/kernel called idle.
> > >
> > 
> > At no point do you appear to explain _why_ the kernel needs this feature?
> 
> Sorry about that.  This originally came up when we had problems with the
> AMD64 x2 in the -rt patch.  It was noted that the TSCs would get very
> far out of sync and cause problems.  The way to solve this was to set
> idle=poll.  The original patch I sent was to allow the user to change to
> idle=poll dynamically.  This way they could switch to the poll_idle and
> run there tests (requiring tsc not to drift) and then switch back to the
> default idle to save on electricity.

The problem with this is that this must be a one way transition. That
is, once the TSCs have become unsynchronized, there is no use going back
to using the polling idle unless you add some code to re-sync the TSCs
which would be ugly to do after the system has booted.

Using idle=poll (for anything other then debugging) is really a worst
case workaround for systems that do not have alternative clocksources
like ACPI PM or HPET.

Its an interesting bit of code, but I'm not really sure I understand its
usefulness.

thanks
-john




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  4:01                         ` Andrew Morton
@ 2005-11-29  6:44                           ` Ingo Molnar
  2005-11-29  6:55                             ` Nick Piggin
  2005-11-29 18:05                             ` Andi Kleen
  0 siblings, 2 replies; 56+ messages in thread
From: Ingo Molnar @ 2005-11-29  6:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Steven Rostedt, acpi-devel, len.brown, nando, rlrevell,
	linux-kernel, paulmck, kr, tglx, pluto, john.cooper, bene,
	dwalker, trini, george


* Andrew Morton <akpm@osdl.org> wrote:

> > The way to solve this was to set
> >  idle=poll.  The original patch I sent was to allow the user to change to
> >  idle=poll dynamically.  This way they could switch to the poll_idle and
> >  run there tests (requiring tsc not to drift) and then switch back to the
> >  default idle to save on electricity.
> 
> Use gettimeofday()?
> 
> If it's just for some sort of instrumentation, run NR_CPUS instances 
> of a niced-down busyloop, pin each one to a different CPU?  That way 
> the idle function doesn't get called at all..

idle=poll is also frequently done for performance reasons [it reduces 
idle wakeup latency by 10 usecs] - while it could be turned off if the 
system has been idle for some time. E.g. cpufreqd could sample idle time 
and turn on/off idle=poll. High-performance setups could enable it all 
the time.

as long as it can be done with zero-cost, i dont see why Steven's patch 
wouldnt be a plus for us. It's a performance thing, and having runtime 
switches for seemless performance features cannot be bad.

	Ingo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  6:44                           ` Ingo Molnar
@ 2005-11-29  6:55                             ` Nick Piggin
  2005-11-29 18:05                             ` Andi Kleen
  1 sibling, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2005-11-29  6:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Steven Rostedt, acpi-devel, len.brown, nando,
	rlrevell, linux-kernel, paulmck, kr, tglx, pluto, john.cooper,
	bene, dwalker, trini, george

Ingo Molnar wrote:
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> 
>>>The way to solve this was to set
>>> idle=poll.  The original patch I sent was to allow the user to change to
>>> idle=poll dynamically.  This way they could switch to the poll_idle and
>>> run there tests (requiring tsc not to drift) and then switch back to the
>>> default idle to save on electricity.
>>
>>Use gettimeofday()?
>>
>>If it's just for some sort of instrumentation, run NR_CPUS instances 
>>of a niced-down busyloop, pin each one to a different CPU?  That way 
>>the idle function doesn't get called at all..
> 
> 
> idle=poll is also frequently done for performance reasons [it reduces 
> idle wakeup latency by 10 usecs] - while it could be turned off if the 
> system has been idle for some time. E.g. cpufreqd could sample idle time 
> and turn on/off idle=poll. High-performance setups could enable it all 
> the time.
> 
> as long as it can be done with zero-cost, i dont see why Steven's patch 
> wouldnt be a plus for us. It's a performance thing, and having runtime 
> switches for seemless performance features cannot be bad.
> 

Why not just slightly cleanup and extend (eg. to ACPI) the
hlt_counter thingy that many architectures already have?

Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  2:48                   ` [RFC][PATCH] Runtime switching of the idle function [take 2] Steven Rostedt
  2005-11-29  3:02                     ` Andrew Morton
@ 2005-11-29 13:08                     ` Pavel Machek
  2005-12-18 15:26                       ` Steven Rostedt
  1 sibling, 1 reply; 56+ messages in thread
From: Pavel Machek @ 2005-11-29 13:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, acpi-devel, len.brown, Andrew Morton,
	Fernando Lopez-Lezcano, Lee Revell, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger

Hi!

> Description:
> 
> This patch creates a directory in /sys/kernel called idle.  This
> directory contains two files: idle_ctrl and idle_methods.  Reading
> idle_ctrl will show the function that is currently being used for idle,
> and idle_methods shows the available methods for the user to send write
> into idle_ctrl to change which function to use for idle.

Pretty ugly interface, I'd say... is listing function really neccessary?

				Pavel

-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29 18:05                             ` Andi Kleen
@ 2005-11-29 14:19                               ` Steven Rostedt
  2005-11-29 14:50                                 ` Andi Kleen
  2005-12-02  1:27                               ` Max Krasnyansky
  1 sibling, 1 reply; 56+ messages in thread
From: Steven Rostedt @ 2005-11-29 14:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, acpi-devel, len.brown, nando, rlrevell, linux-kernel,
	paulmck, kr, tglx, pluto, john.cooper, bene, dwalker, trini,
	george, akpm

On Tue, 2005-11-29 at 11:05 -0700, Andi Kleen so nicely wrote:
> > idle=poll is also frequently done for performance reasons [it reduces 
> > idle wakeup latency by 10 usecs] 
> 
> And it's obsolete on CPUs with monitor/mwait.

And I wish my system supported it.

> And in practice the CPU will run so hot that only benchmarkers like it.

Why would it run hot?  What's the difference between polling and doing
other things.  How many transistors does it take to poll?

> 
> I think switching idle is the wrong way to do. We should rather
> fix the various problems.
> 
> For fixing the TSC issue it is 100% the wrong approach Imho.

I would only say 80% the wrong approach, but that's me ;-)

> Basically software has to live with TSCs being unsynchronized
> and gettimeofday should do the right thing (and if not it should be fixed)

I guess the biggest complaint most have is that the rdtsc _is_ the
fastest way to read a clock.  If it isn't reliable, then what good is
it?  It's unfortunate that Intel didn't solidify the clock usage. Yes,
use HPET, or something else, but those are slower, and may not be on all
systems.  Every system that I owned had a tsc but for critical systems
it isn't up to par (what a shame).

> 
> - while it could be turned off if the 
> > system has been idle for some time. E.g. cpufreqd could sample idle time 
> > and turn on/off idle=poll. High-performance setups could enable it all 
> > the time.
> 
> And upgrade their server air condition or issue additional ear protection
> to the desktop user? Most likely you will just drive the CPUs into
> thermal throttle at some point with that, not get more performance anyways.

Again, what would make it so hot?  It is a waste of CPU cycles, and does
waste energy that way, but does it really heat up the CPU that much?
It's just a loop.  I've run much more complex algorithms for days
without any problems.  I only once over heated a CPU and that was doing
some brute force calculations of prime numbers.

>   
> > as long as it can be done with zero-cost, i dont see why Steven's patch 
> > wouldnt be a plus for us. It's a performance thing, and having runtime 
> > switches for seemless performance features cannot be bad.
> 
> The interface is ugly and I suspect fixing the various obscure race this
> obscure feature would undoubtedly add will be a long term maintenance
> issue. And it's the wrong thing to do anyways because it just papers
> over other problems that should be fixed in the right way.

Oh come now, it's not that ugly.  And it would not produce any more
obscure race conditions than the current method of changing idle with
the acpi processor_idle module has.

But I'll agree that this is more of a paper over than a solution.  Too
bad I wasted a day writing and testing it (mostly just to learn about
kobjects and sysfs which I still feel is very clumsy).

But since I did clean up the patch, and it is still useful for those
debugging problems with timers.  I'm supplying this cleaned up version
(Thank you Andrew for the comments).

-- Steve

Ingo, would you like this for -rt? Even if it will never be accepted
into mainline.


[take 3]:

Index: linux-2.6.15-rc2-git5/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/kernel/process.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/process.c	2005-11-29 07:43:52.000000000 -0500
@@ -39,6 +39,7 @@
 #include <linux/ptrace.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/idle.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -72,11 +73,6 @@
 	return ((unsigned long *)tsk->thread.esp)[3];
 }
 
-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
-EXPORT_SYMBOL(pm_idle);
 static DEFINE_PER_CPU(unsigned int, cpu_idle_state);
 
 void disable_hlt(void)
@@ -185,7 +181,7 @@
 				__get_cpu_var(cpu_idle_state) = 0;
 
 			rmb();
-			idle = pm_idle;
+			idle = idle_func;
 
 			if (!idle)
 				idle = default_idle;
@@ -250,6 +246,11 @@
 	}
 }
 
+static struct idle_info idle_mwait = {
+	.name = "mwait",
+	.func = mwait_idle
+};
+
 void __devinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_MWAIT)) {
@@ -258,25 +259,60 @@
 		 * Skip, if setup has overridden idle.
 		 * One CPU supports mwait => All CPUs supports mwait
 		 */
-		if (!pm_idle) {
+		register_idle(&idle_mwait);
+
+		if (!idle_func) {
 			printk("using mwait in idle threads.\n");
-			pm_idle = mwait_idle;
+			set_idle("mwait");
 		}
 	}
 }
 
+static struct idle_info idle_default = {
+	.name = "default",
+	.func = default_idle
+};
+
+static struct idle_info idle_poll = {
+	.name = "poll",
+	.func = poll_idle
+};
+
+static int __init add_idle(void)
+{
+	static int set;
+
+	if (set)
+		return 0;
+	set = 1;
+
+	register_idle(&idle_poll);
+
+	/*
+	 * Allow the user to switch out of poll_idle even
+	 * if it was a boot option.
+	 */
+	register_idle(&idle_default);
+
+	return 0;
+}
+
+arch_initcall(add_idle);
+
 static int __init idle_setup (char *str)
 {
+	add_idle();
 	if (!strncmp(str, "poll", 4)) {
 		printk("using polling idle threads.\n");
-		pm_idle = poll_idle;
+		set_idle("poll");
+
 #ifdef CONFIG_X86_SMP
 		if (smp_num_siblings > 1)
 			printk("WARNING: polling idle and HT enabled, performance may degrade.\n");
 #endif
 	} else if (!strncmp(str, "halt", 4)) {
 		printk("using halt in idle threads.\n");
-		pm_idle = default_idle;
+		set_idle("default");
 	}
 
 	boot_option_idle_override = 1;
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/kernel/process.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/process.c	2005-11-29 07:45:44.000000000 -0500
@@ -36,6 +36,8 @@
 #include <linux/utsname.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/spinlock.h>
+#include <linux/idle.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -60,10 +62,6 @@
 unsigned long boot_option_idle_override = 0;
 EXPORT_SYMBOL(boot_option_idle_override);
 
-/*
- * Powermanagement idle function, if any..
- */
-void (*pm_idle)(void);
 static DEFINE_PER_CPU(unsigned int, cpu_idle_state);
 
 void disable_hlt(void)
@@ -195,7 +193,7 @@
 				__get_cpu_var(cpu_idle_state) = 0;
 
 			rmb();
-			idle = pm_idle;
+			idle = idle_func;
 			if (!idle)
 				idle = default_idle;
 			if (cpu_is_offline(smp_processor_id()))
@@ -229,29 +227,68 @@
 	}
 }
 
+static struct idle_info idle_mwait = {
+	.name = "mwait",
+	.func = mwait_idle
+};
+
 void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
 {
 	static int printed;
 	if (cpu_has(c, X86_FEATURE_MWAIT)) {
+		register_idle(&idle_mwait);
+
 		/*
 		 * Skip, if setup has overridden idle.
 		 * One CPU supports mwait => All CPUs supports mwait
 		 */
-		if (!pm_idle) {
+		if (!idle_func) {
 			if (!printed) {
 				printk("using mwait in idle threads.\n");
 				printed = 1;
 			}
-			pm_idle = mwait_idle;
+			set_idle("mwait");
 		}
 	}
 }
 
+static struct idle_info idle_default = {
+	.name = "default",
+	.func = default_idle
+};
+
+static struct idle_info idle_poll = {
+	.name = "poll",
+	.func = poll_idle
+};
+
+static int __init add_idle(void)
+{
+	static int set;
+
+	if (set)
+		return 0;
+	set = 1;
+
+	register_idle(&idle_poll);
+
+	/*
+	 * Allow the user to switch out of poll_idle even
+	 * if it was a boot option.
+	 */
+	register_idle(&idle_default);
+
+	return 0;
+}
+arch_initcall(add_idle);
+
 static int __init idle_setup (char *str)
 {
+	add_idle();
+
 	if (!strncmp(str, "poll", 4)) {
 		printk("using polling idle threads.\n");
-		pm_idle = poll_idle;
+		set_idle("poll");
 	}
 
 	boot_option_idle_override = 1;
Index: linux-2.6.15-rc2-git5/drivers/acpi/processor_idle.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/drivers/acpi/processor_idle.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/drivers/acpi/processor_idle.c	2005-11-29 07:47:52.000000000 -0500
@@ -38,6 +38,8 @@
 #include <linux/dmi.h>
 #include <linux/moduleparam.h>
 #include <linux/sched.h>	/* need_resched() */
+#include <linux/spinlock.h>
+#include <linux/idle.h>
 
 #include <asm/io.h>
 #include <asm/uaccess.h>
@@ -56,6 +58,7 @@
 #define C3_OVERHEAD			4	/* 1us (3.579 ticks per us) */
 static void (*pm_idle_save) (void);
 module_param(max_cstate, uint, 0644);
+#define PM_IDLE_NAME "pm_idle"
 
 static unsigned int nocst = 0;
 module_param(nocst, uint, 0000);
@@ -891,13 +894,13 @@
 		return_VALUE(-ENODEV);
 
 	/* Fall back to the default idle loop */
-	pm_idle = pm_idle_save;
+	set_idle(NULL);
 	synchronize_sched();	/* Relies on interrupts forcing exit from idle. */
 
 	pr->flags.power = 0;
 	result = acpi_processor_get_power_info(pr);
 	if ((pr->flags.power == 1) && (pr->flags.power_setup_done))
-		pm_idle = acpi_processor_idle;
+		set_idle(PM_IDLE_NAME);
 
 	return_VALUE(result);
 }
@@ -983,6 +986,12 @@
 	.release = single_release,
 };
 
+static struct idle_info pm_idle_info = {
+	.name = PM_IDLE_NAME,
+	.func = acpi_processor_idle,
+	.freeze = 1
+};
+
 int acpi_processor_power_init(struct acpi_processor *pr,
 			      struct acpi_device *device)
 {
@@ -1032,8 +1041,12 @@
 		printk(")\n");
 
 		if (pr->id == 0) {
-			pm_idle_save = pm_idle;
-			pm_idle = acpi_processor_idle;
+			register_idle(&pm_idle_info);
+			/*
+			 * Just use the default idle
+			 */
+			pm_idle_save = get_idle(NULL);
+			set_idle(PM_IDLE_NAME);
 		}
 	}
 
@@ -1068,8 +1081,8 @@
 
 	/* Unregister the idle handler when processor #0 is removed. */
 	if (pr->id == 0) {
-		pm_idle = pm_idle_save;
-
+		set_idle(NULL);
+		unregister_idle(PM_IDLE_NAME);
 		/*
 		 * We are about to unload the current idle thread pm callback
 		 * (pm_idle), Wait for all processors to update cached/local
Index: linux-2.6.15-rc2-git5/include/linux/pm.h
===================================================================
--- linux-2.6.15-rc2-git5.orig/include/linux/pm.h	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/include/linux/pm.h	2005-11-28 20:31:47.000000000 -0500
@@ -25,6 +25,7 @@
 
 #include <linux/config.h>
 #include <linux/list.h>
+#include <linux/spinlock.h>
 #include <asm/atomic.h>
 
 /*
@@ -102,6 +103,8 @@
  */
 extern void (*pm_idle)(void);
 extern void (*pm_power_off)(void);
+extern spinlock_t pm_idle_switch_lock;
+extern int pm_idle_locked;
 
 typedef int __bitwise suspend_state_t;
 
Index: linux-2.6.15-rc2-git5/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/Kconfig	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/Kconfig	2005-11-28 20:31:47.000000000 -0500
@@ -69,6 +69,10 @@
 	bool
 	default y
 
+config DYNAMIC_IDLE
+	bool
+	default y
+
 source "init/Kconfig"
 
 
Index: linux-2.6.15-rc2-git5/arch/x86_64/kernel/x8664_ksyms.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/x86_64/kernel/x8664_ksyms.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/x86_64/kernel/x8664_ksyms.c	2005-11-28 20:31:47.000000000 -0500
@@ -58,7 +58,6 @@
 EXPORT_SYMBOL(disable_irq_nosync);
 EXPORT_SYMBOL(probe_irq_mask);
 EXPORT_SYMBOL(kernel_thread);
-EXPORT_SYMBOL(pm_idle);
 EXPORT_SYMBOL(pm_power_off);
 EXPORT_SYMBOL(get_cmos_time);
 
Index: linux-2.6.15-rc2-git5/include/linux/idle.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc2-git5/include/linux/idle.h	2005-11-28 20:31:47.000000000 -0500
@@ -0,0 +1,71 @@
+/*
+ *  idle.h - Registering of the idle function (for supported archs)
+ *
+ *  Copyright (C) 2005 Steven Rostedt <rostedt@goodmis.org>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef _LINUX_IDLE_H
+#define _LINUX_IDLE_H
+
+#include <linux/config.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/kobject.h>
+#include <asm/atomic.h>
+
+typedef void (*idlefunc_t)(void);
+
+struct idle_info {
+	struct list_head list;
+	const char *name;	/* Name visible to users */
+	idlefunc_t func;	/* idle function to run  */
+	int freeze;		/* Only allow kernel to add or remove */
+	int inuse;		/* set when being used */
+#ifdef CONFIG_SYSFS
+	struct kobject kobj;
+#endif
+};
+
+/*
+ * Registering and unregistering functions that may be used
+ * instead of the default idle function.  This only adds
+ * them to the list of functions to be used, it does not
+ * set the
+ */
+extern int register_idle(struct idle_info *info);
+extern int unregister_idle(const char *name);
+
+/*
+ * This sets the idle function to the registered function
+ * by name.  Use NULL to set the idle function back to
+ * the default.
+ */
+extern int set_idle(const char *name);
+
+/*
+ * Return the function that is registered by name.
+ * Use NULL to get the default function.
+ * NULL may be returned (as that may be what the current
+ * idle function is set to, to use a default). NULL will
+ * also be returned if name is not registered.
+ */
+extern idlefunc_t get_idle(const char *name);
+
+extern idlefunc_t idle_func;
+
+#endif /* _LINUX_IDLE_H */
Index: linux-2.6.15-rc2-git5/kernel/Makefile
===================================================================
--- linux-2.6.15-rc2-git5.orig/kernel/Makefile	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/kernel/Makefile	2005-11-28 20:31:47.000000000 -0500
@@ -32,6 +32,7 @@
 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-$(CONFIG_DYNAMIC_IDLE) += idle.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6.15-rc2-git5/kernel/idle.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.15-rc2-git5/kernel/idle.c	2005-11-28 20:31:47.000000000 -0500
@@ -0,0 +1,308 @@
+/*
+ *  kernel/idle.c
+ *
+ *  Setting up of the idle function to be dynamic.
+ *
+ *  Copyright (C) 2005 Steven Rostedt
+ */
+#include <linux/module.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/spinlock.h>
+#include <linux/idle.h>
+
+idlefunc_t idle_func;
+
+static void (*idle_default)(void);
+static LIST_HEAD(idle_elements);
+static DECLARE_MUTEX(idle_sem);
+static struct idle_info *curr_idle;
+
+#ifdef CONFIG_SYSFS
+int idle_sysfs_init;
+#endif
+
+extern void poll_idle (void);
+
+static struct idle_info *__find_idle_info(const char *name)
+{
+	struct list_head *curr;
+	struct idle_info *p;
+	/*
+	 * A little inefficient, but this isn't called often.
+	 */
+	list_for_each(curr, &idle_elements) {
+		p = list_entry(curr, struct idle_info, list);
+		if (!strcmp(name, p->name))
+			break;
+	}
+	if (curr == &idle_elements)
+		p = NULL;
+
+	return p;
+}
+
+int register_idle(struct idle_info *info)
+{
+	struct idle_info *p;
+	int ret = -EEXIST;
+
+	BUG_ON(!info->name);
+
+	down(&idle_sem);
+
+	p = __find_idle_info(info->name);
+	if (p)
+		goto out;
+	ret = 0;
+
+	list_add(&info->list, &idle_elements);
+
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(register_idle);
+
+int unregister_idle(const char *name)
+{
+	struct idle_info *p;
+	int ret = -EINVAL;
+
+	BUG_ON(!name);
+
+	down(&idle_sem);
+
+	p = __find_idle_info(name);
+	if (!p)
+		goto out;
+	if (p->inuse) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	ret = 0;
+
+	list_del_init(&p->list);
+
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(unregister_idle);
+
+static int __set_idle(struct idle_info *info)
+{
+	if (curr_idle)
+		curr_idle->inuse--;
+	info->inuse++;
+	curr_idle = info;
+	return 0;
+}
+
+int set_idle(const char *name)
+{
+	struct idle_info *p;
+	int ret = 0;
+
+	down(&idle_sem);
+
+	if (!name) {
+		/* Set to the default function */
+		if (curr_idle) {
+			curr_idle->inuse--;
+			curr_idle = NULL;
+		}
+		idle_func = idle_default;
+		goto out;
+	}
+
+	ret = -EINVAL;
+	p = __find_idle_info(name);
+	if (!p)
+		goto out;
+
+	__set_idle(p);
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(set_idle);
+
+idlefunc_t get_idle(const char *name)
+{
+	struct idle_info *p;
+	idlefunc_t ret = idle_default;
+
+	down(&idle_sem);
+
+	if (!name)
+		goto out;
+
+	p = __find_idle_info(name);
+	if (!p)
+		goto out;
+
+	ret = p->func;
+out:
+	up(&idle_sem);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(get_idle);
+
+#ifdef CONFIG_SYSFS
+#define KERNEL_ATTR_RW(_name) \
+static struct subsys_attribute _name##_attr = \
+	__ATTR(_name, 0644, _name##_show, _name##_store)
+
+static struct idlep_kobject
+{
+	struct kobject kobj;
+} idle_kobj;
+
+static ssize_t idle_ctrl_show(struct subsystem *subsys, char *page)
+{
+	ssize_t ret;
+	char *star = "";
+	const char *name = "default";
+
+	down(&idle_sem);
+	if (curr_idle) {
+		name = curr_idle->name;
+		if (curr_idle->freeze)
+			star = "*";
+	}
+	ret = sprintf(page, "%s%s\n", star, name);
+	up(&idle_sem);
+
+	return ret;
+}
+
+static ssize_t idle_ctrl_store(struct subsystem *subsys,
+			       const char *buf, size_t len)
+{
+	struct list_head *curr;
+	struct idle_info *p;
+	ssize_t ret = -EBUSY;
+
+	down(&idle_sem);
+
+	if (curr_idle && curr_idle->freeze)
+		goto out;
+
+	list_for_each(curr, &idle_elements) {
+		int size;
+		p = list_entry(curr, struct idle_info, list);
+
+		size = strlen(p->name);
+		if (len <= size)
+			continue;
+		if (!strncmp(p->name, buf, size))
+			break;
+	}
+	if (curr == &idle_elements) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * This idle routine may have been registered to
+	 * not allow users to add or remove this.
+	 */
+	if (p->freeze)
+		goto out;
+
+	__set_idle(p);
+
+	ret = len;
+out:
+	up(&idle_sem);
+
+	return ret;
+}
+
+KERNEL_ATTR_RW(idle_ctrl);
+
+static ssize_t idle_methods_show(struct subsystem *subsys, char *page)
+{
+	struct list_head *curr;
+	struct idle_info *p;
+	ssize_t len = 0;
+
+	down(&idle_sem);
+	list_for_each(curr, &idle_elements) {
+		p = list_entry(curr, struct idle_info, list);
+		if (len + 3 + strlen(p->name) >= PAGE_SIZE) {
+			printk("idle functions overflowed sysfs??\n");
+			break;
+		}
+		len += sprintf(page+len, "%s%s%s",
+			       len ? " " : "",
+			       p->freeze ? "*" : "",
+			       p->name);
+	}
+	if (len + 2 < PAGE_SIZE)
+		len += sprintf(page+len, "\n");
+
+	up(&idle_sem);
+	return len;
+}
+
+static ssize_t idle_methods_store(struct subsystem *subsys,
+				  const char *buf, size_t len)
+{
+	/* do nothing */
+	return len;
+}
+
+KERNEL_ATTR_RW(idle_methods);
+
+static struct attribute * idle_attrs[] = {
+	&idle_ctrl_attr.attr,
+	&idle_methods_attr.attr,
+	NULL
+};
+
+static struct attribute_group idle_attr_group = {
+	.attrs = idle_attrs,
+};
+
+static int __init idle_setup_sysfs(void)
+{
+	int err;
+
+	memset(&idle_kobj, 0, sizeof(idle_kobj));
+	err = kobject_set_name(&idle_kobj.kobj, "%s", "idle");
+	if (err)
+		goto out;
+
+	kobj_set_kset_s(&idle_kobj, kernel_subsys);
+
+	idle_kobj.kobj.parent = &kernel_subsys.kset.kobj;
+	err = kobject_register(&idle_kobj.kobj);
+	if (err)
+		goto out;
+
+	err = sysfs_create_group(&idle_kobj.kobj,
+				 &idle_attr_group);
+	if (err)
+		goto out;
+
+       	return 0;
+out:
+	printk(KERN_INFO "Problem setting up sysfs idle_ctrl\n");
+	return 0;
+}
+#endif /* CONFIG_SYSFS */
+
+static int __init idle_setup(void)
+{
+	idle_default = idle_func;
+
+#ifdef CONFIG_SYSFS
+	idle_setup_sysfs();
+#endif
+	return 0;
+}
+
+late_initcall(idle_setup);
Index: linux-2.6.15-rc2-git5/arch/i386/Kconfig
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/Kconfig	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/Kconfig	2005-11-28 20:31:47.000000000 -0500
@@ -45,6 +45,10 @@
 	bool
 	default y
 
+config DYNAMIC_IDLE
+	bool
+	default y
+
 source "init/Kconfig"
 
 menu "Processor type and features"
Index: linux-2.6.15-rc2-git5/arch/i386/kernel/apm.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/i386/kernel/apm.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/i386/kernel/apm.c	2005-11-28 20:31:47.000000000 -0500
@@ -225,6 +225,7 @@
 #include <linux/smp_lock.h>
 #include <linux/dmi.h>
 #include <linux/suspend.h>
+#include <linux/idle.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -2220,6 +2221,9 @@
 	{ }
 };
 
+static struct idle_info apm_idle;
+#define APM_IDLE_NAME "apm"
+
 /*
  * Just start the APM thread. We do NOT want to do APM BIOS
  * calls from anything but the APM thread, if for no other reason
@@ -2373,8 +2377,14 @@
 	if (HZ != 100)
 		idle_period = (idle_period * HZ) / 100;
 	if (idle_threshold < 100) {
-		original_pm_idle = pm_idle;
-		pm_idle  = apm_cpu_idle;
+                memset(&apm_idle, 0, sizeof(apm_idle));
+                apm_idle.name = APM_IDLE_NAME;
+                apm_idle.func = apm_cpu_idle;
+                apm_idle.freeze = 1;
+                register_idle(&apm_idle);
+
+		original_pm_idle = get_idle(NULL);
+                set_idle(APM_IDLE_NAME);
 		set_pm_idle = 1;
 	}
 
@@ -2386,7 +2396,26 @@
 	int	error;
 
 	if (set_pm_idle) {
-		pm_idle = original_pm_idle;
+		int tries = 0;
+		int ret;
+		set_idle(NULL);
+		do {
+			if ((ret = unregister_idle(APM_IDLE_NAME)) == 0)
+				break;
+			/*
+			 * for some reason the idle function is being used.
+			 * Wait a little and then try again.
+			 */
+			if (ret == -EINVAL) {
+				printk(KERN_WARNING
+				       "APM idle function never registered?\n");
+				break;
+			}
+			yield();
+		} while (tries++ < 10);
+		if (tries > 10)
+			printk(KERN_WARNING
+			       "Unable to unresgister APM idle function\n");
 		/*
 		 * We are about to unload the current idle thread pm callback
 		 * (pm_idle), Wait for all processors to update cached/local
Index: linux-2.6.15-rc2-git5/arch/ia64/Kconfig
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/Kconfig	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/Kconfig	2005-11-28 20:31:47.000000000 -0500
@@ -62,6 +62,10 @@
 	bool
 	default y
 
+config DYNAMIC_IDLE
+	bool
+	default y
+
 choice
 	prompt "System type"
 	default IA64_GENERIC
Index: linux-2.6.15-rc2-git5/arch/ia64/kernel/acpi.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/kernel/acpi.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/kernel/acpi.c	2005-11-28 20:31:47.000000000 -0500
@@ -60,8 +60,6 @@
 
 #define PREFIX			"ACPI: "
 
-void (*pm_idle) (void);
-EXPORT_SYMBOL(pm_idle);
 void (*pm_power_off) (void);
 EXPORT_SYMBOL(pm_power_off);
 
Index: linux-2.6.15-rc2-git5/arch/ia64/kernel/process.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/kernel/process.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/kernel/process.c	2005-11-28 20:31:47.000000000 -0500
@@ -31,6 +31,7 @@
 #include <linux/interrupt.h>
 #include <linux/delay.h>
 #include <linux/kprobes.h>
+#include <linux/idle.h>
 
 #include <asm/cpu.h>
 #include <asm/delay.h>
@@ -289,7 +290,7 @@
 			if (mark_idle)
 				(*mark_idle)(1);
 
-			idle = pm_idle;
+			idle = idle_func;
 			if (!idle)
 				idle = default_idle;
 			(*idle)();
Index: linux-2.6.15-rc2-git5/arch/ia64/kernel/setup.c
===================================================================
--- linux-2.6.15-rc2-git5.orig/arch/ia64/kernel/setup.c	2005-11-28 20:31:24.000000000 -0500
+++ linux-2.6.15-rc2-git5/arch/ia64/kernel/setup.c	2005-11-29 07:46:59.000000000 -0500
@@ -43,6 +43,7 @@
 #include <linux/initrd.h>
 #include <linux/platform.h>
 #include <linux/pm.h>
+#include <linux/idle.h>
 
 #include <asm/ia32.h>
 #include <asm/machvec.h>
@@ -738,6 +739,11 @@
 		ia64_max_cacheline_size = max;
 }
 
+struct idle_info idle_default = {
+	.name = "default",
+	.func = default_idle
+};
+
 /*
  * cpu_init() initializes state that is per-CPU.  This function acts
  * as a 'CPU state barrier', nothing should get across.
@@ -861,7 +867,10 @@
 	/* size of physical stacked register partition plus 8 bytes: */
 	__get_cpu_var(ia64_phys_stacked_size_p8) = num_phys_stacked*8 + 8;
 	platform_cpu_init();
-	pm_idle = default_idle;
+
+	register_idle(&idle_default);
+
+	set_idle("default");
 }
 
 void



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  4:22                         ` john stultz
@ 2005-11-29 14:22                           ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-11-29 14:22 UTC (permalink / raw)
  To: john stultz
  Cc: Andrew Morton, mingo, acpi-devel, len.brown, nando, rlrevell,
	linux-kernel, paulmck, kr, tglx, pluto, john.cooper, bene,
	dwalker, trini, george

On Mon, 2005-11-28 at 20:22 -0800, john stultz wrote:
> On Mon, 2005-11-28 at 22:42 -0500, Steven Rostedt wrote:
> > On Mon, 2005-11-28 at 19:02 -0800, Andrew Morton wrote:
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >
> > > > This patch creates a directory in /sys/kernel called idle.
> > > >
> > > 
> > > At no point do you appear to explain _why_ the kernel needs this feature?
> > 
> > Sorry about that.  This originally came up when we had problems with the
> > AMD64 x2 in the -rt patch.  It was noted that the TSCs would get very
> > far out of sync and cause problems.  The way to solve this was to set
> > idle=poll.  The original patch I sent was to allow the user to change to
> > idle=poll dynamically.  This way they could switch to the poll_idle and
> > run there tests (requiring tsc not to drift) and then switch back to the
> > default idle to save on electricity.
> 
> The problem with this is that this must be a one way transition. That
> is, once the TSCs have become unsynchronized, there is no use going back
> to using the polling idle unless you add some code to re-sync the TSCs
> which would be ugly to do after the system has booted.
> 

I've thought about that too.  But this patch does allow you to start
with  idle=poll and then switch back.  Also, if you do lock to a cpu,
you don't need to worry about the tsc from slipping if you switch to
idle=poll.

-- Steve

> Using idle=poll (for anything other then debugging) is really a worst
> case workaround for systems that do not have alternative clocksources
> like ACPI PM or HPET.
> 
> Its an interesting bit of code, but I'm not really sure I understand its
> usefulness.
> 
> thanks
> -john
> 
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29 14:19                               ` Steven Rostedt
@ 2005-11-29 14:50                                 ` Andi Kleen
  2005-11-29 15:42                                   ` Steven Rostedt
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2005-11-29 14:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Andi Kleen, Ingo Molnar, acpi-devel, len.brown, nando, rlrevell,
	linux-kernel, paulmck, kr, tglx, pluto, john.cooper, bene,
	dwalker, trini, george, akpm

On Tue, Nov 29, 2005 at 09:19:31AM -0500, Steven Rostedt wrote:
> > And in practice the CPU will run so hot that only benchmarkers like it.
> 
> Why would it run hot?  What's the difference between polling and doing
> other things.  How many transistors does it take to poll?

It will prevent the CPU from going into sleep states and essentially
keep most of it enabled.  

> 
> > 
> > I think switching idle is the wrong way to do. We should rather
> > fix the various problems.
> > 
> > For fixing the TSC issue it is 100% the wrong approach Imho.
> 
> I would only say 80% the wrong approach, but that's me ;-)
> 
> > Basically software has to live with TSCs being unsynchronized
> > and gettimeofday should do the right thing (and if not it should be fixed)
> 
> I guess the biggest complaint most have is that the rdtsc _is_ the
> fastest way to read a clock.  If it isn't reliable, then what good is

It's the fastest way to read something which needs quite complex
knowledge to turn into a reliable clock value. In general only
the kernel has this knowledge. 

And gettimeofday is optimized to give you the fatest reliable
clock. 

> it?  It's unfortunate that Intel didn't solidify the clock usage. Yes,
> use HPET, or something else, but those are slower, and may not be on all
> systems.  Every system that I owned had a tsc but for critical systems
> it isn't up to par (what a shame).

Just use gettimeofday. It shields you from all that and when
the hardware supports it is quite fast too.

> > > system has been idle for some time. E.g. cpufreqd could sample idle time 
> > > and turn on/off idle=poll. High-performance setups could enable it all 
> > > the time.
> > 
> > And upgrade their server air condition or issue additional ear protection
> > to the desktop user? Most likely you will just drive the CPUs into
> > thermal throttle at some point with that, not get more performance anyways.
> 
> Again, what would make it so hot?  It is a waste of CPU cycles, and does
> waste energy that way, but does it really heat up the CPU that much?

Yes it does.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29 14:50                                 ` Andi Kleen
@ 2005-11-29 15:42                                   ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-11-29 15:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, acpi-devel, len.brown, nando, rlrevell, linux-kernel,
	paulmck, kr, tglx, pluto, john.cooper, bene, dwalker, trini,
	george, akpm

On Tue, 2005-11-29 at 15:50 +0100, Andi Kleen wrote:
> On Tue, Nov 29, 2005 at 09:19:31AM -0500, Steven Rostedt wrote:
> > > And in practice the CPU will run so hot that only benchmarkers like it.
> > 
> > Why would it run hot?  What's the difference between polling and doing
> > other things.  How many transistors does it take to poll?
> 
> It will prevent the CPU from going into sleep states and essentially
> keep most of it enabled.  

Well, there's one thing that my patch _does_ help with.  (And it has
just helped me now).  If you boot up with idle=poll and forget about it,
you can check what idle routine is being used and switch out of poll
without rebooting. (like I'm doing right now :-)

-- Steve


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29  6:44                           ` Ingo Molnar
  2005-11-29  6:55                             ` Nick Piggin
@ 2005-11-29 18:05                             ` Andi Kleen
  2005-11-29 14:19                               ` Steven Rostedt
  2005-12-02  1:27                               ` Max Krasnyansky
  1 sibling, 2 replies; 56+ messages in thread
From: Andi Kleen @ 2005-11-29 18:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, acpi-devel, len.brown, nando, rlrevell,
	linux-kernel, paulmck, kr, tglx, pluto, john.cooper, bene,
	dwalker, trini, george, akpm

Ingo Molnar <mingo@elte.hu> writes:

> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > > The way to solve this was to set
> > >  idle=poll.  The original patch I sent was to allow the user to change to
> > >  idle=poll dynamically.  This way they could switch to the poll_idle and
> > >  run there tests (requiring tsc not to drift) and then switch back to the
> > >  default idle to save on electricity.
> > 
> > Use gettimeofday()?
> > 
> > If it's just for some sort of instrumentation, run NR_CPUS instances 
> > of a niced-down busyloop, pin each one to a different CPU?  That way 
> > the idle function doesn't get called at all..
> 
> idle=poll is also frequently done for performance reasons [it reduces 
> idle wakeup latency by 10 usecs] 

And it's obsolete on CPUs with monitor/mwait.
And in practice the CPU will run so hot that only benchmarkers like it.

I think switching idle is the wrong way to do. We should rather
fix the various problems.

For fixing the TSC issue it is 100% the wrong approach Imho.
Basically software has to live with TSCs being unsynchronized
and gettimeofday should do the right thing (and if not it should be fixed)

- while it could be turned off if the 
> system has been idle for some time. E.g. cpufreqd could sample idle time 
> and turn on/off idle=poll. High-performance setups could enable it all 
> the time.

And upgrade their server air condition or issue additional ear protection
to the desktop user? Most likely you will just drive the CPUs into
thermal throttle at some point with that, not get more performance anyways.
  
> as long as it can be done with zero-cost, i dont see why Steven's patch 
> wouldnt be a plus for us. It's a performance thing, and having runtime 
> switches for seemless performance features cannot be bad.

The interface is ugly and I suspect fixing the various obscure race this
obscure feature would undoubtedly add will be a long term maintenance
issue. And it's the wrong thing to do anyways because it just papers
over other problems that should be fixed in the right way.

-Andi

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29 18:05                             ` Andi Kleen
  2005-11-29 14:19                               ` Steven Rostedt
@ 2005-12-02  1:27                               ` Max Krasnyansky
  2005-12-02  1:45                                 ` Andi Kleen
  1 sibling, 1 reply; 56+ messages in thread
From: Max Krasnyansky @ 2005-12-02  1:27 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Steven Rostedt, acpi-devel, len.brown, nando,
	rlrevell, linux-kernel, paulmck, kr, tglx, pluto, john.cooper,
	bene, dwalker, trini, george, akpm

Andi Kleen wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>>> If it's just for some sort of instrumentation, run NR_CPUS instances 
>>> of a niced-down busyloop, pin each one to a different CPU?  That way 
>>> the idle function doesn't get called at all..
>> idle=poll is also frequently done for performance reasons [it reduces 
>> idle wakeup latency by 10 usecs] 
> 
> And it's obsolete on CPUs with monitor/mwait.
There are some platforms for example IBM ZPro Xeon based machines where
monitor/mwait seems to trigger some kind of SMM and introduce horrible latencies.
With idle=poll ZPros show pretty good worst case latencies, in the order of 10usec
(tested with RTAI/Fusion). With default idle (ie mwait) even average latency is in
hundreds of milliseconds.
You might argue that it's a bug in the their HW design or something but as it stands
today I wouldn't say that monitor/mwait obsoletes idle=poll.

Also IMO saying that CPU will run too hot with idle=poll is basically saying that those
CPUs cannot be used for simulations and stuff which run flat out for days (months actually).
Which is obviously not true (again speaking from experience :)).

Max

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-12-02  1:27                               ` Max Krasnyansky
@ 2005-12-02  1:45                                 ` Andi Kleen
  2005-12-03  2:17                                   ` Max Krasnyansky
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2005-12-02  1:45 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Andi Kleen, Ingo Molnar, Steven Rostedt, acpi-devel, len.brown,
	nando, rlrevell, linux-kernel, paulmck, kr, tglx, pluto,
	john.cooper, bene, dwalker, trini, george, akpm

> Also IMO saying that CPU will run too hot with idle=poll is basically 
> saying that those
> CPUs cannot be used for simulations and stuff which run flat out for days 
> (months actually).
> Which is obviously not true (again speaking from experience :)).

The CPUs can be used, but many cooling setups
(both AirCon in complete data centers, cooling in Blade Racks, laptops)  
the cooling is now often designed to not cool
the maximum thermal output of all systems in parallel, but instead
throttle the systems when things get too hot. This usually
works because in most workloads systems are more often idle
than busy, so no throttling is needed.

On desktops it probably won't throttle, but just become noisy
when all the fans spin up.

All things you don't really want.

Super computing is different of course, but even there maximum
capacity of the air condition often limits how many CPUs you can buy.
And you need all the help you can get.

That said you're right that there is still a small niche 
where idle=poll makes sense, but it's definitely nothing
that should be encouraged to be used regularly like that
original patch would.

-Andi


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-12-02  1:45                                 ` Andi Kleen
@ 2005-12-03  2:17                                   ` Max Krasnyansky
  0 siblings, 0 replies; 56+ messages in thread
From: Max Krasnyansky @ 2005-12-03  2:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Steven Rostedt, acpi-devel, len.brown, nando,
	rlrevell, linux-kernel, paulmck, kr, tglx, pluto, john.cooper,
	bene, dwalker, trini, george, akpm

Andi Kleen wrote:
>> Also IMO saying that CPU will run too hot with idle=poll is basically 
>> saying that those
>> CPUs cannot be used for simulations and stuff which run flat out for days 
>> (months actually).
>> Which is obviously not true (again speaking from experience :)).
> 
> The CPUs can be used, but many cooling setups
> (both AirCon in complete data centers, cooling in Blade Racks, laptops)  
> the cooling is now often designed to not cool
> the maximum thermal output of all systems in parallel, but instead
> throttle the systems when things get too hot. This usually
> works because in most workloads systems are more often idle
> than busy, so no throttling is needed.
> 
> On desktops it probably won't throttle, but just become noisy
> when all the fans spin up.
> 
> All things you don't really want.
We do it (simulations that is) on normal 1U and desktop machines. No special
cooling and stuff. And it does not cause any problems. Granted we don't use
cheap/crappy machines but still it's unmodified off-the-shelf HW.

btw That ZPro machine that I mentioned used to run with idle=poll for weeks
and fans would never spin up unless you put real load on it.

> Super computing is different of course, but even there maximum
> capacity of the air condition often limits how many CPUs you can buy.
> And you need all the help you can get.
> 
> That said you're right that there is still a small niche 
> where idle=poll makes sense, but it's definitely nothing
> that should be encouraged to be used regularly like that
> original patch would.
Agreed.

Max

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC][PATCH] Runtime switching of the idle function [take 2]
  2005-11-29 13:08                     ` Pavel Machek
@ 2005-12-18 15:26                       ` Steven Rostedt
  0 siblings, 0 replies; 56+ messages in thread
From: Steven Rostedt @ 2005-12-18 15:26 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ingo Molnar, acpi-devel, len.brown, Andrew Morton,
	Fernando Lopez-Lezcano, Lee Revell, linux-kernel,
	Paul E. McKenney, K.R. Foley, Thomas Gleixner, pluto, john cooper,
	Benedikt Spranger, Daniel Walker, Tom Rini, George Anzinger


On Tue, 29 Nov 2005, Pavel Machek wrote:

> Hi!
>
> > Description:
> >
> > This patch creates a directory in /sys/kernel called idle.  This
> > directory contains two files: idle_ctrl and idle_methods.  Reading
> > idle_ctrl will show the function that is currently being used for idle,
> > and idle_methods shows the available methods for the user to send write
> > into idle_ctrl to change which function to use for idle.
>
> Pretty ugly interface, I'd say... is listing function really neccessary?
>

What interface would you prefer?  And the listing was a feature request
made by Ingo.

But this is pretty much moot, since the patch is not going any further
than the RT patch. And even then, it probably is only temporary, if it is
even still in there (I haven't checked).

--Steve


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2005-12-18 15:26 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-15  9:08 2.6.14-rt13 Ingo Molnar
2005-11-15 16:36 ` 2.6.14-rt13 Mark Knecht
2005-11-15 19:57   ` 2.6.14-rt13 Paul E. McKenney
2005-11-16  3:48 ` 2.6.14-rt13 K.R. Foley
2005-11-16  8:40   ` 2.6.14-rt13 Ingo Molnar
2005-11-16 17:02     ` 2.6.14-rt13 Paul E. McKenney
2005-11-18 18:02 ` 2.6.14-rt13 Fernando Lopez-Lezcano
2005-11-18 21:54   ` 2.6.14-rt13 Lee Revell
2005-11-18 22:05     ` 2.6.14-rt13 Fernando Lopez-Lezcano
2005-11-18 22:07       ` 2.6.14-rt13 Ingo Molnar
2005-11-18 22:15         ` 2.6.14-rt13 Lee Revell
2005-11-18 22:25           ` 2.6.14-rt13 Steven Rostedt
2005-11-18 23:36             ` 2.6.14-rt13 Fernando Lopez-Lezcano
2005-11-18 23:57               ` 2.6.14-rt13 Steven Rostedt
2005-11-18 22:41         ` 2.6.14-rt13 Fernando Lopez-Lezcano
2005-11-19  2:39           ` 2.6.14-rt13 Steven Rostedt
2005-11-24 15:07             ` 2.6.14-rt13 Ingo Molnar
2005-11-24 15:21               ` 2.6.14-rt13 Steven Rostedt
2005-11-25 20:56               ` [RFC][PATCH] Runtime switching to idle_poll (was: Re: 2.6.14-rt13) Steven Rostedt
2005-11-26 13:05                 ` Ingo Molnar
2005-11-29  2:48                   ` [RFC][PATCH] Runtime switching of the idle function [take 2] Steven Rostedt
2005-11-29  3:02                     ` Andrew Morton
2005-11-29  3:42                       ` Steven Rostedt
2005-11-29  4:01                         ` Andrew Morton
2005-11-29  6:44                           ` Ingo Molnar
2005-11-29  6:55                             ` Nick Piggin
2005-11-29 18:05                             ` Andi Kleen
2005-11-29 14:19                               ` Steven Rostedt
2005-11-29 14:50                                 ` Andi Kleen
2005-11-29 15:42                                   ` Steven Rostedt
2005-12-02  1:27                               ` Max Krasnyansky
2005-12-02  1:45                                 ` Andi Kleen
2005-12-03  2:17                                   ` Max Krasnyansky
2005-11-29  4:22                         ` john stultz
2005-11-29 14:22                           ` Steven Rostedt
2005-11-29 13:08                     ` Pavel Machek
2005-12-18 15:26                       ` Steven Rostedt
2005-11-18 22:13       ` 2.6.14-rt13 Lee Revell
2005-11-18 22:32         ` 2.6.14-rt13 Vojtech Pavlik
2005-11-19  2:28           ` 2.6.14-rt13 George Anzinger
2005-11-19  7:45             ` 2.6.14-rt13 Vojtech Pavlik
2005-11-19 18:27               ` 2.6.14-rt13 Lee Revell
2005-11-21 21:32 ` 2.6.14-rt13 Fernando Lopez-Lezcano
2005-11-21 21:41   ` 2.6.14-rt13 john stultz
     [not found]   ` <20051121221511.GA7255@elte.hu>
2005-11-21 22:19     ` test time-warps [was: Re: 2.6.14-rt13] Ingo Molnar
2005-11-21 23:08       ` Fernando Lopez-Lezcano
2005-11-21 23:38       ` Fernando Lopez-Lezcano
2005-11-21 23:41       ` john stultz
2005-11-22  1:31         ` Lee Revell
2005-11-22  1:15       ` Steven Rostedt
2005-11-22 11:16         ` Ingo Molnar
2005-11-22 17:49           ` Fernando Lopez-Lezcano
2005-11-22 18:01             ` Christopher Friesen
2005-11-22 18:22               ` Steven Rostedt
2005-11-22 20:52                 ` Ingo Molnar
2005-11-22 11:19   ` 2.6.14-rt13 Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox