gradual timeofday overhaul

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* gradual timeofday overhaul
  2004-10-20  0:59     ` john stultz
@ 2004-10-20  3:05       ` Tim Schmielau
  2004-10-20  7:47         ` Len Brown
  2004-10-20 18:13         ` john stultz
  0 siblings, 2 replies; 17+ messages in thread
From: Tim Schmielau @ 2004-10-20  3:05 UTC (permalink / raw)
  To: john stultz; +Cc: lkml, george anzinger

On Tue, 19 Oct 2004, john stultz wrote:

> As for the timeofday overhaul, I've had zero time to work on it
> recently. I hate that I dropped code and then went missing for weeks.
> I'll have to see if I can get a few cycles at home to sync up my current
> tree and send it out. 

I still haven't looked at your code and it's discussion. From what I
remember, I liked your proposal very much. It's surely where we want to
end up someday. But from the above mail it strikes me that we just don't
have enough manpower to get there all at once, so we should have a plan 
for the time code to gradually evolve into what we finally want. I think 
we could do it in the following steps:

  1. Sync up jiffies with the monotonic clock, very much like we
     already handle lost ticks. This would immediately remove the
     hassles with incompatible time sources.
     Judging from the jiffies wrap experience, we there probably are
     some drivers which need fixing (mostly because they wait until 
     jiffies==something), but these are bugs already right now
     in the case of lost ticks.

  2. Decouple jiffies from the actual interrupt counter. We could
     then e.g. set HZ to 10000, also increasing the resolution of
     timers, without increasing the interrupt frequency.
     We'd then need to identify the places where this might lead to
     overflows and promote them to use jiffies_64 instead of jiffies
     (where this hasn't been done already).

  3. Increase HZ all the way up to 1e9. jiffies_64 would then be the
     same as your plain 64 bit nanoseconds value.
     This would require an optimization to the timer code to be able
     to increment jiffies in steps larger than 1.

Thoughts?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
@ 2004-10-20  7:47         ` Len Brown
  2004-10-20 15:09           ` George Anzinger
                             ` (2 more replies)
  2004-10-20 18:13         ` john stultz
  1 sibling, 3 replies; 17+ messages in thread
From: Len Brown @ 2004-10-20  7:47 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: john stultz, lkml, george anzinger

On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
> I think we could do it in the following steps:
> 
>   1. Sync up jiffies with the monotonic clock,...
>   2. Decouple jiffies from the actual interrupt counter...
>   3. Increase HZ all the way up to 1e9....

> Thoughts?

Yes, for long periods of idle, I'd like to see the periodic clock tick
disabled entirely.  Clock ticks causes the hardware to exit power-saving
idle states.

The current design with HZ=1000 gives us 1ms = 1000usec between clock
ticks.  But some platforms take nearly that long just to enter/exit low
power states; which means that on Linux the hardware pays a long idle
state exit latency (performance hit) but gets little or no power savings
from the time it resides in that idle state.

thanks,
-Len

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  7:47         ` Len Brown
@ 2004-10-20 15:09           ` George Anzinger
  2004-10-20 15:59             ` Richard B. Johnson
  2004-10-20 15:17           ` George Anzinger
  2004-10-20 17:09           ` Lee Revell
  2 siblings, 1 reply; 17+ messages in thread
From: George Anzinger @ 2004-10-20 15:09 UTC (permalink / raw)
  To: Len Brown; +Cc: Tim Schmielau, john stultz, lkml

Len Brown wrote:
> On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
> 
>>I think we could do it in the following steps:
>>
>>  1. Sync up jiffies with the monotonic clock,...
>>  2. Decouple jiffies from the actual interrupt counter...
>>  3. Increase HZ all the way up to 1e9....

Before we do any of the above, I think we need to stop and ponder just what a 
"jiffie" is.  Currently it is, by default (or historically) the "basic tick" of 
the system clock.  On top of this a lot of interpolation code has been "grafted" 
to allow the system to resolve time to finer levels, i.e. to the nanosecond. 
But none of this interpolation code actually changes the tick, i.e. the 
interrupt still happens at the same periodic rate.

As the "basic tick", it is used to do a lot of accounting and scheduling house 
keeping AND as a driver of the system timers.

So, by this definition, it REQUIRES a system interrupt.

I have built a "tick less" system and have evidence from that that such systems 
are over load prone.  The faster the context switch rate, the more accounting 
needs to be done.  On the otherhand, the ticked system has flat accounting 
overhead WRT load.

Regardless of what definitions we settle on, the system needs an interrupt 
source to drive the system timers, and, as I indicate above, the accounting and 
scheduling stuff.  It is a MUST that these interrupts occure at the required 
times or the system timers will be off.  This is why we have a jiffies value 
that is "rather odd" in the x86 today.

George

> 
> 
>>Thoughts?
> 
> 
> Yes, for long periods of idle, I'd like to see the periodic clock tick
> disabled entirely.  Clock ticks causes the hardware to exit power-saving
> idle states.
> 
> The current design with HZ=1000 gives us 1ms = 1000usec between clock
> ticks.  But some platforms take nearly that long just to enter/exit low
> power states; which means that on Linux the hardware pays a long idle
> state exit latency (performance hit) but gets little or no power savings
> from the time it resides in that idle state.
> 
> thanks,
> -Len
> 
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  7:47         ` Len Brown
  2004-10-20 15:09           ` George Anzinger
@ 2004-10-20 15:17           ` George Anzinger
  2004-10-20 17:09           ` Lee Revell
  2 siblings, 0 replies; 17+ messages in thread
From: George Anzinger @ 2004-10-20 15:17 UTC (permalink / raw)
  To: Len Brown; +Cc: Tim Schmielau, john stultz, lkml

Len Brown wrote:
> On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
> 
>>I think we could do it in the following steps:
>>
>>  1. Sync up jiffies with the monotonic clock,...
>>  2. Decouple jiffies from the actual interrupt counter...
>>  3. Increase HZ all the way up to 1e9....
> 
> 
>>Thoughts?
> 
> 
> Yes, for long periods of idle, I'd like to see the periodic clock tick
> disabled entirely.  Clock ticks causes the hardware to exit power-saving
> idle states.
> 
> The current design with HZ=1000 gives us 1ms = 1000usec between clock
> ticks.  But some platforms take nearly that long just to enter/exit low
> power states; which means that on Linux the hardware pays a long idle
> state exit latency (performance hit) but gets little or no power savings
> from the time it resides in that idle state.


I (and MontaVista) will be expanding on the VST patches.  There are, currently, 
two levels of VST.  VST-I when entering the idle state (task) looks ahead in the 
timer list, finds the next event, and shuts down the "tick" until that time.  An 
interrupts resets things, be it from the end of the time counter or another source.

VST-II adds a call back list to idle entry and exit.  This allows one to add 
code to change (or even remove) timers on idle entry and restore them on exit.

We are doing this work to support deeply embedded applications that often times 
run on small batteries (think cell phone if you like).
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20 15:09           ` George Anzinger
@ 2004-10-20 15:59             ` Richard B. Johnson
  0 siblings, 0 replies; 17+ messages in thread
From: Richard B. Johnson @ 2004-10-20 15:59 UTC (permalink / raw)
  To: George Anzinger; +Cc: Len Brown, Tim Schmielau, john stultz, lkml

On Wed, 20 Oct 2004, George Anzinger wrote:

> Len Brown wrote:
>> On Tue, 2004-10-19 at 23:05, Tim Schmielau wrote:
>> 
>>> I think we could do it in the following steps:
>>> 
>>>  1. Sync up jiffies with the monotonic clock,...
>>>  2. Decouple jiffies from the actual interrupt counter...
>>>  3. Increase HZ all the way up to 1e9....
>
> Before we do any of the above, I think we need to stop and ponder just what a 
> "jiffie" is.  Currently it is, by default (or historically) the "basic tick" 
> of the system clock.  On top of this a lot of interpolation code has been 
> "grafted" to allow the system to resolve time to finer levels, i.e. to the 
> nanosecond. But none of this interpolation code actually changes the tick, 
> i.e. the interrupt still happens at the same periodic rate.
>
> As the "basic tick", it is used to do a lot of accounting and scheduling 
> house keeping AND as a driver of the system timers.
>
> So, by this definition, it REQUIRES a system interrupt.
>
> I have built a "tick less" system and have evidence from that that such 
> systems are over load prone.  The faster the context switch rate, the more 
> accounting needs to be done.  On the otherhand, the ticked system has flat 
> accounting overhead WRT load.
>
> Regardless of what definitions we settle on, the system needs an interrupt 
> source to drive the system timers, and, as I indicate above, the accounting 
> and scheduling stuff.  It is a MUST that these interrupts occure at the 
> required times or the system timers will be off.  This is why we have a 
> jiffies value that is "rather odd" in the x86 today.
>
> George
>
>

You need that hardware interrupt for more than time-keeping.
Without a hardware-interrupt, to force a new time-slice,

 	for(;;)
            ;

... would allow a user to grab the CPU forever ...

So, getting rid of the hardware interrupt can't be done.
Also, much effort has gone into obtaining high resolution
timing without any high resolution hardware to back it
up. This means that user's can get numbers like 987,654
microseconds and the last 654 are as valuable as teats
on a bull. With a HZ timer tick, you get 1/HZ resolution
pure and simple. The rest of the "interpolation" is just
guess-work which leads to lots of problems, especially
when one attempts to read a spinning down-count value
from a hardware device accessed off some ports!

If the ix86 CMOS timer was used you could get better
accuracy than present, but accuracy is something one
can accommodate with automatic adjustment of time,
tracable to some appropriate standard.

The top-level schedule-code could contain some flag that
says; "are we in a power-down mode". If so, it could
execute minimal in-cache code, i.e. :

 		for(;;)
                 {
                    hlt();	// Sleep until next tick
 		   if(mode != power_down)
                        schedule();

                 }

The timer-tick ISR or any other ISR wakes us up from halt.
This keeps the system sleeping, not wasting power grabbing
code/data from RAM and grunching some numbers that are
not going to be used.


>> 
>> 
>>> Thoughts?
>> 
>> 
>> Yes, for long periods of idle, I'd like to see the periodic clock tick
>> disabled entirely.  Clock ticks causes the hardware to exit power-saving
>> idle states.
>> 
>> The current design with HZ=1000 gives us 1ms = 1000usec between clock
>> ticks.  But some platforms take nearly that long just to enter/exit low
>> power states; which means that on Linux the hardware pays a long idle
>> state exit latency (performance hit) but gets little or no power savings
>> from the time it resides in that idle state.
>> 
>> thanks,
>> -Len
>> 
>> 
>
> -- 
> George Anzinger   george@mvista.com
> High-res-timers:  http://sourceforge.net/projects/high-res-timers/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 GrumpyMips).
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  7:47         ` Len Brown
  2004-10-20 15:09           ` George Anzinger
  2004-10-20 15:17           ` George Anzinger
@ 2004-10-20 17:09           ` Lee Revell
  2004-10-20 21:42             ` Len Brown
  2 siblings, 1 reply; 17+ messages in thread
From: Lee Revell @ 2004-10-20 17:09 UTC (permalink / raw)
  To: Len Brown; +Cc: Tim Schmielau, john stultz, lkml, george anzinger

On Wed, 2004-10-20 at 03:47, Len Brown wrote:
> The current design with HZ=1000 gives us 1ms = 1000usec between clock
> ticks.  But some platforms take nearly that long just to enter/exit low
> power states; which means that on Linux the hardware pays a long idle
> state exit latency (performance hit) but gets little or no power savings
> from the time it resides in that idle state.

My testing shows that the timer interrupt runs for about 21 usec. 
That's 2.1% of its time just running the timer ISR!  No wonder this
causes PM issues, 2.1% cpu load is not exactly an idle machine.  This is
a 600Mhz C3, so on a slower embedded system this might be 5%.

So, any solution that would allow high res timers with Hz = 100 would be
welcome.

Lee


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
  2004-10-20  7:47         ` Len Brown
@ 2004-10-20 18:13         ` john stultz
  1 sibling, 0 replies; 17+ messages in thread
From: john stultz @ 2004-10-20 18:13 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: lkml, george anzinger

On Tue, 2004-10-19 at 20:05, Tim Schmielau wrote:
> On Tue, 19 Oct 2004, john stultz wrote:
> 
> > As for the timeofday overhaul, I've had zero time to work on it
> > recently. I hate that I dropped code and then went missing for weeks.
> > I'll have to see if I can get a few cycles at home to sync up my current
> > tree and send it out. 
> 
> I still haven't looked at your code and it's discussion. From what I
> remember, I liked your proposal very much. It's surely where we want to
> end up someday. But from the above mail it strikes me that we just don't
> have enough manpower to get there all at once, so we should have a plan 
> for the time code to gradually evolve into what we finally want. I think 
> we could do it in the following steps:
> 
>   1. Sync up jiffies with the monotonic clock...
> 
>   2. Decouple jiffies from the actual interrupt counter...
> 
>   3. Increase HZ all the way up to 1e9....
> Thoughts?

They all sound good. I like the notion of basing jiffies off of system
time, rather then interrupt counts. However, I'm a little cautious of
changing the meaning of jiffies too drastically. 

Right now jiffies has two core meanings:
1. Count of the number of timer ticks that have passed.
2. Accurate system uptime, measured in units of 1/HZ
(Let me know if I forgot any others)

The problem being, neither of those meaning are 100% true. 
#1 isn't true because when we loose timer ticks, we try to compensate
for them (i386 specifically). But at the same time #2 isn't true because
the timer interrupts don't necessarily run at exactly HZ (again, i386
specifically).

Basically due to our hardware constraints, we need to break one of these
two assumptions. The problem is which do we choose? 

Do we base jiffies off of monotonic_clock(), guaranteeing #2 and
possibly breaking anyone who is assuming #1? Or do we change all users
of jiffies for time to use monotonic_clock, guaranteeing #1, which will
require quite a bit of work.

And which choice makes it harder for folks to create tickless systems?
Its a tough call.

On top of that, we still have the issue that the current  interpolation
used in the time of day subsystem is broken (in my opinion), and we need
to fix that before we can have a reliable monotonic_clock. 

The joke being of course that I'll need to set my /etc/ntp/ntp.drift
file to 500 to find the time to work on any of this. And really, anyone
who really found that funny needs to go home.

thanks
-john

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-20 17:09           ` Lee Revell
@ 2004-10-20 21:42             ` Len Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Len Brown @ 2004-10-20 21:42 UTC (permalink / raw)
  To: Lee Revell; +Cc: Tim Schmielau, john stultz, lkml, george anzinger

On Wed, 2004-10-20 at 13:09, Lee Revell wrote:
> On Wed, 2004-10-20 at 03:47, Len Brown wrote:
> > The current design with HZ=1000 gives us 1ms = 1000usec between
> > clock ticks.  But some platforms take nearly that long just
> > to enter/exit low power states; which means that on Linux
> > the hardware pays a long idle state exit latency
> > (performance hit) but gets little or no power savings
> > from the time it resides in that idle state.
> 
> My testing shows that the timer interrupt runs for about 21 usec.
> That's 2.1% of its time just running the timer ISR!  No wonder this
> causes PM issues, 2.1% cpu load is not exactly an idle machine.  This
> is a 600Mhz C3, so on a slower embedded system this might be 5%.
> 
> So, any solution that would allow high res timers with Hz = 100 would
> be welcome.

5% residency in the clock tick handler is likely more of a problem when
we're _not_ idle -- a 5% performance hit.  When we're idle we've got
nothing better to do with the processor than run these instructions for
5% of the time and run no instructions 95% of the time -- so tick
handler residency isn't the problem in idle, tick frequency is the
problem.

When an interrupt occurrs, the hardware needs to ramp up its voltages,
resume its clocks and all the stuff it need to do to get out of the
power saving state to run the code that services the interrupt.  This
"exit latency" can take a long time.  On a volume Centrino system today
it is up to 185usec.  On other hardware it is as high as 1000 usec. 
Time spent in this exit latency is a double penalty -- we're not saving
power and we're delaying before the processor starts executing
instructions -- so we want to pay this price only when necessary.

-Len

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: gradual timeofday overhaul
@ 2004-10-21  8:32 Perez-Gonzalez, Inaky
  2004-10-21 21:17 ` George Anzinger
  0 siblings, 1 reply; 17+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-10-21  8:32 UTC (permalink / raw)
  To: root, George Anzinger; +Cc: Brown, Len, Tim Schmielau, john stultz, lkml

> From: Richard B. Johnson
>
> You need that hardware interrupt for more than time-keeping.
> Without a hardware-interrupt, to force a new time-slice,
> 
>  	for(;;)
>             ;
> 
> ... would allow a user to grab the CPU forever ...

But you can also schedule, before switching to the new task, 
a local interrupt on the running processor to mark the end 
of the timeslice. When you enter the scheduler, you just need 
to remove that; devil is in the details, but it should be possible
to do in a way that doesn't take too much overhead.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-21  8:32 Perez-Gonzalez, Inaky
@ 2004-10-21 21:17 ` George Anzinger
  2004-10-21 22:40   ` Chris Friesen
  0 siblings, 1 reply; 17+ messages in thread
From: George Anzinger @ 2004-10-21 21:17 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky; +Cc: root, Brown, Len, Tim Schmielau, john stultz, lkml

Perez-Gonzalez, Inaky wrote:
>>From: Richard B. Johnson
>>
>>You need that hardware interrupt for more than time-keeping.
>>Without a hardware-interrupt, to force a new time-slice,
>>
>> 	for(;;)
>>            ;
>>
>>... would allow a user to grab the CPU forever ...
> 
> 
> But you can also schedule, before switching to the new task, 
> a local interrupt on the running processor to mark the end 
> of the timeslice. When you enter the scheduler, you just need 
> to remove that; devil is in the details, but it should be possible
> to do in a way that doesn't take too much overhead.

Well, that is part of the accounting overhead the increases with context switch 
rate.  You also need to include the time it takes to figure out which of the 
time limits is closes (run time limit, profile time, slice time, etc).  Then, 
you also need to remove the timer when switching away.  No, it is not a lot, but 
it is way more than the nothing we do when we can turn it all over to the 
periodic tick.  The choice is load sensitive overhead vs flat overhead.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: gradual timeofday overhaul
@ 2004-10-21 21:32 Perez-Gonzalez, Inaky
  2004-10-21 22:36 ` Chris Friesen
  2004-10-22  0:21 ` George Anzinger
  0 siblings, 2 replies; 17+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-10-21 21:32 UTC (permalink / raw)
  To: george; +Cc: root, Brown, Len, Tim Schmielau, john stultz, lkml

> From: George Anzinger [mailto:george@mvista.com]
>
> Perez-Gonzalez, Inaky wrote:
> 
> > But you can also schedule, before switching to the new task,
> > a local interrupt on the running processor to mark the end
> > of the timeslice. When you enter the scheduler, you just need
> > to remove that; devil is in the details, but it should be possible
> > to do in a way that doesn't take too much overhead.
> 
> Well, that is part of the accounting overhead the increases with context switch
> rate.  You also need to include the time it takes to figure out which of the
> time limits is closes (run time limit, profile time, slice time, etc).  Then,

I know these are specific examples, but:

- profile time is a periodic thingie, so if you have it, forget about
  having a tickless system. Periodic interrupt for this guy, get it 
  out of the equation.

- slice time vs runtime limit. I don't remember what is the granularity of
  the runtime limit, but it could be expressed in slice terms. If not,
  we are talking (along with any other times) of min() operations, which
  are just a few cycles each [granted, they add up].

> you also need to remove the timer when switching away.  No, it is not a lot, but
> it is way more than the nothing we do when we can turn it all over to the
> periodic tick.  The choice is load sensitive overhead vs flat overhead.

This is just talking out of my ass, but I guess that for each invocation
they will have more or less the same overhead in execution time, let's
say T. For the periodic tick, the total overhead (in a second) is T*HZ;
with tickless, it'd be T*number_of_context_switches_per_second, right?

Now, the ugly case would be if number_of_context_swiches_per_second > HZ.
In HZ = 100, this could be happening, but in HZ=1000, in a single CPU
...well, that would be TOO weird [of course, a real-time app with a 
1ms period would do that, but it'd require at least an HZ of 10000 to
work more or less ok and we'd be below the watermark].

So in most cases, and given the assumptions, we'd end up winning,
beause number_of_context..., even if variable, is going to be bound
on the upper side by HZ.

Well, you know way more than I do about this, so here is the question:
what is the error in that line of reasoning? 

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-21 21:32 gradual timeofday overhaul Perez-Gonzalez, Inaky
@ 2004-10-21 22:36 ` Chris Friesen
  2004-10-22  0:21 ` George Anzinger
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Friesen @ 2004-10-21 22:36 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: george, root, Brown, Len, Tim Schmielau, john stultz, lkml

Perez-Gonzalez, Inaky wrote:

> Now, the ugly case would be if number_of_context_swiches_per_second > HZ.
> In HZ = 100, this could be happening, but in HZ=1000, in a single CPU
> ...well, that would be TOO weird [of course, a real-time app with a 
> 1ms period would do that, but it'd require at least an HZ of 10000 to
> work more or less ok and we'd be below the watermark].

It's easy to have >>1000 context switches per second on a server.  Consider a 
web server that receives a network packet, issues a request to a database, hands 
some work off to a thread so the main app doesn't block, then sends a response. 
  That could be a half dozen context switches per packet.  If you have 20000 
packets/sec coming in....

Chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-21 21:17 ` George Anzinger
@ 2004-10-21 22:40   ` Chris Friesen
  2004-10-25 23:12     ` George Anzinger
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Friesen @ 2004-10-21 22:40 UTC (permalink / raw)
  To: george
  Cc: Perez-Gonzalez, Inaky, root, Brown, Len, Tim Schmielau,
	john stultz, lkml

George Anzinger wrote:

> Well, that is part of the accounting overhead the increases with context 
> switch rate.  You also need to include the time it takes to figure out 
> which of the time limits is closes (run time limit, profile time, slice 
> time, etc).  Then, you also need to remove the timer when switching 
> away.  No, it is not a lot, but it is way more than the nothing we do 
> when we can turn it all over to the periodic tick.  The choice is load 
> sensitive overhead vs flat overhead.

It should be possible to be clever about this.  Most processes don't use their 
timeslice, so if we have a previous timer running, just keep track of how much 
beyond that timer our timeslice will be.  If we context switch before the timer 
expiry, well and good.  If the timer expires, set it for what's left of our 
timeslice.

Chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-21 21:32 gradual timeofday overhaul Perez-Gonzalez, Inaky
  2004-10-21 22:36 ` Chris Friesen
@ 2004-10-22  0:21 ` George Anzinger
  1 sibling, 0 replies; 17+ messages in thread
From: George Anzinger @ 2004-10-22  0:21 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky; +Cc: root, Brown, Len, Tim Schmielau, john stultz, lkml

Perez-Gonzalez, Inaky wrote:
>>From: George Anzinger [mailto:george@mvista.com]
>>
>>Perez-Gonzalez, Inaky wrote:
>>
>>
>>>But you can also schedule, before switching to the new task,
>>>a local interrupt on the running processor to mark the end
>>>of the timeslice. When you enter the scheduler, you just need
>>>to remove that; devil is in the details, but it should be possible
>>>to do in a way that doesn't take too much overhead.
>>
>>Well, that is part of the accounting overhead the increases with context switch
>>rate.  You also need to include the time it takes to figure out which of the
>>time limits is closes (run time limit, profile time, slice time, etc).  Then,
> 
> 
> I know these are specific examples, but:
> 
> - profile time is a periodic thingie, so if you have it, forget about
>   having a tickless system. Periodic interrupt for this guy, get it 
>   out of the equation.

Not really.  It is only active if the task is running.  At the very least the 
scheduler needs to check to see if it is on and, if so, set up a timer for it.
> 
> - slice time vs runtime limit. I don't remember what is the granularity of
>   the runtime limit, but it could be expressed in slice terms. If not,
>   we are talking (along with any other times) of min() operations, which
>   are just a few cycles each [granted, they add up].

The main issue here is accumulating the run time which is accounting work that 
needs to happen on context switch (out in this case).
> 
> 
>>you also need to remove the timer when switching away.  No, it is not a lot, but
>>it is way more than the nothing we do when we can turn it all over to the
>>periodic tick.  The choice is load sensitive overhead vs flat overhead.
> 
> 
> This is just talking out of my ass, but I guess that for each invocation
> they will have more or less the same overhead in execution time, let's
> say T. For the periodic tick, the total overhead (in a second) is T*HZ;
> with tickless, it'd be T*number_of_context_switches_per_second, right?
> 
> Now, the ugly case would be if number_of_context_swiches_per_second > HZ.
> In HZ = 100, this could be happening, but in HZ=1000, in a single CPU
> ...well, that would be TOO weird [of course, a real-time app with a 
> 1ms period would do that, but it'd require at least an HZ of 10000 to
> work more or less ok and we'd be below the watermark].

???  Better look again.  Context switches can and do happen as often as 10 or so 
micro seconds (depends a lot on the cpu speed).  I admit this is with code that 
is just trying to measure the context switch time, but, often the system will 
change it mind just that fast.
> 
> So in most cases, and given the assumptions, we'd end up winning,
> beause number_of_context..., even if variable, is going to be bound
> on the upper side by HZ.
> 
> Well, you know way more than I do about this, so here is the question:
> what is the error in that line of reasoning? 

The expected number of context switches.  In some real world apps it gets rather 
high.  The cross over of your two curves _might_ be of interest to some (it is 
rather low by my measurements, done with the tickless patch that is still on 
sourceforge).  On the other hand, where I come from, a system which has 
increasing overhead with load is one that is going to overload.  We are always 
better off if we can figure a way to have fixed overhead.

As for the idle system ticks, I think the VST stuff we are working on is the 
right answer.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: gradual timeofday overhaul
@ 2004-10-22  0:29 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 17+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-10-22  0:29 UTC (permalink / raw)
  To: george; +Cc: root, Brown, Len, Tim Schmielau, john stultz, lkml

> From: George Anzinger [mailto:george@mvista.com]
>
> > This is just talking out of my ass, but I guess that for each invocation
> > they will have more or less the same overhead in execution time, let's
> > say T. For the periodic tick, the total overhead (in a second) is T*HZ;
> > with tickless, it'd be T*number_of_context_switches_per_second, right?
> 
> ???  Better look again.  Context switches can and do happen as often as 10 or so
> micro seconds (depends a lot on the cpu speed).  I admit this is with code that
> is just trying to measure the context switch time, but, often the system will
> change it mind just that fast.

As I said, I was talking out of my ass [aka, I didn't know and was just 
guesstimating for the heck of it], so I am happily proven wrong--thanks to
Chris and you--I guess I didn't take into account voluntary yielding of
the CPU by a task; I was more guiding myself for kicked out by a timer
making a task runnable, or a timeslice expiring, etc...which now are 
more or less guided by the tick [and then of course, we have IRQs,
but that's another matter]

> ...
> sourceforge).  On the other hand, where I come from, a system which has
> increasing overhead with load is one that is going to overload.  We are always
> better off if we can figure a way to have fixed overhead.
> 
> As for the idle system ticks, I think the VST stuff we are working on is the
> right answer.

Once my logic is proven wrong, then it makes full sense :]

Thanks for the heads up.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-21 22:40   ` Chris Friesen
@ 2004-10-25 23:12     ` George Anzinger
  2004-10-25 23:51       ` Chris Friesen
  0 siblings, 1 reply; 17+ messages in thread
From: George Anzinger @ 2004-10-25 23:12 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Perez-Gonzalez, Inaky, root, Brown, Len, Tim Schmielau,
	john stultz, lkml

Chris Friesen wrote:
> George Anzinger wrote:
> 
>> Well, that is part of the accounting overhead the increases with 
>> context switch rate.  You also need to include the time it takes to 
>> figure out which of the time limits is closes (run time limit, profile 
>> time, slice time, etc).  Then, you also need to remove the timer when 
>> switching away.  No, it is not a lot, but it is way more than the 
>> nothing we do when we can turn it all over to the periodic tick.  The 
>> choice is load sensitive overhead vs flat overhead.
> 
> 
> It should be possible to be clever about this.  Most processes don't use 
> their timeslice, so if we have a previous timer running, just keep track 
> of how much beyond that timer our timeslice will be.  If we context 
> switch before the timer expiry, well and good.  If the timer expires, 
> set it for what's left of our timeslice.

Me thinks that rather quickly devolves to a periodic tick.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: gradual timeofday overhaul
  2004-10-25 23:12     ` George Anzinger
@ 2004-10-25 23:51       ` Chris Friesen
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Friesen @ 2004-10-25 23:51 UTC (permalink / raw)
  To: george
  Cc: Perez-Gonzalez, Inaky, root, Brown, Len, Tim Schmielau,
	john stultz, lkml

George Anzinger wrote:
> Chris Friesen wrote:

>> It should be possible to be clever about this.  Most processes don't 
>> use their timeslice, so if we have a previous timer running, just keep 
>> track of how much beyond that timer our timeslice will be.  If we 
>> context switch before the timer expiry, well and good.  If the timer 
>> expires, set it for what's left of our timeslice.
> 
> 
> Me thinks that rather quickly devolves to a periodic tick.

In the busy case, yes.  But on an idle system you get tickless behaviour.

It's still going to be load-sensitive, since you are doing additional work to 
keep track of the timer/timeout values.  But it saves work if reprogramming the 
timer is time-consuming compared to simply reading it.  On something like the 
ppc, it probably doesn't buy you much since the decrementer is cheap to program.

Chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-10-26  4:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-21 21:32 gradual timeofday overhaul Perez-Gonzalez, Inaky
2004-10-21 22:36 ` Chris Friesen
2004-10-22  0:21 ` George Anzinger
  -- strict thread matches above, loose matches on Subject: below --
2004-10-22  0:29 Perez-Gonzalez, Inaky
2004-10-21  8:32 Perez-Gonzalez, Inaky
2004-10-21 21:17 ` George Anzinger
2004-10-21 22:40   ` Chris Friesen
2004-10-25 23:12     ` George Anzinger
2004-10-25 23:51       ` Chris Friesen
2004-10-19 18:21 process start time set wrongly at boot for kernel 2.6.9 Jerome Borsboom
2004-10-19 20:11 ` john stultz
2004-10-20  0:42   ` Tim Schmielau
2004-10-20  0:59     ` john stultz
2004-10-20  3:05       ` gradual timeofday overhaul Tim Schmielau
2004-10-20  7:47         ` Len Brown
2004-10-20 15:09           ` George Anzinger
2004-10-20 15:59             ` Richard B. Johnson
2004-10-20 15:17           ` George Anzinger
2004-10-20 17:09           ` Lee Revell
2004-10-20 21:42             ` Len Brown
2004-10-20 18:13         ` john stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).