[PATCH 0/5] clocksource patches

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] clocksource patches
@ 2006-04-03 19:54 Roman Zippel
  2006-04-03 20:08 ` Roman Zippel
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Roman Zippel @ 2006-04-03 19:54 UTC (permalink / raw)
  To: johnstul; +Cc: linux-kernel, Andrew Morton

Hi,

It unfortunately took a bit longer when I planned, but here is now an 
update to the clocksource patches.
Patch 1 depends on the generic clocksource patches.
Patches 2-4 are a replacement for these patches in -mm:

time-use-clocksource-infrastructure-for-update_wall_time.patch
time-use-clocksource-infrastructure-for-update_wall_time-mark-few-functions-as-__init.patch
time-let-user-request-precision-from-current_tick_length.patch
time-use-clocksource-abstraction-for-ntp-adjustments.patch
time-introduce-arch-generic-time-accessors.patch

Patch 5 depends on the clocksource drivers.

Some more notes about the patches:

1. generic clocksource updates:
The most important change is probably clocksource_get_nsec_offset(), it 
returns the nsec part of the realtime clock and is adjusted once a second. 
Other time services can be built on top of this and also only have to be 
updated once a second via second_overflow(). I changed the return value to 
an unsigned long which gives us a 3 second window, which should be more 
than enough (anyone not allowing the timer interrupt for that long gets 
what he is asking for). This function should become clocksource specific 
so arch/clocks can optimize this function themselves (e.g. changing 
resolution, making some parameters constant). The generic clocksource 
currently still deals a bit too much with cycles, more of this should be 
moved into the clocksource drivers (using library functions).

I'm also still interested in opinions about different possibilities to 
organize the clocksource infrastructure (although I'd more appreciate 
pro/con arguments than outright rejections). One possibility here would be 
to also shorten the function names a bit (clocksource_ is a lot to type :) 
), cs_... or clock_... would IMO work too.

2. periodic clocksource update
I updated the error adjustment algorithm to be more robust and optimized 
it a bit for the more common cases. It does everything in 64bit right now, 
which is fine for 64bit archs and it allows resolutions of less then 
1nsec, but as long as e.g. gettimeofday() takes longer than 1nsec that's 
a little wasteful and nobody will see a difference if we "restrict" the 
resolution to 1nsec, which would allow to keep parts of it in 32bit.

I also kept it separate from the do_timer() callback and simply created a 
new callback function, which archs can use. This makes it less likely to 
break any existing setup and even allows to coexists both methods until 
archs have been completely converted.

The userspace version I used to test this can be found at
http://www.xs4all.nl/~zippel/ntp/clocktest.c

Another general comment from an arch/driver specific perspective: right 
now everything is rather concentrated on getting the time, but AFAICT it's 
more common that the subsystem which is used to read the time is also used 
as timer (i.e. for the scheduler tick), this means the clocksource driver 
should also include the interrupt setup. Consequently this means the 
update callback isn't sufficient anymore and we need at least something 
like start/stop callbacks. This also means the jiffie clocksource becomes 
basically obsolete, since every arch already needs at least a basic 
clocksource to provide the scheduler tick (which is setup via 
time_init()).
i386 is currently rather hardcoded to use the i8253 timer, but AFAIK it 
would be desirable to e.g. use HPET for that (especially for hrtimer). 
Something like TSC should internally use another clocksource to provide 
the timer interrupt. Anyway, i386 is quite a mess here right now and I can 
understand that you wanted to stay away from it with the generic 
gettimeofday infrastructure. :-)
This is not important right now, but I wanted to mention it, it's IMO
something to keep in mind for the next steps and what at least I'll 
look out for.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-03 19:54 [PATCH 0/5] clocksource patches Roman Zippel
@ 2006-04-03 20:08 ` Roman Zippel
  2006-04-04  4:53 ` Thomas Gleixner
  2006-04-07 17:57 ` john stultz
  2 siblings, 0 replies; 16+ messages in thread
From: Roman Zippel @ 2006-04-03 20:08 UTC (permalink / raw)
  To: johnstul; +Cc: linux-kernel, Andrew Morton

Hi,

Below is a first draft of some documentation which hopefully helps to
better understand the clock update function. It tries to introduce the 
various concepts in small logical steps from the basic xtime updates to 
the continuous clocks with full error adjustment.
It's still rather raw, but it contains all the information one has to 
understand to adjust and customize the clock update code.

Feedback welcome. :)

bye, Roman

There are three basic time variables related to time management:

- cycles: managed by the hardware clock
- ntp_time: managed by the NTP system and is updated in regular
  intervals (ticks)
- xtime: local base time as used by gettimeofday

Each variable is regularly updated:

- cycle_update: defines the basic tick frequency and is initialized with
  (frequency / HZ)
- ntp_update: is calculated by the NTP system
- xtime_update: is derived from cycle_update using a conversion factor
  (cycle_update*mult)

gettimeofday uses xtime and the cycle counter to calculate the current
time:

	timeofday = xtime + cycle_offset * mult;

cycle_offset is the offset to the last timer tick and mult is a fixed
point conversion factor from cycles to seconds. mult is initialized
based on the equivalence of 1 sec to frequency cycles:

	1 sec = frequency * mult
	mult = 10^9 nsec / frequency

Time is updated at every tick (after cycle_update cycles), which for a
simple system system looks like this:

	xtime = ntp_time += ntp_update;
	if (xtime.tv_nsec >= NSEC_PER_SEC) {
		ntp_time -= NSEC_PER_SEC;
		xtime.tv_nsec -= NSEC_PER_SEC;
		xtime.tv_sec++;
		second_overflow();
	}

xtime and ntp_time are basically identical, which means at every update
step gettimeofday can jump by a small error value (ntp_update -
xtime_update). To avoid this jump we have to update ntp_time and xtime
separately:

	ntp_time += ntp_update;
	xtime += cycle_update * mult;
	if (xtime.tv_nsec >= NSEC_PER_SEC) {
		ntp_time -= NSEC_PER_SEC;
		xtime.tv_nsec -= NSEC_PER_SEC;
		xtime.tv_sec++;
		second_overflow();
	}
	error = ntp_time - xtime;
	mult_adj = error / cycle_update;
	mult += mult_adj;

This produces an error between ntp_time and xtime and to keep both
synchronized, we have to modify mult (the only other variable in the
timeofday calculation).  mult_adj is the difference between the new and
old multiplier and is derived from:

	cycle_update * mult_new = cycle_update * mult + error

One big problem with this is that we only look at the current error,
which causes the multiplier to oscillate if ntp_update changes (the
adjustments are applied too late), so it's better to look ahead to the
next update:

	ntp_time += ntp_update;
	xtime += cycle_update * mult;
	if (xtime.tv_nsec >= NSEC_PER_SEC) {
		ntp_time -= NSEC_PER_SEC;
		xtime.tv_nsec -= NSEC_PER_SEC;
		xtime.tv_sec++;
		second_overflow();
	}
	error = (ntp_time + ntp_update) - (xtime + cycle_update * mult);
	mult_adj = error / cycle_update;
	mult += mult_adj;

The important thing here is that the error calculation happens after
calling second_overflow(), which is the main callback for the NTP system
to update ntp_update.
Instead of calculating the complete error every time, we can also update
the error incrementally and avoid maintaining ntp_time:

	xtime += cycle_update * mult;
	if (xtime.tv_nsec >= NSEC_PER_SEC) {
		xtime.tv_nsec -= NSEC_PER_SEC;
		xtime.tv_sec++;
		second_overflow();
	}
	error += ntp_update - cycle_update * mult;
	mult_adj = error / cycle_update;
	mult += mult_adj;
	error -= cycle_update * mult_adj;

This is probably the step, which makes the calculation less obvious as
there is no explicit ntp_time anymore and is instead maintained via the
difference to xtime. It's very important to keep in mind that the error
is for the next update step, so if we change the multiplier for the next
tick, we also have to adjust the error:

	cycle_update * mult - cycle_update * mult_new

as at the next step xtime is not advanced by (cycle_update * mult)
anymore but (cycle_update * mult_new) instead.

Note that although we don't maintain the ntp_time value anymore, the
complexity has only shifted to the error calculation, but since under
normal circumstances (clock is synchronized and stable) adjustments to
the multiplier are less common events, the calculation becomes cheaper
with a simple error update and occasional error adjustments.

This is already sufficient for simple systems, but interrupts can delay
the update step, so that we can see small time jumps due to the
multiplier adjustments. To avoid this we have to include the current
cycle_offset into the calculation:

	xtime += cycle_update * mult;
	if (xtime.tv_nsec >= NSEC_PER_SEC) {
		xtime.tv_nsec -= NSEC_PER_SEC;
		xtime.tv_sec++;
		second_overflow();
	}
	error += ntp_update - cycle_update * mult;
	mult_adj = error / cycle_update;
	mult += mult_adj;
	xtime -= cycle_offset * mult_adj;
	error -= (cycle_update - cycle_offset) * mult_adj;

We adjust xtime to keep the current time the same after the multiplier
change:

	xtime_new + cycle_offset * mult_new = xtime + cycle_offset * mult

Since we change xtime, we don't have to forget to also adjust the error
by the same amount.

If the time is now updated in very irregular intervals, it can happen
that we skipped an update step, where we can either just repeat
everything or skip a few error adjustment steps and do them at the end
instead:

	while (cycle_offset >= cycle_update) {
		cycles_last += cycle_update;
		cycle_offset -= cycle_update;
		xtime += cycle_update * mult;
		if (xtime.tv_nsec >= NSEC_PER_SEC) {
			xtime.tv_nsec -= NSEC_PER_SEC;
			xtime.tv_sec++;
			second_overflow();
		}
		error += ntp_update - cycle_update * mult;
	}
	mult_adj = error / cycle_update;
	mult += mult_adj;
	xtime -= cycle_offset * mult_adj;
	error -= (cycle_update - cycle_offset) * mult_adj;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-03 19:54 [PATCH 0/5] clocksource patches Roman Zippel
  2006-04-03 20:08 ` Roman Zippel
@ 2006-04-04  4:53 ` Thomas Gleixner
  2006-04-04 19:06   ` Roman Zippel
  2006-04-07 17:57 ` john stultz
  2 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2006-04-04  4:53 UTC (permalink / raw)
  To: Roman Zippel; +Cc: johnstul, linux-kernel, Andrew Morton, Ingo Molnar

On Mon, 2006-04-03 at 21:54 +0200, Roman Zippel wrote:
> Another general comment from an arch/driver specific perspective: right 
> now everything is rather concentrated on getting the time, but AFAICT it's 
> more common that the subsystem which is used to read the time is also used 
> as timer (i.e. for the scheduler tick), this means the clocksource driver 
> should also include the interrupt setup.

I don't think so. Coupling the clock sources to clock event sources is
wrong. In many systems the clock event source delivering the next event
interrupt - either periodic or per event programmed - is independend
from the clock source which is used to read the time.

Also we want to distribute multiple clock event sources for various
services. Hardwiring those into combos is counter productive.

> i386 is currently rather hardcoded to use the i8253 timer, but AFAIK it 
> would be desirable to e.g. use HPET for that (especially for hrtimer). 
> Something like TSC should internally use another clocksource to provide 
> the timer interrupt.

This is exactly the result of such an artificial combo. "Use internally
something else." Thats fundamentally wrong and violates every basic rule
of abstraction.

>  Anyway, i386 is quite a mess here right now and I can 
> understand that you wanted to stay away from it with the generic 
> gettimeofday infrastructure. :-)

He ? John addressed the clock source (timeofday) related problem in
x386. He never claimed that his timeofday code is solving the clock
event source problem. gettimeofday() exactly does what it says: it gets
the time of day. And it does it independend of any interrupt source in
the first place.

Clock event sources need their own independent abstraction layer, as one
can be found in my high resolution timer patch queue. There is
interaction between the timekeeping and the next event interrupt
programming, but it's important to keep them seperate.

How should a combo solution allow to add special hardware, which
provides only one of the services ? By using "something else
internally" ? No, thanks.

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-04  4:53 ` Thomas Gleixner
@ 2006-04-04 19:06   ` Roman Zippel
  2006-04-05 11:22     ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Zippel @ 2006-04-04 19:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: johnstul, linux-kernel, Andrew Morton, Ingo Molnar

Hi,

On Tue, 4 Apr 2006, Thomas Gleixner wrote:

> On Mon, 2006-04-03 at 21:54 +0200, Roman Zippel wrote:
> > Another general comment from an arch/driver specific perspective: right 
> > now everything is rather concentrated on getting the time, but AFAICT it's 
> > more common that the subsystem which is used to read the time is also used 
> > as timer (i.e. for the scheduler tick), this means the clocksource driver 
> > should also include the interrupt setup.
> 
> I don't think so. Coupling the clock sources to clock event sources is
> wrong. In many systems the clock event source delivering the next event
> interrupt - either periodic or per event programmed - is independend
> from the clock source which is used to read the time.

Why? In order to program a clock event you have to be able to read the 
clock somehow. In many systems it's the same hardware.

> Also we want to distribute multiple clock event sources for various
> services. Hardwiring those into combos is counter productive.

What suggest I want to hardwire everything?

> > i386 is currently rather hardcoded to use the i8253 timer, but AFAIK it 
> > would be desirable to e.g. use HPET for that (especially for hrtimer). 
> > Something like TSC should internally use another clocksource to provide 
> > the timer interrupt.
> 
> This is exactly the result of such an artificial combo. "Use internally
> something else." Thats fundamentally wrong and violates every basic rule
> of abstraction.

Why and what "basic rule of abstraction"?

> Clock event sources need their own independent abstraction layer, as one
> can be found in my high resolution timer patch queue. There is
> interaction between the timekeeping and the next event interrupt
> programming, but it's important to keep them seperate.
> 
> How should a combo solution allow to add special hardware, which
> provides only one of the services ? By using "something else
> internally" ? No, thanks.

Again, why? Please explain.

Thomas, I would really appreciate if you actually started to argue your 
point instead of just disagreeing with me every time. I have no problem to 
admit that I'm wrong, but it takes a bit more than you just saying it is 
so.
This is not the first time and I have a really hard time to actually get 
you to explain your position and then it's even harder to discuss this 
with you. This is very frustrating and I can't lose the feeling that you 
think I'm a complete idiot, who should know better and isn't worth the 
time to answer, which makes me feel disrespected.
I don't mind really that you mainly just disagree with me and if you 
argued your point, it would make for some interesting and challenging 
conversations, but this isn't possible if you only voice just your 
disagreement.
For example above you bascially only state that your clock event source 
is superior and the correct way of doing this without any explanation why 
(and the "No, thanks." doesn't exactly imply that you're even interested 
in alternatives). This may be true, but it still would be interesting to 
know the advantages of keeping this separate. Why is it such a bad idea to 
have a single base abstraction for reading and setting time(r)?
Do you really expect me to believe you just because you say so?

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-04 19:06   ` Roman Zippel
@ 2006-04-05 11:22     ` Thomas Gleixner
  2006-04-05 13:40       ` Ingo Molnar
  2006-04-05 20:44       ` Roman Zippel
  0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2006-04-05 11:22 UTC (permalink / raw)
  To: Roman Zippel; +Cc: johnstul, linux-kernel, Andrew Morton, Ingo Molnar

Roman,

> Thomas, I would really appreciate if you actually started to argue your 
> point instead of just disagreeing with me every time. I have no problem to 
> admit that I'm wrong, but it takes a bit more than you just saying it is 
> so.

I on my side would appreciate, if you would stop to take every line I
write as a personal offense.

> For example above you bascially only state that your clock event source 
> is superior and the correct way of doing this without any explanation why 
> (and the "No, thanks." doesn't exactly imply that you're even interested 
> in alternatives). 

The question arises, who is not interested in alternatives. You are well
aware about the efforts others made, but you don't even think about
working together with them. Do you really expect people to jump on your
train, when you entirely ignore their work and efforts and just propose
your own view of the world? 

I did nowhere say that I'm not interested in alternative solutions. You
interpret it into my words for whatever reason.

Your way to rip out a single statement of its context and making your
argument on that is simply annoying.

Thats the original quote:
> How should a combo solution allow to add special hardware, which
> provides only one of the services ? By using "something else
> internally" ? No, thanks.

It is entirely clear, that "No thanks" is related to "something else
internally".

Thats the point I made further up in my first reply. It was your
proposal to combine clock sources and clock events. This works nice
where both are in the same hardware device, but its not a compulsory
argument for a combined abstraction layer. You said that in case there
is one element missing (e.g. TSC has no clock event source) it should
internally use "something else".

That's excatly the point I'm referring to. Do you think that "Use
something else" is a real good and convincing technical argument? No,
such a statement is simply the evidence that your design is flawed.

I assume we agree that we want abstraction of clock soures (read time)
and clock event sources (periodic / next event interrupts).

Why are those seperate items ?

Looking at the various hardware implementations there are three types of
devices:

1. Read only devices (e.g. TSC)
2. Next interrupt only devices (e.g. various timer implementations on
SoC CPUs)
3. Devices which combine both functions

Building an abstraction layer which handles all device types requires
either 

- that e.g. a read only device needs to be combined with a next
interrupt device in some way. This introduces artifical combos of
functionality which can and should be handled completely seperate.

or 

- to handle all the corner cases where a device has to be handled which
only provides one of the functionalies. Also the selection of different
combinations of devices will introduce extra handling code.

Futhermore the functionalites of clock sources and clock event source
can be used completely independent. In case of a periodic event there is
no interaction with the clock source at all. The readout of the clock
source is independent of any kind of event.

So we have two functional domains, which provide a high level interface:

The clock source abstraction provides:
- gettimeofday()
- getmonotonic()
- settimeofday()
- clock skew adjustment functions()

The clock event source abstraction provides:
- setup periodic events()
- setup one shot events()
- associate services (function to call on an event) to a clock event
source

This is a clear seperation and there is no combined functionality.

The availability of devices which combine both functions is not changing
the clear functional seperation of clock sources and clock event sources
at all.

Programming one shot events might need information from the clock source
layer e.g. to calculate the delta to the next event, but it does not
need this information on a low level base. The information is read from
the high level interface. Moving this to the low level interfaces would
enforce particular combinations of clock sources and clock event sources
and remove the flexibility to use arbitrary combinations on a best fit
selection.

Back to your questions:

> Why? In order to program a clock event you have to be able to read the
> clock somehow. In many systems it's the same hardware.

Yes, it is the same hardware. It is not the same level of abstraction.

The next event is defined by a absolute time in human units (i.e.
nanoseconds). The delta to the next event is calculated by

	delta  = next_event - now

Then delta is converted to the time units of the underlying hardware
device.

You might argue, that you also can convert the next_event time to the
time units of the underlying hardware first and then caclulate the delta
for the next event interrupt hardware.

This is only true, when those devices are combined and it results in
different handling of the same problem for different combinations of
hardware or even restricts the possible combinations depending on the
implementation details. Thats why I said it violates the basic rule of
abstraction.

> Why and what "basic rule of abstraction"?

Whats abstraction ? The generic way to access a functional domain to
allow users of the functional domain to be hardware agnostic.

As I showed above there are two functional domains:
clock sources and clock event sources

The high level interfaces of the functional domains use an hardware
independent representation of time, i.e. human time units (nano
seconds). Any interaction between the functional domains also uses the
hardware independent representation. 

This is the basic guarantee, that such abstraction layers are flexible
and of general use. 

Interaction or interdependencies of functional domains are not a reason
to combine them into one "handle it all" domain.

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-05 11:22     ` Thomas Gleixner
@ 2006-04-05 13:40       ` Ingo Molnar
  2006-04-05 20:44       ` Roman Zippel
  1 sibling, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2006-04-05 13:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Roman Zippel, johnstul, linux-kernel, Andrew Morton

* Thomas Gleixner <tglx@linutronix.de> wrote:

> > For example above you bascially only state that your clock event source 
> > is superior and the correct way of doing this without any explanation why 
> > (and the "No, thanks." doesn't exactly imply that you're even interested 
> > in alternatives). 
> 
> The question arises, who is not interested in alternatives. You are 
> well aware about the efforts others made, but you don't even think 
> about working together with them. Do you really expect people to jump 
> on your train, when you entirely ignore their work and efforts and 
> just propose your own view of the world?
> 
> I did nowhere say that I'm not interested in alternative solutions.  
> You interpret it into my words for whatever reason.

just to explain it to everyone: the code Thomas refers to and which we 
are working on is John's GTOD patchset with Thomas' high-resolution 
timers patches ontop of it. [all of that (and more) is glued together in 
the -rt tree as well].

Thomas' hrtimers queue (ontop of 2.6.16) is a practical, working 
implementation of the clock-event design Thomas is talking about, 
resulting in a working high-resolution timers solution that spans all 
the relevant Linux APIs: nanosleep() and POSIX timers. So Thomas' 
arguments derive straight from that experience.

for more details, the latest hrtimers code can be found at:

  http://tglx.de/projects/hrtimers

the merge of the hrtimers subsystem into 2.6.16 was just the first step, 
and the next steps are expressed in the patches above.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-05 11:22     ` Thomas Gleixner
  2006-04-05 13:40       ` Ingo Molnar
@ 2006-04-05 20:44       ` Roman Zippel
  2006-04-05 22:18         ` Thomas Gleixner
  1 sibling, 1 reply; 16+ messages in thread
From: Roman Zippel @ 2006-04-05 20:44 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: johnstul, linux-kernel, Andrew Morton, Ingo Molnar

Hi,

On Wed, 5 Apr 2006, Thomas Gleixner wrote:

> > Thomas, I would really appreciate if you actually started to argue your 
> > point instead of just disagreeing with me every time. I have no problem to 
> > admit that I'm wrong, but it takes a bit more than you just saying it is 
> > so.
> 
> I on my side would appreciate, if you would stop to take every line I
> write as a personal offense.

Believe me, I'm trying very hard and I'm not easily offended, but you 
don't exactly make it easy.
For example I get very little positive feedback from you and I'd be happy 
if you gave me at least constructive criticism, but you mainly just 
contradict me. An important indicator here is asking questions, you barely 
ask me any questions, which would show you're interested in my opinion and 
would help to avoid misunderstandings as shown further below.

> > For example above you bascially only state that your clock event source 
> > is superior and the correct way of doing this without any explanation why 
> > (and the "No, thanks." doesn't exactly imply that you're even interested 
> > in alternatives). 
> 
> The question arises, who is not interested in alternatives. You are well
> aware about the efforts others made, but you don't even think about
> working together with them. Do you really expect people to jump on your
> train, when you entirely ignore their work and efforts and just propose
> your own view of the world? 

First off please leave other people out of this, I get along with other 
people quite well, it's you I continuously have these problems with.
Secondly considering my contributions to John's work, it's fucking rude of 
you to accuse me of "not interested in alternatives" and "don't even 
think about working together". There is a simple reason that e.g. I'm not 
working on the -rt tree - lack of time. For weeks now I already had no 
time to work on the m68k tree, for months now I already sit on a batch of 
kconfig patches I'm trying to make ready. As long as your stuff doesn't 
appear on lkml, I consider as in development and I spent my time on more 
important things (which may be not so important to you).

> I did nowhere say that I'm not interested in alternative solutions. You
> interpret it into my words for whatever reason.
> 
> Your way to rip out a single statement of its context and making your
> argument on that is simply annoying.
> 
> Thats the original quote:
> > How should a combo solution allow to add special hardware, which
> > provides only one of the services ? By using "something else
> > internally" ? No, thanks.
> 
> It is entirely clear, that "No thanks" is related to "something else
> internally".

The interesting part is that I never said "something else internally", 
more specifically I said "internally use another clocksource". It's you 
who made the generalization and then only talks about clock events, so 
it's easy to come to the conclusion that the "No, thanks." is not related 
to something specific I said, but to the generic "something else (than 
clock events)". I had nothing "to rip out".

> Thats the point I made further up in my first reply. It was your
> proposal to combine clock sources and clock events.

This is the point were you could have saved us a lot of grief by simply 
asking me instead of jumping to conclusions. Nowhere in my initial mail 
did I make such a proposal, I'm not even mentioning clock events.
The original quote:

> > AFAICT it's
> > more common that the subsystem which is used to read the time is also used
> > as timer (i.e. for the scheduler tick), this means the clocksource driver
> > should also include the interrupt setup.
> 
> I don't think so. Coupling the clock sources to clock event sources is
> wrong.

What I had in mind is the _current_ setup, that e.g. the scheduler tick 
can be generated by a clock event is a possibility, but since your 
clockevent patch is not even in -mm, it wasn't something I had in mind 
when I wrote this comment. This comment was specifically about the 
_current_ way time.c sets up the timer interrupt and i8253.c sets up the 
timer hardware.
I also specifically said "This is not important right now, but ... 
something to keep in mind" to indicate this needs further discussion, 
nowhere did I ever exclude the possibility to do this via clock events, 
but you just jumped at me with "this is wrong".

> Why are those seperate items ?
> 
> Looking at the various hardware implementations there are three types of
> devices:
> 
> 1. Read only devices (e.g. TSC)
> 2. Next interrupt only devices (e.g. various timer implementations on
> SoC CPUs)
> 3. Devices which combine both functions
> 
> Building an abstraction layer which handles all device types requires
> either 
> 
> - that e.g. a read only device needs to be combined with a next
> interrupt device in some way. This introduces artifical combos of
> functionality which can and should be handled completely seperate.
> 
> or 
> 
> - to handle all the corner cases where a device has to be handled which
> only provides one of the functionalies. Also the selection of different
> combinations of devices will introduce extra handling code.

Neither excludes a basic clock source abstraction, which can create any 
kind of clock events. The basic question is should any clock source be 
able to create every kind of clock events?
The problem here is that we may have to synchronize reading time with the 
timer interrupt. To make an extreme example let's assume the TSC clock is 
updated every 1msec and the PIT generates an interrupt every 1.01msec, 
this means jiffies is updated every 100th interrupt by 2. Also the NTP 
error adjustment algorithm works better with small offsets, my latest 
version compensates better for that, but a smaller offset still means a 
smaller jitter.
These kinds of problems I had in mind when I wrote "Something like TSC 
should internally use another clocksource" and if you had asked I could 
have explained this, but I had no chance as you simply jumped to your own 
conclusions. :-(

> So we have two functional domains, which provide a high level interface:
> 
> The clock source abstraction provides:
> - gettimeofday()
> - getmonotonic()
> - settimeofday()
> - clock skew adjustment functions()
> 
> The clock event source abstraction provides:
> - setup periodic events()
> - setup one shot events()
> - associate services (function to call on an event) to a clock event
> source
> 
> This is a clear seperation and there is no combined functionality.

Unfortunately it's not that simple, e.g. generating gettimeofday() 
requires a periodic event. At some point you have to combine 
functionality, whether it's done inside the clocksource or by something 
else is a different discussion. My proposal to do it inside the clock 
source was because we don't have that "something else" yet (which may be 
your clock events, I don't know).

I have to skip the rest of your mail, as it's so generic and so out of 
context that I don't really disagree with it, if there was something 
important, please put back it into some context.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-05 20:44       ` Roman Zippel
@ 2006-04-05 22:18         ` Thomas Gleixner
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2006-04-05 22:18 UTC (permalink / raw)
  To: Roman Zippel; +Cc: johnstul, linux-kernel, Andrew Morton, Ingo Molnar

Roman,

On Wed, 2006-04-05 at 22:44 +0200, Roman Zippel wrote:
> What I had in mind is the _current_ setup, that e.g. the scheduler tick 
> can be generated by a clock event is a possibility, but since your 
> clockevent patch is not even in -mm, it wasn't something I had in mind 
> when I wrote this comment. This comment was specifically about the 
> _current_ way time.c sets up the timer interrupt and i8253.c sets up the 
> timer hardware.
> I also specifically said "This is not important right now, but ... 
> something to keep in mind" to indicate this needs further discussion, 
> nowhere did I ever exclude the possibility to do this via clock events, 
> but you just jumped at me with "this is wrong".

Can we stop missquoting on both sides please ? I did not jump in. I
said:

> I don't think so. Coupling the clock sources to clock event sources is
> wrong.

I unfortunately forgot to add an IMHO to every single sentence, but "I
don't think so." should have made it clear, that this is my opinion.
Maybe my knowledge of the English language is worse than I thought.

> > Why are those seperate items ?
> > 
> > Looking at the various hardware implementations there are three types of
> > devices:
> > 
> > 1. Read only devices (e.g. TSC)
> > 2. Next interrupt only devices (e.g. various timer implementations on
> > SoC CPUs)
> > 3. Devices which combine both functions
> > 
> > Building an abstraction layer which handles all device types requires
> > either 
> > 
> > - that e.g. a read only device needs to be combined with a next
> > interrupt device in some way. This introduces artifical combos of
> > functionality which can and should be handled completely seperate.
> > 
> > or 
> > 
> > - to handle all the corner cases where a device has to be handled which
> > only provides one of the functionalies. Also the selection of different
> > combinations of devices will introduce extra handling code.
> 
> Neither excludes a basic clock source abstraction, which can create any 
> kind of clock events. The basic question is should any clock source be 
> able to create every kind of clock events?

At least should the design allow that. Making restrictions in such a
layer upfront due to requirements of some known hardware devices would
be bad.

> The problem here is that we may have to synchronize reading time with the 
> timer interrupt. To make an extreme example let's assume the TSC clock is 
> updated every 1msec and the PIT generates an interrupt every 1.01msec, 
> this means jiffies is updated every 100th interrupt by 2. Also the NTP 
> error adjustment algorithm works better with small offsets, my latest 
> version compensates better for that, but a smaller offset still means a 
> smaller jitter.

Why does this require a hard coupled design ? You want a periodic event
for X with as much precision as available. Let the system choose one
which fits this best.

> > This is a clear seperation and there is no combined functionality.
> 
> Unfortunately it's not that simple, e.g. generating gettimeofday() 
> requires a periodic event.

It does not necessarily require a periodic event. It depends on the type
of clock source you are using and the requirements of synchronization.
The DCF-77 card in one of my boxen gives me auto sychronized time with
1nsec resolution without any periodic event. Also embedded systems which
do not use NTP do not need a periodic event for gettimeofday() when
there is a continous clock source available.

If the NTP mechanisms needs a precise periodic event, then it can
request it from the layer which manages those events.

>  At some point you have to combine 
> functionality, whether it's done inside the clocksource or by something 
> else is a different discussion. My proposal to do it inside the clock 
> source was because we don't have that "something else" yet (which may be 
> your clock events, I don't know).

So we move it into clock source because there is nothing else right
now ? That sounds like moving the mess you identified in x386 from there
to a another place.

IMO this is well worth a discussion how we want to solve this in a solid
and generic way.

> I have to skip the rest of your mail, as it's so generic and so out of 
> context that I don't really disagree with it, if there was something 
> important, please put back it into some context.

Sorry Roman, your way of discussion is not understandable to me. On one
hand you ask for explanations. When you get them, you just brush them
aside. Thanks for reminding me that my efforts to communicate with you
just produce unimportant blurb.

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-03 19:54 [PATCH 0/5] clocksource patches Roman Zippel
  2006-04-03 20:08 ` Roman Zippel
  2006-04-04  4:53 ` Thomas Gleixner
@ 2006-04-07 17:57 ` john stultz
  2006-04-27 20:33   ` Roman Zippel
  2 siblings, 1 reply; 16+ messages in thread
From: john stultz @ 2006-04-07 17:57 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton

On Mon, 2006-04-03 at 21:54 +0200, Roman Zippel wrote:
> Some more notes about the patches:

Hey Roman,
	I'll reply to more specifics to the patches themselves, but I'll try to
get some general clarifications here.

> 1. generic clocksource updates:
> The most important change is probably clocksource_get_nsec_offset(), it 
> returns the nsec part of the realtime clock and is adjusted once a second. 
> Other time services can be built on top of this and also only have to be 
> updated once a second via second_overflow(). I changed the return value to 
> an unsigned long which gives us a 3 second window, which should be more 
> than enough (anyone not allowing the timer interrupt for that long gets 
> what he is asking for). This function should become clocksource specific 
> so arch/clocks can optimize this function themselves (e.g. changing 
> resolution, making some parameters constant). The generic clocksource 
> currently still deals a bit too much with cycles, more of this should be 
> moved into the clocksource drivers (using library functions).

I'm still not convinced for the need of clocksource specific
get_nsec_offset() functions. I really want to limit the clocksource
structure to be as stateless as possible and to only provide an
abstraction to the hardware counter below. 

But clearly you are pretty concerned about it, so maybe could you share
the case you have in mind where it would be necessary?

> I'm also still interested in opinions about different possibilities to 
> organize the clocksource infrastructure (although I'd more appreciate 
> pro/con arguments than outright rejections). One possibility here would be 
> to also shorten the function names a bit (clocksource_ is a lot to type :) 
> ), cs_... or clock_... would IMO work too.

I *much* prefer the clarity of clocksource over the wear and tear typing
it might take on my fingers. 

> 2. periodic clocksource update
> I updated the error adjustment algorithm to be more robust and optimized 
> it a bit for the more common cases. It does everything in 64bit right now, 
> which is fine for 64bit archs and it allows resolutions of less then 
> 1nsec, but as long as e.g. gettimeofday() takes longer than 1nsec that's 
> a little wasteful and nobody will see a difference if we "restrict" the 
> resolution to 1nsec, which would allow to keep parts of it in 32bit.
> 
> I also kept it separate from the do_timer() callback and simply created a 
> new callback function, which archs can use. This makes it less likely to 
> break any existing setup and even allows to coexists both methods until 
> archs have been completely converted.

One of the reasons I did the significant rework of the TOD patches was
so that we could generically introduce the clocksource abstraction first
(using jiffies) for all arches. I feel this provides a much smoother
transition to the generic timekeeping code (and greatly reduces the
amount of CONFIG_GENERIC_TIME specific code), so I'm not sure I
understand why you want to go back to separate do_timer() functions. 

I guess what I'm asking for is what was wrong with the way my code did
it? What breakage are you concerned about?

> Another general comment from an arch/driver specific perspective: right 
> now everything is rather concentrated on getting the time, but AFAICT it's 
> more common that the subsystem which is used to read the time is also used 
> as timer (i.e. for the scheduler tick), this means the clocksource driver 
> should also include the interrupt setup. Consequently this means the 
> update callback isn't sufficient anymore and we need at least something 
> like start/stop callbacks. This also means the jiffie clocksource becomes 
> basically obsolete, since every arch already needs at least a basic 
> clocksource to provide the scheduler tick (which is setup via 
> time_init()).
> i386 is currently rather hardcoded to use the i8253 timer, but AFAIK it 
> would be desirable to e.g. use HPET for that (especially for hrtimer). 
> Something like TSC should internally use another clocksource to provide 
> the timer interrupt. Anyway, i386 is quite a mess here right now and I can 
> understand that you wanted to stay away from it with the generic 
> gettimeofday infrastructure. :-)
> This is not important right now, but I wanted to mention it, it's IMO
> something to keep in mind for the next steps and what at least I'll 
> look out for.

Thomas has already commented on this, and maybe we both misunderstand
what your suggesting here, but I am pretty hesitant to bind the clock
source / hardware interrupt timer code together. The reason for that is
in my experience w/ i386 (and I'll take my lumps for my part in the i386
timekeeping mess) the number of combinations of clock/timers is
something like: PIT/PIT, TSC/PIT, CYCLONE/PIT, ACPIPM/PIT, HPET/PIT,
TSC/HPET, HPET/HPET. And with some of Andi's patches x86-64 patches you
can do them all over /APIC.

My goal with the clocksources is to take just the counter aspect of any
hardware and export it generically so we can use that inside generic
code. I also would support a similar timer interrupt source abstraction,
like what Thomas has started in his clockevent patches in his hrt
patchset. The hope being we can later freely mix and match
clocksources/clockevents, allowing for some of the features needed for
dynamic ticks and virtualization.

Do these concerns make sense to you? Or was it something else you were
suggesting?

thanks
-john

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-07 17:57 ` john stultz
@ 2006-04-27 20:33   ` Roman Zippel
  2006-05-06  2:04     ` john stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Zippel @ 2006-04-27 20:33 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, Andrew Morton

Hi,

On Fri, 7 Apr 2006, john stultz wrote:

Sorry about the long delay, I was engrossed in other problems and I had to 
check a few things for a proper reply, which I've put off. Anyway, sorry. :)

> I'm still not convinced for the need of clocksource specific
> get_nsec_offset() functions. I really want to limit the clocksource
> structure to be as stateless as possible and to only provide an
> abstraction to the hardware counter below. 
> 
> But clearly you are pretty concerned about it, so maybe could you share
> the case you have in mind where it would be necessary?

The main point is performance. This is a very important function and your 
approach requires to assume the worst case, e.g. it has to be done in 
64bit, which OTOH can be easily optimized in the driver.
I can understand that you want to keep the clock driver as simple as 
possible, but that's not how Linux works - we usually don't abstract away 
every detail. It's always a compromise between simplicity, flexibility and 
performance. This means sometimes we do leave things to the drivers which 
can simply use some generic functions if they don't have any special 
requirements.
By pushing all the complexity into a middle layer you're running the risk 
of creating a bloated middle layer with a big runtime complexity, which 
has to deal with every little special case. Pushing at least part of this 
complexity into the driver allows us to make some decisions already at 
compile time and thus reducing runtime complexity.
Simply look at the various kernel APIs, middle layers tend to be rather 
light weight, provide _common_ services and leave the real work to the 
drivers, e.g. the fs layer does the path translation and the basic 
locking, but we have no special handling for block vs net fs drivers, 
instead we provide library functions which also deal with such special 
cases as support for holes in files.

I really don't understand your problem with a clocksource specific 
nsec_offset(), the author already has to provide most of the basic 
parameters, it's not that difficult to put them together and for the 
really lazy we can provide a:

my_clock_nsec_offset(struct clocksource *cs)
{
	return generic_nsec_offset(cs, my_get_cycles(), MASK, SHIFT);
}

this is _very_ simple and if some of the parameters are constant you 
already saved a few cycles.

John, we have to find some compromise here, but I think sacrificing 
everything to driver simplicity is IMO a huge mistake. A "simple" driver 
gives you a lot of flexibility on the generic side, but you remove this 
flexibility from the driver side, i.e. if a driver doesn't fit into this 
cycle model (e.g. tick based drivers) it has to do extra work to provide 
this abstraction.
A good abstraction should concentrate on the _common_ properties and I 
don't think that the continous cycle model is a good general abstraction 
for all types of clocks. Tick based clocks are already more complex due 
the extra work needed to emulate the cycle counter. Some clocks may 
already provide a nsec value (e.g. virtual clocks in a virtualized 
environment), where your generic nsec calculation is a complete waste of 
time. A common property of all clocks is that we want a nsec value from 
them, so why not simply ask the clock for some kind of nsec value and 
provide the clock driver with the necessary library routines to convert 
the cycle value to a nsec value, where you actually have the necessary 
information to create efficient code. As long as you try to pull the cycle 
model into the generic model it will seriously suck from a performance 
perspective, as you separate the cycle value from the information how to 
deal with it efficiently.

> > I'm also still interested in opinions about different possibilities to 
> > organize the clocksource infrastructure (although I'd more appreciate 
> > pro/con arguments than outright rejections). One possibility here would be 
> > to also shorten the function names a bit (clocksource_ is a lot to type :) 
> > ), cs_... or clock_... would IMO work too.
> 
> I *much* prefer the clarity of clocksource over the wear and tear typing
> it might take on my fingers. 

What is the special meaning of "clocksource" vs e.g. just "clock"?

> > I also kept it separate from the do_timer() callback and simply created a 
> > new callback function, which archs can use. This makes it less likely to 
> > break any existing setup and even allows to coexists both methods until 
> > archs have been completely converted.
> 
> One of the reasons I did the significant rework of the TOD patches was
> so that we could generically introduce the clocksource abstraction first
> (using jiffies) for all arches. I feel this provides a much smoother
> transition to the generic timekeeping code (and greatly reduces the
> amount of CONFIG_GENERIC_TIME specific code), so I'm not sure I
> understand why you want to go back to separate do_timer() functions. 

Apropos code size: I checked the generated code size of timer.o (default 
i386 config), before it was 3248 bytes, with your patches it goes to 5907 
bytes and by disabling CONFIG_GENERIC_TIME it still is 4554. With my 
patches it's 4767/4347 bytes (removing the old time code saves another 
315 bytes).
In my version I was very careful to keep the common interrupt paths as 
short as possible, so that most of this code is not even executed most of 
the time. You can't just "hope" that gcc does the work for you (especially 
with 64bit math), you actually have to check the code and you would have 
the noticed the bloated code it generates.

Keeping the old time code separate makes it easier to optimize the new 
code and is more flexible, e.g. for a tick based clock source we can 
provide an update function which does something like this:

	clocksource_update_tick();
	clocksource_adjust(0);

This means the loop is gone and the adjustment code becomes simpler due to 
the constant offset.

> Thomas has already commented on this, and maybe we both misunderstand
> what your suggesting here, but I am pretty hesitant to bind the clock
> source / hardware interrupt timer code together. The reason for that is
> in my experience w/ i386 (and I'll take my lumps for my part in the i386
> timekeeping mess) the number of combinations of clock/timers is
> something like: PIT/PIT, TSC/PIT, CYCLONE/PIT, ACPIPM/PIT, HPET/PIT,
> TSC/HPET, HPET/HPET. And with some of Andi's patches x86-64 patches you
> can do them all over /APIC.

I really don't want to hardcode anything, it's not my intention to 
recreate a mess similiar to timer_tsc.c.
My basic idea of clock abstraction would be something you can use to read 
the time and which you can set to generate an interrupt. This doesn't mean 
every clock driver has to provide everything, it should be not that 
difficult to _dynamically_ bind two drivers to provide this functionality 
(where and how this is done is not important here).
In the simple cases clock and timer functionality are provided by the same 
hardware and I really don't understand where the advantage is to 
artifically separate this functionality, _if_ this comes from the same 
hardware. If I want to write a driver for a _simple_ timer chip, I want to 
export its functionality via one interface, so e.g. I can use it as the 
system clock, this means the driver simply registers itself as a clock 
driver and the generic system sets it up as a system clock. This is what I 
need from an arch perspective and I want to keep this as simple as 
possible and splitting this simple thing into different abstractions runs 
contrary to that.

> My goal with the clocksources is to take just the counter aspect of any
> hardware and export it generically so we can use that inside generic
> code. I also would support a similar timer interrupt source abstraction,
> like what Thomas has started in his clockevent patches in his hrt
> patchset. The hope being we can later freely mix and match
> clocksources/clockevents, allowing for some of the features needed for
> dynamic ticks and virtualization.

I have looked at the clockevent code, I can see what I does, but I don't 
really understand why it does it this way, it simply doesn't match my 
expectations/needs from a clock. The code is barely documented and 
there is no explanation of the basic design ideas and requirements it's 
based on. As long as this is missing it's useless to me and my attempts so 
far at getting Thomas to explain his code have been rather fruitless. :(
It's possible it's some ingenious piece of code, but if I have trouble 
understanding the code, I don't think I'm going to be the only one.

Maybe I have a completely different idea of this, which prevents me from 
understanding this, so it would really help if someone would explain the 
advantages of the clocksource/clockevent abstraction and connect the dots 
of how it fits into the general model between clock hardware/driver and 
the generic kernel clock/timer services.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-04-27 20:33   ` Roman Zippel
@ 2006-05-06  2:04     ` john stultz
  2006-05-06 16:25       ` Roman Zippel
  0 siblings, 1 reply; 16+ messages in thread
From: john stultz @ 2006-05-06  2:04 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton

On Thu, 2006-04-27 at 22:33 +0200, Roman Zippel wrote:
> I really don't understand your problem with a clocksource specific 
> nsec_offset(), the author already has to provide most of the basic 
> parameters, it's not that difficult to put them together and for the 
> really lazy we can provide a:
> 
> my_clock_nsec_offset(struct clocksource *cs)
> {
> 	return generic_nsec_offset(cs, my_get_cycles(), MASK, SHIFT);
> }
> 
> this is _very_ simple and if some of the parameters are constant you 
> already saved a few cycles.

We already have this, its the per-arch gettimeofday(). 

My issue here is that by pushing this functionality off into the
clocksource driver, not only do you complicate the driver, you also are
implicitly binding the driver to a form of tick based timekeeping
(question: from what point is the nsec_offset measuring from?). Also,
this has implications w/ NTP, as the nsec_offset() function now must do
its own scaling internally. 

I think this sort of get_nsec_offset() interface is needed, but it
doesn't belong in the clocksource driver, because that fouls the
abstraction.

Instead, why don't we have two implementations of __get_nsec_offset()?
One which uses the clean clocksource abstraction, and one that can call
your arch_get_nsec_offset()? 

Here's what I'm proposing for generic code:

do_get(ns)timeofday():
	xtime + __get_nsec_offset()

update_wall_time():
	now = clocksource_read() /* <- that's jiffies in tick-clock case */
	while (now - last > interval_cycles):
		last += interval_cycles
		xtime += interval_nsecs
		error += interval_ntp - interval_nsecs
		ntp_adjust(...)
	...

#ifdef CONTINUOUS_CLOCK
__get_nsec_offset():
	now = clock_read
	return ((now - last)*mult)>>shift
#else /*TICK_CLOCK*/
__get_nsec_offset():
	return arch_get_nsec_offset()
#endif

How's that sound?

> John, we have to find some compromise here, but I think sacrificing 
> everything to driver simplicity is IMO a huge mistake. 

We're coming up on the one year mark for this discussion. Every pass
I've taken your feedback, worked to understand it (which, i admit can
take me awhile :), and implemented parts of your ideas into the code. I
really do appreciate this cycle, so I hope you don't find me
thick-headed or stubborn. I believe I've been happy to compromise every
step of the way.

I do want to find a solution that we both like, but I am feeling some
fatigue and we need to make some progress.  The existing tick based
assumptions in mainline is blocking the ability to sanely implement both
bug fixes and new features.

> A "simple" driver 
> gives you a lot of flexibility on the generic side, but you remove this 
> flexibility from the driver side, i.e. if a driver doesn't fit into this 
> cycle model (e.g. tick based drivers) it has to do extra work to provide 
> this abstraction.

I'm not trying to fit tick based clocks into the continuous model. I'm
trying to allow the currently jiffies based timekeeping code to possibly
use something else, cleaning up a lot of code in the process.

> A good abstraction should concentrate on the _common_ properties and I 
> don't think that the continous cycle model is a good general abstraction 
> for all types of clocks. Tick based clocks are already more complex due 
> the extra work needed to emulate the cycle counter. Some clocks may 
> already provide a nsec value (e.g. virtual clocks in a virtualized 
> environment), where your generic nsec calculation is a complete waste of 
> time. A common property of all clocks is that we want a nsec value from 
> them, so why not simply ask the clock for some kind of nsec value and 
> provide the clock driver with the necessary library routines to convert 
> the cycle value to a nsec value, where you actually have the necessary 
> information to create efficient code. As long as you try to pull the cycle 
> model into the generic model it will seriously suck from a performance 
> perspective, as you separate the cycle value from the information how to 
> deal with it efficiently.

For features like robust timekeeping in the face of lost ticks (needed
for virtualization, and realtime), as well as high-res timers and
dynamic/no-idle ticks, we *NEED* a continuous clock. 

I made a quick audit of the arches to see what the breakdown was for
continuous vs tick clocks: 

Continuous:
i386,x86-64, ia64, powerpc, ppc, sparc, alpha, mips, parisc, s390,
xtensa, and arm (pxa, sa1100, plat-omap) 

Tick:
arm (other), cris, m32, m68k, sh

xtime/no intertick resolution:
frv, h8300, v850

I'll admit, the continuos cycle model doesn't fit tick based clocks, but
we shouldn't limit the abstraction to the lowest common denominator. By
providing what I suggested above (w/ the two __get_nsec_offset()
implementations), we reduce code for both models, and allow progress for
systems w/ continuous clocks in a generic fashion.

> > > I'm also still interested in opinions about different possibilities to 
> > > organize the clocksource infrastructure (although I'd more appreciate 
> > > pro/con arguments than outright rejections). One possibility here would be 
> > > to also shorten the function names a bit (clocksource_ is a lot to type :) 
> > > ), cs_... or clock_... would IMO work too.
> > 
> > I *much* prefer the clarity of clocksource over the wear and tear typing
> > it might take on my fingers. 
> 
> What is the special meaning of "clocksource" vs e.g. just "clock"?

"clock" is already overloaded. We have the
CLOCK_MONOTONIC/CLOCK_REALTIME clocks, we have the RTC clocks... Its a
cycle counter we're using to accumulate time, thus clocksource seemed
understandable and unique.

> > > I also kept it separate from the do_timer() callback and simply created a 
> > > new callback function, which archs can use. This makes it less likely to 
> > > break any existing setup and even allows to coexists both methods until 
> > > archs have been completely converted.
> > 
> > One of the reasons I did the significant rework of the TOD patches was
> > so that we could generically introduce the clocksource abstraction first
> > (using jiffies) for all arches. I feel this provides a much smoother
> > transition to the generic timekeeping code (and greatly reduces the
> > amount of CONFIG_GENERIC_TIME specific code), so I'm not sure I
> > understand why you want to go back to separate do_timer() functions. 
> 
> Apropos code size: I checked the generated code size of timer.o (default 
> i386 config), before it was 3248 bytes, with your patches it goes to 5907 
> bytes and by disabling CONFIG_GENERIC_TIME it still is 4554. With my 
> patches it's 4767/4347 bytes (removing the old time code saves another 
> 315 bytes).

This isn't really a fair comparison (yet atleast), as your patches don't
appear to handle suspend/resume correctly. Nor did your patches even
boot on my laptop. :(

I'll admit that my code could use more low-level optimization, and I
welcome patches against my code to improve it. My code has already
gotten a good amount of testing in both -mm and -rt, so I know it works.
Lets make it fast too, but just in steps.

> In my version I was very careful to keep the common interrupt paths as 
> short as possible, so that most of this code is not even executed most of 
> the time. You can't just "hope" that gcc does the work for you (especially 
> with 64bit math), you actually have to check the code and you would have 
> the noticed the bloated code it generates.

Good point. I just sent a patch to Andrew fixing the assumption that gcc
could avoid a few of the mults. And I'm working on more patches to
reduce the text size as well.

Once again, I do appreciate the feedback.

thanks
-john

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-05-06  2:04     ` john stultz
@ 2006-05-06 16:25       ` Roman Zippel
  2006-05-08 18:33         ` john stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Zippel @ 2006-05-06 16:25 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, Andrew Morton

Hi,

On Fri, 5 May 2006, john stultz wrote:

> > A good abstraction should concentrate on the _common_ properties and I 
> > don't think that the continous cycle model is a good general abstraction 
> > for all types of clocks. Tick based clocks are already more complex due 
> > the extra work needed to emulate the cycle counter. Some clocks may 
> > already provide a nsec value (e.g. virtual clocks in a virtualized 
> > environment), where your generic nsec calculation is a complete waste of 
> > time. A common property of all clocks is that we want a nsec value from 
> > them, so why not simply ask the clock for some kind of nsec value and 
> > provide the clock driver with the necessary library routines to convert 
> > the cycle value to a nsec value, where you actually have the necessary 
> > information to create efficient code. As long as you try to pull the cycle 
> > model into the generic model it will seriously suck from a performance 
> > perspective, as you separate the cycle value from the information how to 
> > deal with it efficiently.
> 
> 
> For features like robust timekeeping in the face of lost ticks (needed
> for virtualization, and realtime), as well as high-res timers and
> dynamic/no-idle ticks, we *NEED* a continuous clock. 

Let's concentrate on the core issue.
I do agree that we need a continuous clock, but why has this clock to be 
cycle based? As I tried to explain above the cycle based abstraction hurts 
performance, as it cuts us off from further optimizations.
What we want in the end is continous _nanosecond_ value, so why not let 
the abstraction base on this? Why is the cycle value so important?

> I made a quick audit of the arches to see what the breakdown was for
> continuous vs tick clocks: 
> 
> Continuous:
> i386,x86-64, ia64, powerpc, ppc, sparc, alpha, mips, parisc, s390,
> xtensa, and arm (pxa, sa1100, plat-omap) 
> 
> Tick:
> arm (other), cris, m32, m68k, sh
> 
> xtime/no intertick resolution:
> frv, h8300, v850
> 
> I'll admit, the continuos cycle model doesn't fit tick based clocks, but
> we shouldn't limit the abstraction to the lowest common denominator. By
> providing what I suggested above (w/ the two __get_nsec_offset()
> implementations), we reduce code for both models, and allow progress for
> systems w/ continuous clocks in a generic fashion.

Why can't we reach the same goal by providing library functions to the 
drivers, which would allow far better optimizations?
Of these archs ppc has a higly optimized lock and condition free 
gettimeofday implementation, which you simply throw away. I'm afraid that 
archs which care about performance have to work around your slow generic 
implementation. I have a big problem seeing progress in this.

John, I see three options:
1. I'm missing the point in your design which allows for further 
optimizations in a generic way.
2. Something is wrong with your cycle based design.
3. Performance isn't important anymore.

> > > I *much* prefer the clarity of clocksource over the wear and tear typing
> > > it might take on my fingers. 
> > 
> > What is the special meaning of "clocksource" vs e.g. just "clock"?
> 
> 
> "clock" is already overloaded. We have the
> CLOCK_MONOTONIC/CLOCK_REALTIME clocks, we have the RTC clocks... Its a
> cycle counter we're using to accumulate time, thus clocksource seemed
> understandable and unique.

The first would actually be my reason for calling it "clock", i.e. if this 
is going to be _the_ central abstraction for further time based services, 
it deserves the shorter and concise "clock" name.

> This isn't really a fair comparison (yet atleast), as your patches don't
> appear to handle suspend/resume correctly. Nor did your patches even
> boot on my laptop. :(

Why didn't you mentioned this earlier? :(

> I'll admit that my code could use more low-level optimization, and I
> welcome patches against my code to improve it. My code has already
> gotten a good amount of testing in both -mm and -rt, so I know it works.

Andrew immediately drops my patches, so they don't even get a chance to 
get any testing. This is a complete unfair argument, I don't even get the 
chance to prove that my code might be better. :-(

> Lets make it fast too, but just in steps.

The first step would be to keep it separate from the current 
update_wall_time() code. I just got rid of clock read in the hrtimer 
interrupt code you are about to introduce it again here. Many clocks don't 
need that much precision, and especially if it's an expensive operation 
it's a complete waste of time.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-05-06 16:25       ` Roman Zippel
@ 2006-05-08 18:33         ` john stultz
  2006-05-08 21:15           ` Roman Zippel
  0 siblings, 1 reply; 16+ messages in thread
From: john stultz @ 2006-05-08 18:33 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton

On Sat, 2006-05-06 at 18:25 +0200, Roman Zippel wrote:
> On Fri, 5 May 2006, john stultz wrote:
> 
> > > A good abstraction should concentrate on the _common_ properties and I 
> > > don't think that the continous cycle model is a good general abstraction 
> > > for all types of clocks. Tick based clocks are already more complex due 
> > > the extra work needed to emulate the cycle counter. Some clocks may 
> > > already provide a nsec value (e.g. virtual clocks in a virtualized 
> > > environment), where your generic nsec calculation is a complete waste of 
> > > time. A common property of all clocks is that we want a nsec value from 
> > > them, so why not simply ask the clock for some kind of nsec value and 
> > > provide the clock driver with the necessary library routines to convert 
> > > the cycle value to a nsec value, where you actually have the necessary 
> > > information to create efficient code. As long as you try to pull the cycle 
> > > model into the generic model it will seriously suck from a performance 
> > > perspective, as you separate the cycle value from the information how to 
> > > deal with it efficiently.
> > 
> > 
> > For features like robust timekeeping in the face of lost ticks (needed
> > for virtualization, and realtime), as well as high-res timers and
> > dynamic/no-idle ticks, we *NEED* a continuous clock. 
> 
> Let's concentrate on the core issue.
> I do agree that we need a continuous clock, but why has this clock to be 
> cycle based? As I tried to explain above the cycle based abstraction hurts 
> performance, as it cuts us off from further optimizations.

First of all, you didn't reply specifically, but I hope we're in
agreement w/ the tick based clocks being not part of this specific
discussion. I'm fine letting systems w/ tick based clocks have an
get_nsec_offset() that is fully arch specific. And I don't love it, but
I can deal w/ having two update_wall_time() paths so tick based systems
can get some extra constant based optimizations.

Now, on to the continuous clocksource discussion :)

What arch specific optimizations for continuous clocks do you have in
mind? In other words, what would be an example of an architecture
specific optimization for generating time from a continuous counter?

For the sake of this discussion, I claim that optimizations made on
converting a continuous cycle based clock to an NTP adjusted time can be
made to all arches, and pushing the nanosecond conversion into the
driver is messy and needless.  What are examples contrary to this claim?

> What we want in the end is continous _nanosecond_ value, so why not let 
> the abstraction base on this? Why is the cycle value so important?

Because doing the NTP adjustment correctly on a cycle based clock is
difficult with the current code (I think ppc is the only one that does
it correctly, in my mind, by changing the frequency multiplier).

It can be done generically, and I do not see what sort of optimizations
you're imagining, so why keep it arch specific?

> Of these archs ppc has a higly optimized lock and condition free 
> gettimeofday implementation, which you simply throw away. I'm afraid that 
> archs which care about performance have to work around your slow generic 
> implementation. I have a big problem seeing progress in this.

The ppc's lockfree implementation is interesting (putting aside for a
moment the fact that the current ppc vsyscall-gtod added locks back to
the code).

However I don't see how its an arch specific optimization! Its simply
doing atomic updates via pointer switches between two structures. This
doesn't need to be ppc specific, yet because of the current mess it is.
This is a great example of why the generic code would be useful.

> > This isn't really a fair comparison (yet atleast), as your patches don't
> > appear to handle suspend/resume correctly. Nor did your patches even
> > boot on my laptop. :(
> 
> Why didn't you mentioned this earlier? :(

I apologize for not trying to run your patches earlier, but being time
constrained as well, I have been trying to focus understand the
algorithmic differences.

> > Lets make it fast too, but just in steps.
> 
> The first step would be to keep it separate from the current 
> update_wall_time() code. I just got rid of clock read in the hrtimer 
> interrupt code you are about to introduce it again here. Many clocks don't 
> need that much precision, and especially if it's an expensive operation 
> it's a complete waste of time.

With continuous cycle based counters, the clock read is *necessary* when
updating xtime for robust timekeeping. We can move update_wall_time so
we don't run it every timer interrupt, but we cannot keep correct time
by just guessing how much time has passed and adding it in.

On tick based systems, the code in -mm, would just be reading jiffies
which is equivalent to how its done in mainline. But I'll grant you we
probably miss out on some of the optimizations where we could use
constants, so I'll add in a tick based update_wall_time path soon.

thanks
-john

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-05-08 18:33         ` john stultz
@ 2006-05-08 21:15           ` Roman Zippel
  2006-05-09  0:29             ` john stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Zippel @ 2006-05-08 21:15 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, Andrew Morton

Hi,

On Mon, 8 May 2006, john stultz wrote:

> > Let's concentrate on the core issue.
> > I do agree that we need a continuous clock, but why has this clock to be 
> > cycle based? As I tried to explain above the cycle based abstraction hurts 
> > performance, as it cuts us off from further optimizations.
> 
> 
> First of all, you didn't reply specifically, but I hope we're in
> agreement w/ the tick based clocks being not part of this specific
> discussion. I'm fine letting systems w/ tick based clocks have an
> get_nsec_offset() that is fully arch specific. And I don't love it, but
> I can deal w/ having two update_wall_time() paths so tick based systems
> can get some extra constant based optimizations.

Please leave the tick based stuff out of this completely. We want to get 
rid of the arch specific hooks and not add them back.
The point is to give the _clock_ control over this kind of stuff, only the 
clock driver knows how to deal with this efficiently, so as long as you 
try to create a "dumb" clock driver you're going to make things only 
worse. Now the archs have most control over it, but creating a single 
bloated and slow blop instead will not be an improvement.
It's not about moving everything into the clock driver here, it's about 
creating a _powerful_ API, which leaves control in the hands of the clock 
driver, but at the same time keeps them as _simple_ (and not as dumb) as 
possible.

> What arch specific optimizations for continuous clocks do you have in
> mind? In other words, what would be an example of an architecture
> specific optimization for generating time from a continuous counter?

The best example is the powerpc gettimeofday.

> For the sake of this discussion, I claim that optimizations made on
> converting a continuous cycle based clock to an NTP adjusted time can be
> made to all arches, and pushing the nanosecond conversion into the
> driver is messy and needless.  What are examples contrary to this claim?

What kind of NTP adjustments are you talking about? A nsec_offset function 
could look like this:

unsigned long nsec_offset(cs)
{
	return ((cs->xtime_nsec + get_cycles() - cs->last_offset) * cs->mult) >> SHIFT;
}

This is fucking simple, what about this is "messy"? There is no NTP 
adjustment here, this is all happening somewhere else. Keeping it in the 
driver allows to make parameter constant, skip unnecessary steps and 
allows to do it within 32bit. This is something you can _never_ do in a 
generic centralized way without making it truly messy. I'd be happy to be 
proven otherwise, but I simply don't see it.

> > What we want in the end is continous _nanosecond_ value, so why not let 
> > the abstraction base on this? Why is the cycle value so important?
> 
> Because doing the NTP adjustment correctly on a cycle based clock is
> difficult with the current code (I think ppc is the only one that does
> it correctly, in my mind, by changing the frequency multiplier).

I was talking about the nsec offset (see above), which is outside of the 
NTP business. What is your fascination with the cycle counter, that you 
want to make it so generic?

> It can be done generically, and I do not see what sort of optimizations
> you're imagining, so why keep it arch specific?

John, it's _clock_ specific.

> > Of these archs ppc has a higly optimized lock and condition free 
> > gettimeofday implementation, which you simply throw away. I'm afraid that 
> > archs which care about performance have to work around your slow generic 
> > implementation. I have a big problem seeing progress in this.
> 
> The ppc's lockfree implementation is interesting (putting aside for a
> moment the fact that the current ppc vsyscall-gtod added locks back to
> the code).
> 
> However I don't see how its an arch specific optimization! Its simply
> doing atomic updates via pointer switches between two structures. This
> doesn't need to be ppc specific, yet because of the current mess it is.
> This is a great example of why the generic code would be useful.

Part of this can be done generically and I'd like to see it implemented, 
but not all the optimizations can be done generically, e.g. not every arch 
has a convenient mulhdu instruction like ppc64.
The arch can provide some base optimizations, but in the end it's the 
clock which knows best (e.g. 32bit vs. 64bit resolution). Only the clock 
driver knows the basic clock parameters at _compile_ time and can generate 
an efficient implementation. With the help of some library functions we 
can keep the amount of source needed in the driver to a minimum and 
simple. Where is the problem with this?

> > The first step would be to keep it separate from the current 
> > update_wall_time() code. I just got rid of clock read in the hrtimer 
> > interrupt code you are about to introduce it again here. Many clocks don't 
> > need that much precision, and especially if it's an expensive operation 
> > it's a complete waste of time.
> 
> With continuous cycle based counters, the clock read is *necessary* when
> updating xtime for robust timekeeping. We can move update_wall_time so
> we don't run it every timer interrupt, but we cannot keep correct time
> by just guessing how much time has passed and adding it in.

It has almost nothing to do with continuous cycles. On an UP system only 
anything running with a higher priority than the timer interrupt or if the 
cycle adjustment happens asynchron to the timer interrupt (e.g. TSC) can 
see the adjustment. Again it depends on the clock whether the common 
adjustment is significant bigger than the time needed to read the clock, 
otherwise it's just a waste of time.

> On tick based systems, the code in -mm, would just be reading jiffies
> which is equivalent to how its done in mainline. But I'll grant you we
> probably miss out on some of the optimizations where we could use
> constants, so I'll add in a tick based update_wall_time path soon.

John, again, it's not about tick based systems, you'll get it wrong as 
long as you think it's about special exceptions for tick based systems. 
It's about creating a flexible system, which can handle about anything and 
only the clock driver really knows what this is, otherwise you create a 
bloated and slow generic system.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-05-08 21:15           ` Roman Zippel
@ 2006-05-09  0:29             ` john stultz
  2006-05-09 22:48               ` Roman Zippel
  0 siblings, 1 reply; 16+ messages in thread
From: john stultz @ 2006-05-09  0:29 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel, Andrew Morton

On Mon, 2006-05-08 at 23:15 +0200, Roman Zippel wrote:
> The point is to give the _clock_ control over this kind of stuff, only the 
> clock driver knows how to deal with this efficiently, so as long as you 
> try to create a "dumb" clock driver you're going to make things only 
> worse. Now the archs have most control over it, but creating a single 
> bloated and slow blop instead will not be an improvement.
> It's not about moving everything into the clock driver here, it's about 
> creating a _powerful_ API, which leaves control in the hands of the clock 
> driver, but at the same time keeps them as _simple_ (and not as dumb) as 
> possible.

Part of my concern here is keeping the code manageable and hackable.
What you're suggesting sounds very similar to what we have for the i386
timer_opts code, which I don't want to duplicate.

Maybe it would help here if you would better define the API for this
abstraction. As it stands, it appears incomplete. Define the state
values stored, and list what interfaces modify which state values, etc.

	ie: clock->get_nsec_offset():
		What exactly does this measure? Is this strictly defined by the state
stored in the clocksource? If not, how will this interact w/ dynamic
ticks? What is the maximum time we can delay interrupts before the clock
wraps? How do we know if its tick based? 

Another issue: if get_nsec_offset() is clock specific in its
implementation, but the state values it uses in at least the common case
are modified by generic code, how can we implement something like the
ppc lock free read? That will affect both the generic code and the clock
specific code.

> > What arch specific optimizations for continuous clocks do you have in
> > mind? In other words, what would be an example of an architecture
> > specific optimization for generating time from a continuous counter?
> 
> The best example is the powerpc gettimeofday.
> 
> > For the sake of this discussion, I claim that optimizations made on
> > converting a continuous cycle based clock to an NTP adjusted time can be
> > made to all arches, and pushing the nanosecond conversion into the
> > driver is messy and needless.  What are examples contrary to this claim?
> 
> What kind of NTP adjustments are you talking about? A nsec_offset function 
> could look like this:
> 
> unsigned long nsec_offset(cs)
> {
> 	return ((cs->xtime_nsec + get_cycles() - cs->last_offset) * cs->mult) >> SHIFT;
> }
> 
> This is fucking simple, what about this is "messy"? There is no NTP 
> adjustment here, this is all happening somewhere else. 

That's my point. if nsec_offset is opaque, then what is the interface
for making NTP adjustments to that function? Are all nsec_offset
functions required to use the xtime_nsec, last_offset, and mult values?


> Keeping it in the 
> driver allows to make parameter constant, skip unnecessary steps and 
> allows to do it within 32bit. This is something you can _never_ do in a 
> generic centralized way without making it truly messy. I'd be happy to be 
> proven otherwise, but I simply don't see it.

Well, my issue w/ you desire to have a 32bit continuous clock is that I
don't know how useful it would be. 

For a 1Mhz clock, 
	You've got 1,000 ns per cycle and the algorithm looks something like
say:
	get_nsec_offset: cycles * 1,000 >> 0
which gives you ~4 seconds of leeway, which is pretty good.
However, the short term jitter is 1000ppm.

So lets bump up the SHIFT value,
	get_nsec_offset: cycles * 1,024,000 >> 10

The short term jitter: <1ppm, so that's good.
But the max cycles is then ~4,000 (4ms), which is too short.

You can wiggle around here and maybe come up with something that works,
but it only gets worse for clocks that are faster. 

For robust timekeeping using continuous cycles, I think the 64bit mult
in gettimeofday is going to be necessary in the majority of cases (we
can wrap it in some sort of arch optimized macro for a mul_lxl_ll or
something if possible).


> > > The first step would be to keep it separate from the current 
> > > update_wall_time() code. I just got rid of clock read in the hrtimer 
> > > interrupt code you are about to introduce it again here. Many clocks don't 
> > > need that much precision, and especially if it's an expensive operation 
> > > it's a complete waste of time.
> > 
> > With continuous cycle based counters, the clock read is *necessary* when
> > updating xtime for robust timekeeping. We can move update_wall_time so
> > we don't run it every timer interrupt, but we cannot keep correct time
> > by just guessing how much time has passed and adding it in.
> 
> It has almost nothing to do with continuous cycles. On an UP system only 
> anything running with a higher priority than the timer interrupt or if the 
> cycle adjustment happens asynchron to the timer interrupt (e.g. TSC) can 
> see the adjustment. Again it depends on the clock whether the common 
> adjustment is significant bigger than the time needed to read the clock, 
> otherwise it's just a waste of time.

Eh? I didn't understand that. Mind clarifying/expanding here?

thanks
-john


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/5] clocksource patches
  2006-05-09  0:29             ` john stultz
@ 2006-05-09 22:48               ` Roman Zippel
  0 siblings, 0 replies; 16+ messages in thread
From: Roman Zippel @ 2006-05-09 22:48 UTC (permalink / raw)
  To: john stultz; +Cc: linux-kernel, Andrew Morton

Hi,

On Mon, 8 May 2006, john stultz wrote:

> On Mon, 2006-05-08 at 23:15 +0200, Roman Zippel wrote:
> > The point is to give the _clock_ control over this kind of stuff, only the 
> > clock driver knows how to deal with this efficiently, so as long as you 
> > try to create a "dumb" clock driver you're going to make things only 
> > worse. Now the archs have most control over it, but creating a single 
> > bloated and slow blop instead will not be an improvement.
> > It's not about moving everything into the clock driver here, it's about 
> > creating a _powerful_ API, which leaves control in the hands of the clock 
> > driver, but at the same time keeps them as _simple_ (and not as dumb) as 
> > possible.
> 
> Part of my concern here is keeping the code manageable and hackable.
> What you're suggesting sounds very similar to what we have for the i386
> timer_opts code, which I don't want to duplicate.

What are you talking about???

Let's take a simple example: 

static cycle_t read_tsc(void)
{       
	cycle_t ret;

	rdtscll(ret);

	return ret;
}

which could become:

static unsigned long tsc_nsec_offset(struct clocksource *cs)
{
	cycle_t cyc;

	rdtscll(cyc);

	return generic_cont_cycle2nsec_offset(cs, cyc, 22, (cycle_t)-1);
}

Please tell me what kind of problem you have with this? Even in this 
simple case it will already be faster due to the constant parameters.
What is wrong with a little extra performance?

> Maybe it would help here if you would better define the API for this
> abstraction. As it stands, it appears incomplete. Define the state
> values stored, and list what interfaces modify which state values, etc.
> 
> 	ie: clock->get_nsec_offset():
> 		What exactly does this measure? Is this strictly defined by the state
> stored in the clocksource? If not, how will this interact w/ dynamic
> ticks? What is the maximum time we can delay interrupts before the clock
> wraps? How do we know if its tick based? 

I explained this already at the start of this thread and please read the 
patch. I'm truly baffled, that you actually have to ask these questions.

> Another issue: if get_nsec_offset() is clock specific in its
> implementation, but the state values it uses in at least the common case
> are modified by generic code, how can we implement something like the
> ppc lock free read? That will affect both the generic code and the clock
> specific code.

You basically have two copies of the time parameters, for 
get_nsec_offset() this only means it gets a different structure argument, 
the rest is the same. The pointer switch between these two copies can be 
done generically.

> > What kind of NTP adjustments are you talking about? A nsec_offset function 
> > could look like this:
> > 
> > unsigned long nsec_offset(cs)
> > {
> > 	return ((cs->xtime_nsec + get_cycles() - cs->last_offset) * cs->mult) >> SHIFT;
> > }
> > 
> > This is fucking simple, what about this is "messy"? There is no NTP 
> > adjustment here, this is all happening somewhere else. 
> 
> That's my point. if nsec_offset is opaque, then what is the interface
> for making NTP adjustments to that function? Are all nsec_offset
> functions required to use the xtime_nsec, last_offset, and mult values?

These values are not opaque, from an average kernel hacker I would at 
least expect to understand what these values mean, but he doesn't has to 
know how they are adjusted (or even calculated) and for this function it's 
also not really important.

> > Keeping it in the 
> > driver allows to make parameter constant, skip unnecessary steps and 
> > allows to do it within 32bit. This is something you can _never_ do in a 
> > generic centralized way without making it truly messy. I'd be happy to be 
> > proven otherwise, but I simply don't see it.
> 
> Well, my issue w/ you desire to have a 32bit continuous clock is that I
> don't know how useful it would be. 
> 
> For a 1Mhz clock, 
> 	You've got 1,000 ns per cycle and the algorithm looks something like
> say:
> 	get_nsec_offset: cycles * 1,000 >> 0
> which gives you ~4 seconds of leeway, which is pretty good.
> However, the short term jitter is 1000ppm.

I think you mean 1000ppb or 1ppm. 
Anyway, it's still not entirely correct. The short term jitter (for a 
continuous clock) is defined by two parameters. First you have the base 
jitter defined by the clock frequency of 1/freq seconds (which is also 
independent of the used shift), so even with a perfectly synchronized 
signal there is a jitter of +-0.5ppm. The second parameter is defined by 
the size of the adjustment steps, which depends on the shift and HZ: 
freq/2^shift/HZ nsec. This means in this case with HZ=1000 adjustment adds 
a jitter of +-0.5ppm or with HZ=100 it's +-5ppm.

> So lets bump up the SHIFT value,
> 	get_nsec_offset: cycles * 1,024,000 >> 10
> 
> The short term jitter: <1ppm, so that's good.
> But the max cycles is then ~4,000 (4ms), which is too short.
> 
> You can wiggle around here and maybe come up with something that works,
> but it only gets worse for clocks that are faster. 

You don't have to "wiggle" around anything. Using log2(freq^2/10^9/HZ) you 
can calculate a shift value with an adjustment step which matches the 
clock resolution, so with HZ=100 and a shift of 4 the adjustment step is 
+-0.3125ppm and with a shift of 5 it's +-0.15625ppm, which is good enough 
and wraps around after 134msec (If you lose that many update ticks you 
have other problems already and with a tick based clock you wouldn't 
notice it anyway). So if the clock resolution is only around 1usec anyway,  
a 32bit calculation of the offset is perfectly usable.

I don't really expect to know these details from many people, but if you 
want to maintain the clock system, you have to know this, you have to know 
what is possible and what isn't and why. I'm really scared if you create 
your code by "wiggling" around with it until it somehow fits.

> For robust timekeeping using continuous cycles, I think the 64bit mult
> in gettimeofday is going to be necessary in the majority of cases (we
> can wrap it in some sort of arch optimized macro for a mul_lxl_ll or
> something if possible).

Actually currently you use a mul_llxl_ll, which is considerably more 
expensive on many archs. With current GHz clocks and the possible delays 
of the timer interrupt in the rt kernel we are currently pushing the 
limits and a mul_llxl_ll may be needed, which is perfectly fine. What is 
not ok, is forcing it onto everyone by design. If something is not 
generally true, you cannot make it generic!
What makes this worse that you don't optimize for the majority of cases, 
but that you have to do it for the worst case, this is the reason it's a
mul_llxl_ll not mul_lxl_ll. Giving the clock control over this you could 
register two different clock and have a choice whether you want the slow, 
low priority clock, which can go a few seconds without updates, or the 
fast clock, which needs updates a few times a second. With your cycle 
based design you'll never have this kind of flexibility, are you really 
sure nobody ever needs that?

> > > With continuous cycle based counters, the clock read is *necessary* when
> > > updating xtime for robust timekeeping. We can move update_wall_time so
> > > we don't run it every timer interrupt, but we cannot keep correct time
> > > by just guessing how much time has passed and adding it in.
> > 
> > It has almost nothing to do with continuous cycles. On an UP system only 
> > anything running with a higher priority than the timer interrupt or if the 
> > cycle adjustment happens asynchron to the timer interrupt (e.g. TSC) can 
> > see the adjustment. Again it depends on the clock whether the common 
> > adjustment is significant bigger than the time needed to read the clock, 
> > otherwise it's just a waste of time.
> 
> Eh? I didn't understand that. Mind clarifying/expanding here?

If the clock is updated at exactly every freq/HZ cycles, you won't see a 
jump due to the clock adjustment. This means only if you delay the timer 
interrupt can you cause a time jump and this delay has to be large enough 
to produce a large enough jump to be noticable (without correcting for 
it).
The nsec offset is adjusted by (offset >> (shift - mult_adj)) nsec, this 
means IOW the correction is adjustment_offset*delay, e.g. if the clock is 
adjusted by 1msec/sec and the delay is 1msec (one tick!), the correction 
is 1usec. IOW you already need quite large adjustments and delay to 
produce a significant adjustment for the delay, so for slow clocks it's 
pretty much guaranteed to be a complete waste of time.

bye, Roman

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-05-09 22:48 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-03 19:54 [PATCH 0/5] clocksource patches Roman Zippel
2006-04-03 20:08 ` Roman Zippel
2006-04-04  4:53 ` Thomas Gleixner
2006-04-04 19:06   ` Roman Zippel
2006-04-05 11:22     ` Thomas Gleixner
2006-04-05 13:40       ` Ingo Molnar
2006-04-05 20:44       ` Roman Zippel
2006-04-05 22:18         ` Thomas Gleixner
2006-04-07 17:57 ` john stultz
2006-04-27 20:33   ` Roman Zippel
2006-05-06  2:04     ` john stultz
2006-05-06 16:25       ` Roman Zippel
2006-05-08 18:33         ` john stultz
2006-05-08 21:15           ` Roman Zippel
2006-05-09  0:29             ` john stultz
2006-05-09 22:48               ` Roman Zippel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox