[PATCH] sched: Support current clocksource handling in fallback sched

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched: Support current clocksource handling in fallback sched_clock().
@ 2009-05-26  6:15 Paul Mundt
  2009-05-26 14:31 ` Linus Walleij
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-26  6:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh

There are presently a number of issues and limitations with how the
clocksource and sched_clock() interaction works today. Configurations
tend to be grouped in to one of the following:

	- Platform provides a low rated clocksource (< 100) and prefers
	  to use jiffies for sched_clock() due to reliability concerns.

	- Platform provides its own clocksource and sched_clock() that
	  wraps in to it.

	- Platform uses a generic clocksource (ie, drivers/clocksource/)
	  combined with the generic jiffies-backed sched_clock().

	- Platform supports a generic highly-rated clocksource but ends
	  up having to use the jiffies sched_clock() anyways.

	- Platform supports multiple highly-rated clocksources.

In the first case, simply using the rating information is sufficient to
figure out the proper course of action. In the second case, very few of
these do anything outside of the regular cyc2ns() work on the preferred
clocksource, so it tends to be more about having access to the reference
clocksource data structures more than really wanting to do any special
calculations in sched_clock().

The last few cases are presently what we are faced with on sh, and which
also impacts other drivers/clocksource drivers (while acpi_pm seems to
have alternate recourse for sched_clock(), ARM/AVR32/SH do not). In these
cases multiple clocksources can be provided, and the availability of
these will often depend on runtime constraints (pinmux and so forth), in
which case link time determination is simply not sufficient. While these
clocksources can be highly rated and can offer excellent granularity, the
jiffies clocksource is still used as a fallback given the inability to
sprinkle sched_clock() wrappers in the drivers themselves. Also, while
sched_clock() could be moved in to struct clocksource itself, this does
not help the case where sched_clock() is called in to repeatedly well
before a preferred clocksource has been determined and made available
(printk times and so on), so extra logic is needed regardless.

This patch does the only thing I could think of to address most of
these in one shot, abusing the current clocksource pointer and forcing
sched_clock() to read from it directly as soon as it becomes available
(and assuming that is is rated highly enough). This does add the cost of
the rating test on systems that only have the jiffies clocksource, but I
think this is acceptable collateral damage given that jiffies are not
very granular to begin with.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

---

 kernel/sched_clock.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..59bbeeb 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>

 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,6 +39,15 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
+	/*
+	 * Use the current clocksource when it becomes available later in
+	 * the boot process, and ensure that it has a high enough rating
+	 * to make it suitable for general use.
+	 */
+	if (clock && clock->rating >= 100)
+		return cyc2ns(clock, clocksource_read(clock));
+
+	/* Otherwise just fall back on jiffies */
 	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
 					* (NSEC_PER_SEC / HZ);
 }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback  sched_clock().
  2009-05-26  6:15 [PATCH] sched: Support current clocksource handling in fallback sched_clock() Paul Mundt
@ 2009-05-26 14:31 ` Linus Walleij
  2009-05-26 14:38   ` Peter Zijlstra
                     ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Linus Walleij @ 2009-05-26 14:31 UTC (permalink / raw)
  To: Paul Mundt, Ingo Molnar, Andrew Victor, Haavard Skinnemoen,
	Andrew Morton, linux-kernel, linux-sh, peterz
  Cc: linux-arm-kernel

2009/5/26 Paul Mundt <lethal@linux-sh.org>:

>  */
>  unsigned long long __attribute__((weak)) sched_clock(void)
>  {
> +       /*
> +        * Use the current clocksource when it becomes available later in
> +        * the boot process, and ensure that it has a high enough rating
> +        * to make it suitable for general use.
> +        */
> +       if (clock && clock->rating >= 100)
> +               return cyc2ns(clock, clocksource_read(clock));
> +
> +       /* Otherwise just fall back on jiffies */
>        return (unsigned long long)(jiffies - INITIAL_JIFFIES)
>                                        * (NSEC_PER_SEC / HZ);
>  }

This seems like it would make the patch I sent the other day
unnecessary (subject u300 sched_clock() implementation).

It would also trim off this solution found in all OMAP platforms in
arch/arm/plat-omap/common.c

BUT Peter Zijlstra replied to my question about why this wasn't
generic with:

[peterz]:
> But that is the reason this isn't generic, non of the 'stable'
> clocksources on x86 are fast enough to use as sched_clock.

Does that mean clock->rating for these clocksources is
for certain < 100?

The definition of "rating" from the kerneldoc does not
seem to imply that, it's a subjective measure AFAICT.

Else you might want an additional criteria, like
cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
(however you do that the best way)
so you don't pick something
that isn't substantially faster than the jiffy counter atleast?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:31 ` Linus Walleij
@ 2009-05-26 14:38   ` Peter Zijlstra
  2009-05-26 20:17     ` Thomas Gleixner
  2009-05-26 20:23     ` john stultz
  2009-05-26 14:43   ` Paul Mundt
  2009-05-26 15:02   ` Matthieu CASTET
  2 siblings, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-26 14:38 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Paul Mundt, Ingo Molnar, Andrew Victor, Haavard Skinnemoen,
	Andrew Morton, linux-kernel, linux-sh, linux-arm-kernel,
	John Stultz, Thomas Gleixner

Added the generic clock and timer folks to CC.

On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> 2009/5/26 Paul Mundt <lethal@linux-sh.org>:
> 
> >  */
> >  unsigned long long __attribute__((weak)) sched_clock(void)
> >  {
> > +       /*
> > +        * Use the current clocksource when it becomes available later in
> > +        * the boot process, and ensure that it has a high enough rating
> > +        * to make it suitable for general use.
> > +        */
> > +       if (clock && clock->rating >= 100)
> > +               return cyc2ns(clock, clocksource_read(clock));
> > +
> > +       /* Otherwise just fall back on jiffies */
> >        return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> >                                        * (NSEC_PER_SEC / HZ);
> >  }
> 
> This seems like it would make the patch I sent the other day
> unnecessary (subject u300 sched_clock() implementation).
> 
> It would also trim off this solution found in all OMAP platforms in
> arch/arm/plat-omap/common.c
> 
> BUT Peter Zijlstra replied to my question about why this wasn't
> generic with:
> 
> [peterz]:
> > But that is the reason this isn't generic, non of the 'stable'
> > clocksources on x86 are fast enough to use as sched_clock.
> 
> Does that mean clock->rating for these clocksources is
> for certain < 100?
> 
> The definition of "rating" from the kerneldoc does not
> seem to imply that, it's a subjective measure AFAICT.
> 
> Else you might want an additional criteria, like
> cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> (however you do that the best way)
> so you don't pick something
> that isn't substantially faster than the jiffy counter atleast?





^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:38   ` Peter Zijlstra
@ 2009-05-26 20:17     ` Thomas Gleixner
  2009-05-26 23:08       ` Paul Mundt
  2009-05-26 20:23     ` john stultz
  1 sibling, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-26 20:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz

On Tue, 26 May 2009, Peter Zijlstra wrote:
> Added the generic clock and timer folks to CC.
> 
> On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > 2009/5/26 Paul Mundt <lethal@linux-sh.org>:
> > 
> > >  */
> > >  unsigned long long __attribute__((weak)) sched_clock(void)
> > >  {
> > > +       /*
> > > +        * Use the current clocksource when it becomes available later in
> > > +        * the boot process, and ensure that it has a high enough rating
> > > +        * to make it suitable for general use.
> > > +        */
> > > +       if (clock && clock->rating >= 100)
> > > +               return cyc2ns(clock, clocksource_read(clock));
> > > +
> > > +       /* Otherwise just fall back on jiffies */
> > >        return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> > >                                        * (NSEC_PER_SEC / HZ);
> > >  }
> > 
> > This seems like it would make the patch I sent the other day
> > unnecessary (subject u300 sched_clock() implementation).
> > 
> > It would also trim off this solution found in all OMAP platforms in
> > arch/arm/plat-omap/common.c
> > 
> > BUT Peter Zijlstra replied to my question about why this wasn't
> > generic with:
> > 
> > [peterz]:
> > > But that is the reason this isn't generic, non of the 'stable'
> > > clocksources on x86 are fast enough to use as sched_clock.
> > 
> > Does that mean clock->rating for these clocksources is
> > for certain < 100?
> > 
> > The definition of "rating" from the kerneldoc does not
> > seem to imply that, it's a subjective measure AFAICT.

  Right, there is no rating threshold defined, which allows to deduce
  that. The TSC on x86 which might be unreliable, but usable as
  sched_clock has an initial rating of 300 which can be changed later
  on to 0 when the TSC is unusable as a time of day source. In that
  case clock is replaced by HPET which has a rating > 100 but is
  definitely not a good choice for sched_clock

> > Else you might want an additional criteria, like
> > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > (however you do that the best way)
> > so you don't pick something
> > that isn't substantially faster than the jiffy counter atleast?

  What we can do is add another flag to the clocksource e.g.
  CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
  rating.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 20:17     ` Thomas Gleixner
@ 2009-05-26 23:08       ` Paul Mundt
  2009-05-26 23:13         ` Paul Mundt
                           ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-26 23:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Linus Walleij, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz

On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> On Tue, 26 May 2009, Peter Zijlstra wrote:
> > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > The definition of "rating" from the kerneldoc does not
> > > seem to imply that, it's a subjective measure AFAICT.
> 
>   Right, there is no rating threshold defined, which allows to deduce
>   that. The TSC on x86 which might be unreliable, but usable as
>   sched_clock has an initial rating of 300 which can be changed later
>   on to 0 when the TSC is unusable as a time of day source. In that
>   case clock is replaced by HPET which has a rating > 100 but is
>   definitely not a good choice for sched_clock
> 
> > > Else you might want an additional criteria, like
> > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > (however you do that the best way)
> > > so you don't pick something
> > > that isn't substantially faster than the jiffy counter atleast?
> 
>   What we can do is add another flag to the clocksource e.g.
>   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
>   rating.
> 
Ok, so based on this and John's locking concerns, how about something
like this? It doesn't handle the wrapping cases, but I wonder if we
really want to add that amount of logic to sched_clock() in the first
place. Clocksources that wrap frequently could either leave the flag
unset, or do something similar to the TSC code where the cyc2ns shift is
used. If this is something we want to handle generically, then I'll have
a go at generalizing the TSC cyc2ns scaling bits for the next spin.

---

 include/linux/clocksource.h |    2 ++
 kernel/sched_clock.c        |   22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..cfd873e 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -203,6 +203,7 @@ struct clocksource {
 };
 
 extern struct clocksource *clock;	/* current clocksource */
+extern spinlock_t clocksource_lock;
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..c7027cd 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,6 +39,27 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
+	/*
+	 * Use the current clocksource when it becomes available later in
+	 * the boot process. As this needs to be fast, we only make a
+	 * single pass at grabbing the spinlock. If the clock is changing
+	 * out from underneath us, fall back on jiffies and try it again
+	 * the next time around.
+	 */
+	if (clock && _raw_spin_trylock(&clocksource_lock)) {
+		/*
+		 * Only use clocksources suitable for sched_clock()
+		 */
+		if (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK) {
+			cycle_t now = cyc2ns(clock, clocksource_read(clock));
+			_raw_spin_unlock(&clocksource_lock);
+			return now;
+		}
+
+		_raw_spin_unlock(&clocksource_lock);
+	}
+
+	/* If all else fails, fall back on jiffies */
 	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
 					* (NSEC_PER_SEC / HZ);
 }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:08       ` Paul Mundt
@ 2009-05-26 23:13         ` Paul Mundt
  2009-05-26 23:25         ` john stultz
  2009-05-26 23:49         ` Thomas Gleixner
  2 siblings, 0 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-26 23:13 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Wed, May 27, 2009 at 08:08:55AM +0900, Paul Mundt wrote:
> On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > The definition of "rating" from the kerneldoc does not
> > > > seem to imply that, it's a subjective measure AFAICT.
> > 
> >   Right, there is no rating threshold defined, which allows to deduce
> >   that. The TSC on x86 which might be unreliable, but usable as
> >   sched_clock has an initial rating of 300 which can be changed later
> >   on to 0 when the TSC is unusable as a time of day source. In that
> >   case clock is replaced by HPET which has a rating > 100 but is
> >   definitely not a good choice for sched_clock
> > 
> > > > Else you might want an additional criteria, like
> > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > (however you do that the best way)
> > > > so you don't pick something
> > > > that isn't substantially faster than the jiffy counter atleast?
> > 
> >   What we can do is add another flag to the clocksource e.g.
> >   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> >   rating.
> > 
> Ok, so based on this and John's locking concerns, how about something
> like this? It doesn't handle the wrapping cases, but I wonder if we
> really want to add that amount of logic to sched_clock() in the first
> place. Clocksources that wrap frequently could either leave the flag
> unset, or do something similar to the TSC code where the cyc2ns shift is
> used. If this is something we want to handle generically, then I'll have
> a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> 
Lets try that again..

---

 include/linux/clocksource.h |    2 ++
 kernel/sched_clock.c        |   22 ++++++++++++++++++++++
 kernel/time/clocksource.c   |    2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..cfd873e 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -203,6 +203,7 @@ struct clocksource {
 };
 
 extern struct clocksource *clock;	/* current clocksource */
+extern spinlock_t clocksource_lock;
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..c7027cd 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,6 +39,27 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
+	/*
+	 * Use the current clocksource when it becomes available later in
+	 * the boot process. As this needs to be fast, we only make a
+	 * single pass at grabbing the spinlock. If the clock is changing
+	 * out from underneath us, fall back on jiffies and try it again
+	 * the next time around.
+	 */
+	if (clock && _raw_spin_trylock(&clocksource_lock)) {
+		/*
+		 * Only use clocksources suitable for sched_clock()
+		 */
+		if (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK) {
+			cycle_t now = cyc2ns(clock, clocksource_read(clock));
+			_raw_spin_unlock(&clocksource_lock);
+			return now;
+		}
+
+		_raw_spin_unlock(&clocksource_lock);
+	}
+
+	/* If all else fails, fall back on jiffies */
 	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
 					* (NSEC_PER_SEC / HZ);
 }
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 80189f6..437a6cf 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -127,7 +127,7 @@ static struct clocksource *curr_clocksource = &clocksource_jiffies;
 static struct clocksource *next_clocksource;
 static struct clocksource *clocksource_override;
 static LIST_HEAD(clocksource_list);
-static DEFINE_SPINLOCK(clocksource_lock);
+DEFINE_SPINLOCK(clocksource_lock);
 static char override_name[32];
 static int finished_booting;
 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:08       ` Paul Mundt
  2009-05-26 23:13         ` Paul Mundt
@ 2009-05-26 23:25         ` john stultz
  2009-05-26 23:44           ` Paul Mundt
  2009-05-26 23:49         ` Thomas Gleixner
  2 siblings, 1 reply; 57+ messages in thread
From: john stultz @ 2009-05-26 23:25 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Wed, 2009-05-27 at 08:08 +0900, Paul Mundt wrote:
> On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > The definition of "rating" from the kerneldoc does not
> > > > seem to imply that, it's a subjective measure AFAICT.
> > 
> >   Right, there is no rating threshold defined, which allows to deduce
> >   that. The TSC on x86 which might be unreliable, but usable as
> >   sched_clock has an initial rating of 300 which can be changed later
> >   on to 0 when the TSC is unusable as a time of day source. In that
> >   case clock is replaced by HPET which has a rating > 100 but is
> >   definitely not a good choice for sched_clock
> > 
> > > > Else you might want an additional criteria, like
> > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > (however you do that the best way)
> > > > so you don't pick something
> > > > that isn't substantially faster than the jiffy counter atleast?
> > 
> >   What we can do is add another flag to the clocksource e.g.
> >   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> >   rating.
> > 
> Ok, so based on this and John's locking concerns, how about something
> like this? It doesn't handle the wrapping cases, but I wonder if we
> really want to add that amount of logic to sched_clock() in the first
> place. Clocksources that wrap frequently could either leave the flag
> unset, or do something similar to the TSC code where the cyc2ns shift is
> used. If this is something we want to handle generically, then I'll have
> a go at generalizing the TSC cyc2ns scaling bits for the next spin.


Yea. So this is a little better. There's still a few other issues to
consider:

1) What if a clocksource is registered that has the _SCHED_CLOCK bit
set, but is not selected for timekeeping due it being unstable like the
TSC?

2) Conditionally returning jiffies if the lock is held seems troubling.
Might get some crazy values that way.

thanks
-john


> ---
> 
>  include/linux/clocksource.h |    2 ++
>  kernel/sched_clock.c        |   22 ++++++++++++++++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index c56457c..cfd873e 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -203,6 +203,7 @@ struct clocksource {
>  };
> 
>  extern struct clocksource *clock;	/* current clocksource */
> +extern spinlock_t clocksource_lock;
> 
>  /*
>   * Clock source flags bits::
> @@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
> 
>  #define CLOCK_SOURCE_WATCHDOG			0x10
>  #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
> +#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
> 
>  /* simplify initialization of mask field */
>  #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
> diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
> index e1d16c9..c7027cd 100644
> --- a/kernel/sched_clock.c
> +++ b/kernel/sched_clock.c
> @@ -30,6 +30,7 @@
>  #include <linux/percpu.h>
>  #include <linux/ktime.h>
>  #include <linux/sched.h>
> +#include <linux/clocksource.h>
> 
>  /*
>   * Scheduler clock - returns current time in nanosec units.
> @@ -38,6 +39,27 @@
>   */
>  unsigned long long __attribute__((weak)) sched_clock(void)
>  {
> +	/*
> +	 * Use the current clocksource when it becomes available later in
> +	 * the boot process. As this needs to be fast, we only make a
> +	 * single pass at grabbing the spinlock. If the clock is changing
> +	 * out from underneath us, fall back on jiffies and try it again
> +	 * the next time around.
> +	 */
> +	if (clock && _raw_spin_trylock(&clocksource_lock)) {
> +		/*
> +		 * Only use clocksources suitable for sched_clock()
> +		 */
> +		if (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK) {
> +			cycle_t now = cyc2ns(clock, clocksource_read(clock));
> +			_raw_spin_unlock(&clocksource_lock);
> +			return now;
> +		}
> +
> +		_raw_spin_unlock(&clocksource_lock);
> +	}
> +
> +	/* If all else fails, fall back on jiffies */
>  	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
>  					* (NSEC_PER_SEC / HZ);
>  }


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:25         ` john stultz
@ 2009-05-26 23:44           ` Paul Mundt
  2009-05-27  0:18             ` Thomas Gleixner
  2009-05-27  0:22             ` john stultz
  0 siblings, 2 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-26 23:44 UTC (permalink / raw)
  To: john stultz
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Tue, May 26, 2009 at 04:25:03PM -0700, john stultz wrote:
> On Wed, 2009-05-27 at 08:08 +0900, Paul Mundt wrote:
> > On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > > The definition of "rating" from the kerneldoc does not
> > > > > seem to imply that, it's a subjective measure AFAICT.
> > > 
> > >   Right, there is no rating threshold defined, which allows to deduce
> > >   that. The TSC on x86 which might be unreliable, but usable as
> > >   sched_clock has an initial rating of 300 which can be changed later
> > >   on to 0 when the TSC is unusable as a time of day source. In that
> > >   case clock is replaced by HPET which has a rating > 100 but is
> > >   definitely not a good choice for sched_clock
> > > 
> > > > > Else you might want an additional criteria, like
> > > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > > (however you do that the best way)
> > > > > so you don't pick something
> > > > > that isn't substantially faster than the jiffy counter atleast?
> > > 
> > >   What we can do is add another flag to the clocksource e.g.
> > >   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> > >   rating.
> > > 
> > Ok, so based on this and John's locking concerns, how about something
> > like this? It doesn't handle the wrapping cases, but I wonder if we
> > really want to add that amount of logic to sched_clock() in the first
> > place. Clocksources that wrap frequently could either leave the flag
> > unset, or do something similar to the TSC code where the cyc2ns shift is
> > used. If this is something we want to handle generically, then I'll have
> > a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> 
> 
> Yea. So this is a little better. There's still a few other issues to
> consider:
> 
> 1) What if a clocksource is registered that has the _SCHED_CLOCK bit
> set, but is not selected for timekeeping due it being unstable like the
> TSC?
> 
See, this is what I thought the rating information was useful for, as the
rating is subsequently dropped if it is not usable. But perhaps it makes
more sense to just clear the bit at the same time that the rating is
lowered once it turns out to be unstable.

> 2) Conditionally returning jiffies if the lock is held seems troubling.
> Might get some crazy values that way.
> 
What would you recommend instead? We do not want to spin here, and if we
are in the middle of changing clocksources and returning jiffies anyways,
then this same issue pops up in the current sched_clock() implementation
regardless of whether we are testing for lock contention or not.
Likewise, even if we were to spin, the same situation exists if the new
clocksource does not have the _SCHED_CLOCK bit set and we have to fall
back on jiffies anyways, doesn't it?

Put another way, and unless I'm missing something obvious, if we ignore
my changes to sched_clock(), how is your concern not applicable to case
where we are changing clocksources and using generic sched_clock() as it
is today?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:44           ` Paul Mundt
@ 2009-05-27  0:18             ` Thomas Gleixner
  2009-05-27  0:22             ` john stultz
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-27  0:18 UTC (permalink / raw)
  To: Paul Mundt
  Cc: john stultz, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Wed, 27 May 2009, Paul Mundt wrote:
> > Yea. So this is a little better. There's still a few other issues to
> > consider:
> > 
> > 1) What if a clocksource is registered that has the _SCHED_CLOCK bit
> > set, but is not selected for timekeeping due it being unstable like the
> > TSC?
> > 
> See, this is what I thought the rating information was useful for, as the
> rating is subsequently dropped if it is not usable. But perhaps it makes
> more sense to just clear the bit at the same time that the rating is
> lowered once it turns out to be unstable.

Stop worrying about TSC please. The x86 f*cked up timers need special
handling which is definitely not required for most of arch/*. x86
overrides that anyway and handles the TSC f*ckup in the
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK section of sched_clock.c which is
completely irrelevant to any architecture which has a sane set of
timers.

The only extra magic which is required to avoid that (sub)arch
maintainers need to specify a sched_clock() implementation to override
the weak generic one is really the simple

    if (clock && (clock->flags & CLOCKSOURCE_USE_FOR_SCHED_CLOCK))
       return ....

We need no locking there at all.

We have a workaround in place, which overrides the weak sched_clock()
implementation, to make x86 efficient, so why do we want to impose all
that x86 crap on folks which deal with architectures which got the
timers right ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:44           ` Paul Mundt
  2009-05-27  0:18             ` Thomas Gleixner
@ 2009-05-27  0:22             ` john stultz
  2009-05-27  0:26               ` Paul Mundt
  2009-05-27  0:27               ` Thomas Gleixner
  1 sibling, 2 replies; 57+ messages in thread
From: john stultz @ 2009-05-27  0:22 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel

On Wed, 2009-05-27 at 08:44 +0900, Paul Mundt wrote:
> On Tue, May 26, 2009 at 04:25:03PM -0700, john stultz wrote:
> > On Wed, 2009-05-27 at 08:08 +0900, Paul Mundt wrote:
> > > On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > > > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > > > The definition of "rating" from the kerneldoc does not
> > > > > > seem to imply that, it's a subjective measure AFAICT.
> > > > 
> > > >   Right, there is no rating threshold defined, which allows to deduce
> > > >   that. The TSC on x86 which might be unreliable, but usable as
> > > >   sched_clock has an initial rating of 300 which can be changed later
> > > >   on to 0 when the TSC is unusable as a time of day source. In that
> > > >   case clock is replaced by HPET which has a rating > 100 but is
> > > >   definitely not a good choice for sched_clock
> > > > 
> > > > > > Else you might want an additional criteria, like
> > > > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > > > (however you do that the best way)
> > > > > > so you don't pick something
> > > > > > that isn't substantially faster than the jiffy counter atleast?
> > > > 
> > > >   What we can do is add another flag to the clocksource e.g.
> > > >   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> > > >   rating.
> > > > 
> > > Ok, so based on this and John's locking concerns, how about something
> > > like this? It doesn't handle the wrapping cases, but I wonder if we
> > > really want to add that amount of logic to sched_clock() in the first
> > > place. Clocksources that wrap frequently could either leave the flag
> > > unset, or do something similar to the TSC code where the cyc2ns shift is
> > > used. If this is something we want to handle generically, then I'll have
> > > a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> > 
> > 
> > Yea. So this is a little better. There's still a few other issues to
> > consider:
> > 
> > 1) What if a clocksource is registered that has the _SCHED_CLOCK bit
> > set, but is not selected for timekeeping due it being unstable like the
> > TSC?
> > 
> See, this is what I thought the rating information was useful for, as the
> rating is subsequently dropped if it is not usable. But perhaps it makes
> more sense to just clear the bit at the same time that the rating is
> lowered once it turns out to be unstable.

Yes, if we're dropping a clocksource we should also drop the bit. That
shouldn't be a problem.

The point I was making, is that multiple clocksources may be registered
at one time (TSC, ACPI_PM, etc). But only one is being managed by the
timekeeping code (clock). So there may be the case where the
sched_clock() is different then the timekeeping clock (which is common
on x86). 

So I suspect we need a special hook that grabs the best _SCHED_CLOCK
clocksource (as computed at clocksource registration time) and provides
it to the generic sched_clock() interface.


> > 2) Conditionally returning jiffies if the lock is held seems troubling.
> > Might get some crazy values that way.
> > 
> What would you recommend instead? We do not want to spin here, and if we
> are in the middle of changing clocksources and returning jiffies anyways,
> then this same issue pops up in the current sched_clock() implementation
> regardless of whether we are testing for lock contention or not.
> Likewise, even if we were to spin, the same situation exists if the new
> clocksource does not have the _SCHED_CLOCK bit set and we have to fall
> back on jiffies anyways, doesn't it?
> 
> Put another way, and unless I'm missing something obvious, if we ignore
> my changes to sched_clock(), how is your concern not applicable to case
> where we are changing clocksources and using generic sched_clock() as it
> is today?

Well, Thomas' point that locking isn't necessary, as sched_clock()
doesn't have to be correct, is probably right. 

So, I think a get_sched_clocksource() interface would be ideal (if we
want to get academic at a later date, the pointer could be atomically
updated, and we'd keep it valid for some time via an rcu like method).

Additionally, you can set the jiffies clocksource as a _SCHED_CLOCK
clocksource and drop the jiffies fallback code completely.

thanks
-john



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-27  0:22             ` john stultz
@ 2009-05-27  0:26               ` Paul Mundt
  2009-05-27  1:09                 ` john stultz
  2009-05-27  0:27               ` Thomas Gleixner
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-27  0:26 UTC (permalink / raw)
  To: john stultz
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel

On Tue, May 26, 2009 at 05:22:10PM -0700, john stultz wrote:
> On Wed, 2009-05-27 at 08:44 +0900, Paul Mundt wrote:
> > What would you recommend instead? We do not want to spin here, and if we
> > are in the middle of changing clocksources and returning jiffies anyways,
> > then this same issue pops up in the current sched_clock() implementation
> > regardless of whether we are testing for lock contention or not.
> > Likewise, even if we were to spin, the same situation exists if the new
> > clocksource does not have the _SCHED_CLOCK bit set and we have to fall
> > back on jiffies anyways, doesn't it?
> > 
> > Put another way, and unless I'm missing something obvious, if we ignore
> > my changes to sched_clock(), how is your concern not applicable to case
> > where we are changing clocksources and using generic sched_clock() as it
> > is today?
> 
> Well, Thomas' point that locking isn't necessary, as sched_clock()
> doesn't have to be correct, is probably right. 
> 
> So, I think a get_sched_clocksource() interface would be ideal (if we
> want to get academic at a later date, the pointer could be atomically
> updated, and we'd keep it valid for some time via an rcu like method).
> 
> Additionally, you can set the jiffies clocksource as a _SCHED_CLOCK
> clocksource and drop the jiffies fallback code completely.
> 
I thought about that initially as well, but in the case of the jiffies
clocksource, that won't handle INITIAL_JIFFIES, which we want to subtract
to make printk times sane.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-27  0:26               ` Paul Mundt
@ 2009-05-27  1:09                 ` john stultz
  0 siblings, 0 replies; 57+ messages in thread
From: john stultz @ 2009-05-27  1:09 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel

On Wed, 2009-05-27 at 09:26 +0900, Paul Mundt wrote:
> On Tue, May 26, 2009 at 05:22:10PM -0700, john stultz wrote:
> > On Wed, 2009-05-27 at 08:44 +0900, Paul Mundt wrote:
> > > What would you recommend instead? We do not want to spin here, and if we
> > > are in the middle of changing clocksources and returning jiffies anyways,
> > > then this same issue pops up in the current sched_clock() implementation
> > > regardless of whether we are testing for lock contention or not.
> > > Likewise, even if we were to spin, the same situation exists if the new
> > > clocksource does not have the _SCHED_CLOCK bit set and we have to fall
> > > back on jiffies anyways, doesn't it?
> > > 
> > > Put another way, and unless I'm missing something obvious, if we ignore
> > > my changes to sched_clock(), how is your concern not applicable to case
> > > where we are changing clocksources and using generic sched_clock() as it
> > > is today?
> > 
> > Well, Thomas' point that locking isn't necessary, as sched_clock()
> > doesn't have to be correct, is probably right. 
> > 
> > So, I think a get_sched_clocksource() interface would be ideal (if we
> > want to get academic at a later date, the pointer could be atomically
> > updated, and we'd keep it valid for some time via an rcu like method).
> > 
> > Additionally, you can set the jiffies clocksource as a _SCHED_CLOCK
> > clocksource and drop the jiffies fallback code completely.
> > 
> I thought about that initially as well, but in the case of the jiffies
> clocksource, that won't handle INITIAL_JIFFIES, which we want to subtract
> to make printk times sane.

Fair point, but that shouldn't be a big issue, we can fix that in the
jiffies clocksource read() implementation.

thanks
-john



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-27  0:22             ` john stultz
  2009-05-27  0:26               ` Paul Mundt
@ 2009-05-27  0:27               ` Thomas Gleixner
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-27  0:27 UTC (permalink / raw)
  To: john stultz
  Cc: Paul Mundt, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel

John,

On Tue, 26 May 2009, john stultz wrote:
> > See, this is what I thought the rating information was useful for, as the
> > rating is subsequently dropped if it is not usable. But perhaps it makes
> > more sense to just clear the bit at the same time that the rating is
> > lowered once it turns out to be unstable.
> 
> Yes, if we're dropping a clocksource we should also drop the bit. That
> shouldn't be a problem.
> 
> The point I was making, is that multiple clocksources may be registered
> at one time (TSC, ACPI_PM, etc). But only one is being managed by the
> timekeeping code (clock). So there may be the case where the
> sched_clock() is different then the timekeeping clock (which is common
> on x86). 
> 
> So I suspect we need a special hook that grabs the best _SCHED_CLOCK
> clocksource (as computed at clocksource registration time) and provides
> it to the generic sched_clock() interface.

this is not about x86 and its inferiour timer hardware
implementation. We talk about sane architectures which do not have
that problems at all. x86 takes a different code path and overrides
the generic weak sched_clock implememtation anyway. So what ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:08       ` Paul Mundt
  2009-05-26 23:13         ` Paul Mundt
  2009-05-26 23:25         ` john stultz
@ 2009-05-26 23:49         ` Thomas Gleixner
  2009-05-27  0:15           ` Paul Mundt
  2 siblings, 1 reply; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-26 23:49 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Linus Walleij, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz

On Wed, 27 May 2009, Paul Mundt wrote:

> On Tue, May 26, 2009 at 10:17:02PM +0200, Thomas Gleixner wrote:
> > On Tue, 26 May 2009, Peter Zijlstra wrote:
> > > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > > The definition of "rating" from the kerneldoc does not
> > > > seem to imply that, it's a subjective measure AFAICT.
> > 
> >   Right, there is no rating threshold defined, which allows to deduce
> >   that. The TSC on x86 which might be unreliable, but usable as
> >   sched_clock has an initial rating of 300 which can be changed later
> >   on to 0 when the TSC is unusable as a time of day source. In that
> >   case clock is replaced by HPET which has a rating > 100 but is
> >   definitely not a good choice for sched_clock
> > 
> > > > Else you might want an additional criteria, like
> > > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > > (however you do that the best way)
> > > > so you don't pick something
> > > > that isn't substantially faster than the jiffy counter atleast?
> > 
> >   What we can do is add another flag to the clocksource e.g.
> >   CLOCK_SOURCE_USE_FOR_SCHED_CLOCK and check this instead of the
> >   rating.
> > 
> Ok, so based on this and John's locking concerns, how about something
> like this? It doesn't handle the wrapping cases, but I wonder if we
> really want to add that amount of logic to sched_clock() in the first
> place. Clocksources that wrap frequently could either leave the flag
> unset, or do something similar to the TSC code where the cyc2ns shift is
> used. If this is something we want to handle generically, then I'll have
> a go at generalizing the TSC cyc2ns scaling bits for the next spin.

Gah. There is no locking issue. As Peter explained before the
scheduler code can cope with some inaccurate value.

The wrap issue is completly academic. If the current clock source has
a wrap issue then it needs to be addressed anyway by frequent enough
wakeups to assure correctness of timekeeping and that makes it
suitable for the sched clock domain as well. Also the scheduler can
not hit a value which has not gone through the irq_enter() based
update after a long idle sleep.

So changing your previous patch from

   if (clock && clock->rating > 100)

to

   if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))

is sufficient.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:49         ` Thomas Gleixner
@ 2009-05-27  0:15           ` Paul Mundt
  2009-05-27 16:25             ` Daniel Walker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-27  0:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Linus Walleij, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz

On Wed, May 27, 2009 at 01:49:33AM +0200, Thomas Gleixner wrote:
> On Wed, 27 May 2009, Paul Mundt wrote:
> > Ok, so based on this and John's locking concerns, how about something
> > like this? It doesn't handle the wrapping cases, but I wonder if we
> > really want to add that amount of logic to sched_clock() in the first
> > place. Clocksources that wrap frequently could either leave the flag
> > unset, or do something similar to the TSC code where the cyc2ns shift is
> > used. If this is something we want to handle generically, then I'll have
> > a go at generalizing the TSC cyc2ns scaling bits for the next spin.
> 
> Gah. There is no locking issue. As Peter explained before the
> scheduler code can cope with some inaccurate value.
> 
> The wrap issue is completly academic. If the current clock source has
> a wrap issue then it needs to be addressed anyway by frequent enough
> wakeups to assure correctness of timekeeping and that makes it
> suitable for the sched clock domain as well. Also the scheduler can
> not hit a value which has not gone through the irq_enter() based
> update after a long idle sleep.
> 
> So changing your previous patch from
> 
>    if (clock && clock->rating > 100)
> 
> to
> 
>    if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> 
> is sufficient.
> 
Works for me.. here's v3 with an updated changelog.

--

sched: Support current clocksource handling in fallback sched_clock(), v3.

There are presently a number of issues and limitations with how the
clocksource and sched_clock() interaction works today. Configurations
tend to be grouped in to one of the following:

	- Platform provides a clocksource unsuitable for sched_clock()
	  and prefers to use the generic jiffies-backed implementation.

	- Platform provides its own clocksource and sched_clock() that
	  wraps in to it.

	- Platform uses a generic clocksource (ie, drivers/clocksource/)
	  combined with the generic jiffies-backed sched_clock().

	- Platform supports multiple sched_clock()-capable clocksources.

This patch adds a new CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag to address
these issues. The first case simply doesn't set the flag at all (or
clears it, if a clocksource is unstable), while the second case is made
redundant for any sched_clock() implementation that just does the
cyc2ns() case, which tends to be the vast majority of the embedded
platforms.

The remaining cases are handled transparently, in that sched_clock() will
always read from whatever the current clocksource is, as long as it is
has the flag set. This permits switching between multiple clocksources
which may or may not support being used by sched_clock(), and while some
inaccuracy may occur in these corner cases, the scheduler can live with
this.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

---

 include/linux/clocksource.h |    1 +
 kernel/sched_clock.c        |    9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..70d156f 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -212,6 +212,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..a0c18da 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,6 +39,14 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
+	/*
+	 * Use the current clocksource when it becomes available later in
+	 * the boot process, and ensure that it is usable for sched_clock().
+	 */
+	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
+		return cyc2ns(clock, clocksource_read(clock));
+
+	/* Otherwise just fall back on jiffies */
 	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
 					* (NSEC_PER_SEC / HZ);
 }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-27  0:15           ` Paul Mundt
@ 2009-05-27 16:25             ` Daniel Walker
  2009-05-28  8:44               ` Paul Mundt
  2009-05-28  9:19               ` Paul Mundt
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel Walker @ 2009-05-27 16:25 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Wed, 2009-05-27 at 09:15 +0900, Paul Mundt wrote:

>  unsigned long long __attribute__((weak)) sched_clock(void)
>  {
> +	/*
> +	 * Use the current clocksource when it becomes available later in
> +	 * the boot process, and ensure that it is usable for sched_clock().
> +	 */
> +	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> +		return cyc2ns(clock, clocksource_read(clock));
> +
> +	/* Otherwise just fall back on jiffies */
>  	return (unsigned long long)(jiffies - INITIAL_JIFFIES)

Do we really need all this complexity in a fast path? the jiffies
clocksource is static and always ready. Could we instead remove the
usage of "clock" and create a new pointer called "sched_clocksource" and
initialize it to the jiffies clock, and allow the clocksource management
code to update that new pointer based on the
CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag when new clocksources are
registered/removed/marked unstable etc..

that would eliminate all the code in sched_clock except one line,

unsigned long long __attribute__((weak)) sched_clock(void)
{
	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
}

Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-27 16:25             ` Daniel Walker
@ 2009-05-28  8:44               ` Paul Mundt
  2009-05-28  9:19               ` Paul Mundt
  1 sibling, 0 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-28  8:44 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Wed, May 27, 2009 at 09:25:25AM -0700, Daniel Walker wrote:
> On Wed, 2009-05-27 at 09:15 +0900, Paul Mundt wrote:
> 
> >  unsigned long long __attribute__((weak)) sched_clock(void)
> >  {
> > +	/*
> > +	 * Use the current clocksource when it becomes available later in
> > +	 * the boot process, and ensure that it is usable for sched_clock().
> > +	 */
> > +	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> > +		return cyc2ns(clock, clocksource_read(clock));
> > +
> > +	/* Otherwise just fall back on jiffies */
> >  	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> 
> Do we really need all this complexity in a fast path? the jiffies
> clocksource is static and always ready. Could we instead remove the
> usage of "clock" and create a new pointer called "sched_clocksource" and
> initialize it to the jiffies clock, and allow the clocksource management
> code to update that new pointer based on the
> CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag when new clocksources are
> registered/removed/marked unstable etc..
> 
> that would eliminate all the code in sched_clock except one line,
> 
> unsigned long long __attribute__((weak)) sched_clock(void)
> {
> 	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
> }
> 
It would, but then what would that do to the sched_clock() value? It will
reset once a CLOCK_SOURCE_USE_FOR_SCHED_CLOCK clocksource comes along,
which basically means that sched_clock() then rolls back, which is worse
than simply being delayed a bit. So even if we were to use a special
sched_clocksource, it still could not be used until it was set up at
registration time.

Now we could of course add a sched_clocksource assignment in
select_clocksource(), but that's really no different than just testing
'clock' as we do today. All it does is move the flag test in to a
different path.

So, I think the current v3 is still the best solution.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-27 16:25             ` Daniel Walker
  2009-05-28  8:44               ` Paul Mundt
@ 2009-05-28  9:19               ` Paul Mundt
  2009-05-28  9:34                 ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-28  9:19 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Thomas Gleixner, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Wed, May 27, 2009 at 09:25:25AM -0700, Daniel Walker wrote:
> On Wed, 2009-05-27 at 09:15 +0900, Paul Mundt wrote:
> 
> >  unsigned long long __attribute__((weak)) sched_clock(void)
> >  {
> > +	/*
> > +	 * Use the current clocksource when it becomes available later in
> > +	 * the boot process, and ensure that it is usable for sched_clock().
> > +	 */
> > +	if (clock && (clock->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK))
> > +		return cyc2ns(clock, clocksource_read(clock));
> > +
> > +	/* Otherwise just fall back on jiffies */
> >  	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> 
> Do we really need all this complexity in a fast path? the jiffies
> clocksource is static and always ready. Could we instead remove the
> usage of "clock" and create a new pointer called "sched_clocksource" and
> initialize it to the jiffies clock, and allow the clocksource management
> code to update that new pointer based on the
> CLOCK_SOURCE_USE_FOR_SCHED_CLOCK flag when new clocksources are
> registered/removed/marked unstable etc..
> 
> that would eliminate all the code in sched_clock except one line,
> 
> unsigned long long __attribute__((weak)) sched_clock(void)
> {
> 	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
> }
> 
Ok, there were some ordering problems with the early platform code, but
I've played with this a bit more and got it to the point where this now
also works. I can live with this over the v3 version if people prefer
this approach instead.

--

 include/linux/clocksource.h |    4 +++-
 kernel/sched_clock.c        |    4 ++--
 kernel/time/clocksource.c   |    4 ++++
 kernel/time/jiffies.c       |    2 +-
 4 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..2109940 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -202,7 +202,8 @@ struct clocksource {
 #endif
 };
 
-extern struct clocksource *clock;	/* current clocksource */
+extern struct clocksource *clock;		/* current clocksource */
+extern struct clocksource *sched_clocksource;	/* sched_clock() clocksource */
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..c06c285 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,8 +39,7 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
-	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
-					* (NSEC_PER_SEC / HZ);
+	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
 }
 
 static __read_mostly int sched_clock_running;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 80189f6..d148a75 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -109,6 +109,7 @@ EXPORT_SYMBOL(timecounter_cyc2time);
 
 /* XXX - Would like a better way for initializing curr_clocksource */
 extern struct clocksource clocksource_jiffies;
+struct clocksource *sched_clocksource = &clocksource_jiffies;
 
 /*[Clocksource internal variables]---------
  * curr_clocksource:
@@ -362,6 +363,9 @@ static struct clocksource *select_clocksource(void)
 	if (next == curr_clocksource)
 		return NULL;
 
+	if (next->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK)
+		sched_clocksource = next;
+
 	return next;
 }
 
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index c3f6c30..727d881 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -52,7 +52,7 @@
 
 static cycle_t jiffies_read(struct clocksource *cs)
 {
-	return (cycle_t) jiffies;
+	return (cycle_t) (jiffies - INITIAL_JIFFIES);
 }
 
 struct clocksource clocksource_jiffies = {

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28  9:19               ` Paul Mundt
@ 2009-05-28  9:34                 ` Peter Zijlstra
  2009-05-28 11:09                   ` Paul Mundt
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-28  9:34 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Daniel Walker, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 2009-05-28 at 18:19 +0900, Paul Mundt wrote:

> Ok, there were some ordering problems with the early platform code, but
> I've played with this a bit more and got it to the point where this now
> also works. I can live with this over the v3 version if people prefer
> this approach instead.
> 
> --

> @@ -38,8 +39,7 @@
>   */
>  unsigned long long __attribute__((weak)) sched_clock(void)
>  {
> -	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> -					* (NSEC_PER_SEC / HZ);
> +	return cyc2ns(sched_clocksource, clocksource_read(sched_clocksource));
>  }
>  
>  static __read_mostly int sched_clock_running;

> @@ -362,6 +363,9 @@ static struct clocksource *select_clocksource(void)
>  	if (next == curr_clocksource)
>  		return NULL;
>  
> +	if (next->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK)
> +		sched_clocksource = next;
> +
>  	return next;
>  }
>  

That's a single assignment, vs two reads on use. Should we be worried
about the SMP race where we observe two different sched_clocksource
pointers on read?


I would suggest we write it as:

u64 __weak sched_clock(void)
{
	struct clocksource *clock = ACCESS_ONCE(sched_clocksource);

	return cyc2ns(clock, clocksource_read(clock));
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28  9:34                 ` Peter Zijlstra
@ 2009-05-28 11:09                   ` Paul Mundt
  2009-05-28 12:22                     ` Thomas Gleixner
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 11:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Walker, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 11:34:41AM +0200, Peter Zijlstra wrote:
> That's a single assignment, vs two reads on use. Should we be worried
> about the SMP race where we observe two different sched_clocksource
> pointers on read?
> 
> 
> I would suggest we write it as:
> 
> u64 __weak sched_clock(void)
> {
> 	struct clocksource *clock = ACCESS_ONCE(sched_clocksource);
> 
> 	return cyc2ns(clock, clocksource_read(clock));
> }

Here's an updated version, also with an added sanity check in the
unregister path..

--

 include/linux/clocksource.h |    4 +++-
 kernel/sched_clock.c        |    6 ++++--
 kernel/time/clocksource.c   |   13 +++++++++++++
 kernel/time/jiffies.c       |    2 +-
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..2109940 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -202,7 +202,8 @@ struct clocksource {
 #endif
 };
 
-extern struct clocksource *clock;	/* current clocksource */
+extern struct clocksource *clock;		/* current clocksource */
+extern struct clocksource *sched_clocksource;	/* sched_clock() clocksource */
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..b91190e 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,8 +39,9 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
-	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
-					* (NSEC_PER_SEC / HZ);
+	struct clocksource *clock = ACCESS_ONCE(sched_clocksource);
+
+	return cyc2ns(clock, clocksource_read(clock));
 }
 
 static __read_mostly int sched_clock_running;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 80189f6..57b011a 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -109,6 +109,7 @@ EXPORT_SYMBOL(timecounter_cyc2time);
 
 /* XXX - Would like a better way for initializing curr_clocksource */
 extern struct clocksource clocksource_jiffies;
+struct clocksource *sched_clocksource = &clocksource_jiffies;
 
 /*[Clocksource internal variables]---------
  * curr_clocksource:
@@ -362,6 +363,9 @@ static struct clocksource *select_clocksource(void)
 	if (next == curr_clocksource)
 		return NULL;
 
+	if (next->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK)
+		sched_clocksource = next;
+
 	return next;
 }
 
@@ -437,10 +441,19 @@ void clocksource_unregister(struct clocksource *cs)
 	unsigned long flags;
 
 	spin_lock_irqsave(&clocksource_lock, flags);
+
+	if (sched_clocksource == cs) {
+		printk(KERN_WARNING "Refusing to unregister "
+		       "scheduler clocksource %s", cs->name);
+		goto out;
+	}
+
 	list_del(&cs->list);
 	if (clocksource_override == cs)
 		clocksource_override = NULL;
 	next_clocksource = select_clocksource();
+
+out:
 	spin_unlock_irqrestore(&clocksource_lock, flags);
 }
 
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index c3f6c30..727d881 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -52,7 +52,7 @@
 
 static cycle_t jiffies_read(struct clocksource *cs)
 {
-	return (cycle_t) jiffies;
+	return (cycle_t) (jiffies - INITIAL_JIFFIES);
 }
 
 struct clocksource clocksource_jiffies = {

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 11:09                   ` Paul Mundt
@ 2009-05-28 12:22                     ` Thomas Gleixner
  2009-05-28 12:40                       ` Peter Zijlstra
  2009-05-28 12:42                       ` Paul Mundt
  0 siblings, 2 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-28 12:22 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Daniel Walker, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 28 May 2009, Paul Mundt wrote:
> @@ -437,10 +441,19 @@ void clocksource_unregister(struct clocksource *cs)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&clocksource_lock, flags);
> +
> +	if (sched_clocksource == cs) {
> +		printk(KERN_WARNING "Refusing to unregister "
> +		       "scheduler clocksource %s", cs->name);

  Hmm, isn't that dangerous ? The clocksource might be in a module or
  in kmalloced memory which is going to be freed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 12:22                     ` Thomas Gleixner
@ 2009-05-28 12:40                       ` Peter Zijlstra
  2009-05-28 12:42                       ` Paul Mundt
  1 sibling, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-28 12:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul Mundt, Daniel Walker, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 2009-05-28 at 14:22 +0200, Thomas Gleixner wrote:
> On Thu, 28 May 2009, Paul Mundt wrote:
> > @@ -437,10 +441,19 @@ void clocksource_unregister(struct clocksource *cs)
> >  	unsigned long flags;
> >  
> >  	spin_lock_irqsave(&clocksource_lock, flags);
> > +
> > +	if (sched_clocksource == cs) {
> > +		printk(KERN_WARNING "Refusing to unregister "
> > +		       "scheduler clocksource %s", cs->name);
> 
>   Hmm, isn't that dangerous ? The clocksource might be in a module or
>   in kmalloced memory which is going to be freed.

One could play dirty tricks with 'leaking' module ref counts and such so
as to avoid that. The alternative is making sched_clock() look something
like:

u64 __weak sched_clock(void)
{
  struct clocksource *clock;
  u64 time;

  rcu_read_lock();
  clock = rcu_dereference(sched_clocksource);
  time = cyc2ns(clock, read_clocksource(clock));
  rcu_read_unlock;

  return time;
}

and make the module unload do a sync_rcu -- but this might be a little
overkill


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 12:22                     ` Thomas Gleixner
  2009-05-28 12:40                       ` Peter Zijlstra
@ 2009-05-28 12:42                       ` Paul Mundt
  2009-05-28 12:53                         ` Thomas Gleixner
  2009-05-28 12:59                         ` Peter Zijlstra
  1 sibling, 2 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 12:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Daniel Walker, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 02:22:17PM +0200, Thomas Gleixner wrote:
> On Thu, 28 May 2009, Paul Mundt wrote:
> > @@ -437,10 +441,19 @@ void clocksource_unregister(struct clocksource *cs)
> >  	unsigned long flags;
> >  
> >  	spin_lock_irqsave(&clocksource_lock, flags);
> > +
> > +	if (sched_clocksource == cs) {
> > +		printk(KERN_WARNING "Refusing to unregister "
> > +		       "scheduler clocksource %s", cs->name);
> 
>   Hmm, isn't that dangerous ? The clocksource might be in a module or
>   in kmalloced memory which is going to be freed.
> 
Perhaps the saner thing to do is see if select_clocksource() manages to
find something suitable, otherwise switch back to jiffies..

--

 include/linux/clocksource.h |    4 +++-
 kernel/sched_clock.c        |    6 ++++--
 kernel/time/clocksource.c   |   14 ++++++++++++++
 kernel/time/jiffies.c       |    2 +-
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..2109940 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -202,7 +202,8 @@ struct clocksource {
 #endif
 };
 
-extern struct clocksource *clock;	/* current clocksource */
+extern struct clocksource *clock;		/* current clocksource */
+extern struct clocksource *sched_clocksource;	/* sched_clock() clocksource */
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..b91190e 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,7 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,8 +39,9 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
-	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
-					* (NSEC_PER_SEC / HZ);
+	struct clocksource *clock = ACCESS_ONCE(sched_clocksource);
+
+	return cyc2ns(clock, clocksource_read(clock));
 }
 
 static __read_mostly int sched_clock_running;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 80189f6..19c72d0 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -109,6 +109,7 @@ EXPORT_SYMBOL(timecounter_cyc2time);
 
 /* XXX - Would like a better way for initializing curr_clocksource */
 extern struct clocksource clocksource_jiffies;
+struct clocksource *sched_clocksource = &clocksource_jiffies;
 
 /*[Clocksource internal variables]---------
  * curr_clocksource:
@@ -362,6 +363,9 @@ static struct clocksource *select_clocksource(void)
 	if (next == curr_clocksource)
 		return NULL;
 
+	if (next->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK)
+		sched_clocksource = next;
+
 	return next;
 }
 
@@ -440,7 +444,17 @@ void clocksource_unregister(struct clocksource *cs)
 	list_del(&cs->list);
 	if (clocksource_override == cs)
 		clocksource_override = NULL;
+
 	next_clocksource = select_clocksource();
+
+	/*
+	 * If select_clocksource() fails to find another suitable
+	 * clocksource for sched_clocksource and we are unregistering
+	 * it, switch back to jiffies.
+	 */
+	if (sched_clocksource == cs)
+		sched_clocksource = &clocksource_jiffies;
+
 	spin_unlock_irqrestore(&clocksource_lock, flags);
 }
 
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index c3f6c30..727d881 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -52,7 +52,7 @@
 
 static cycle_t jiffies_read(struct clocksource *cs)
 {
-	return (cycle_t) jiffies;
+	return (cycle_t) (jiffies - INITIAL_JIFFIES);
 }
 
 struct clocksource clocksource_jiffies = {

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 12:42                       ` Paul Mundt
@ 2009-05-28 12:53                         ` Thomas Gleixner
  2009-05-28 12:59                         ` Peter Zijlstra
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-28 12:53 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Daniel Walker, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 28 May 2009, Paul Mundt wrote:

> On Thu, May 28, 2009 at 02:22:17PM +0200, Thomas Gleixner wrote:
> > On Thu, 28 May 2009, Paul Mundt wrote:
> > > @@ -437,10 +441,19 @@ void clocksource_unregister(struct clocksource *cs)
> > >  	unsigned long flags;
> > >  
> > >  	spin_lock_irqsave(&clocksource_lock, flags);
> > > +
> > > +	if (sched_clocksource == cs) {
> > > +		printk(KERN_WARNING "Refusing to unregister "
> > > +		       "scheduler clocksource %s", cs->name);
> > 
> >   Hmm, isn't that dangerous ? The clocksource might be in a module or
> >   in kmalloced memory which is going to be freed.
> > 
> Perhaps the saner thing to do is see if select_clocksource() manages to
> find something suitable, otherwise switch back to jiffies..

That makes sense. The scheduler should be able handle that, if not ... :)
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 12:42                       ` Paul Mundt
  2009-05-28 12:53                         ` Thomas Gleixner
@ 2009-05-28 12:59                         ` Peter Zijlstra
  2009-05-28 13:20                           ` Paul Mundt
  2009-05-28 16:13                           ` Daniel Walker
  1 sibling, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-28 12:59 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Thomas Gleixner, Daniel Walker, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 2009-05-28 at 21:42 +0900, Paul Mundt wrote:

>  unsigned long long __attribute__((weak)) sched_clock(void)
>  {
> -	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> -					* (NSEC_PER_SEC / HZ);
> +	struct clocksource *clock = ACCESS_ONCE(sched_clocksource);
> +
> +	return cyc2ns(clock, clocksource_read(clock));
>  }
>  

> @@ -440,7 +444,17 @@ void clocksource_unregister(struct clocksource *cs)
>  	list_del(&cs->list);
>  	if (clocksource_override == cs)
>  		clocksource_override = NULL;
> +
>  	next_clocksource = select_clocksource();
> +
> +	/*
> +	 * If select_clocksource() fails to find another suitable
> +	 * clocksource for sched_clocksource and we are unregistering
> +	 * it, switch back to jiffies.
> +	 */
> +	if (sched_clocksource == cs)
> +		sched_clocksource = &clocksource_jiffies;
> +
>  	spin_unlock_irqrestore(&clocksource_lock, flags);
>  }


CPU0                           CPU1

clock = ACCESS_ONCE(sched_clocksource);

                               unload module
                                 clocksource_unregister()
                                   sched_clocksource = jiffies
                                 unmap data/text

cyc2ns(clock, clocksource_read(clock)) <--- fireworks




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 12:59                         ` Peter Zijlstra
@ 2009-05-28 13:20                           ` Paul Mundt
  2009-05-28 16:13                           ` Daniel Walker
  1 sibling, 0 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Daniel Walker, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 02:59:30PM +0200, Peter Zijlstra wrote:
> On Thu, 2009-05-28 at 21:42 +0900, Paul Mundt wrote:
> 
> >  unsigned long long __attribute__((weak)) sched_clock(void)
> >  {
> > -	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> > -					* (NSEC_PER_SEC / HZ);
> > +	struct clocksource *clock = ACCESS_ONCE(sched_clocksource);
> > +
> > +	return cyc2ns(clock, clocksource_read(clock));
> >  }
> >  
> 
> > @@ -440,7 +444,17 @@ void clocksource_unregister(struct clocksource *cs)
> >  	list_del(&cs->list);
> >  	if (clocksource_override == cs)
> >  		clocksource_override = NULL;
> > +
> >  	next_clocksource = select_clocksource();
> > +
> > +	/*
> > +	 * If select_clocksource() fails to find another suitable
> > +	 * clocksource for sched_clocksource and we are unregistering
> > +	 * it, switch back to jiffies.
> > +	 */
> > +	if (sched_clocksource == cs)
> > +		sched_clocksource = &clocksource_jiffies;
> > +
> >  	spin_unlock_irqrestore(&clocksource_lock, flags);
> >  }
> 
> 
> CPU0                           CPU1
> 
> clock = ACCESS_ONCE(sched_clocksource);
> 
>                                unload module
>                                  clocksource_unregister()
>                                    sched_clocksource = jiffies
>                                  unmap data/text
> 
> cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> 

Right, lets try that again..

--

 include/linux/clocksource.h |    4 +++-
 kernel/sched_clock.c        |   13 +++++++++++--
 kernel/time/clocksource.c   |   19 +++++++++++++++++++
 kernel/time/jiffies.c       |    2 +-
 4 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index c56457c..2109940 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -202,7 +202,8 @@ struct clocksource {
 #endif
 };
 
-extern struct clocksource *clock;	/* current clocksource */
+extern struct clocksource *clock;		/* current clocksource */
+extern struct clocksource *sched_clocksource;	/* sched_clock() clocksource */
 
 /*
  * Clock source flags bits::
@@ -212,6 +213,7 @@ extern struct clocksource *clock;	/* current clocksource */
 
 #define CLOCK_SOURCE_WATCHDOG			0x10
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
+#define CLOCK_SOURCE_USE_FOR_SCHED_CLOCK	0x40
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) (cycle_t)((bits) < 64 ? ((1ULL<<(bits))-1) : -1)
diff --git a/kernel/sched_clock.c b/kernel/sched_clock.c
index e1d16c9..b51d48d 100644
--- a/kernel/sched_clock.c
+++ b/kernel/sched_clock.c
@@ -30,6 +30,8 @@
 #include <linux/percpu.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
+#include <linux/clocksource.h>
+#include <linux/rcupdate.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -38,8 +40,15 @@
  */
 unsigned long long __attribute__((weak)) sched_clock(void)
 {
-	return (unsigned long long)(jiffies - INITIAL_JIFFIES)
-					* (NSEC_PER_SEC / HZ);
+	unsigned long long time;
+	struct clocksource *clock;
+
+	rcu_read_lock();
+	clock = rcu_dereference(sched_clocksource);
+	time = cyc2ns(clock, clocksource_read(clock));
+	rcu_read_unlock();
+
+	return time;
 }
 
 static __read_mostly int sched_clock_running;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 80189f6..3795954 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -25,6 +25,7 @@
  */
 
 #include <linux/clocksource.h>
+#include <linux/rcupdate.h>
 #include <linux/sysdev.h>
 #include <linux/init.h>
 #include <linux/module.h>
@@ -109,6 +110,7 @@ EXPORT_SYMBOL(timecounter_cyc2time);
 
 /* XXX - Would like a better way for initializing curr_clocksource */
 extern struct clocksource clocksource_jiffies;
+struct clocksource *sched_clocksource = &clocksource_jiffies;
 
 /*[Clocksource internal variables]---------
  * curr_clocksource:
@@ -362,6 +364,9 @@ static struct clocksource *select_clocksource(void)
 	if (next == curr_clocksource)
 		return NULL;
 
+	if (next->flags & CLOCK_SOURCE_USE_FOR_SCHED_CLOCK)
+		sched_clocksource = next;
+
 	return next;
 }
 
@@ -440,7 +445,21 @@ void clocksource_unregister(struct clocksource *cs)
 	list_del(&cs->list);
 	if (clocksource_override == cs)
 		clocksource_override = NULL;
+
 	next_clocksource = select_clocksource();
+
+	/*
+	 * If select_clocksource() fails to find another suitable
+	 * clocksource for sched_clocksource and we are unregistering
+	 * it, switch back to jiffies.
+	 */
+	if (sched_clocksource == cs) {
+		rcu_assign_pointer(sched_clocksource, &clocksource_jiffies);
+		spin_unlock_irqrestore(&clocksource_lock, flags);
+		synchronize_rcu();
+		return;
+	}
+
 	spin_unlock_irqrestore(&clocksource_lock, flags);
 }
 
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index c3f6c30..727d881 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -52,7 +52,7 @@
 
 static cycle_t jiffies_read(struct clocksource *cs)
 {
-	return (cycle_t) jiffies;
+	return (cycle_t) (jiffies - INITIAL_JIFFIES);
 }
 
 struct clocksource clocksource_jiffies = {

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 12:59                         ` Peter Zijlstra
  2009-05-28 13:20                           ` Paul Mundt
@ 2009-05-28 16:13                           ` Daniel Walker
  2009-05-28 16:32                             ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel Walker @ 2009-05-28 16:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mundt, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:

> 
> 
> CPU0                           CPU1
> 
> clock = ACCESS_ONCE(sched_clocksource);
> 
>                                unload module
>                                  clocksource_unregister()
>                                    sched_clocksource = jiffies
>                                  unmap data/text
> 
> cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> 
> 

Do any module based clocksources even exist right now?
clocksource_unregister only seems to be used 3 times..

Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:13                           ` Daniel Walker
@ 2009-05-28 16:32                             ` Peter Zijlstra
  2009-05-28 16:40                               ` Paul Mundt
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-28 16:32 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Paul Mundt, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 2009-05-28 at 09:13 -0700, Daniel Walker wrote:
> On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:
> 
> > 
> > 
> > CPU0                           CPU1
> > 
> > clock = ACCESS_ONCE(sched_clocksource);
> > 
> >                                unload module
> >                                  clocksource_unregister()
> >                                    sched_clocksource = jiffies
> >                                  unmap data/text
> > 
> > cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> > 
> > 
> 
> Do any module based clocksources even exist right now?
> clocksource_unregister only seems to be used 3 times..

Good point, it appears its not even exported.

Thomas mentioned modules, I assumed.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:32                             ` Peter Zijlstra
@ 2009-05-28 16:40                               ` Paul Mundt
  2009-05-28 16:52                                 ` Daniel Walker
  2009-05-28 17:07                                 ` John Stultz
  0 siblings, 2 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 16:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Walker, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 06:32:09PM +0200, Peter Zijlstra wrote:
> On Thu, 2009-05-28 at 09:13 -0700, Daniel Walker wrote:
> > On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:
> > > CPU0                           CPU1
> > > 
> > > clock = ACCESS_ONCE(sched_clocksource);
> > > 
> > >                                unload module
> > >                                  clocksource_unregister()
> > >                                    sched_clocksource = jiffies
> > >                                  unmap data/text
> > > 
> > > cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> > > 
> > > 
> > 
> > Do any module based clocksources even exist right now?
> > clocksource_unregister only seems to be used 3 times..
> 
> Good point, it appears its not even exported.
> 
> Thomas mentioned modules, I assumed.
> 
The drivers/clocksource/ drivers in theory could be modular anyways.
Handling this transition properly is at least one less barrier to modular
clocksources, so I think it's progress regardless. I don't remember what
all of the other issues were though, John probably remembers.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:40                               ` Paul Mundt
@ 2009-05-28 16:52                                 ` Daniel Walker
  2009-05-28 16:58                                   ` Paul Mundt
  2009-05-28 17:00                                   ` Thomas Gleixner
  2009-05-28 17:07                                 ` John Stultz
  1 sibling, 2 replies; 57+ messages in thread
From: Daniel Walker @ 2009-05-28 16:52 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Fri, 2009-05-29 at 01:40 +0900, Paul Mundt wrote:
> On Thu, May 28, 2009 at 06:32:09PM +0200, Peter Zijlstra wrote:
> > On Thu, 2009-05-28 at 09:13 -0700, Daniel Walker wrote:
> > > On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:
> > > > CPU0                           CPU1
> > > > 
> > > > clock = ACCESS_ONCE(sched_clocksource);
> > > > 
> > > >                                unload module
> > > >                                  clocksource_unregister()
> > > >                                    sched_clocksource = jiffies
> > > >                                  unmap data/text
> > > > 
> > > > cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> > > > 
> > > > 
> > > 
> > > Do any module based clocksources even exist right now?
> > > clocksource_unregister only seems to be used 3 times..
> > 
> > Good point, it appears its not even exported.
> > 
> > Thomas mentioned modules, I assumed.
> > 
> The drivers/clocksource/ drivers in theory could be modular anyways.
> Handling this transition properly is at least one less barrier to modular
> clocksources, so I think it's progress regardless. I don't remember what
> all of the other issues were though, John probably remembers.

I don't think it's an important case to consider right now ..
clocksources are usually so integral to the system putting one in a
module seems counterintuitive. 

Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:52                                 ` Daniel Walker
@ 2009-05-28 16:58                                   ` Paul Mundt
  2009-05-28 17:38                                     ` Daniel Walker
  2009-05-28 17:00                                   ` Thomas Gleixner
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 16:58 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 09:52:27AM -0700, Daniel Walker wrote:
> On Fri, 2009-05-29 at 01:40 +0900, Paul Mundt wrote:
> > On Thu, May 28, 2009 at 06:32:09PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2009-05-28 at 09:13 -0700, Daniel Walker wrote:
> > > > On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:
> > > > > CPU0                           CPU1
> > > > > 
> > > > > clock = ACCESS_ONCE(sched_clocksource);
> > > > > 
> > > > >                                unload module
> > > > >                                  clocksource_unregister()
> > > > >                                    sched_clocksource = jiffies
> > > > >                                  unmap data/text
> > > > > 
> > > > > cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> > > > > 
> > > > > 
> > > > 
> > > > Do any module based clocksources even exist right now?
> > > > clocksource_unregister only seems to be used 3 times..
> > > 
> > > Good point, it appears its not even exported.
> > > 
> > > Thomas mentioned modules, I assumed.
> > > 
> > The drivers/clocksource/ drivers in theory could be modular anyways.
> > Handling this transition properly is at least one less barrier to modular
> > clocksources, so I think it's progress regardless. I don't remember what
> > all of the other issues were though, John probably remembers.
> 
> I don't think it's an important case to consider right now ..
> clocksources are usually so integral to the system putting one in a
> module seems counterintuitive. 
> 
I would not have mentioned it if it weren't something we already had use
cases for. For the SH timers alone we have 3 that can be used as
clocksources in any combination, excluding the differences in timer
channels per block. These tend to have different implications for
performance, power management, etc.

The only reason they are not modular today is because more work needs to
be done to handle clocksources going away, or at least there was the last
time we tried it.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:58                                   ` Paul Mundt
@ 2009-05-28 17:38                                     ` Daniel Walker
  2009-05-28 17:46                                       ` Thomas Gleixner
  2009-05-28 17:53                                       ` Paul Mundt
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel Walker @ 2009-05-28 17:38 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Fri, 2009-05-29 at 01:58 +0900, Paul Mundt wrote:
> On Thu, May 28, 2009 at 09:52:27AM -0700, Daniel Walker wrote:
> > On Fri, 2009-05-29 at 01:40 +0900, Paul Mundt wrote:
> > > On Thu, May 28, 2009 at 06:32:09PM +0200, Peter Zijlstra wrote:
> > > > On Thu, 2009-05-28 at 09:13 -0700, Daniel Walker wrote:
> > > > > On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:
> > > > > > CPU0                           CPU1
> > > > > > 
> > > > > > clock = ACCESS_ONCE(sched_clocksource);
> > > > > > 
> > > > > >                                unload module
> > > > > >                                  clocksource_unregister()
> > > > > >                                    sched_clocksource = jiffies
> > > > > >                                  unmap data/text
> > > > > > 
> > > > > > cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> > > > > > 
> > > > > > 
> > > > > 
> > > > > Do any module based clocksources even exist right now?
> > > > > clocksource_unregister only seems to be used 3 times..
> > > > 
> > > > Good point, it appears its not even exported.
> > > > 
> > > > Thomas mentioned modules, I assumed.
> > > > 
> > > The drivers/clocksource/ drivers in theory could be modular anyways.
> > > Handling this transition properly is at least one less barrier to modular
> > > clocksources, so I think it's progress regardless. I don't remember what
> > > all of the other issues were though, John probably remembers.
> > 
> > I don't think it's an important case to consider right now ..
> > clocksources are usually so integral to the system putting one in a
> > module seems counterintuitive. 
> > 
> I would not have mentioned it if it weren't something we already had use
> cases for. For the SH timers alone we have 3 that can be used as
> clocksources in any combination, excluding the differences in timer
> channels per block. These tend to have different implications for
> performance, power management, etc.
> 
> The only reason they are not modular today is because more work needs to
> be done to handle clocksources going away, or at least there was the last
> time we tried it.

I don't know the details of SH so I can't speak specifically to that ..
My experience is that usually one clock gets selected as the clocksource
for a given system , and it rarely changes.. We have a sysfs facility to
allow a user to switch clocksources, but I doubt that's used for more
than debugging..

Can you imagine a general case on SH where the users know enough about
the different clocksources that they can switch between them optimally
without an SH expert sitting next to them telling them what to do?

Daniel




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 17:38                                     ` Daniel Walker
@ 2009-05-28 17:46                                       ` Thomas Gleixner
  2009-05-28 17:53                                       ` Paul Mundt
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-28 17:46 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Paul Mundt, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 28 May 2009, Daniel Walker wrote:
> > I would not have mentioned it if it weren't something we already had use
> > cases for. For the SH timers alone we have 3 that can be used as
> > clocksources in any combination, excluding the differences in timer
> > channels per block. These tend to have different implications for
> > performance, power management, etc.
> > 
> > The only reason they are not modular today is because more work needs to
> > be done to handle clocksources going away, or at least there was the last
> > time we tried it.
> 
> I don't know the details of SH so I can't speak specifically to that ..
> My experience is that usually one clock gets selected as the clocksource
> for a given system , and it rarely changes.. We have a sysfs facility to
> allow a user to switch clocksources, but I doubt that's used for more
> than debugging..
> 
> Can you imagine a general case on SH where the users know enough about
> the different clocksources that they can switch between them optimally
> without an SH expert sitting next to them telling them what to do?

Paul explained already that:

> > ...  These tend to have different implications for
> > performance, power management, etc.

So the use case for this is pretty obvious. In operational state you
want something fast and easy to access (which might use more power
e.g. because it runs at a higher clock frequency). When you go into
power saving modes you switch over to a clocksource which might be
slower to access but uses less power.

And there is no user interaction at all, it's selected as part of a
power state transition from the kernel.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 17:38                                     ` Daniel Walker
  2009-05-28 17:46                                       ` Thomas Gleixner
@ 2009-05-28 17:53                                       ` Paul Mundt
  2009-05-28 18:10                                         ` Daniel Walker
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 17:53 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 10:38:44AM -0700, Daniel Walker wrote:
> On Fri, 2009-05-29 at 01:58 +0900, Paul Mundt wrote:
> > On Thu, May 28, 2009 at 09:52:27AM -0700, Daniel Walker wrote:
> > > I don't think it's an important case to consider right now ..
> > > clocksources are usually so integral to the system putting one in a
> > > module seems counterintuitive. 
> > > 
> > I would not have mentioned it if it weren't something we already had use
> > cases for. For the SH timers alone we have 3 that can be used as
> > clocksources in any combination, excluding the differences in timer
> > channels per block. These tend to have different implications for
> > performance, power management, etc.
> > 
> > The only reason they are not modular today is because more work needs to
> > be done to handle clocksources going away, or at least there was the last
> > time we tried it.
> 
> I don't know the details of SH so I can't speak specifically to that ..
> My experience is that usually one clock gets selected as the clocksource
> for a given system , and it rarely changes.. We have a sysfs facility to
> allow a user to switch clocksources, but I doubt that's used for more
> than debugging..
> 
> Can you imagine a general case on SH where the users know enough about
> the different clocksources that they can switch between them optimally
> without an SH expert sitting next to them telling them what to do?
> 
As I already stated, yes.

We have multiple clock sources for most CPUs. These can be set up in any
sort of configuration, and there are pros and cons to using different
ones. The ones that are available can in turn be cycled between. I don't
know what exactly is difficult to understand about this.

Yes, we want to be able to use modular clocksources. The only reason we
don't right now is because some more preparatory work is needed first.
Any attempt to remove support for modular clocksources means we will just
have to add it in back later.

That and the fact there are already in-tree users using the
unregistration path suggests that there is no benefit in trying to
prevent modular clocksources in the first place. If you have no intention
to use modular clocksources, then don't.

If you have any technical concerns, then raise them, otherwise this is
pointless.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 17:53                                       ` Paul Mundt
@ 2009-05-28 18:10                                         ` Daniel Walker
  2009-05-28 18:27                                           ` Paul Mundt
  2009-05-28 18:44                                           ` Thomas Gleixner
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel Walker @ 2009-05-28 18:10 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Fri, 2009-05-29 at 02:53 +0900, Paul Mundt wrote:

> As I already stated, yes.
> 
> We have multiple clock sources for most CPUs. These can be set up in any
> sort of configuration, and there are pros and cons to using different
> ones. The ones that are available can in turn be cycled between. I don't
> know what exactly is difficult to understand about this.

I understand that cpu's can have multiple clocks, that's not a hard concept to grasp.

> Yes, we want to be able to use modular clocksources. The only reason we
> don't right now is because some more preparatory work is needed first.
> Any attempt to remove support for modular clocksources means we will just
> have to add it in back later.

This is what's difficult to understand.. You have multiple clocks ok,
fine.. You have multiple clocks that you want the kernel to switch
between, ok that's fine too.. What's missing is the case where
clocksource modules being loaded/unload via the user becomes a valuable
use case..

If you have a valuable use case for that, fine, I won't stand in the
way ..

Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 18:10                                         ` Daniel Walker
@ 2009-05-28 18:27                                           ` Paul Mundt
  2009-05-28 19:04                                             ` Daniel Walker
  2009-05-28 18:44                                           ` Thomas Gleixner
  1 sibling, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 18:27 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 11:10:52AM -0700, Daniel Walker wrote:
> On Fri, 2009-05-29 at 02:53 +0900, Paul Mundt wrote:
> > Yes, we want to be able to use modular clocksources. The only reason we
> > don't right now is because some more preparatory work is needed first.
> > Any attempt to remove support for modular clocksources means we will just
> > have to add it in back later.
> 
> This is what's difficult to understand.. You have multiple clocks ok,
> fine.. You have multiple clocks that you want the kernel to switch
> between, ok that's fine too.. What's missing is the case where
> clocksource modules being loaded/unload via the user becomes a valuable
> use case..
> 
> If you have a valuable use case for that, fine, I won't stand in the
> way ..
> 
Ah, ok, I suppose I could have explained that better. There are a couple
of different considerations really. The timer blocks are often in
different clock and power domains, meaning that only certain ones can be
used for wakeups. These tend not to be ideal for general use, and so we
only switch to them when we have to.

To make matters more convoluted, the availability of different timer
channels on different CPUs will depend on current pin state, and more
importantly, what sort of states we are able to achieve. It is not
uncommon to have some of the pins required by these channels locked down
by other drivers during regular operation, which we in turn need to
unload before the pin state can be reconfigured and the timer can
succeed, all which needs to happen before certain power state transitions
can take place. We implement dynamic pinmux for most of the SH CPUs
precisely to deal with these sorts of problems. In the case of some of
the microcontrollers that are timer heavy and low on pincount, it is not
uncommon to have upwards of 16 different functions per pin.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 18:27                                           ` Paul Mundt
@ 2009-05-28 19:04                                             ` Daniel Walker
  2009-05-28 19:34                                               ` Paul Mundt
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel Walker @ 2009-05-28 19:04 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Fri, 2009-05-29 at 03:27 +0900, Paul Mundt wrote:

> Ah, ok, I suppose I could have explained that better. There are a couple
> of different considerations really. The timer blocks are often in
> different clock and power domains, meaning that only certain ones can be
> used for wakeups. These tend not to be ideal for general use, and so we
> only switch to them when we have to.
> 
> To make matters more convoluted, the availability of different timer
> channels on different CPUs will depend on current pin state, and more
> importantly, what sort of states we are able to achieve. It is not
> uncommon to have some of the pins required by these channels locked down
> by other drivers during regular operation, which we in turn need to
> unload before the pin state can be reconfigured and the timer can
> succeed, all which needs to happen before certain power state transitions
> can take place. We implement dynamic pinmux for most of the SH CPUs
> precisely to deal with these sorts of problems. In the case of some of
> the microcontrollers that are timer heavy and low on pincount, it is not
> uncommon to have upwards of 16 different functions per pin.

I'm still a little confused how kernel modules fit in here.. Are you
saying a user would unload some certain driver which has a pin locked
down and prevents the clocksource from working. Then the user would load
the clocksource module which would now function, and that all would have
to happen in order to enter a certain power state?

Daniel 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 19:04                                             ` Daniel Walker
@ 2009-05-28 19:34                                               ` Paul Mundt
  2009-05-28 19:41                                                 ` Daniel Walker
  0 siblings, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 19:34 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 12:04:23PM -0700, Daniel Walker wrote:
> On Fri, 2009-05-29 at 03:27 +0900, Paul Mundt wrote:
> 
> > Ah, ok, I suppose I could have explained that better. There are a couple
> > of different considerations really. The timer blocks are often in
> > different clock and power domains, meaning that only certain ones can be
> > used for wakeups. These tend not to be ideal for general use, and so we
> > only switch to them when we have to.
> > 
> > To make matters more convoluted, the availability of different timer
> > channels on different CPUs will depend on current pin state, and more
> > importantly, what sort of states we are able to achieve. It is not
> > uncommon to have some of the pins required by these channels locked down
> > by other drivers during regular operation, which we in turn need to
> > unload before the pin state can be reconfigured and the timer can
> > succeed, all which needs to happen before certain power state transitions
> > can take place. We implement dynamic pinmux for most of the SH CPUs
> > precisely to deal with these sorts of problems. In the case of some of
> > the microcontrollers that are timer heavy and low on pincount, it is not
> > uncommon to have upwards of 16 different functions per pin.
> 
> I'm still a little confused how kernel modules fit in here.. Are you
> saying a user would unload some certain driver which has a pin locked
> down and prevents the clocksource from working. Then the user would load
> the clocksource module which would now function, and that all would have
> to happen in order to enter a certain power state?
> 
Yes.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 19:34                                               ` Paul Mundt
@ 2009-05-28 19:41                                                 ` Daniel Walker
  2009-05-28 23:37                                                   ` Paul Mundt
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel Walker @ 2009-05-28 19:41 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Fri, 2009-05-29 at 04:34 +0900, Paul Mundt wrote:
> > I'm still a little confused how kernel modules fit in here.. Are you
> > saying a user would unload some certain driver which has a pin locked
> > down and prevents the clocksource from working. Then the user would load
> > the clocksource module which would now function, and that all would have
> > to happen in order to enter a certain power state?
> > 
> Yes.

I'm assuming this isn't a low power state, this would be something more
like suspend or hibernate right?

Daniel


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 19:41                                                 ` Daniel Walker
@ 2009-05-28 23:37                                                   ` Paul Mundt
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-28 23:37 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Peter Zijlstra, Thomas Gleixner, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, May 28, 2009 at 12:41:03PM -0700, Daniel Walker wrote:
> On Fri, 2009-05-29 at 04:34 +0900, Paul Mundt wrote:
> > > I'm still a little confused how kernel modules fit in here.. Are you
> > > saying a user would unload some certain driver which has a pin locked
> > > down and prevents the clocksource from working. Then the user would load
> > > the clocksource module which would now function, and that all would have
> > > to happen in order to enter a certain power state?
> > > 
> > Yes.
> 
> I'm assuming this isn't a low power state, this would be something more
> like suspend or hibernate right?
> 
Right, with the amount of hassle involved, it makes no sense for run-time
PM at least. So the suspend/hibernate use cases are mostly where it makes
sense.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 18:10                                         ` Daniel Walker
  2009-05-28 18:27                                           ` Paul Mundt
@ 2009-05-28 18:44                                           ` Thomas Gleixner
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-28 18:44 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Paul Mundt, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 28 May 2009, Daniel Walker wrote:
> On Fri, 2009-05-29 at 02:53 +0900, Paul Mundt wrote:
> 
> > As I already stated, yes.
> > 
> > We have multiple clock sources for most CPUs. These can be set up in any
> > sort of configuration, and there are pros and cons to using different
> > ones. The ones that are available can in turn be cycled between. I don't
> > know what exactly is difficult to understand about this.
> 
> I understand that cpu's can have multiple clocks, that's not a hard concept to grasp.
> 
> > Yes, we want to be able to use modular clocksources. The only reason we
> > don't right now is because some more preparatory work is needed first.
> > Any attempt to remove support for modular clocksources means we will just
> > have to add it in back later.
> 
> This is what's difficult to understand.. You have multiple clocks ok,
> fine.. You have multiple clocks that you want the kernel to switch
> between, ok that's fine too.. What's missing is the case where
> clocksource modules being loaded/unload via the user becomes a valuable
> use case..

Did you actually read what people wrote ? This is not about modules or
not, it's about safe switching of clocksources when they are used for
sched_clock as well.
 
> If you have a valuable use case for that, fine, I won't stand in the
> way ..

Sigh,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:52                                 ` Daniel Walker
  2009-05-28 16:58                                   ` Paul Mundt
@ 2009-05-28 17:00                                   ` Thomas Gleixner
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-28 17:00 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Paul Mundt, Peter Zijlstra, Linus Walleij, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Thu, 28 May 2009, Daniel Walker wrote:
> > > > Do any module based clocksources even exist right now?
> > > > clocksource_unregister only seems to be used 3 times..
> > > 
> > > Good point, it appears its not even exported.
> > > 
> > > Thomas mentioned modules, I assumed.
> > > 
> > The drivers/clocksource/ drivers in theory could be modular anyways.
> > Handling this transition properly is at least one less barrier to modular
> > clocksources, so I think it's progress regardless. I don't remember what
> > all of the other issues were though, John probably remembers.
> 
> I don't think it's an important case to consider right now ..
> clocksources are usually so integral to the system putting one in a
> module seems counterintuitive. 

That does not matter at all. If we have an unregister call then we
have to be prepared for 

     - the memory which contained the clocksource is freed 
     - the underlying hardware is going away. 

It does not matter at all if that happens in a compiled in or in a
modular driver.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-28 16:40                               ` Paul Mundt
  2009-05-28 16:52                                 ` Daniel Walker
@ 2009-05-28 17:07                                 ` John Stultz
  1 sibling, 0 replies; 57+ messages in thread
From: John Stultz @ 2009-05-28 17:07 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Peter Zijlstra, Daniel Walker, Thomas Gleixner, Linus Walleij,
	Ingo Molnar, Andrew Victor, Haavard Skinnemoen, Andrew Morton,
	linux-kernel, linux-sh, linux-arm-kernel, John Stultz

On Fri, 2009-05-29 at 01:40 +0900, Paul Mundt wrote:
> On Thu, May 28, 2009 at 06:32:09PM +0200, Peter Zijlstra wrote:
> > On Thu, 2009-05-28 at 09:13 -0700, Daniel Walker wrote:
> > > On Thu, 2009-05-28 at 14:59 +0200, Peter Zijlstra wrote:
> > > > CPU0                           CPU1
> > > > 
> > > > clock = ACCESS_ONCE(sched_clocksource);
> > > > 
> > > >                                unload module
> > > >                                  clocksource_unregister()
> > > >                                    sched_clocksource = jiffies
> > > >                                  unmap data/text
> > > > 
> > > > cyc2ns(clock, clocksource_read(clock)) <--- fireworks
> > > > 
> > > > 
> > > 
> > > Do any module based clocksources even exist right now?
> > > clocksource_unregister only seems to be used 3 times..
> > 
> > Good point, it appears its not even exported.
> > 
> > Thomas mentioned modules, I assumed.
> > 
> The drivers/clocksource/ drivers in theory could be modular anyways.
> Handling this transition properly is at least one less barrier to modular
> clocksources, so I think it's progress regardless. I don't remember what
> all of the other issues were though, John probably remembers.

Yea. Its always been something I'd want to have as a capability, but we
haven't actually had a user of. I still think its something we should
allow, even if there aren't upstream users, as if you only have to
backport a module to an old distro kernel to get some oddball bit of
hardware working, that can be really useful.

So I think the locking fix is the way to go.

thanks
-john




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:38   ` Peter Zijlstra
  2009-05-26 20:17     ` Thomas Gleixner
@ 2009-05-26 20:23     ` john stultz
  2009-05-26 20:30       ` Peter Zijlstra
  2009-05-26 20:39       ` Thomas Gleixner
  1 sibling, 2 replies; 57+ messages in thread
From: john stultz @ 2009-05-26 20:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz, Thomas Gleixner

On Tue, 2009-05-26 at 16:38 +0200, Peter Zijlstra wrote:
> Added the generic clock and timer folks to CC.
> 
> On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > 2009/5/26 Paul Mundt <lethal@linux-sh.org>:
> > 
> > >  */
> > >  unsigned long long __attribute__((weak)) sched_clock(void)
> > >  {
> > > +       /*
> > > +        * Use the current clocksource when it becomes available later in
> > > +        * the boot process, and ensure that it has a high enough rating
> > > +        * to make it suitable for general use.
> > > +        */
> > > +       if (clock && clock->rating >= 100)
> > > +               return cyc2ns(clock, clocksource_read(clock));

I'm not super familiar with the recent sched_clock changes, but how will
this work if the clocksource wraps (ACPI PM wraps every 2-5 seconds).

Also there's no locking here, so the clocksource could change under you.

Further, checking for rating being greater then 100 really doesn't mean
anything. Probably need to check if the clocksource is continuous
instead.

Overall, I'd probably suggest thinking this through a bit more. At some
point doing this right will cause sched_clock() to be basically the same
as ktime_get(). So why not just use that instead of remaking it?

thanks
-john



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 20:23     ` john stultz
@ 2009-05-26 20:30       ` Peter Zijlstra
  2009-05-26 20:40         ` john stultz
  2009-05-26 20:39       ` Thomas Gleixner
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-26 20:30 UTC (permalink / raw)
  To: john stultz
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz, Thomas Gleixner

On Tue, 2009-05-26 at 13:23 -0700, john stultz wrote:
> Overall, I'd probably suggest thinking this through a bit more. At some
> point doing this right will cause sched_clock() to be basically the same
> as ktime_get(). So why not just use that instead of remaking it?

simply because we don't require the strict global monotonicy for
scheduling as we do from a regular time source (its nice to have
though).

That means that on x86 we can always use TSC for sched_clock(), even
when its quite unsuitable for ktime.




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 20:30       ` Peter Zijlstra
@ 2009-05-26 20:40         ` john stultz
  2009-05-26 20:55           ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: john stultz @ 2009-05-26 20:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz, Thomas Gleixner

On Tue, 2009-05-26 at 22:30 +0200, Peter Zijlstra wrote:
> On Tue, 2009-05-26 at 13:23 -0700, john stultz wrote:
> > Overall, I'd probably suggest thinking this through a bit more. At some
> > point doing this right will cause sched_clock() to be basically the same
> > as ktime_get(). So why not just use that instead of remaking it?
> 
> simply because we don't require the strict global monotonicy for
> scheduling as we do from a regular time source (its nice to have
> though).
> 
> That means that on x86 we can always use TSC for sched_clock(), even
> when its quite unsuitable for ktime.

Right, but I guess what I'm asking is can this be a bit better defined? 

If we are going to use clocksources (or cyclecounters - an area I need
to clean up soon), it would be good to get an idea of what is expected
of the sched_clock() interface.

So TSC good, HPET bad. Why? Is latency all we care about? How bad would
the TSC have to be before we wouldn't want to use it?

thanks
-john



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 20:40         ` john stultz
@ 2009-05-26 20:55           ` Peter Zijlstra
  2009-05-26 23:00             ` john stultz
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-26 20:55 UTC (permalink / raw)
  To: john stultz
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz, Thomas Gleixner

On Tue, 2009-05-26 at 13:40 -0700, john stultz wrote:
> On Tue, 2009-05-26 at 22:30 +0200, Peter Zijlstra wrote:
> > On Tue, 2009-05-26 at 13:23 -0700, john stultz wrote:
> > > Overall, I'd probably suggest thinking this through a bit more. At some
> > > point doing this right will cause sched_clock() to be basically the same
> > > as ktime_get(). So why not just use that instead of remaking it?
> > 
> > simply because we don't require the strict global monotonicy for
> > scheduling as we do from a regular time source (its nice to have
> > though).
> > 
> > That means that on x86 we can always use TSC for sched_clock(), even
> > when its quite unsuitable for ktime.
> 
> Right, but I guess what I'm asking is can this be a bit better defined? 
> 
> If we are going to use clocksources (or cyclecounters - an area I need
> to clean up soon), it would be good to get an idea of what is expected
> of the sched_clock() interface.
> 
> So TSC good, HPET bad. Why? 

Because TSC is a few cycles to read, and you can factorize a largish
prime while doing an HPET read :-)

> Is latency all we care about? How bad would
> the TSC have to be before we wouldn't want to use it?

Anything better than jiffies ;-)

For sched_clock() we want something high-res that is monotonic per cpu
and has a bounded drift between cpus in the order of jiffies.

Look at kernel/sched_clock.c for what we do to make really shitty TSC
conform to the above requirements.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 20:55           ` Peter Zijlstra
@ 2009-05-26 23:00             ` john stultz
  2009-05-26 23:24               ` Mangalampalli, JayantX
                                 ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: john stultz @ 2009-05-26 23:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz, Thomas Gleixner

On Tue, 2009-05-26 at 22:55 +0200, Peter Zijlstra wrote:
> On Tue, 2009-05-26 at 13:40 -0700, john stultz wrote:
> > On Tue, 2009-05-26 at 22:30 +0200, Peter Zijlstra wrote:
> > > On Tue, 2009-05-26 at 13:23 -0700, john stultz wrote:
> > > > Overall, I'd probably suggest thinking this through a bit more. At some
> > > > point doing this right will cause sched_clock() to be basically the same
> > > > as ktime_get(). So why not just use that instead of remaking it?
> > > 
> > > simply because we don't require the strict global monotonicy for
> > > scheduling as we do from a regular time source (its nice to have
> > > though).
> > > 
> > > That means that on x86 we can always use TSC for sched_clock(), even
> > > when its quite unsuitable for ktime.
> > 
> > Right, but I guess what I'm asking is can this be a bit better defined? 
> > 
> > If we are going to use clocksources (or cyclecounters - an area I need
> > to clean up soon), it would be good to get an idea of what is expected
> > of the sched_clock() interface.
> > 
> > So TSC good, HPET bad. Why? 
> 
> Because TSC is a few cycles to read, and you can factorize a largish
> prime while doing an HPET read :-)
> 
> > Is latency all we care about? How bad would
> > the TSC have to be before we wouldn't want to use it?
> 
> Anything better than jiffies ;-)

Except HPET thought, right? :)

> For sched_clock() we want something high-res that is monotonic per cpu
> and has a bounded drift between cpus in the order of jiffies.
> 
> Look at kernel/sched_clock.c for what we do to make really shitty TSC
> conform to the above requirements.

Sure, I guess what I'm trying to pull out here is that should we try to
create some OK_FOR_SCHED_CLOCK flag for clocksources, and then we try to
make this generic so other arches can add that flag and be done, what is
the guidance we want to give to arch maintainers for setting that flag?

1) Has to be very very fast. Can we put a number on this? 50ns to read?

2) How long does it have to be monotonic for? Is it ok if it wraps every
few seconds?

If get_cycles() || jiffies is what we want, then lets leave it there. I
just want to avoid mixing the clocksource code into the sched clock code
until we really get this sort of definition sorted.

thanks
-john



^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:00             ` john stultz
@ 2009-05-26 23:24               ` Mangalampalli, JayantX
  2009-05-27  0:04                 ` Thomas Gleixner
  2009-05-26 23:39               ` Thomas Gleixner
  2009-05-27  6:58               ` Peter Zijlstra
  2 siblings, 1 reply; 57+ messages in thread
From: Mangalampalli, JayantX @ 2009-05-26 23:24 UTC (permalink / raw)
  To: john stultz, Peter Zijlstra
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel@vger.kernel.org,
	linux-sh@vger.kernel.org, linux-arm-kernel@lists.arm.linux.org.uk,
	John Stultz, Thomas Gleixner

Isn't rdtsc inherently dependent on CPU tick rate, which is somewhat variable? Shouldn't HPET be more reliable? Or can constant_tsc be relied on for important things like scheduler?

Thanks
Jayant

-----Original Message-----
From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of john stultz
Sent: Tuesday, May 26, 2009 4:01 PM
To: Peter Zijlstra
Cc: Linus Walleij; Paul Mundt; Ingo Molnar; Andrew Victor; Haavard Skinnemoen; Andrew Morton; linux-kernel@vger.kernel.org; linux-sh@vger.kernel.org; linux-arm-kernel@lists.arm.linux.org.uk; John Stultz; Thomas Gleixner
Subject: Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().

On Tue, 2009-05-26 at 22:55 +0200, Peter Zijlstra wrote:
> On Tue, 2009-05-26 at 13:40 -0700, john stultz wrote:
> > On Tue, 2009-05-26 at 22:30 +0200, Peter Zijlstra wrote:
> > > On Tue, 2009-05-26 at 13:23 -0700, john stultz wrote:
> > > > Overall, I'd probably suggest thinking this through a bit more. At some
> > > > point doing this right will cause sched_clock() to be basically the same
> > > > as ktime_get(). So why not just use that instead of remaking it?
> > > 
> > > simply because we don't require the strict global monotonicy for
> > > scheduling as we do from a regular time source (its nice to have
> > > though).
> > > 
> > > That means that on x86 we can always use TSC for sched_clock(), even
> > > when its quite unsuitable for ktime.
> > 
> > Right, but I guess what I'm asking is can this be a bit better defined? 
> > 
> > If we are going to use clocksources (or cyclecounters - an area I need
> > to clean up soon), it would be good to get an idea of what is expected
> > of the sched_clock() interface.
> > 
> > So TSC good, HPET bad. Why? 
> 
> Because TSC is a few cycles to read, and you can factorize a largish
> prime while doing an HPET read :-)
> 
> > Is latency all we care about? How bad would
> > the TSC have to be before we wouldn't want to use it?
> 
> Anything better than jiffies ;-)

Except HPET thought, right? :)

> For sched_clock() we want something high-res that is monotonic per cpu
> and has a bounded drift between cpus in the order of jiffies.
> 
> Look at kernel/sched_clock.c for what we do to make really shitty TSC
> conform to the above requirements.

Sure, I guess what I'm trying to pull out here is that should we try to
create some OK_FOR_SCHED_CLOCK flag for clocksources, and then we try to
make this generic so other arches can add that flag and be done, what is
the guidance we want to give to arch maintainers for setting that flag?

1) Has to be very very fast. Can we put a number on this? 50ns to read?

2) How long does it have to be monotonic for? Is it ok if it wraps every
few seconds?

If get_cycles() || jiffies is what we want, then lets leave it there. I
just want to avoid mixing the clocksource code into the sched clock code
until we really get this sort of definition sorted.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:24               ` Mangalampalli, JayantX
@ 2009-05-27  0:04                 ` Thomas Gleixner
  0 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-27  0:04 UTC (permalink / raw)
  To: Mangalampalli, JayantX
  Cc: john stultz, Peter Zijlstra, Linus Walleij, Paul Mundt,
	Ingo Molnar, Andrew Victor, Haavard Skinnemoen, Andrew Morton,
	linux-kernel@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-arm-kernel@lists.arm.linux.org.uk, John Stultz

On Tue, 26 May 2009, Mangalampalli, JayantX wrote:

1) please do not top post.
2) please fix your mail client to do proper line breaks at ~78 chars

(see http://www.tux.org/lkml/)

> Isn't rdtsc inherently dependent on CPU tick rate, which is somewhat
> variable? Shouldn't HPET be more reliable? Or can constant_tsc be
> relied on for important things like scheduler?

1) Please understand that there is an universe which has useful timer
implementations. i.e. almost everything except arch/x86

2) rdtsc is not inherently dependent on CPU frequency. It just depends
on the CPU frequency for most of the CPU implementations which have
the ability to change the CPU core frequency. Newer CPUs have core
frequency invariant TSC implementations, but they are not perfect yet
as the TSC stops in deeper C-states.

3) HPET is reliable but the hardware access to HPET is way more
expensive than the access to TSC. Also it does not scale on SMP
machines as the HPET access is serialized and worse than a global
lock. So there is a damned good reason why we went there to add the
horror of sched_clock.c to utilize TSC for hot code pathes.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:00             ` john stultz
  2009-05-26 23:24               ` Mangalampalli, JayantX
@ 2009-05-26 23:39               ` Thomas Gleixner
  2009-05-27  6:58               ` Peter Zijlstra
  2 siblings, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-26 23:39 UTC (permalink / raw)
  To: john stultz
  Cc: Peter Zijlstra, Linus Walleij, Paul Mundt, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Tue, 26 May 2009, john stultz wrote:
w> > > Is latency all we care about? How bad would
> > > the TSC have to be before we wouldn't want to use it?
> > 
> > Anything better than jiffies ;-)
> 
> Except HPET thought, right? :)

Eeek. HPET access serializes across cores. That's orders of magnitudes
worse than the granularity loss of jiffies.

> > For sched_clock() we want something high-res that is monotonic per cpu
> > and has a bounded drift between cpus in the order of jiffies.
> > 
> > Look at kernel/sched_clock.c for what we do to make really shitty TSC
> > conform to the above requirements.
> 
> Sure, I guess what I'm trying to pull out here is that should we try to
> create some OK_FOR_SCHED_CLOCK flag for clocksources, and then we try to
> make this generic so other arches can add that flag and be done, what is
> the guidance we want to give to arch maintainers for setting that flag?
> 
> 1) Has to be very very fast. Can we put a number on this? 50ns to read?
> 
> 2) How long does it have to be monotonic for? Is it ok if it wraps every
> few seconds?
> 
> If get_cycles() || jiffies is what we want, then lets leave it there. I
> just want to avoid mixing the clocksource code into the sched clock code
> until we really get this sort of definition sorted.

The criterion is simple. If arch maintainer decides that the access to
the particular clock source is the best compromise between granularity
and access speed vs. jiffies then he can express that by setting the
flag.

Where is the difference between that flag and an arch specific
sched_clock() implementation which overrides the weak generic one ?

That arch specific function will probably do the same thing:

     return cyc2ns(whathever_clocksource_the_maintainer_thinks_is_the_best);

So let's get rid of those sched_clock() overrrides which do nothing
else than the generic version.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 23:00             ` john stultz
  2009-05-26 23:24               ` Mangalampalli, JayantX
  2009-05-26 23:39               ` Thomas Gleixner
@ 2009-05-27  6:58               ` Peter Zijlstra
  2 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-27  6:58 UTC (permalink / raw)
  To: john stultz
  Cc: Linus Walleij, Paul Mundt, Ingo Molnar, Andrew Victor,
	Haavard Skinnemoen, Andrew Morton, linux-kernel, linux-sh,
	linux-arm-kernel, John Stultz, Thomas Gleixner

On Tue, 2009-05-26 at 16:00 -0700, john stultz wrote:

> Sure, I guess what I'm trying to pull out here is that should we try to
> create some OK_FOR_SCHED_CLOCK flag for clocksources, and then we try to
> make this generic so other arches can add that flag and be done, what is
> the guidance we want to give to arch maintainers for setting that flag?
> 
> 1) Has to be very very fast. Can we put a number on this? 50ns to read?

I'd express it in cpu cycles, but not sure, the very fastest the arch
has :-) Just tell people this is in their scheduling hot-path and they
might understand.

> 2) How long does it have to be monotonic for? 

Forever? (per cpu)

> Is it ok if it wraps every few seconds?

No, if it wraps it needs to wrap on u64.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 20:23     ` john stultz
  2009-05-26 20:30       ` Peter Zijlstra
@ 2009-05-26 20:39       ` Thomas Gleixner
  1 sibling, 0 replies; 57+ messages in thread
From: Thomas Gleixner @ 2009-05-26 20:39 UTC (permalink / raw)
  To: john stultz
  Cc: Peter Zijlstra, Linus Walleij, Paul Mundt, Ingo Molnar,
	Andrew Victor, Haavard Skinnemoen, Andrew Morton, linux-kernel,
	linux-sh, linux-arm-kernel, John Stultz

On Tue, 26 May 2009, john stultz wrote:

> On Tue, 2009-05-26 at 16:38 +0200, Peter Zijlstra wrote:
> > Added the generic clock and timer folks to CC.
> > 
> > On Tue, 2009-05-26 at 16:31 +0200, Linus Walleij wrote:
> > > 2009/5/26 Paul Mundt <lethal@linux-sh.org>:
> > > 
> > > >  */
> > > >  unsigned long long __attribute__((weak)) sched_clock(void)
> > > >  {
> > > > +       /*
> > > > +        * Use the current clocksource when it becomes available later in
> > > > +        * the boot process, and ensure that it has a high enough rating
> > > > +        * to make it suitable for general use.
> > > > +        */
> > > > +       if (clock && clock->rating >= 100)
> > > > +               return cyc2ns(clock, clocksource_read(clock));
> 
> I'm not super familiar with the recent sched_clock changes, but how will
> this work if the clocksource wraps (ACPI PM wraps every 2-5 seconds).

You don't want to use ACPI PM for sched_clock, never ever.
 
> Also there's no locking here, so the clocksource could change under you.
> 
> Further, checking for rating being greater then 100 really doesn't mean
> anything. Probably need to check if the clocksource is continuous
> instead.

I'd like to have an explicit flag for this, so we can avoid that stuff
like pmtimer and other slow access clock sources are used.
 
> Overall, I'd probably suggest thinking this through a bit more. At some
> point doing this right will cause sched_clock() to be basically the same
> as ktime_get(). So why not just use that instead of remaking it?

ktime_get() involves xtime lock and the scheduler does not care about
a slightly wrong value. sched_clock does not have the accuracy
requirements of time keeping,

If we have an explicit flag we can replace lots of arch/embedded
sched_clock implementations with a generic one which is a Good Thing.

There is a world beside the broken x86 timers :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:31 ` Linus Walleij
  2009-05-26 14:38   ` Peter Zijlstra
@ 2009-05-26 14:43   ` Paul Mundt
  2009-05-26 14:50     ` Peter Zijlstra
  2009-05-26 15:02   ` Matthieu CASTET
  2 siblings, 1 reply; 57+ messages in thread
From: Paul Mundt @ 2009-05-26 14:43 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Ingo Molnar, Andrew Victor, Haavard Skinnemoen, Andrew Morton,
	linux-kernel, linux-sh, peterz, linux-arm-kernel

On Tue, May 26, 2009 at 04:31:58PM +0200, Linus Walleij wrote:
> 2009/5/26 Paul Mundt <lethal@linux-sh.org>:
> 
> >  */
> >  unsigned long long __attribute__((weak)) sched_clock(void)
> >  {
> > +       /*
> > +        * Use the current clocksource when it becomes available later in
> > +        * the boot process, and ensure that it has a high enough rating
> > +        * to make it suitable for general use.
> > +        */
> > +       if (clock && clock->rating >= 100)
> > +               return cyc2ns(clock, clocksource_read(clock));
> > +
> > +       /* Otherwise just fall back on jiffies */
> >        return (unsigned long long)(jiffies - INITIAL_JIFFIES)
> >                                        * (NSEC_PER_SEC / HZ);
> >  }
> 
> This seems like it would make the patch I sent the other day
> unnecessary (subject u300 sched_clock() implementation).
> 
> It would also trim off this solution found in all OMAP platforms in
> arch/arm/plat-omap/common.c
> 
> BUT Peter Zijlstra replied to my question about why this wasn't
> generic with:
> 
Hum, you trimmed out my changelog which explains all the rationale for
precisely why this needs to be generic. Hopefully people that care will
go back and read that.

> [peterz]:
> > But that is the reason this isn't generic, non of the 'stable'
> > clocksources on x86 are fast enough to use as sched_clock.
> 
> Does that mean clock->rating for these clocksources is
> for certain < 100?
> 
The '100' thing is a bit arbitrary, this is what defines base level
usability. If we want to set a mandate that sched_clock() sources need to
start at 300 or 400 or whatever, that is fine with me, too. In the case
of x86 there are several < 100 ratings, but I don't know if those cover
all of the cases Peter is concerned about.

Regardless, the above cyc2ns() logic does make most of the
architecture-specific sched_clock() implementations redundant, so they
can of course be killed off incrementally. This might not be the case for
x86, but in those cases I expect a different sched_clock() to be
implemented anyways.

> Else you might want an additional criteria, like
> cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> (however you do that the best way)
> so you don't pick something
> that isn't substantially faster than the jiffy counter atleast?
> 
This rather defeats the purpose of sched_clock() being fast. If we want
to add a flag that means this in to the clocksource instead of consulting
the rating, that is fine with me too. I know which clocksources I prefer
to use for a sched_clock() and they are all better than jiffies. The
semantics of how we tell sched_clock() that are not so important. Rating
seemed like a good choice from the documentation in struct clocksource at
least.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:43   ` Paul Mundt
@ 2009-05-26 14:50     ` Peter Zijlstra
  2009-05-26 14:53       ` Paul Mundt
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2009-05-26 14:50 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Linus Walleij, Ingo Molnar, Andrew Victor, Haavard Skinnemoen,
	Andrew Morton, linux-kernel, linux-sh, linux-arm-kernel

On Tue, 2009-05-26 at 23:43 +0900, Paul Mundt wrote:
> > Else you might want an additional criteria, like
> > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > (however you do that the best way)
> > so you don't pick something
> > that isn't substantially faster than the jiffy counter atleast?
> > 
> This rather defeats the purpose of sched_clock() being fast. If we want
> to add a flag that means this in to the clocksource instead of consulting
> the rating, that is fine with me too. I know which clocksources I prefer
> to use for a sched_clock() and they are all better than jiffies. The
> semantics of how we tell sched_clock() that are not so important. Rating
> seemed like a good choice from the documentation in struct clocksource at
> least.

Am I confused or are we talking about fast HZ vs fast cycles?

sched_clock() should be fast cycles, that is, we don't want to read a
clock that takes about 1000 cycles.

sched_clock() is about providing a high resolution clock that is fast
(low cycle count) to acquire, and need not be strictly monotonic on smp.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:50     ` Peter Zijlstra
@ 2009-05-26 14:53       ` Paul Mundt
  0 siblings, 0 replies; 57+ messages in thread
From: Paul Mundt @ 2009-05-26 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Walleij, Ingo Molnar, Andrew Victor, Haavard Skinnemoen,
	Andrew Morton, linux-kernel, linux-sh, linux-arm-kernel

On Tue, May 26, 2009 at 04:50:09PM +0200, Peter Zijlstra wrote:
> On Tue, 2009-05-26 at 23:43 +0900, Paul Mundt wrote:
> > > Else you might want an additional criteria, like
> > > cyc2ns(1) (much less than) jiffies_to_usecs(1)*1000
> > > (however you do that the best way)
> > > so you don't pick something
> > > that isn't substantially faster than the jiffy counter atleast?
> > > 
> > This rather defeats the purpose of sched_clock() being fast. If we want
> > to add a flag that means this in to the clocksource instead of consulting
> > the rating, that is fine with me too. I know which clocksources I prefer
> > to use for a sched_clock() and they are all better than jiffies. The
> > semantics of how we tell sched_clock() that are not so important. Rating
> > seemed like a good choice from the documentation in struct clocksource at
> > least.
> 
> Am I confused or are we talking about fast HZ vs fast cycles?
> 
> sched_clock() should be fast cycles, that is, we don't want to read a
> clock that takes about 1000 cycles.
> 
> sched_clock() is about providing a high resolution clock that is fast
> (low cycle count) to acquire, and need not be strictly monotonic on smp.

I don't think there's any confusion here. My point is that I didn't want
to add too much logic in to sched_clock() given that it is supposed to be
fast. So if the rating test by itself is not sufficient, then we need
another way to flag a clocksource as being usable for sched_clock().

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH] sched: Support current clocksource handling in fallback sched_clock().
  2009-05-26 14:31 ` Linus Walleij
  2009-05-26 14:38   ` Peter Zijlstra
  2009-05-26 14:43   ` Paul Mundt
@ 2009-05-26 15:02   ` Matthieu CASTET
  2 siblings, 0 replies; 57+ messages in thread
From: Matthieu CASTET @ 2009-05-26 15:02 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Paul Mundt, Ingo Molnar, Andrew Victor, Haavard Skinnemoen,
	Andrew Morton, linux-kernel@vger.kernel.org,
	linux-sh@vger.kernel.org, peterz@infradead.org,
	linux-arm-kernel@lists.arm.linux.org.uk

Linus Walleij a écrit :
> 2009/5/26 Paul Mundt <lethal@linux-sh.org>:
> 
>>  */
>>  unsigned long long __attribute__((weak)) sched_clock(void)
>>  {
>> +       /*
>> +        * Use the current clocksource when it becomes available later in
>> +        * the boot process, and ensure that it has a high enough rating
>> +        * to make it suitable for general use.
>> +        */
>> +       if (clock && clock->rating >= 100)
>> +               return cyc2ns(clock, clocksource_read(clock));
>> +
>> +       /* Otherwise just fall back on jiffies */
>>        return (unsigned long long)(jiffies - INITIAL_JIFFIES)
>>                                        * (NSEC_PER_SEC / HZ);
>>  }
This is buggy for fast clocksource that wrap often, because sched_clock
is not supposed to wrap.
That why custom implementation use cnt32_to_63 and a smaller shift
(around 8 -10) for conversion for timer unit to ns.

Look for example to arch/arm/mach-pxa/time.c implementation [1]

But we can image you make it generic.

Matthieu

[1]
/*
 * This is PXA's sched_clock implementation. This has a resolution
 * of at least 308 ns and a maximum value of 208 days.
 *
 * The return value is guaranteed to be monotonic in that range as
 * long as there is always less than 582 seconds between successive
 * calls to sched_clock() which should always be the case in practice.
 */
#define OSCR2NS_SCALE_FACTOR 10

static unsigned long oscr2ns_scale;

static void __init set_oscr2ns_scale(unsigned long oscr_rate)
{
    unsigned long long v = 1000000000ULL << OSCR2NS_SCALE_FACTOR;
    do_div(v, oscr_rate);
    oscr2ns_scale = v;
    /*
     * We want an even value to automatically clear the top bit
     * returned by cnt32_to_63() without an additional run time
     * instruction. So if the LSB is 1 then round it up.
     */
    if (oscr2ns_scale & 1)
        oscr2ns_scale++;
}

unsigned long long sched_clock(void)
{
    unsigned long long v = cnt32_to_63(OSCR);
    return (v * oscr2ns_scale) >> OSCR2NS_SCALE_FACTOR;
}

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2009-05-28 23:37 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-26  6:15 [PATCH] sched: Support current clocksource handling in fallback sched_clock() Paul Mundt
2009-05-26 14:31 ` Linus Walleij
2009-05-26 14:38   ` Peter Zijlstra
2009-05-26 20:17     ` Thomas Gleixner
2009-05-26 23:08       ` Paul Mundt
2009-05-26 23:13         ` Paul Mundt
2009-05-26 23:25         ` john stultz
2009-05-26 23:44           ` Paul Mundt
2009-05-27  0:18             ` Thomas Gleixner
2009-05-27  0:22             ` john stultz
2009-05-27  0:26               ` Paul Mundt
2009-05-27  1:09                 ` john stultz
2009-05-27  0:27               ` Thomas Gleixner
2009-05-26 23:49         ` Thomas Gleixner
2009-05-27  0:15           ` Paul Mundt
2009-05-27 16:25             ` Daniel Walker
2009-05-28  8:44               ` Paul Mundt
2009-05-28  9:19               ` Paul Mundt
2009-05-28  9:34                 ` Peter Zijlstra
2009-05-28 11:09                   ` Paul Mundt
2009-05-28 12:22                     ` Thomas Gleixner
2009-05-28 12:40                       ` Peter Zijlstra
2009-05-28 12:42                       ` Paul Mundt
2009-05-28 12:53                         ` Thomas Gleixner
2009-05-28 12:59                         ` Peter Zijlstra
2009-05-28 13:20                           ` Paul Mundt
2009-05-28 16:13                           ` Daniel Walker
2009-05-28 16:32                             ` Peter Zijlstra
2009-05-28 16:40                               ` Paul Mundt
2009-05-28 16:52                                 ` Daniel Walker
2009-05-28 16:58                                   ` Paul Mundt
2009-05-28 17:38                                     ` Daniel Walker
2009-05-28 17:46                                       ` Thomas Gleixner
2009-05-28 17:53                                       ` Paul Mundt
2009-05-28 18:10                                         ` Daniel Walker
2009-05-28 18:27                                           ` Paul Mundt
2009-05-28 19:04                                             ` Daniel Walker
2009-05-28 19:34                                               ` Paul Mundt
2009-05-28 19:41                                                 ` Daniel Walker
2009-05-28 23:37                                                   ` Paul Mundt
2009-05-28 18:44                                           ` Thomas Gleixner
2009-05-28 17:00                                   ` Thomas Gleixner
2009-05-28 17:07                                 ` John Stultz
2009-05-26 20:23     ` john stultz
2009-05-26 20:30       ` Peter Zijlstra
2009-05-26 20:40         ` john stultz
2009-05-26 20:55           ` Peter Zijlstra
2009-05-26 23:00             ` john stultz
2009-05-26 23:24               ` Mangalampalli, JayantX
2009-05-27  0:04                 ` Thomas Gleixner
2009-05-26 23:39               ` Thomas Gleixner
2009-05-27  6:58               ` Peter Zijlstra
2009-05-26 20:39       ` Thomas Gleixner
2009-05-26 14:43   ` Paul Mundt
2009-05-26 14:50     ` Peter Zijlstra
2009-05-26 14:53       ` Paul Mundt
2009-05-26 15:02   ` Matthieu CASTET

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox