* [PATCH] Add support for deferrable timers (respun)
[not found] ` <20070322161355.GA160@tv-sign.ru>
@ 2007-03-27 20:43 ` Venki Pallipadi
2007-03-27 21:11 ` Oleg Nesterov
2007-03-28 11:06 ` [PATCH] Add support for deferrable timers (respun) Andi Kleen
0 siblings, 2 replies; 14+ messages in thread
From: Venki Pallipadi @ 2007-03-27 20:43 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: linux-kernel, akpm, davej, johnstul, mingo, tglx
On Thu, Mar 22, 2007 at 07:13:55PM +0300, Oleg Nesterov wrote:
>
> Perhaps we can do something like this,
>
> struct deferrable_timer {
> struct timer_list timer;
> void (*real_function)(struct deferrable_timer *self);
> };
>
> static void deferrable_timer_fn(unsigned long data)
> {
> struct deferrable_timer dtimer = (void*)data;
> dtimer->real_function(dtimer);
> }
>
> void setup_deferrable_timer(struct deferrable_timer *dtimer,
> void (*function)(struct deferrable_timer *self))
> {
> setup_timer(&dtimer->timer, deferrable_timer_fn, dtimer);
> dtimer->real_function = function;
> }
>
> Now next_timer_interrupt() can check "nte->function == deferrable_timer_fn"
> to detect a deferrable_timer. I can't say I like this idea very much, though.
>
> However, it would be nice to avoid timer->base manipulation and minimize the
> changes in timer.c, if possible.
>
I looked at the above option. It seems more complicated than the patch below,
mainly because now we have to have a new set of interfaces for this
deferrable_timer exported, which will include init_deferrable_timer,
setup_timer, mod_timer etc (in order to do it cleanly) and workqueue needs to
change as well to include this deferrable_timer.
So, that got me back to my original patch. I tried to reduce the amount of
changes and overhead and hopefully I have gooten it right this time.
Patch below.
Thanks,
Venki
---
Introduce a new flag for timers - deferrable:
Timers that work normally when system is busy. But, will not cause CPU to
come out of idle (just to service this timer), when CPU is idle. Instead,
this timer will be serviced when CPU eventually wakes up with a subsequent
non-deferrable timer.
The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until next
event without any issues, this new timer can be used to setup timer event
for that routine. This, with dynticks, allows CPUs to be lazy, allowing them
to stay in idle for extended period of time by reducing unnecesary wakeup and
thereby reducing the power consumption.
This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this
deferrable timer flag. __next_timer_interrupt() function
skips over these deferrable timers when CPU looks for
next timer event for which it has to wake up.
This is exported by a new interface init_timer_deferrable() that can
be called in place of regular init_timer().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Index: new/kernel/timer.c
===================================================================
--- new.orig/kernel/timer.c 2007-03-22 16:27:44.000000000 -0800
+++ new/kernel/timer.c 2007-03-26 15:19:35.000000000 -0800
@@ -74,7 +74,7 @@
tvec_t tv3;
tvec_t tv4;
tvec_t tv5;
-} ____cacheline_aligned_in_smp;
+} ____cacheline_aligned;
typedef struct tvec_t_base_s tvec_base_t;
@@ -82,6 +82,41 @@
EXPORT_SYMBOL(boot_tvec_bases);
static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;
+/*
+ * The lowest bit of base ptr in timer is used as a flag to indicate
+ * 'deferrable' nature of the timer. Functions below help us manage that flag.
+ */
+static inline unsigned int tbase_get_deferrable(struct tvec_t_base_s *base)
+{
+ return ((unsigned int)base & TBASE_DEFERRABLE_FLAG);
+}
+
+static inline unsigned int timer_get_deferrable(struct timer_list *timer)
+{
+ return tbase_get_deferrable(timer->base);
+}
+
+static inline struct tvec_t_base_s *timer_get_base(struct timer_list *timer)
+{
+ return ((struct tvec_t_base_s *)((unsigned long)(timer->base) &
+ ~TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void timer_set_deferrable(struct timer_list *timer)
+{
+ timer->base = ((struct tvec_t_base_s *)((unsigned long)(timer->base) |
+ TBASE_DEFERRABLE_FLAG));
+}
+
+/* new_base is guaranteed to have last bit not set, in all callers below */
+static inline void timer_set_base(struct timer_list *timer,
+ struct tvec_t_base_s *old_base,
+ struct tvec_t_base_s *new_base)
+{
+ timer->base = (struct tvec_t_base_s *)((unsigned long)(new_base) |
+ tbase_get_deferrable(old_base));
+}
+
/**
* __round_jiffies - function to round jiffies to a full second
* @j: the time in (absolute) jiffies that should be rounded
@@ -295,6 +330,13 @@
}
EXPORT_SYMBOL(init_timer);
+void fastcall init_timer_deferrable(struct timer_list *timer)
+{
+ init_timer(timer);
+ timer_set_deferrable(timer);
+}
+EXPORT_SYMBOL(init_timer_deferrable);
+
static inline void detach_timer(struct timer_list *timer,
int clear_pending)
{
@@ -325,10 +367,11 @@
tvec_base_t *base;
for (;;) {
- base = timer->base;
+ tvec_base_t *prelock_base = timer->base;
+ base = timer_get_base(timer);
if (likely(base != NULL)) {
spin_lock_irqsave(&base->lock, *flags);
- if (likely(base == timer->base))
+ if (likely(prelock_base == timer->base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(&base->lock, *flags);
@@ -364,12 +407,13 @@
* the timer is serialized wrt itself.
*/
if (likely(base->running_timer != timer)) {
+ tvec_base_t *old_base = timer->base;
/* See the comment in lock_timer_base() */
timer->base = NULL;
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
- timer->base = base;
+ timer_set_base(timer, old_base, base);
}
}
@@ -397,7 +441,7 @@
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(&base->lock, flags);
- timer->base = base;
+ timer_set_base(timer, timer->base, base);
internal_add_timer(base, timer);
spin_unlock_irqrestore(&base->lock, flags);
}
@@ -548,7 +592,7 @@
* don't have to detach them individually.
*/
list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
- BUG_ON(timer->base != base);
+ BUG_ON(timer_get_base(timer) != base);
internal_add_timer(base, timer);
}
@@ -634,6 +678,9 @@
index = slot = timer_jiffies & TVR_MASK;
do {
list_for_each_entry(nte, base->tv1.vec + slot, entry) {
+ if (timer_get_deferrable(nte))
+ continue;
+
found = 1;
expires = nte->expires;
/* Look at the cascade bucket(s)? */
@@ -1602,6 +1649,13 @@
cpu_to_node(cpu));
if (!base)
return -ENOMEM;
+
+ /* Make sure that tvec_base is 2 byte aligned */
+ if (tbase_get_deferrable(base)) {
+ WARN_ON(1);
+ kfree(base);
+ return -ENOMEM;
+ }
memset(base, 0, sizeof(*base));
per_cpu(tvec_bases, cpu) = base;
} else {
@@ -1643,7 +1697,7 @@
while (!list_empty(head)) {
timer = list_entry(head->next, struct timer_list, entry);
detach_timer(timer, 0);
- timer->base = new_base;
+ timer_set_base(timer, timer->base, new_base);
internal_add_timer(new_base, timer);
}
}
Index: new/include/linux/timer.h
===================================================================
--- new.orig/include/linux/timer.h 2007-03-22 16:27:44.000000000 -0800
+++ new/include/linux/timer.h 2007-03-23 10:08:17.000000000 -0800
@@ -8,6 +8,14 @@
struct tvec_t_base_s;
+extern struct tvec_t_base_s boot_tvec_bases;
+/*
+ * Note that all tvec_bases is 2 byte aligned and lower bit of
+ * base in timer_list is guaranteed to be zero. Use the LSB for
+ * the new flag to indicate whether the timer is deferrable
+ */
+#define TBASE_DEFERRABLE_FLAG (0x1)
+
struct timer_list {
struct list_head entry;
unsigned long expires;
@@ -37,6 +45,7 @@
TIMER_INITIALIZER(_function, _expires, _data)
void fastcall init_timer(struct timer_list * timer);
+void fastcall init_timer_deferrable(struct timer_list *timer);
static inline void setup_timer(struct timer_list * timer,
void (*function)(unsigned long),
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-27 20:43 ` [PATCH] Add support for deferrable timers (respun) Venki Pallipadi
@ 2007-03-27 21:11 ` Oleg Nesterov
2007-03-27 21:55 ` Venki Pallipadi
2007-03-28 11:06 ` [PATCH] Add support for deferrable timers (respun) Andi Kleen
1 sibling, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2007-03-27 21:11 UTC (permalink / raw)
To: Venki Pallipadi; +Cc: linux-kernel, akpm, davej, johnstul, mingo, tglx
On 03/27, Venki Pallipadi wrote:
>
> for (;;) {
> - base = timer->base;
> + tvec_base_t *prelock_base = timer->base;
> + base = timer_get_base(timer);
> if (likely(base != NULL)) {
> spin_lock_irqsave(&base->lock, *flags);
> - if (likely(base == timer->base))
> + if (likely(prelock_base == timer->base))
> return base;
I don't think this is correct, at least in theory.
Suppose that
tvec_base_t *prelock_base = timer->base;
base = timer_get_base(timer);
are re-ordered (the second LOAD happens after the first one), and the timer
changes its base in between. Now, we lock the old base, and return it because
"prelock_base == timer->base" == true.
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-27 21:11 ` Oleg Nesterov
@ 2007-03-27 21:55 ` Venki Pallipadi
2007-03-27 22:22 ` Oleg Nesterov
0 siblings, 1 reply; 14+ messages in thread
From: Venki Pallipadi @ 2007-03-27 21:55 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Venki Pallipadi, linux-kernel, akpm, davej, johnstul, mingo, tglx
On Wed, Mar 28, 2007 at 01:11:45AM +0400, Oleg Nesterov wrote:
> On 03/27, Venki Pallipadi wrote:
> >
> > for (;;) {
> > - base = timer->base;
> > + tvec_base_t *prelock_base = timer->base;
> > + base = timer_get_base(timer);
> > if (likely(base != NULL)) {
> > spin_lock_irqsave(&base->lock, *flags);
> > - if (likely(base == timer->base))
> > + if (likely(prelock_base == timer->base))
> > return base;
>
> I don't think this is correct, at least in theory.
>
> Suppose that
>
> tvec_base_t *prelock_base = timer->base;
> base = timer_get_base(timer);
>
> are re-ordered (the second LOAD happens after the first one), and the timer
> changes its base in between. Now, we lock the old base, and return it because
> "prelock_base == timer->base" == true.
>
Great catch. Yes. this is a theoritical possibility, even though most compilers
would load base only once and use it for prelock_base and 'and' it for
base. Atleast that is what I see on i386/gcc.
Incremental patch below eliminates this race.
Index: new/kernel/timer.c
===================================================================
--- new.orig/kernel/timer.c 2007-03-26 15:19:35.000000000 -0800
+++ new/kernel/timer.c 2007-03-27 13:00:33.000000000 -0800
@@ -96,9 +96,9 @@
return tbase_get_deferrable(timer->base);
}
-static inline struct tvec_t_base_s *timer_get_base(struct timer_list *timer)
+static inline struct tvec_t_base_s *tbase_get_base(struct tvec_t_base_s *base)
{
- return ((struct tvec_t_base_s *)((unsigned long)(timer->base) &
+ return ((struct tvec_t_base_s *)((unsigned long)base &
~TBASE_DEFERRABLE_FLAG));
}
@@ -368,7 +368,7 @@
for (;;) {
tvec_base_t *prelock_base = timer->base;
- base = timer_get_base(timer);
+ base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(&base->lock, *flags);
if (likely(prelock_base == timer->base))
@@ -592,7 +592,7 @@
* don't have to detach them individually.
*/
list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
- BUG_ON(timer_get_base(timer) != base);
+ BUG_ON(tbase_get_base(timer->base) != base);
internal_add_timer(base, timer);
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-27 21:55 ` Venki Pallipadi
@ 2007-03-27 22:22 ` Oleg Nesterov
2007-03-27 22:28 ` Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Oleg Nesterov @ 2007-03-27 22:22 UTC (permalink / raw)
To: Venki Pallipadi; +Cc: linux-kernel, akpm, davej, johnstul, mingo, tglx
On 03/27, Venki Pallipadi wrote:
>
> Incremental patch below eliminates this race.
>
> Index: new/kernel/timer.c
> ===================================================================
> --- new.orig/kernel/timer.c 2007-03-26 15:19:35.000000000 -0800
> +++ new/kernel/timer.c 2007-03-27 13:00:33.000000000 -0800
> @@ -96,9 +96,9 @@
> return tbase_get_deferrable(timer->base);
> }
>
> -static inline struct tvec_t_base_s *timer_get_base(struct timer_list *timer)
> +static inline struct tvec_t_base_s *tbase_get_base(struct tvec_t_base_s *base)
> {
> - return ((struct tvec_t_base_s *)((unsigned long)(timer->base) &
> + return ((struct tvec_t_base_s *)((unsigned long)base &
> ~TBASE_DEFERRABLE_FLAG));
> }
>
> @@ -368,7 +368,7 @@
>
> for (;;) {
> tvec_base_t *prelock_base = timer->base;
> - base = timer_get_base(timer);
> + base = tbase_get_base(prelock_base);
> if (likely(base != NULL)) {
> spin_lock_irqsave(&base->lock, *flags);
> if (likely(prelock_base == timer->base))
Looks correct to me... Personally, I'd prefer
static tvec_base_t *lock_timer_base(struct timer_list *timer,
unsigned long *flags)
__acquires(timer->base->lock)
{
tvec_base_t *base;
for (;;) {
base = timer_get_base(timer);
if (likely(base != NULL)) {
spin_lock_irqsave(&base->lock, *flags);
if (likely(base == timer_get_base(timer))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(&base->lock, *flags);
}
cpu_relax();
}
}
but this is a matter of taste.
A minor nitpick,
> +/* new_base is guaranteed to have last bit not set, in all callers below */
> +static inline void timer_set_base(struct timer_list *timer,
> + struct tvec_t_base_s *old_base,
> + struct tvec_t_base_s *new_base)
> +{
> + timer->base = (struct tvec_t_base_s *)((unsigned long)(new_base) |
> + tbase_get_deferrable(old_base));
> +}
looks a little bit ugly, but may be this is just me. How about
void timer_set_base(struct timer_list *timer, struct tvec_t_base_s *new_base)
{
timer->base = (struct tvec_t_base_s *)
((unsigned long)(new_base) | tbase_get_deferrable(timer->base));
}
__mod_timer:
- tvec_base_t *old_base = timer->base;
- timer->base = NULL;
+ timer_set_base(timer, NULL);
?
> + /* Make sure that tvec_base is 2 byte aligned */
> + if (tbase_get_deferrable(base)) {
> + WARN_ON(1);
> + kfree(base);
> + return -ENOMEM;
> + }
Not a comment, but a question: do we really need this?
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-27 22:22 ` Oleg Nesterov
@ 2007-03-27 22:28 ` Oleg Nesterov
2007-03-27 22:36 ` Venki Pallipadi
2007-03-28 23:00 ` [PATCH] Add support for deferrable timers (respun-Mar28) Venki Pallipadi
2 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2007-03-27 22:28 UTC (permalink / raw)
To: Venki Pallipadi; +Cc: linux-kernel, akpm, davej, johnstul, mingo, tglx
On 03/28, Oleg Nesterov wrote:
>
> looks a little bit ugly, but may be this is just me. How about
>
> void timer_set_base(struct timer_list *timer, struct tvec_t_base_s *new_base)
> {
> timer->base = (struct tvec_t_base_s *)
> ((unsigned long)(new_base) | tbase_get_deferrable(timer->base));
> }
>
> __mod_timer:
> - tvec_base_t *old_base = timer->base;
> - timer->base = NULL;
> + timer_set_base(timer, NULL);
>
> ?
Damn, forgot to say. With this change timer_get_deferrable(timer) always works,
even if changing the timer's base in progress.
Oleg.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-27 22:22 ` Oleg Nesterov
2007-03-27 22:28 ` Oleg Nesterov
@ 2007-03-27 22:36 ` Venki Pallipadi
2007-03-28 23:00 ` [PATCH] Add support for deferrable timers (respun-Mar28) Venki Pallipadi
2 siblings, 0 replies; 14+ messages in thread
From: Venki Pallipadi @ 2007-03-27 22:36 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Venki Pallipadi, linux-kernel, akpm, davej, johnstul, mingo, tglx
On Wed, Mar 28, 2007 at 02:22:27AM +0400, Oleg Nesterov wrote:
> On 03/27, Venki Pallipadi wrote:
> >
> > @@ -368,7 +368,7 @@
> >
> > for (;;) {
> > tvec_base_t *prelock_base = timer->base;
> > - base = timer_get_base(timer);
> > + base = tbase_get_base(prelock_base);
> > if (likely(base != NULL)) {
> > spin_lock_irqsave(&base->lock, *flags);
> > if (likely(prelock_base == timer->base))
>
> Looks correct to me... Personally, I'd prefer
>
> static tvec_base_t *lock_timer_base(struct timer_list *timer,
> unsigned long *flags)
> __acquires(timer->base->lock)
> {
> tvec_base_t *base;
>
> for (;;) {
> base = timer_get_base(timer);
> if (likely(base != NULL)) {
> spin_lock_irqsave(&base->lock, *flags);
> if (likely(base == timer_get_base(timer))
> return base;
> /* The timer has migrated to another CPU */
> spin_unlock_irqrestore(&base->lock, *flags);
> }
> cpu_relax();
> }
> }
>
> but this is a matter of taste.
I thought about this. But, chose the other one just to save one additional
'and' overhead.
>
> A minor nitpick,
>
> > +/* new_base is guaranteed to have last bit not set, in all callers below */
> > +static inline void timer_set_base(struct timer_list *timer,
> > + struct tvec_t_base_s *old_base,
> > + struct tvec_t_base_s *new_base)
> > +{
> > + timer->base = (struct tvec_t_base_s *)((unsigned long)(new_base) |
> > + tbase_get_deferrable(old_base));
> > +}
>
> looks a little bit ugly, but may be this is just me. How about
>
> void timer_set_base(struct timer_list *timer, struct tvec_t_base_s *new_base)
> {
> timer->base = (struct tvec_t_base_s *)
> ((unsigned long)(new_base) | tbase_get_deferrable(timer->base));
> }
>
> __mod_timer:
> - tvec_base_t *old_base = timer->base;
> - timer->base = NULL;
> + timer_set_base(timer, NULL);
>
> ?
I agree the above suggestion is clean. But, it will have one additional 'and'
operation when we set NULL. I saw some concern from Andrew earlier on overhead
this patch was adding.
>
> > + /* Make sure that tvec_base is 2 byte aligned */
> > + if (tbase_get_deferrable(base)) {
> > + WARN_ON(1);
> > + kfree(base);
> > + return -ENOMEM;
> > + }
>
> Not a comment, but a question: do we really need this?
AFAIK, kmalloc_node should return an even address always. I was just being
paranoid and wanted to assert it here as otherwise some normal timer may end up
being deferred timer.
Thanks,
Venki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-27 20:43 ` [PATCH] Add support for deferrable timers (respun) Venki Pallipadi
2007-03-27 21:11 ` Oleg Nesterov
@ 2007-03-28 11:06 ` Andi Kleen
2007-03-28 16:42 ` Venki Pallipadi
1 sibling, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2007-03-28 11:06 UTC (permalink / raw)
To: Venki Pallipadi
Cc: Oleg Nesterov, linux-kernel, akpm, davej, johnstul, mingo, tglx
Venki Pallipadi <venkatesh.pallipadi@intel.com> writes:
> +++ new/kernel/timer.c 2007-03-26 15:19:35.000000000 -0800
> @@ -74,7 +74,7 @@
> tvec_t tv3;
> tvec_t tv4;
> tvec_t tv5;
> -} ____cacheline_aligned_in_smp;
> +} ____cacheline_aligned;
Why this change? It should be aligned to 2 bytes anyways.
> static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;
>
> +/*
> + * The lowest bit of base ptr in timer is used as a flag to indicate
> + * 'deferrable' nature of the timer. Functions below help us manage that flag.
> + */
> +static inline unsigned int tbase_get_deferrable(struct tvec_t_base_s *base)
> +{
> + return ((unsigned int)base & TBASE_DEFERRABLE_FLAG);
This will warn on 64bit for cast ptr -> int
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun)
2007-03-28 11:06 ` [PATCH] Add support for deferrable timers (respun) Andi Kleen
@ 2007-03-28 16:42 ` Venki Pallipadi
0 siblings, 0 replies; 14+ messages in thread
From: Venki Pallipadi @ 2007-03-28 16:42 UTC (permalink / raw)
To: Andi Kleen
Cc: Venki Pallipadi, Oleg Nesterov, linux-kernel, akpm, davej,
johnstul, mingo, tglx
On Wed, Mar 28, 2007 at 01:06:35PM +0200, Andi Kleen wrote:
> Venki Pallipadi <venkatesh.pallipadi@intel.com> writes:
> > +++ new/kernel/timer.c 2007-03-26 15:19:35.000000000 -0800
> > @@ -74,7 +74,7 @@
> > tvec_t tv3;
> > tvec_t tv4;
> > tvec_t tv5;
> > -} ____cacheline_aligned_in_smp;
> > +} ____cacheline_aligned;
>
> Why this change? It should be aligned to 2 bytes anyways.
Just wanted to be cautious of all archs and all compilers.
> > static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;
> >
> > +/*
> > + * The lowest bit of base ptr in timer is used as a flag to indicate
> > + * 'deferrable' nature of the timer. Functions below help us manage that flag.
> > + */
> > +static inline unsigned int tbase_get_deferrable(struct tvec_t_base_s *base)
> > +{
> > + return ((unsigned int)base & TBASE_DEFERRABLE_FLAG);
>
> This will warn on 64bit for cast ptr -> int
OK. Will change this to ptr->long->int in the update patch.
Thanks,
Venki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun-Mar28)
2007-03-27 22:22 ` Oleg Nesterov
2007-03-27 22:28 ` Oleg Nesterov
2007-03-27 22:36 ` Venki Pallipadi
@ 2007-03-28 23:00 ` Venki Pallipadi
2007-03-29 0:01 ` Andrew Morton
2 siblings, 1 reply; 14+ messages in thread
From: Venki Pallipadi @ 2007-03-28 23:00 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, akpm, davej, johnstul, mingo, tglx, Andi Kleen
Andrew,
Please drop the patch you included yesterday and two incremental patches and
use the patch below.
This patch is - yesterday's patch + Your tidy cleanup +
minor changes based on comments from Oleg and Andi. This is a lot
cleaner (and smaller) than earlier patches.
Thanks,
Venki
Introduce a new flag for timers - deferrable:
Timers that work normally when system is busy. But, will not cause CPU to
come out of idle (just to service this timer), when CPU is idle. Instead,
this timer will be serviced when CPU eventually wakes up with a subsequent
non-deferrable timer.
The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until next
event without any issues, this new timer can be used to setup timer event
for that routine. This, with dynticks, allows CPUs to be lazy, allowing them
to stay in idle for extended period of time by reducing unnecesary wakeup and
thereby reducing the power consumption.
This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this
deferrable timer flag. __next_timer_interrupt() function
skips over these deferrable timers when CPU looks for
next timer event for which it has to wake up.
This is exported by a new interface init_timer_deferrable() that can
be called in place of regular init_timer().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Index: new/kernel/timer.c
===================================================================
--- new.orig/kernel/timer.c 2007-03-22 16:27:44.000000000 -0800
+++ new/kernel/timer.c 2007-03-28 10:05:38.000000000 -0800
@@ -74,7 +74,7 @@
tvec_t tv3;
tvec_t tv4;
tvec_t tv5;
-} ____cacheline_aligned_in_smp;
+} ____cacheline_aligned;
typedef struct tvec_t_base_s tvec_base_t;
@@ -82,6 +82,37 @@
EXPORT_SYMBOL(boot_tvec_bases);
static DEFINE_PER_CPU(tvec_base_t *, tvec_bases) = &boot_tvec_bases;
+/*
+ * Note that all tvec_bases is 2 byte aligned and lower bit of
+ * base in timer_list is guaranteed to be zero. Use the LSB for
+ * the new flag to indicate whether the timer is deferrable
+ */
+#define TBASE_DEFERRABLE_FLAG (0x1)
+
+/* Functions below help us manage 'deferrable' flag */
+static inline unsigned int tbase_get_deferrable(tvec_base_t *base)
+{
+ return ((unsigned int)(unsigned long)base & TBASE_DEFERRABLE_FLAG);
+}
+
+static inline tvec_base_t *tbase_get_base(tvec_base_t *base)
+{
+ return ((tvec_base_t *)((unsigned long)base & ~TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void timer_set_deferrable(struct timer_list *timer)
+{
+ timer->base = ((tvec_base_t *)((unsigned long)(timer->base) |
+ TBASE_DEFERRABLE_FLAG));
+}
+
+static inline void
+timer_set_base(struct timer_list *timer, tvec_base_t *new_base)
+{
+ timer->base = (tvec_base_t *)((unsigned long)(new_base) |
+ tbase_get_deferrable(timer->base));
+}
+
/**
* __round_jiffies - function to round jiffies to a full second
* @j: the time in (absolute) jiffies that should be rounded
@@ -295,6 +326,13 @@
}
EXPORT_SYMBOL(init_timer);
+void fastcall init_timer_deferrable(struct timer_list *timer)
+{
+ init_timer(timer);
+ timer_set_deferrable(timer);
+}
+EXPORT_SYMBOL(init_timer_deferrable);
+
static inline void detach_timer(struct timer_list *timer,
int clear_pending)
{
@@ -325,10 +363,11 @@
tvec_base_t *base;
for (;;) {
- base = timer->base;
+ tvec_base_t *prelock_base = timer->base;
+ base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(&base->lock, *flags);
- if (likely(base == timer->base))
+ if (likely(prelock_base == timer->base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(&base->lock, *flags);
@@ -365,11 +404,11 @@
*/
if (likely(base->running_timer != timer)) {
/* See the comment in lock_timer_base() */
- timer->base = NULL;
+ timer_set_base(timer, NULL);
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
- timer->base = base;
+ timer_set_base(timer, base);
}
}
@@ -397,7 +436,7 @@
timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
spin_lock_irqsave(&base->lock, flags);
- timer->base = base;
+ timer_set_base(timer, base);
internal_add_timer(base, timer);
spin_unlock_irqrestore(&base->lock, flags);
}
@@ -548,7 +587,7 @@
* don't have to detach them individually.
*/
list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
- BUG_ON(timer->base != base);
+ BUG_ON(tbase_get_base(timer->base) != base);
internal_add_timer(base, timer);
}
@@ -634,6 +673,9 @@
index = slot = timer_jiffies & TVR_MASK;
do {
list_for_each_entry(nte, base->tv1.vec + slot, entry) {
+ if (tbase_get_deferrable(nte->base))
+ continue;
+
found = 1;
expires = nte->expires;
/* Look at the cascade bucket(s)? */
@@ -1602,6 +1644,13 @@
cpu_to_node(cpu));
if (!base)
return -ENOMEM;
+
+ /* Make sure that tvec_base is 2 byte aligned */
+ if (tbase_get_deferrable(base)) {
+ WARN_ON(1);
+ kfree(base);
+ return -ENOMEM;
+ }
memset(base, 0, sizeof(*base));
per_cpu(tvec_bases, cpu) = base;
} else {
@@ -1643,7 +1692,7 @@
while (!list_empty(head)) {
timer = list_entry(head->next, struct timer_list, entry);
detach_timer(timer, 0);
- timer->base = new_base;
+ timer_set_base(timer, new_base);
internal_add_timer(new_base, timer);
}
}
Index: new/include/linux/timer.h
===================================================================
--- new.orig/include/linux/timer.h 2007-03-22 16:27:44.000000000 -0800
+++ new/include/linux/timer.h 2007-03-28 10:03:14.000000000 -0800
@@ -37,6 +37,7 @@
TIMER_INITIALIZER(_function, _expires, _data)
void fastcall init_timer(struct timer_list * timer);
+void fastcall init_timer_deferrable(struct timer_list *timer);
static inline void setup_timer(struct timer_list * timer,
void (*function)(unsigned long),
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun-Mar28)
2007-03-28 23:00 ` [PATCH] Add support for deferrable timers (respun-Mar28) Venki Pallipadi
@ 2007-03-29 0:01 ` Andrew Morton
2007-03-29 0:59 ` Venki Pallipadi
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2007-03-29 0:01 UTC (permalink / raw)
To: Venki Pallipadi
Cc: Oleg Nesterov, linux-kernel, davej, johnstul, mingo, tglx,
Andi Kleen
On Wed, 28 Mar 2007 16:00:21 -0700
Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
> Please drop the patch you included yesterday and two incremental patches and
> use the patch below.
As you saw, I went and turned it into an incremental patch again. It makes
it easier to see what changed, but harder to see the whole thing.
> Introduce a new flag for timers - deferrable:
OK, but there's nothing in-kernel whcih actually uses this.
It would be good to identify some timer users which can be switched over (as
many as possible, really) so this thing actually gets some runtime testing.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun-Mar28)
2007-03-29 0:01 ` Andrew Morton
@ 2007-03-29 0:59 ` Venki Pallipadi
2007-03-29 11:41 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Venki Pallipadi @ 2007-03-29 0:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Venki Pallipadi, Oleg Nesterov, linux-kernel, davej, johnstul,
mingo, tglx, Andi Kleen
On Wed, Mar 28, 2007 at 05:01:59PM -0700, Andrew Morton wrote:
> On Wed, 28 Mar 2007 16:00:21 -0700
> Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
>
> > Please drop the patch you included yesterday and two incremental patches and
> > use the patch below.
>
> As you saw, I went and turned it into an incremental patch again. It makes
> it easier to see what changed, but harder to see the whole thing.
>
> > Introduce a new flag for timers - deferrable:
>
> OK, but there's nothing in-kernel whcih actually uses this.
>
> It would be good to identify some timer users which can be switched over (as
> many as possible, really) so this thing actually gets some runtime testing.
ondemand is the biggest offender and the patch below reduces the number of
interrupts by 50% or more (depending on HZ) on different test systems here.
Yes. There are quite a few other timers inside kernel that can be
migrated. I will use timer_stats and track others and send in the patches
soon.
Thanks,
Venki
------
Add a new deferrable delayed work init. This can be used to schedule work
that are 'unimportant' when CPU is idle and can be called later, when CPU
eventually comes out of idle.
Use this init in cpufreq ondemand governor.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Index: new/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- new.orig/drivers/cpufreq/cpufreq_ondemand.c 2007-03-28 10:03:21.000000000 -0800
+++ new/drivers/cpufreq/cpufreq_ondemand.c 2007-03-28 10:05:44.000000000 -0800
@@ -470,7 +470,7 @@
dbs_info->enable = 1;
ondemand_powersave_bias_init();
dbs_info->sample_type = DBS_NORMAL_SAMPLE;
- INIT_DELAYED_WORK(&dbs_info->work, do_dbs_timer);
+ INIT_DELAYED_WORK_DEFERRABLE(&dbs_info->work, do_dbs_timer);
queue_delayed_work_on(dbs_info->cpu, kondemand_wq, &dbs_info->work,
delay);
}
Index: new/include/linux/workqueue.h
===================================================================
--- new.orig/include/linux/workqueue.h 2007-03-28 10:03:21.000000000 -0800
+++ new/include/linux/workqueue.h 2007-03-28 10:05:44.000000000 -0800
@@ -89,6 +89,12 @@
init_timer(&(_work)->timer); \
} while (0)
+#define INIT_DELAYED_WORK_DEFERRABLE(_work, _func) \
+ do { \
+ INIT_WORK(&(_work)->work, (_func)); \
+ init_timer_deferrable(&(_work)->timer); \
+ } while (0)
+
/**
* work_pending - Find out whether a work item is currently pending
* @work: The work item in question
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun-Mar28)
2007-03-29 0:59 ` Venki Pallipadi
@ 2007-03-29 11:41 ` Andi Kleen
2007-03-29 11:51 ` Kyle Moffett
0 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2007-03-29 11:41 UTC (permalink / raw)
To: Venki Pallipadi
Cc: Andrew Morton, Oleg Nesterov, linux-kernel, davej, johnstul,
mingo, tglx
> ondemand is the biggest offender and the patch below reduces the number of
> interrupts by 50% or more (depending on HZ) on different test systems here.
Cool!
> Yes. There are quite a few other timers inside kernel that can be
> migrated. I will use timer_stats and track others and send in the patches
> soon.
Longer term it might make sense to even expose this as a option to user space.
Maybe as a new timer in setitimer()? This might safe power with "wiggling desktop
applets" too.
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun-Mar28)
2007-03-29 11:41 ` Andi Kleen
@ 2007-03-29 11:51 ` Kyle Moffett
2007-03-29 11:58 ` Andi Kleen
0 siblings, 1 reply; 14+ messages in thread
From: Kyle Moffett @ 2007-03-29 11:51 UTC (permalink / raw)
To: Andi Kleen
Cc: Venki Pallipadi, Andrew Morton, Oleg Nesterov, linux-kernel,
davej, johnstul, mingo, tglx
On Mar 29, 2007, at 07:41:12, Andi Kleen wrote:
>> ondemand is the biggest offender and the patch below reduces the
>> number of interrupts by 50% or more (depending on HZ) on different
>> test systems here.
>
> Cool!
>
>> Yes. There are quite a few other timers inside kernel that can be
>> migrated. I will use timer_stats and track others and send in the
>> patches soon.
>
> Longer term it might make sense to even expose this as a option to
> user space. Maybe as a new timer in setitimer()? This might safe
> power with "wiggling desktop applets" too.
Might also be useful to add an extra option to "top" to reduce the
polling frequency if the system is otherwise idle. A fixed 30-sec
timer and a deferrable 1-sec timer or somesuch?
Cheers,
Kyle Moffett
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Add support for deferrable timers (respun-Mar28)
2007-03-29 11:51 ` Kyle Moffett
@ 2007-03-29 11:58 ` Andi Kleen
0 siblings, 0 replies; 14+ messages in thread
From: Andi Kleen @ 2007-03-29 11:58 UTC (permalink / raw)
To: Kyle Moffett
Cc: Venki Pallipadi, Andrew Morton, Oleg Nesterov, linux-kernel,
davej, johnstul, mingo, tglx
> Might also be useful to add an extra option to "top" to reduce the
> polling frequency if the system is otherwise idle. A fixed 30-sec
> timer and a deferrable 1-sec timer or somesuch?
Hmm, i think the current implementation is per CPU. top really would
like to have one that applies to all CPUs though.
Thinking about it for sane user space semantics it would probably need
a global implementation anyways.
Perhaps it could use the same infrastructure as RCU does to handle this?
-Andi
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-03-29 11:59 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200703212353.l2LNrNOj007453@shell0.pdx.osdl.net>
[not found] ` <20070322140532.GA120@tv-sign.ru>
[not found] ` <20070322151817.GA29840@linux-os.sc.intel.com>
[not found] ` <20070322161355.GA160@tv-sign.ru>
2007-03-27 20:43 ` [PATCH] Add support for deferrable timers (respun) Venki Pallipadi
2007-03-27 21:11 ` Oleg Nesterov
2007-03-27 21:55 ` Venki Pallipadi
2007-03-27 22:22 ` Oleg Nesterov
2007-03-27 22:28 ` Oleg Nesterov
2007-03-27 22:36 ` Venki Pallipadi
2007-03-28 23:00 ` [PATCH] Add support for deferrable timers (respun-Mar28) Venki Pallipadi
2007-03-29 0:01 ` Andrew Morton
2007-03-29 0:59 ` Venki Pallipadi
2007-03-29 11:41 ` Andi Kleen
2007-03-29 11:51 ` Kyle Moffett
2007-03-29 11:58 ` Andi Kleen
2007-03-28 11:06 ` [PATCH] Add support for deferrable timers (respun) Andi Kleen
2007-03-28 16:42 ` Venki Pallipadi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox