* [v4 RFC PATCH 0/4] timers: Framework for migration of timers
@ 2009-04-01 11:31 Arun R Bharadwaj
2009-04-01 11:32 ` [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers Arun R Bharadwaj
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-01 11:31 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: a.p.zijlstra, ego, tglx, mingo, andi, venkatesh.pallipadi, vatsa,
arjan, svaidy, arun
Hi,
In an SMP system, tasks are scheduled on different CPUs by the
scheduler, interrupts are managed by irqbalancer daemon, but timers
are still stuck to the CPUs that they have been initialised. Timers
queued by tasks gets re-queued on the CPU where the task gets to run
next, but timers from IRQ context like the ones in device drivers are
still stuck on the CPU they were initialised. This framework will
help move all 'movable timers' using a sysctl interface.
Iteration v3 of this patch can be found at
http://lkml.org/lkml/2009/3/16/162
Changelog: v3->v4
-An hrtimer is migrated *only* if its expiry occurs after the next timer
interrupt on the CPU to which it is being migrated to. This prevents any
possible latency, which might have arised otherwise due to migration.
So, a few helper functions have been added to check if migration would
result in latency or not.
Thanks to Thomas for pointing out this issue.
The following patches are included:
PATCH 1/4 - framework to identify pinned timers.
PATCH 2/4 - identifying the existing pinned hrtimers.
PATCH 3/4 - /proc/sys sysctl hook to enable timer migration.
PATCH 4/4 - logic to enable timer migration.
The patchset is based on the latest tip/master.
Timer migration is enabled by default as suggested by Ingo.
It can be turned off when CONFIG_SCHED_DEBUG=y by
echo 0 > /proc/sys/kernel/timer_migration
Please let me know your comments.
--arun
^ permalink raw reply [flat|nested] 16+ messages in thread
* [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers
2009-04-01 11:31 [v4 RFC PATCH 0/4] timers: Framework for migration of timers Arun R Bharadwaj
@ 2009-04-01 11:32 ` Arun R Bharadwaj
2009-04-01 11:41 ` Andi Kleen
2009-04-01 11:34 ` [v4 RFC PATCH 2/4] timers: Identifying the existing " Arun R Bharadwaj
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-01 11:32 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: a.p.zijlstra, ego, tglx, mingo, andi, venkatesh.pallipadi, vatsa,
arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:01:28]:
This patch creates a new framework for identifying cpu-pinned timers
and hrtimers.
This framework is needed because pinned timers are expected to fire on
the same CPU on which they are queued. So it is essential to identify
these and not migrate them, in case there are any.
For regular timers, the currently existing add_timer_on() can be used
queue pinned timers and subsequently mod_timer_pinned() can be used
to modify the 'expires' field.
For hrtimers, a new interface hrtimer_start_pinned() is created,
which can be used to queue cpu-pinned hrtimer.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
include/linux/hrtimer.h | 24 ++++++++++++++++++++----
include/linux/timer.h | 3 +++
kernel/hrtimer.c | 34 ++++++++++++++++++++++++++++------
kernel/timer.c | 31 +++++++++++++++++++++++++++----
4 files changed, 78 insertions(+), 14 deletions(-)
Index: linux.trees.git/include/linux/hrtimer.h
===================================================================
--- linux.trees.git.orig/include/linux/hrtimer.h
+++ linux.trees.git/include/linux/hrtimer.h
@@ -331,23 +331,39 @@ static inline void hrtimer_init_on_stack
static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
#endif
+#define HRTIMER_NOT_PINNED 0
+#define HRTIMER_PINNED 1
/* Basic timer operations: */
extern int hrtimer_start(struct hrtimer *timer, ktime_t tim,
const enum hrtimer_mode mode);
+extern int hrtimer_start_pinned(struct hrtimer *timer, ktime_t tim,
+ const enum hrtimer_mode mode);
extern int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
- unsigned long range_ns, const enum hrtimer_mode mode);
+ unsigned long range_ns, const enum hrtimer_mode mode, int pinned);
extern int hrtimer_cancel(struct hrtimer *timer);
extern int hrtimer_try_to_cancel(struct hrtimer *timer);
-static inline int hrtimer_start_expires(struct hrtimer *timer,
- enum hrtimer_mode mode)
+static inline int __hrtimer_start_expires(struct hrtimer *timer,
+ enum hrtimer_mode mode, int pinned)
{
unsigned long delta;
ktime_t soft, hard;
soft = hrtimer_get_softexpires(timer);
hard = hrtimer_get_expires(timer);
delta = ktime_to_ns(ktime_sub(hard, soft));
- return hrtimer_start_range_ns(timer, soft, delta, mode);
+ return hrtimer_start_range_ns(timer, soft, delta, mode, pinned);
+}
+
+static inline int hrtimer_start_expires(struct hrtimer *timer,
+ enum hrtimer_mode mode)
+{
+ return __hrtimer_start_expires(timer, mode, HRTIMER_NOT_PINNED);
+}
+
+static inline int hrtimer_start_expires_pinned(struct hrtimer *timer,
+ enum hrtimer_mode mode)
+{
+ return __hrtimer_start_expires(timer, mode, HRTIMER_PINNED);
}
static inline int hrtimer_restart(struct hrtimer *timer)
Index: linux.trees.git/kernel/hrtimer.c
===================================================================
--- linux.trees.git.orig/kernel/hrtimer.c
+++ linux.trees.git/kernel/hrtimer.c
@@ -193,7 +193,8 @@ struct hrtimer_clock_base *lock_hrtimer_
* Switch the timer base to the current CPU when possible.
*/
static inline struct hrtimer_clock_base *
-switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base)
+switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
+ int pinned)
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
@@ -897,9 +898,8 @@ remove_hrtimer(struct hrtimer *timer, st
* 0 on success
* 1 when the timer was active
*/
-int
-hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, unsigned long delta_ns,
- const enum hrtimer_mode mode)
+int hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
+ unsigned long delta_ns, const enum hrtimer_mode mode, int pinned)
{
struct hrtimer_clock_base *base, *new_base;
unsigned long flags;
@@ -911,7 +911,7 @@ hrtimer_start_range_ns(struct hrtimer *t
ret = remove_hrtimer(timer, base);
/* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base);
+ new_base = switch_hrtimer_base(timer, base, pinned);
if (mode == HRTIMER_MODE_REL) {
tim = ktime_add_safe(tim, new_base->get_time());
@@ -948,6 +948,12 @@ hrtimer_start_range_ns(struct hrtimer *t
}
EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
+int __hrtimer_start(struct hrtimer *timer, ktime_t tim,
+ const enum hrtimer_mode mode, int pinned)
+{
+ return hrtimer_start_range_ns(timer, tim, 0, mode, pinned);
+}
+
/**
* hrtimer_start - (re)start an hrtimer on the current CPU
* @timer: the timer to be added
@@ -961,10 +967,26 @@ EXPORT_SYMBOL_GPL(hrtimer_start_range_ns
int
hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
{
- return hrtimer_start_range_ns(timer, tim, 0, mode);
+ return __hrtimer_start(timer, tim, mode, HRTIMER_NOT_PINNED);
}
EXPORT_SYMBOL_GPL(hrtimer_start);
+/**
+ * hrtimer_start_pinned - start a CPU-pinned hrtimer
+ * @timer: the timer to be added
+ * @tim: expiry time
+ * @mode: expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
+ *
+ * Returns:
+ * 0 on success
+ * 1 when the timer was active
+ */
+int hrtimer_start_pinned(struct hrtimer *timer,
+ ktime_t tim, const enum hrtimer_mode mode)
+{
+ return __hrtimer_start(timer, tim, mode, HRTIMER_PINNED);
+}
+EXPORT_SYMBOL_GPL(hrtimer_start_pinned);
/**
* hrtimer_try_to_cancel - try to deactivate a timer
Index: linux.trees.git/include/linux/timer.h
===================================================================
--- linux.trees.git.orig/include/linux/timer.h
+++ linux.trees.git/include/linux/timer.h
@@ -163,7 +163,10 @@ extern void add_timer_on(struct timer_li
extern int del_timer(struct timer_list * timer);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
+extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);
+#define TIMER_NOT_PINNED 0
+#define TIMER_PINNED 1
/*
* The jiffies value which is added to now, when there is no timer
* in the timer wheel:
Index: linux.trees.git/kernel/timer.c
===================================================================
--- linux.trees.git.orig/kernel/timer.c
+++ linux.trees.git/kernel/timer.c
@@ -601,7 +601,8 @@ static struct tvec_base *lock_timer_base
}
static inline int
-__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
+__mod_timer(struct timer_list *timer, unsigned long expires,
+ bool pending_only, int pinned)
{
struct tvec_base *base, *new_base;
unsigned long flags;
@@ -665,7 +666,7 @@ out_unlock:
*/
int mod_timer_pending(struct timer_list *timer, unsigned long expires)
{
- return __mod_timer(timer, expires, true);
+ return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
}
EXPORT_SYMBOL(mod_timer_pending);
@@ -699,11 +700,33 @@ int mod_timer(struct timer_list *timer,
if (timer->expires == expires && timer_pending(timer))
return 1;
- return __mod_timer(timer, expires, false);
+ return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
}
EXPORT_SYMBOL(mod_timer);
/**
+ * mod_timer_pinned - modify a timer's timeout
+ * @timer: the timer to be modified
+ * @expires: new timeout in jiffies
+ *
+ * mod_timer_pinned() is a way to update the expire field of an
+ * active timer (if the timer is inactive it will be activated)
+ * and not allow the timer to be migrated to a different CPU.
+ *
+ * mod_timer_pinned(timer, expires) is equivalent to:
+ *
+ * del_timer(timer); timer->expires = expires; add_timer(timer);
+ */
+int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
+{
+ if (timer->expires == expires && timer_pending(timer))
+ return 1;
+
+ return __mod_timer(timer, expires, false, TIMER_PINNED);
+}
+EXPORT_SYMBOL(mod_timer_pinned);
+
+/**
* add_timer - start a timer
* @timer: the timer to be added
*
@@ -1350,7 +1373,7 @@ signed long __sched schedule_timeout(sig
expire = timeout + jiffies;
setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
- __mod_timer(&timer, expire, false);
+ __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
schedule();
del_singleshot_timer_sync(&timer);
^ permalink raw reply [flat|nested] 16+ messages in thread
* [v4 RFC PATCH 2/4] timers: Identifying the existing pinned timers
2009-04-01 11:31 [v4 RFC PATCH 0/4] timers: Framework for migration of timers Arun R Bharadwaj
2009-04-01 11:32 ` [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers Arun R Bharadwaj
@ 2009-04-01 11:34 ` Arun R Bharadwaj
2009-04-01 11:36 ` [v4 RFC PATCH 3/4] timers: /proc/sys sysctl hook to enable timer migration Arun R Bharadwaj
2009-04-01 11:37 ` [v4 RFC PATCH 4/4] timers: logic to move non pinned timers Arun R Bharadwaj
3 siblings, 0 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-01 11:34 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: a.p.zijlstra, ego, tglx, mingo, andi, venkatesh.pallipadi, vatsa,
arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:01:28]:
The following pinned hrtimers have been identified and marked:
1)sched_rt_period_timer
2)tick_sched_timer
3)stack_trace_timer_fn
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
kernel/sched.c | 5 +++--
kernel/time/tick-sched.c | 7 ++++---
kernel/trace/trace_sysprof.c | 3 ++-
4 files changed, 10 insertions(+), 7 deletions(-)
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c
+++ linux.trees.git/kernel/sched.c
@@ -236,7 +236,7 @@ static void start_rt_bandwidth(struct rt
now = hrtimer_cb_get_time(&rt_b->rt_period_timer);
hrtimer_forward(&rt_b->rt_period_timer, now, rt_b->rt_period);
- hrtimer_start_expires(&rt_b->rt_period_timer,
+ hrtimer_start_expires_pinned(&rt_b->rt_period_timer,
HRTIMER_MODE_ABS);
}
spin_unlock(&rt_b->rt_runtime_lock);
@@ -1156,7 +1156,8 @@ static __init void init_hrtick(void)
*/
static void hrtick_start(struct rq *rq, u64 delay)
{
- hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay), HRTIMER_MODE_REL);
+ hrtimer_start_pinned(&rq->hrtick_timer, ns_to_ktime(delay),
+ HRTIMER_MODE_REL);
}
static inline void init_hrtick(void)
Index: linux.trees.git/kernel/time/tick-sched.c
===================================================================
--- linux.trees.git.orig/kernel/time/tick-sched.c
+++ linux.trees.git/kernel/time/tick-sched.c
@@ -348,7 +348,7 @@ void tick_nohz_stop_sched_tick(int inidl
ts->idle_expires = expires;
if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
- hrtimer_start(&ts->sched_timer, expires,
+ hrtimer_start_pinned(&ts->sched_timer, expires,
HRTIMER_MODE_ABS);
/* Check, if the timer was already in the past */
if (hrtimer_active(&ts->sched_timer))
@@ -394,7 +394,7 @@ static void tick_nohz_restart(struct tic
hrtimer_forward(&ts->sched_timer, now, tick_period);
if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
- hrtimer_start_expires(&ts->sched_timer,
+ hrtimer_start_expires_pinned(&ts->sched_timer,
HRTIMER_MODE_ABS);
/* Check, if the timer was already in the past */
if (hrtimer_active(&ts->sched_timer))
@@ -698,7 +698,8 @@ void tick_setup_sched_timer(void)
for (;;) {
hrtimer_forward(&ts->sched_timer, now, tick_period);
- hrtimer_start_expires(&ts->sched_timer, HRTIMER_MODE_ABS);
+ hrtimer_start_expires_pinned(&ts->sched_timer,
+ HRTIMER_MODE_ABS);
/* Check, if the timer was already in the past */
if (hrtimer_active(&ts->sched_timer))
break;
Index: linux.trees.git/kernel/trace/trace_sysprof.c
===================================================================
--- linux.trees.git.orig/kernel/trace/trace_sysprof.c
+++ linux.trees.git/kernel/trace/trace_sysprof.c
@@ -203,7 +203,8 @@ static void start_stack_timer(void *unus
hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hrtimer->function = stack_trace_timer_fn;
- hrtimer_start(hrtimer, ns_to_ktime(sample_period), HRTIMER_MODE_REL);
+ hrtimer_start_pinned(hrtimer, ns_to_ktime(sample_period),
+ HRTIMER_MODE_REL);
}
static void start_stack_timers(void)
Index: linux.trees.git/arch/x86/kernel/apic/x2apic_uv_x.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/apic/x2apic_uv_x.c
+++ linux.trees.git/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -455,7 +455,7 @@ static void uv_heartbeat(unsigned long i
uv_set_scir_bits(bits);
/* enable next timer period */
- mod_timer(timer, jiffies + SCIR_CPU_HB_INTERVAL);
+ mod_timer_pinned(timer, jiffies + SCIR_CPU_HB_INTERVAL);
}
static void __cpuinit uv_heartbeat_enable(int cpu)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [v4 RFC PATCH 3/4] timers: /proc/sys sysctl hook to enable timer migration
2009-04-01 11:31 [v4 RFC PATCH 0/4] timers: Framework for migration of timers Arun R Bharadwaj
2009-04-01 11:32 ` [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers Arun R Bharadwaj
2009-04-01 11:34 ` [v4 RFC PATCH 2/4] timers: Identifying the existing " Arun R Bharadwaj
@ 2009-04-01 11:36 ` Arun R Bharadwaj
2009-04-01 11:37 ` [v4 RFC PATCH 4/4] timers: logic to move non pinned timers Arun R Bharadwaj
3 siblings, 0 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-01 11:36 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: a.p.zijlstra, ego, tglx, mingo, andi, venkatesh.pallipadi, vatsa,
arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:01:28]:
This patch creates the /proc/sys sysctl interface at
/proc/sys/kernel/timer_migration
Timer migration is enabled by default.
To disable timer migration, when CONFIG_SCHED_DEBUG = y,
echo 0 > /proc/sys/kernel/timer_migration
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
include/linux/sched.h | 1 +
kernel/sched.c | 1 +
kernel/sysctl.c | 8 ++++++++
3 files changed, 10 insertions(+)
Index: linux.trees.git/include/linux/sched.h
===================================================================
--- linux.trees.git.orig/include/linux/sched.h
+++ linux.trees.git/include/linux/sched.h
@@ -1763,6 +1763,7 @@ extern unsigned int sysctl_sched_child_r
extern unsigned int sysctl_sched_features;
extern unsigned int sysctl_sched_migration_cost;
extern unsigned int sysctl_sched_nr_migrate;
+extern unsigned int sysctl_timer_migration;
int sched_nr_latency_handler(struct ctl_table *table, int write,
struct file *file, void __user *buffer, size_t *length,
Index: linux.trees.git/kernel/sysctl.c
===================================================================
--- linux.trees.git.orig/kernel/sysctl.c
+++ linux.trees.git/kernel/sysctl.c
@@ -328,6 +328,14 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "timer_migration",
+ .data = &sysctl_timer_migration,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#endif
{
.ctl_name = CTL_UNNUMBERED,
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c
+++ linux.trees.git/kernel/sched.c
@@ -7445,6 +7445,7 @@ static void sched_domain_node_span(int n
#endif /* CONFIG_NUMA */
int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
+const_debug unsigned int sysctl_timer_migration = 1;
/*
* The cpus mask in sched_group and sched_domain hangs off the end.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-01 11:31 [v4 RFC PATCH 0/4] timers: Framework for migration of timers Arun R Bharadwaj
` (2 preceding siblings ...)
2009-04-01 11:36 ` [v4 RFC PATCH 3/4] timers: /proc/sys sysctl hook to enable timer migration Arun R Bharadwaj
@ 2009-04-01 11:37 ` Arun R Bharadwaj
2009-04-01 11:46 ` Arun R Bharadwaj
2009-04-03 21:52 ` Thomas Gleixner
3 siblings, 2 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-01 11:37 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: a.p.zijlstra, ego, tglx, mingo, andi, venkatesh.pallipadi, vatsa,
arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:01:28]:
This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.
While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So a hrtimer_check_latency()
has been added to provide this check. This compares the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrates the
hrtimer only if it expires *after* the next interrupt on the target cpu.
Currently get_next_timer_interrupt() helps to get the next timer interrupt
on the current cpu. So added a get_next_timer_interrupt_on() to get the
next timer interrupt on the cpu of choice. Also a few other helper functions
have been added to facilitate this.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
include/linux/hrtimer.h | 19 ++++++++++++++-
include/linux/sched.h | 12 +++++++++
include/linux/timer.h | 2 -
kernel/hrtimer.c | 42 +++++++++++++++++++++++++--------
kernel/sched.c | 5 ++++
kernel/timer.c | 60 ++++++++++++++++++++++++++++++++++++++++--------
6 files changed, 119 insertions(+), 21 deletions(-)
Index: linux.trees.git/kernel/timer.c
===================================================================
--- linux.trees.git.orig/kernel/timer.c
+++ linux.trees.git/kernel/timer.c
@@ -37,6 +37,7 @@
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
+#include <linux/sched.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -606,7 +607,7 @@ __mod_timer(struct timer_list *timer, un
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret;
+ int ret, current_cpu, preferred_cpu;
ret = 0;
@@ -627,6 +628,12 @@ __mod_timer(struct timer_list *timer, un
new_base = __get_cpu_var(tvec_bases);
+ current_cpu = smp_processor_id();
+ preferred_cpu = get_nohz_load_balancer();
+ if (get_sysctl_timer_migration() && idle_cpu(current_cpu)
+ && !pinned && preferred_cpu != -1)
+ new_base = per_cpu(tvec_bases, preferred_cpu);
+
if (base != new_base) {
/*
* We are trying to schedule the timer on the local CPU.
@@ -635,7 +642,8 @@ __mod_timer(struct timer_list *timer, un
* handler yet has not finished. This also guarantees that
* the timer is serialized wrt itself.
*/
- if (likely(base->running_timer != timer)) {
+ if (likely(base->running_timer != timer) ||
+ get_sysctl_timer_migration()) {
/* See the comment in lock_timer_base() */
timer_set_base(timer, NULL);
spin_unlock(&base->lock);
@@ -1063,10 +1071,9 @@ cascade:
* Check, if the next hrtimer event is before the next timer wheel
* event:
*/
-static unsigned long cmp_next_hrtimer_event(unsigned long now,
- unsigned long expires)
+static unsigned long __cmp_next_hrtimer_event(unsigned long now,
+ unsigned long expires, ktime_t hr_delta)
{
- ktime_t hr_delta = hrtimer_get_next_event();
struct timespec tsdelta;
unsigned long delta;
@@ -1103,24 +1110,59 @@ static unsigned long cmp_next_hrtimer_ev
return expires;
}
+static unsigned long cmp_next_hrtimer_event(unsigned long now,
+ unsigned long expires)
+{
+ ktime_t hr_delta = hrtimer_get_next_event();
+ return __cmp_next_hrtimer_event(now, expires, hr_delta);
+}
+
+static unsigned long cmp_next_hrtimer_event_on(unsigned long now,
+ unsigned long expires, int cpu)
+{
+ ktime_t hr_delta = hrtimer_get_next_event_on(cpu);
+ return __cmp_next_hrtimer_event(now, expires, hr_delta);
+}
+
+unsigned long __get_next_timer_interrupt(unsigned long now, int cpu)
+{
+ struct tvec_base *base = per_cpu(tvec_bases, cpu);
+ unsigned long expires;
+
+ spin_lock(&base->lock);
+ expires = __next_timer_interrupt(base);
+ spin_unlock(&base->lock);
+ return expires;
+}
+
/**
* get_next_timer_interrupt - return the jiffy of the next pending timer
* @now: current time (in jiffies)
*/
unsigned long get_next_timer_interrupt(unsigned long now)
{
- struct tvec_base *base = __get_cpu_var(tvec_bases);
unsigned long expires;
+ int cpu = smp_processor_id();
- spin_lock(&base->lock);
- expires = __next_timer_interrupt(base);
- spin_unlock(&base->lock);
+ expires = __get_next_timer_interrupt(now, cpu);
if (time_before_eq(expires, now))
return now;
return cmp_next_hrtimer_event(now, expires);
}
+
+unsigned long get_next_timer_interrupt_on(unsigned long now, int cpu)
+{
+ unsigned long expires;
+
+ expires = __get_next_timer_interrupt(now, cpu);
+
+ if (time_before_eq(expires, now))
+ return now;
+
+ return cmp_next_hrtimer_event_on(now, expires, cpu);
+}
#endif
/*
Index: linux.trees.git/kernel/hrtimer.c
===================================================================
--- linux.trees.git.orig/kernel/hrtimer.c
+++ linux.trees.git/kernel/hrtimer.c
@@ -43,6 +43,8 @@
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/debugobjects.h>
+#include <linux/sched.h>
+#include <linux/timer.h>
#include <asm/uaccess.h>
@@ -198,8 +200,16 @@ switch_hrtimer_base(struct hrtimer *time
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
+ int current_cpu, preferred_cpu;
+
+ current_cpu = smp_processor_id();
+ preferred_cpu = get_nohz_load_balancer();
+ if (get_sysctl_timer_migration() && !pinned && preferred_cpu != -1 &&
+ check_hrtimer_latency(timer, preferred_cpu))
+ new_cpu_base = &per_cpu(hrtimer_bases, preferred_cpu);
+ else
+ new_cpu_base = &__get_cpu_var(hrtimer_bases);
- new_cpu_base = &__get_cpu_var(hrtimer_bases);
new_base = &new_cpu_base->clock_base[base->index];
if (base != new_base) {
@@ -1056,21 +1066,16 @@ EXPORT_SYMBOL_GPL(hrtimer_get_remaining)
#ifdef CONFIG_NO_HZ
/**
- * hrtimer_get_next_event - get the time until next expiry event
+ * __hrtimer_get_next_event - get the time until next expiry event
*
* Returns the delta to the next expiry event or KTIME_MAX if no timer
* is pending.
*/
-ktime_t hrtimer_get_next_event(void)
+ktime_t __hrtimer_get_next_event(struct hrtimer_clock_base *base)
{
- struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
- struct hrtimer_clock_base *base = cpu_base->clock_base;
ktime_t delta, mindelta = { .tv64 = KTIME_MAX };
- unsigned long flags;
int i;
- spin_lock_irqsave(&cpu_base->lock, flags);
-
if (!hrtimer_hres_active()) {
for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) {
struct hrtimer *timer;
@@ -1086,12 +1091,29 @@ ktime_t hrtimer_get_next_event(void)
}
}
- spin_unlock_irqrestore(&cpu_base->lock, flags);
-
if (mindelta.tv64 < 0)
mindelta.tv64 = 0;
return mindelta;
}
+
+ktime_t hrtimer_get_next_event(void)
+{
+ ktime_t mindelta;
+ unsigned long flags;
+ struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+ struct hrtimer_clock_base *base = cpu_base->clock_base;
+ spin_lock_irqsave(&cpu_base->lock, flags);
+ mindelta = __hrtimer_get_next_event(base);
+ spin_unlock_irqrestore(&cpu_base->lock, flags);
+ return mindelta;
+}
+
+ktime_t hrtimer_get_next_event_on(int cpu)
+{
+ struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
+ struct hrtimer_clock_base *base = cpu_base->clock_base;
+ return __hrtimer_get_next_event(base);
+}
#endif
static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
Index: linux.trees.git/include/linux/sched.h
===================================================================
--- linux.trees.git.orig/include/linux/sched.h
+++ linux.trees.git/include/linux/sched.h
@@ -265,6 +265,7 @@ static inline int select_nohz_load_balan
}
#endif
+extern int get_nohz_load_balancer(void);
/*
* Only dump TASK_* tasks. (0 for all tasks)
*/
@@ -1769,6 +1770,17 @@ int sched_nr_latency_handler(struct ctl_
struct file *file, void __user *buffer, size_t *length,
loff_t *ppos);
#endif
+#ifdef CONFIG_SCHED_DEBUG
+static inline int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif
extern unsigned int sysctl_sched_rt_period;
extern int sysctl_sched_rt_runtime;
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c
+++ linux.trees.git/kernel/sched.c
@@ -4009,6 +4009,11 @@ static struct {
.load_balancer = ATOMIC_INIT(-1),
};
+int get_nohz_load_balancer(void)
+{
+ return atomic_read(&nohz.load_balancer);
+}
+
/*
* This routine will try to nominate the ilb (idle load balancing)
* owner among the cpus whose ticks are stopped. ilb owner will do the idle
Index: linux.trees.git/include/linux/hrtimer.h
===================================================================
--- linux.trees.git.orig/include/linux/hrtimer.h
+++ linux.trees.git/include/linux/hrtimer.h
@@ -22,7 +22,6 @@
#include <linux/wait.h>
#include <linux/percpu.h>
-
struct hrtimer_clock_base;
struct hrtimer_cpu_base;
@@ -376,6 +375,24 @@ extern ktime_t hrtimer_get_remaining(con
extern int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp);
extern ktime_t hrtimer_get_next_event(void);
+extern ktime_t hrtimer_get_next_event_on(int cpu);
+
+#define TICK_PERIOD_IN_NS NSEC_PER_SEC / HZ
+/*
+ * Helper function to check before migrating a hrtimer if it expires
+ * before the next timer interrupt on the target cpu.
+ */
+static inline int check_hrtimer_latency(struct hrtimer *timer, int cpu)
+{
+ unsigned long next_hrtimeout, next_jiffies;
+ next_jiffies = get_next_timer_interrupt_on(jiffies, cpu);
+ next_jiffies = next_jiffies * TICK_PERIOD_IN_NS;
+ next_hrtimeout = hrtimer_get_expires_ns(timer);
+
+ if (next_hrtimeout > next_jiffies)
+ return 1;
+ return 0;
+}
/*
* A timer is active, when it is enqueued into the rbtree or the callback
Index: linux.trees.git/include/linux/timer.h
===================================================================
--- linux.trees.git.orig/include/linux/timer.h
+++ linux.trees.git/include/linux/timer.h
@@ -184,7 +184,7 @@ extern unsigned long next_timer_interrup
* jiffie.
*/
extern unsigned long get_next_timer_interrupt(unsigned long now);
-
+extern unsigned long get_next_timer_interrupt_on(unsigned long now, int cpu);
/*
* Timer-statistics info:
*/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers
2009-04-01 11:32 ` [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers Arun R Bharadwaj
@ 2009-04-01 11:41 ` Andi Kleen
2009-04-02 5:09 ` Arun R Bharadwaj
0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2009-04-01 11:41 UTC (permalink / raw)
To: Arun R Bharadwaj
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, tglx, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy
On Wed, Apr 01, 2009 at 05:02:58PM +0530, Arun R Bharadwaj wrote:
> * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:01:28]:
>
> This patch creates a new framework for identifying cpu-pinned timers
> and hrtimers.
>
>
> This framework is needed because pinned timers are expected to fire on
> the same CPU on which they are queued. So it is essential to identify
> these and not migrate them, in case there are any.
How would that interact with add_timer_on()? You currently only
support the current CPU, don't you?
e.g. the new tip x86 machine check polling code relies on add_timer_on
staying on that CPU.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-01 11:37 ` [v4 RFC PATCH 4/4] timers: logic to move non pinned timers Arun R Bharadwaj
@ 2009-04-01 11:46 ` Arun R Bharadwaj
2009-04-03 21:52 ` Thomas Gleixner
1 sibling, 0 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-01 11:46 UTC (permalink / raw)
To: linux-kernel, linux-pm
Cc: a.p.zijlstra, ego, tglx, mingo, andi, venkatesh.pallipadi, vatsa,
arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:07:38]:
--------snip--------------
> Index: linux.trees.git/include/linux/hrtimer.h
> ===================================================================
> --- linux.trees.git.orig/include/linux/hrtimer.h
> +++ linux.trees.git/include/linux/hrtimer.h
> @@ -22,7 +22,6 @@
> #include <linux/wait.h>
> #include <linux/percpu.h>
>
> -
> struct hrtimer_clock_base;
> struct hrtimer_cpu_base;
>
> @@ -376,6 +375,24 @@ extern ktime_t hrtimer_get_remaining(con
> extern int hrtimer_get_res(const clockid_t which_clock, struct timespec *tp);
>
> extern ktime_t hrtimer_get_next_event(void);
> +extern ktime_t hrtimer_get_next_event_on(int cpu);
> +
> +#define TICK_PERIOD_IN_NS NSEC_PER_SEC / HZ
> +/*
> + * Helper function to check before migrating a hrtimer if it expires
> + * before the next timer interrupt on the target cpu.
> + */
> +static inline int check_hrtimer_latency(struct hrtimer *timer, int cpu)
> +{
> + unsigned long next_hrtimeout, next_jiffies;
> + next_jiffies = get_next_timer_interrupt_on(jiffies, cpu);
> + next_jiffies = next_jiffies * TICK_PERIOD_IN_NS;
> + next_hrtimeout = hrtimer_get_expires_ns(timer);
> +
> + if (next_hrtimeout > next_jiffies)
> + return 1;
> + return 0;
> +}
>
This function can be greatly simplified.
Here i am getting the next timer interrupt on target_cpu.
The target_cpu is the idle load balancer.
So we can check if the hrtimer expiry is before the next tick
on the ilb or not.
By doing this we can eliminate a lot of extra code, which would
otherwise be needed in the form of helper functions for
get_next_timer_interrupt_on().
After simplification the code would simply look like
static inline int check_hrtimer_latency(struct hrtimer *timer, int cpu)
{
unsigned long next_hrtimeout, next_jiffies;
next_jiffies = (jiffies + 1) * TICK_PERIOD_IN_NS;
next_hrtimeout = hrtimer_get_expires_ns(timer);
if(.....)
}
--arun
> /*
> * A timer is active, when it is enqueued into the rbtree or the callback
> Index: linux.trees.git/include/linux/timer.h
> ===================================================================
> --- linux.trees.git.orig/include/linux/timer.h
> +++ linux.trees.git/include/linux/timer.h
> @@ -184,7 +184,7 @@ extern unsigned long next_timer_interrup
> * jiffie.
> */
> extern unsigned long get_next_timer_interrupt(unsigned long now);
> -
> +extern unsigned long get_next_timer_interrupt_on(unsigned long now, int cpu);
> /*
> * Timer-statistics info:
> */
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers
2009-04-01 11:41 ` Andi Kleen
@ 2009-04-02 5:09 ` Arun R Bharadwaj
0 siblings, 0 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-02 5:09 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, tglx, mingo,
venkatesh.pallipadi, vatsa, arjan, svaidy, Arun Bharadwaj
* Andi Kleen <andi@firstfloor.org> [2009-04-01 13:41:46]:
> On Wed, Apr 01, 2009 at 05:02:58PM +0530, Arun R Bharadwaj wrote:
> > * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-01 17:01:28]:
> >
> > This patch creates a new framework for identifying cpu-pinned timers
> > and hrtimers.
> >
> >
> > This framework is needed because pinned timers are expected to fire on
> > the same CPU on which they are queued. So it is essential to identify
> > these and not migrate them, in case there are any.
>
> How would that interact with add_timer_on()? You currently only
> support the current CPU, don't you?
>
> e.g. the new tip x86 machine check polling code relies on add_timer_on
> staying on that CPU.
>
Pinned timers are directly related to add_timer_on().
So I assume that whatever timer is queued using add_timer_on() is
supposed to be a pinned timer.
Currently, we can stay on one CPU and still queue a pinned timer on
some other CPU. We can mark those timers as 'pinned' to that
particular CPU.
--arun
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-01 11:37 ` [v4 RFC PATCH 4/4] timers: logic to move non pinned timers Arun R Bharadwaj
2009-04-01 11:46 ` Arun R Bharadwaj
@ 2009-04-03 21:52 ` Thomas Gleixner
2009-04-06 5:16 ` Arun R Bharadwaj
1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2009-04-03 21:52 UTC (permalink / raw)
To: Arun R Bharadwaj
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy
Arun,
On Wed, 1 Apr 2009, Arun R Bharadwaj wrote:
> @@ -627,6 +628,12 @@ __mod_timer(struct timer_list *timer, un
>
> new_base = __get_cpu_var(tvec_bases);
>
> + current_cpu = smp_processor_id();
> + preferred_cpu = get_nohz_load_balancer();
> + if (get_sysctl_timer_migration() && idle_cpu(current_cpu)
> + && !pinned && preferred_cpu != -1)
Can we please check the preliminaries first to avoid the atomic_read ?
cpu = smp_processor_id();
if (!pinned && idle_cpu() && get_sysctl_timer_migration()) {
newcpu = get_nohz_load_balancer();
if (newcpu >= 0)
cpu = newcpu;
}
new_base = per_cpu(tvec_bases, cpu);
> + new_base = per_cpu(tvec_bases, preferred_cpu);
> +
> if (base != new_base) {
> /*
> * We are trying to schedule the timer on the local CPU.
> @@ -635,7 +642,8 @@ __mod_timer(struct timer_list *timer, un
> * handler yet has not finished. This also guarantees that
> * the timer is serialized wrt itself.
> */
> - if (likely(base->running_timer != timer)) {
> + if (likely(base->running_timer != timer) ||
> + get_sysctl_timer_migration()) {
No, that's wrong. We can not migrate a running timer ever.
> /* See the comment in lock_timer_base() */
> timer_set_base(timer, NULL);
> spin_unlock(&base->lock);
> @@ -1063,10 +1071,9 @@ cascade:
> * Check, if the next hrtimer event is before the next timer wheel
> * event:
> */
> -static unsigned long cmp_next_hrtimer_event(unsigned long now,
> - unsigned long expires)
> +static unsigned long __cmp_next_hrtimer_event(unsigned long now,
> + unsigned long expires, ktime_t hr_delta)
> {
> - ktime_t hr_delta = hrtimer_get_next_event();
> struct timespec tsdelta;
> unsigned long delta;
>
> @@ -1103,24 +1110,59 @@ static unsigned long cmp_next_hrtimer_ev
> return expires;
> }
>
> +static unsigned long cmp_next_hrtimer_event(unsigned long now,
> + unsigned long expires)
> +{
> + ktime_t hr_delta = hrtimer_get_next_event();
> + return __cmp_next_hrtimer_event(now, expires, hr_delta);
> +}
> +
> +static unsigned long cmp_next_hrtimer_event_on(unsigned long now,
> + unsigned long expires, int cpu)
> +{
> + ktime_t hr_delta = hrtimer_get_next_event_on(cpu);
> + return __cmp_next_hrtimer_event(now, expires, hr_delta);
> +}
> +
> +unsigned long __get_next_timer_interrupt(unsigned long now, int cpu)
> +{
> + struct tvec_base *base = per_cpu(tvec_bases, cpu);
> + unsigned long expires;
> +
> + spin_lock(&base->lock);
> + expires = __next_timer_interrupt(base);
> + spin_unlock(&base->lock);
> + return expires;
> +}
> +
> /**
> * get_next_timer_interrupt - return the jiffy of the next pending timer
> * @now: current time (in jiffies)
> */
> unsigned long get_next_timer_interrupt(unsigned long now)
> {
> - struct tvec_base *base = __get_cpu_var(tvec_bases);
> unsigned long expires;
> + int cpu = smp_processor_id();
>
> - spin_lock(&base->lock);
> - expires = __next_timer_interrupt(base);
> - spin_unlock(&base->lock);
> + expires = __get_next_timer_interrupt(now, cpu);
>
> if (time_before_eq(expires, now))
> return now;
>
> return cmp_next_hrtimer_event(now, expires);
> }
> +
> +unsigned long get_next_timer_interrupt_on(unsigned long now, int cpu)
> +{
> + unsigned long expires;
> +
> + expires = __get_next_timer_interrupt(now, cpu);
> +
> + if (time_before_eq(expires, now))
> + return now;
> +
> + return cmp_next_hrtimer_event_on(now, expires, cpu);
> +}
> #endif
What's the purpose of all those changes ? Just for the latency check ?
See below.
> /*
> Index: linux.trees.git/kernel/hrtimer.c
> ===================================================================
> --- linux.trees.git.orig/kernel/hrtimer.c
> +++ linux.trees.git/kernel/hrtimer.c
> @@ -43,6 +43,8 @@
> #include <linux/seq_file.h>
> #include <linux/err.h>
> #include <linux/debugobjects.h>
> +#include <linux/sched.h>
> +#include <linux/timer.h>
>
> #include <asm/uaccess.h>
>
> @@ -198,8 +200,16 @@ switch_hrtimer_base(struct hrtimer *time
> {
> struct hrtimer_clock_base *new_base;
> struct hrtimer_cpu_base *new_cpu_base;
> + int current_cpu, preferred_cpu;
> +
> + current_cpu = smp_processor_id();
> + preferred_cpu = get_nohz_load_balancer();
> + if (get_sysctl_timer_migration() && !pinned && preferred_cpu != -1 &&
> + check_hrtimer_latency(timer, preferred_cpu))
Comments from timer.c __mod_timer() apply here as well.
You have a cpu_idle check in timer.c, why not here ?
This check_hrtimer_latency business is ugly and I think we should
try to solve this different.
...
int cpu, tocpu = -1;
cpu = smp_processor_id();
if (preliminaries_for_migration) {
tocpu = get_nohz_load_balancer();
if (tocpu >= 0)
cpu = tocpu;
}
again:
new_cpu_base = &per_cpu(hrtimer_bases, cpu);
new_base = &new_cpu_base->clock_base[base->index];
if (base != new_base) {
if (unlikely(hrtimer_callback_running(timer)))
return base;
timer->base = NULL;
spin_unlock(&base->cpu_base->lock);
spin_lock(&new_base->cpu_base->lock);
timer->base = new_base;
if (cpu == tocpu) {
/* Calc clock monotonic expiry time */
ktime_t expires = ktime_sub(hrtimer_get_expires(timer), new_base->offset);
/*
* Get the next event on the target cpu from the clock events layer. This
* covers the highres=off nohz=on case as well.
*/
ktime_t next = clockevents_get_next_event(cpu);
ktime_t delta = ktime_sub(expires, next);
/*
* We do not migrate the timer when it is expiring before the next
* event on the target cpu because we can not reprogram the target
* cpu timer hardware and we would cause it to fire late.
*/
if (delta.tv64 < 0) {
cpu = smp_processor_id();
goto again;
}
/*
* We might add a check here which does not migrate the timer
* when it's expiry is very close, but that needs to be evaluated
* if it's really a problem. Again we can ask the clock events layer
* here when the next tick timer is due and compare against it to
* avoid an extra ktime_get() call.
* Probably it's not a problem as a possible wakeup of some task will
* push that task anyway to the preferred cpu, but we'll see.
*/
}
}
That way we avoid the whole poking in the timer wheel and adding /
modifying functions all over the place for no real value.
The information we are looking for is already there.
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-03 21:52 ` Thomas Gleixner
@ 2009-04-06 5:16 ` Arun R Bharadwaj
2009-04-06 10:42 ` Arun R Bharadwaj
0 siblings, 1 reply; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-06 5:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy, Arun Bharadwaj
* Thomas Gleixner <tglx@linutronix.de> [2009-04-03 23:52:51]:
Hi Thomas,
Thanks for the really detailed review of my patch.
I'll work on the things that you have suggested and get back with
the changes.
--arun
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-06 5:16 ` Arun R Bharadwaj
@ 2009-04-06 10:42 ` Arun R Bharadwaj
2009-04-06 10:56 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-06 10:42 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-06 10:46:56]:
This is a re-post of the patch after implementing the changes
suggested by Thomas.
---
This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.
While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So we compare the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrate the
hrtimer only if it expires *after* the next interrupt on the target cpu.
So, added a clockevents_get_next_event() helper function to return the
next_event on the target cpu's clock_event_device.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
include/linux/sched.h | 12 ++++++++++++
kernel/hrtimer.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
kernel/sched.c | 5 +++++
kernel/timer.c | 12 +++++++++++-
4 files changed, 77 insertions(+), 2 deletions(-)
Index: linux.trees.git/kernel/timer.c
===================================================================
--- linux.trees.git.orig/kernel/timer.c
+++ linux.trees.git/kernel/timer.c
@@ -37,6 +37,7 @@
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
+#include <linux/sched.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -606,7 +607,7 @@ __mod_timer(struct timer_list *timer, un
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret;
+ int ret, preferred_cpu, cpu;
ret = 0;
@@ -627,6 +628,15 @@ __mod_timer(struct timer_list *timer, un
new_base = __get_cpu_var(tvec_bases);
+ cpu = smp_processor_id();
+ if (get_sysctl_timer_migration() && idle_cpu(cpu) && !pinned) {
+ preferred_cpu = get_nohz_load_balancer();
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+
+ new_base = per_cpu(tvec_bases, cpu);
+
if (base != new_base) {
/*
* We are trying to schedule the timer on the local CPU.
Index: linux.trees.git/kernel/hrtimer.c
===================================================================
--- linux.trees.git.orig/kernel/hrtimer.c
+++ linux.trees.git/kernel/hrtimer.c
@@ -43,6 +43,8 @@
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/debugobjects.h>
+#include <linux/sched.h>
+#include <linux/timer.h>
#include <asm/uaccess.h>
@@ -189,6 +191,17 @@ struct hrtimer_clock_base *lock_hrtimer_
}
}
+ktime_t clockevents_get_next_event(int cpu)
+{
+ struct tick_device *td;
+ struct clock_event_device *dev;
+
+ td = &per_cpu(tick_cpu_device, cpu);
+ dev = td->evtdev;
+
+ return dev->next_event;
+}
+
/*
* Switch the timer base to the current CPU when possible.
*/
@@ -198,8 +211,17 @@ switch_hrtimer_base(struct hrtimer *time
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
+ int cpu, preferred_cpu = -1;
+
+ cpu = smp_processor_id();
+ if (get_sysctl_timer_migration() && !pinned && idle_cpu(cpu)) {
+ preferred_cpu = get_nohz_load_balancer();
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
- new_cpu_base = &__get_cpu_var(hrtimer_bases);
+again:
+ new_cpu_base = &per_cpu(hrtimer_bases, cpu);
new_base = &new_cpu_base->clock_base[base->index];
if (base != new_base) {
@@ -220,6 +242,32 @@ switch_hrtimer_base(struct hrtimer *time
spin_unlock(&base->cpu_base->lock);
spin_lock(&new_base->cpu_base->lock);
timer->base = new_base;
+
+ if (cpu == preferred_cpu) {
+ /* Calculate clock monotonic expiry time */
+ ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
+ new_base->offset);
+
+ /*
+ * Get the next event on target cpu from the
+ * clock events layer.
+ * This covers the highres=off nohz=on case as well.
+ */
+ ktime_t next = clockevents_get_next_event(cpu);
+
+ ktime_t delta = ktime_sub(expires, next);
+
+ /*
+ * We do not migrate the timer when it is expiring
+ * before the next event on the target cpu because
+ * we cannot reprogram the target cpu hardware and
+ * we would cause it to fire late.
+ */
+ if (delta.tv64 < 0) {
+ cpu = smp_processor_id();
+ goto again;
+ }
+ }
}
return new_base;
}
Index: linux.trees.git/include/linux/sched.h
===================================================================
--- linux.trees.git.orig/include/linux/sched.h
+++ linux.trees.git/include/linux/sched.h
@@ -265,6 +265,7 @@ static inline int select_nohz_load_balan
}
#endif
+extern int get_nohz_load_balancer(void);
/*
* Only dump TASK_* tasks. (0 for all tasks)
*/
@@ -1769,6 +1770,17 @@ int sched_nr_latency_handler(struct ctl_
struct file *file, void __user *buffer, size_t *length,
loff_t *ppos);
#endif
+#ifdef CONFIG_SCHED_DEBUG
+static inline int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif
extern unsigned int sysctl_sched_rt_period;
extern int sysctl_sched_rt_runtime;
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c
+++ linux.trees.git/kernel/sched.c
@@ -4009,6 +4009,11 @@ static struct {
.load_balancer = ATOMIC_INIT(-1),
};
+int get_nohz_load_balancer(void)
+{
+ return atomic_read(&nohz.load_balancer);
+}
+
/*
* This routine will try to nominate the ilb (idle load balancing)
* owner among the cpus whose ticks are stopped. ilb owner will do the idle
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-06 10:42 ` Arun R Bharadwaj
@ 2009-04-06 10:56 ` Thomas Gleixner
2009-04-06 15:28 ` Arun R Bharadwaj
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2009-04-06 10:56 UTC (permalink / raw)
To: Arun R Bharadwaj
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy
Arun,
On Mon, 6 Apr 2009, Arun R Bharadwaj wrote:
>
> +ktime_t clockevents_get_next_event(int cpu)
> +{
> + struct tick_device *td;
> + struct clock_event_device *dev;
> +
> + td = &per_cpu(tick_cpu_device, cpu);
> + dev = td->evtdev;
> +
> + return dev->next_event;
> +}
> +
Preferrably this function should be in the clock events code and a
stub inline function which returns KTIME_MAX for non clock events
archs is probably necessary as well.
> /*
> * Switch the timer base to the current CPU when possible.
> */
> @@ -198,8 +211,17 @@ switch_hrtimer_base(struct hrtimer *time
> {
> struct hrtimer_clock_base *new_base;
> struct hrtimer_cpu_base *new_cpu_base;
> + int cpu, preferred_cpu = -1;
> +
> + cpu = smp_processor_id();
> + if (get_sysctl_timer_migration() && !pinned && idle_cpu(cpu)) {
> + preferred_cpu = get_nohz_load_balancer();
> + if (preferred_cpu >= 0)
> + cpu = preferred_cpu;
> + }
>
> - new_cpu_base = &__get_cpu_var(hrtimer_bases);
> +again:
> + new_cpu_base = &per_cpu(hrtimer_bases, cpu);
> new_base = &new_cpu_base->clock_base[base->index];
>
> if (base != new_base) {
> @@ -220,6 +242,32 @@ switch_hrtimer_base(struct hrtimer *time
> spin_unlock(&base->cpu_base->lock);
> spin_lock(&new_base->cpu_base->lock);
> timer->base = new_base;
> +
> + if (cpu == preferred_cpu) {
> + /* Calculate clock monotonic expiry time */
> + ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
> + new_base->offset);
> +
> + /*
> + * Get the next event on target cpu from the
> + * clock events layer.
> + * This covers the highres=off nohz=on case as well.
> + */
> + ktime_t next = clockevents_get_next_event(cpu);
> +
> + ktime_t delta = ktime_sub(expires, next);
> +
> + /*
> + * We do not migrate the timer when it is expiring
> + * before the next event on the target cpu because
> + * we cannot reprogram the target cpu hardware and
> + * we would cause it to fire late.
> + */
> + if (delta.tv64 < 0) {
> + cpu = smp_processor_id();
You are missing a small but fatal detail here: You hold
new_base->cpu_base->lock. So you need to do:
spin_unlock(&new_base->cpu_base->lock);
spin_lock(&base->cpu_base->lock);
> + goto again;
> + }
Also you need to move
> timer->base = new_base;
here to avoid a stale timer->base setting.
> + }
> }
> return new_base;
> }
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-06 10:56 ` Thomas Gleixner
@ 2009-04-06 15:28 ` Arun R Bharadwaj
2009-04-06 15:31 ` Arun R Bharadwaj
2009-04-06 15:35 ` Thomas Gleixner
0 siblings, 2 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-06 15:28 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy, Arun Bharadwaj
* Thomas Gleixner <tglx@linutronix.de> [2009-04-06 12:56:17]:
> Arun,
>
> On Mon, 6 Apr 2009, Arun R Bharadwaj wrote:
> >
> > +ktime_t clockevents_get_next_event(int cpu)
> > +{
> > + struct tick_device *td;
> > + struct clock_event_device *dev;
> > +
> > + td = &per_cpu(tick_cpu_device, cpu);
> > + dev = td->evtdev;
> > +
> > + return dev->next_event;
> > +}
> > +
>
> Preferrably this function should be in the clock events code and a
> stub inline function which returns KTIME_MAX for non clock events
> archs is probably necessary as well.
>
Sure.
> > /*
> > * Switch the timer base to the current CPU when possible.
> > */
> > @@ -198,8 +211,17 @@ switch_hrtimer_base(struct hrtimer *time
> > {
> > struct hrtimer_clock_base *new_base;
> > struct hrtimer_cpu_base *new_cpu_base;
> > + int cpu, preferred_cpu = -1;
> > +
> > + cpu = smp_processor_id();
> > + if (get_sysctl_timer_migration() && !pinned && idle_cpu(cpu)) {
> > + preferred_cpu = get_nohz_load_balancer();
> > + if (preferred_cpu >= 0)
> > + cpu = preferred_cpu;
> > + }
> >
> > - new_cpu_base = &__get_cpu_var(hrtimer_bases);
> > +again:
> > + new_cpu_base = &per_cpu(hrtimer_bases, cpu);
> > new_base = &new_cpu_base->clock_base[base->index];
> >
> > if (base != new_base) {
> > @@ -220,6 +242,32 @@ switch_hrtimer_base(struct hrtimer *time
> > spin_unlock(&base->cpu_base->lock);
> > spin_lock(&new_base->cpu_base->lock);
> > timer->base = new_base;
> > +
> > + if (cpu == preferred_cpu) {
> > + /* Calculate clock monotonic expiry time */
> > + ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
> > + new_base->offset);
> > +
> > + /*
> > + * Get the next event on target cpu from the
> > + * clock events layer.
> > + * This covers the highres=off nohz=on case as well.
> > + */
> > + ktime_t next = clockevents_get_next_event(cpu);
> > +
> > + ktime_t delta = ktime_sub(expires, next);
> > +
> > + /*
> > + * We do not migrate the timer when it is expiring
> > + * before the next event on the target cpu because
> > + * we cannot reprogram the target cpu hardware and
> > + * we would cause it to fire late.
> > + */
> > + if (delta.tv64 < 0) {
> > + cpu = smp_processor_id();
>
> You are missing a small but fatal detail here: You hold
> new_base->cpu_base->lock. So you need to do:
>
I just moved the if block.. if (cpu==preferred_cpu) above the base
locking part to avoid the extra unlocking.
> spin_unlock(&new_base->cpu_base->lock);
> spin_lock(&base->cpu_base->lock);
>
> > + goto again;
> > + }
>
> Also you need to move
>
> > timer->base = new_base;
>
> here to avoid a stale timer->base setting.
>
The above takes care of this as well.
--arun
> > + }
> > }
> > return new_base;
> > }
>
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-06 15:28 ` Arun R Bharadwaj
@ 2009-04-06 15:31 ` Arun R Bharadwaj
2009-04-06 15:35 ` Thomas Gleixner
1 sibling, 0 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-06 15:31 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy, Arun Bharadwaj
* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-06 20:58:43]:
This is a repost of [PATCH 4/4] in the series.
thanks,
arun
---
This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.
While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So we compare the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrate the
hrtimer only if it expires *after* the next interrupt on the target cpu.
So, added a clockevents_get_next_event() helper function to return the
next_event on the target cpu's clock_event_device.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
---
include/linux/clockchips.h | 9 +++++++++
include/linux/sched.h | 12 ++++++++++++
kernel/hrtimer.c | 40 +++++++++++++++++++++++++++++++++++++++-
kernel/sched.c | 5 +++++
kernel/time/clockevents.c | 14 ++++++++++++++
kernel/timer.c | 12 +++++++++++-
6 files changed, 90 insertions(+), 2 deletions(-)
Index: linux.trees.git/kernel/timer.c
===================================================================
--- linux.trees.git.orig/kernel/timer.c
+++ linux.trees.git/kernel/timer.c
@@ -37,6 +37,7 @@
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
+#include <linux/sched.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -606,7 +607,7 @@ __mod_timer(struct timer_list *timer, un
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret;
+ int ret, preferred_cpu, cpu;
ret = 0;
@@ -627,6 +628,15 @@ __mod_timer(struct timer_list *timer, un
new_base = __get_cpu_var(tvec_bases);
+ cpu = smp_processor_id();
+ if (get_sysctl_timer_migration() && idle_cpu(cpu) && !pinned) {
+ preferred_cpu = get_nohz_load_balancer();
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+
+ new_base = per_cpu(tvec_bases, cpu);
+
if (base != new_base) {
/*
* We are trying to schedule the timer on the local CPU.
Index: linux.trees.git/kernel/hrtimer.c
===================================================================
--- linux.trees.git.orig/kernel/hrtimer.c
+++ linux.trees.git/kernel/hrtimer.c
@@ -43,6 +43,8 @@
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/debugobjects.h>
+#include <linux/sched.h>
+#include <linux/timer.h>
#include <asm/uaccess.h>
@@ -198,8 +200,17 @@ switch_hrtimer_base(struct hrtimer *time
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
+ int cpu, preferred_cpu = -1;
- new_cpu_base = &__get_cpu_var(hrtimer_bases);
+ cpu = smp_processor_id();
+ if (get_sysctl_timer_migration() && !pinned && idle_cpu(cpu)) {
+ preferred_cpu = get_nohz_load_balancer();
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+
+again:
+ new_cpu_base = &per_cpu(hrtimer_bases, cpu);
new_base = &new_cpu_base->clock_base[base->index];
if (base != new_base) {
@@ -215,11 +226,38 @@ switch_hrtimer_base(struct hrtimer *time
if (unlikely(hrtimer_callback_running(timer)))
return base;
+ if (cpu == preferred_cpu) {
+ /* Calculate clock monotonic expiry time */
+ ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
+ new_base->offset);
+
+ /*
+ * Get the next event on target cpu from the
+ * clock events layer.
+ * This covers the highres=off nohz=on case as well.
+ */
+ ktime_t next = clockevents_get_next_event(cpu);
+
+ ktime_t delta = ktime_sub(expires, next);
+
+ /*
+ * We do not migrate the timer when it is expiring
+ * before the next event on the target cpu because
+ * we cannot reprogram the target cpu hardware and
+ * we would cause it to fire late.
+ */
+ if (delta.tv64 < 0) {
+ cpu = smp_processor_id();
+ goto again;
+ }
+ }
+
/* See the comment in lock_timer_base() */
timer->base = NULL;
spin_unlock(&base->cpu_base->lock);
spin_lock(&new_base->cpu_base->lock);
timer->base = new_base;
+
}
return new_base;
}
Index: linux.trees.git/include/linux/sched.h
===================================================================
--- linux.trees.git.orig/include/linux/sched.h
+++ linux.trees.git/include/linux/sched.h
@@ -265,6 +265,7 @@ static inline int select_nohz_load_balan
}
#endif
+extern int get_nohz_load_balancer(void);
/*
* Only dump TASK_* tasks. (0 for all tasks)
*/
@@ -1769,6 +1770,17 @@ int sched_nr_latency_handler(struct ctl_
struct file *file, void __user *buffer, size_t *length,
loff_t *ppos);
#endif
+#ifdef CONFIG_SCHED_DEBUG
+static inline int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif
extern unsigned int sysctl_sched_rt_period;
extern int sysctl_sched_rt_runtime;
Index: linux.trees.git/kernel/sched.c
===================================================================
--- linux.trees.git.orig/kernel/sched.c
+++ linux.trees.git/kernel/sched.c
@@ -4009,6 +4009,11 @@ static struct {
.load_balancer = ATOMIC_INIT(-1),
};
+int get_nohz_load_balancer(void)
+{
+ return atomic_read(&nohz.load_balancer);
+}
+
/*
* This routine will try to nominate the ilb (idle load balancing)
* owner among the cpus whose ticks are stopped. ilb owner will do the idle
Index: linux.trees.git/kernel/time/clockevents.c
===================================================================
--- linux.trees.git.orig/kernel/time/clockevents.c
+++ linux.trees.git/kernel/time/clockevents.c
@@ -18,6 +18,7 @@
#include <linux/notifier.h>
#include <linux/smp.h>
#include <linux/sysdev.h>
+#include <linux/tick.h>
/* The registered clock event devices */
static LIST_HEAD(clockevent_devices);
@@ -252,3 +253,16 @@ void clockevents_notify(unsigned long re
}
EXPORT_SYMBOL_GPL(clockevents_notify);
#endif
+
+#ifdef CONFIG_GENERIC_CLOCKEVENTS
+ktime_t clockevents_get_next_event(int cpu)
+{
+ struct tick_device *td;
+ struct clock_event_device *dev;
+
+ td = &per_cpu(tick_cpu_device, cpu);
+ dev = td->evtdev;
+
+ return dev->next_event;
+}
+#endif
Index: linux.trees.git/include/linux/clockchips.h
===================================================================
--- linux.trees.git.orig/include/linux/clockchips.h
+++ linux.trees.git/include/linux/clockchips.h
@@ -143,3 +143,12 @@ extern void clockevents_notify(unsigned
#endif
#endif
+
+#ifdef CONFIG_GENERIC_CLOCKEVENTS
+extern ktime_t clockevents_get_next_event(int cpu);
+#else
+static inline ktime_t clockevents_get_next_event(int cpu)
+{
+ return KTIME_MAX;
+}
+#endif
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-06 15:28 ` Arun R Bharadwaj
2009-04-06 15:31 ` Arun R Bharadwaj
@ 2009-04-06 15:35 ` Thomas Gleixner
2009-04-06 16:00 ` Arun R Bharadwaj
1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2009-04-06 15:35 UTC (permalink / raw)
To: Arun R Bharadwaj
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy
Arun,
On Mon, 6 Apr 2009, Arun R Bharadwaj wrote:
> > > - new_cpu_base = &__get_cpu_var(hrtimer_bases);
> > > +again:
> > > + new_cpu_base = &per_cpu(hrtimer_bases, cpu);
> > > new_base = &new_cpu_base->clock_base[base->index];
> > >
> > > if (base != new_base) {
> > > @@ -220,6 +242,32 @@ switch_hrtimer_base(struct hrtimer *time
> > > spin_unlock(&base->cpu_base->lock);
> > > spin_lock(&new_base->cpu_base->lock);
> > > timer->base = new_base;
> > > +
> > > + if (cpu == preferred_cpu) {
> > > + /* Calculate clock monotonic expiry time */
> > > + ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
> > > + new_base->offset);
> > > +
> > > + /*
> > > + * Get the next event on target cpu from the
> > > + * clock events layer.
> > > + * This covers the highres=off nohz=on case as well.
> > > + */
> > > + ktime_t next = clockevents_get_next_event(cpu);
> > > +
> > > + ktime_t delta = ktime_sub(expires, next);
> > > +
> > > + /*
> > > + * We do not migrate the timer when it is expiring
> > > + * before the next event on the target cpu because
> > > + * we cannot reprogram the target cpu hardware and
> > > + * we would cause it to fire late.
> > > + */
> > > + if (delta.tv64 < 0) {
> > > + cpu = smp_processor_id();
> >
> > You are missing a small but fatal detail here: You hold
> > new_base->cpu_base->lock. So you need to do:
> >
>
> I just moved the if block.. if (cpu==preferred_cpu) above the base
> locking part to avoid the extra unlocking.
That's not a good idea. You want to look at the next event with the
base lock of the other cpu held. That prevents that expires the first
pending timer right after you checked next_event and before you queue
your timer, which then might become the first timer to expire but you
can't reprogram the clock event device on the other cpu.
Thanks,
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v4 RFC PATCH 4/4] timers: logic to move non pinned timers
2009-04-06 15:35 ` Thomas Gleixner
@ 2009-04-06 16:00 ` Arun R Bharadwaj
0 siblings, 0 replies; 16+ messages in thread
From: Arun R Bharadwaj @ 2009-04-06 16:00 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, linux-pm, a.p.zijlstra, ego, mingo, andi,
venkatesh.pallipadi, vatsa, arjan, svaidy, Arun Bharadwaj
* Thomas Gleixner <tglx@linutronix.de> [2009-04-06 17:35:09]:
> Arun,
>
> On Mon, 6 Apr 2009, Arun R Bharadwaj wrote:
> > > > - new_cpu_base = &__get_cpu_var(hrtimer_bases);
> > > > +again:
> > > > + new_cpu_base = &per_cpu(hrtimer_bases, cpu);
> > > > new_base = &new_cpu_base->clock_base[base->index];
> > > >
> > > > if (base != new_base) {
> > > > @@ -220,6 +242,32 @@ switch_hrtimer_base(struct hrtimer *time
> > > > spin_unlock(&base->cpu_base->lock);
> > > > spin_lock(&new_base->cpu_base->lock);
> > > > timer->base = new_base;
> > > > +
> > > > + if (cpu == preferred_cpu) {
> > > > + /* Calculate clock monotonic expiry time */
> > > > + ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
> > > > + new_base->offset);
> > > > +
> > > > + /*
> > > > + * Get the next event on target cpu from the
> > > > + * clock events layer.
> > > > + * This covers the highres=off nohz=on case as well.
> > > > + */
> > > > + ktime_t next = clockevents_get_next_event(cpu);
> > > > +
> > > > + ktime_t delta = ktime_sub(expires, next);
> > > > +
> > > > + /*
> > > > + * We do not migrate the timer when it is expiring
> > > > + * before the next event on the target cpu because
> > > > + * we cannot reprogram the target cpu hardware and
> > > > + * we would cause it to fire late.
> > > > + */
> > > > + if (delta.tv64 < 0) {
> > > > + cpu = smp_processor_id();
> > >
> > > You are missing a small but fatal detail here: You hold
> > > new_base->cpu_base->lock. So you need to do:
> > >
> >
> > I just moved the if block.. if (cpu==preferred_cpu) above the base
> > locking part to avoid the extra unlocking.
>
> That's not a good idea. You want to look at the next event with the
> base lock of the other cpu held. That prevents that expires the first
> pending timer right after you checked next_event and before you queue
> your timer, which then might become the first timer to expire but you
> can't reprogram the clock event device on the other cpu.
>
Okay... I understand. Thanks for the explanation :)
Will post all the changes together in a separate thread.
--arun
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-04-06 16:02 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-01 11:31 [v4 RFC PATCH 0/4] timers: Framework for migration of timers Arun R Bharadwaj
2009-04-01 11:32 ` [v4 RFC PATCH 1/4] timers: Framework for identifying pinned timers Arun R Bharadwaj
2009-04-01 11:41 ` Andi Kleen
2009-04-02 5:09 ` Arun R Bharadwaj
2009-04-01 11:34 ` [v4 RFC PATCH 2/4] timers: Identifying the existing " Arun R Bharadwaj
2009-04-01 11:36 ` [v4 RFC PATCH 3/4] timers: /proc/sys sysctl hook to enable timer migration Arun R Bharadwaj
2009-04-01 11:37 ` [v4 RFC PATCH 4/4] timers: logic to move non pinned timers Arun R Bharadwaj
2009-04-01 11:46 ` Arun R Bharadwaj
2009-04-03 21:52 ` Thomas Gleixner
2009-04-06 5:16 ` Arun R Bharadwaj
2009-04-06 10:42 ` Arun R Bharadwaj
2009-04-06 10:56 ` Thomas Gleixner
2009-04-06 15:28 ` Arun R Bharadwaj
2009-04-06 15:31 ` Arun R Bharadwaj
2009-04-06 15:35 ` Thomas Gleixner
2009-04-06 16:00 ` Arun R Bharadwaj
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox