* [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states
@ 2013-07-26 5:13 Preeti U Murthy
2013-07-26 5:15 ` [Resend RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC) Preeti U Murthy
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-07-26 5:13 UTC (permalink / raw)
To: benh, paul.gortmaker, paulus, shangw, galak, fweisbec, paulmck,
michael, arnd, linux-pm, rostedt, rjw, john.stultz, tglx,
chenhui.zhao, deepthi, geoff, linux-kernel, srivatsa.bhat,
schwidefsky, svaidy, linuxppc-dev
On PowerPC, when CPUs enter deep idle states, their local timers are
switched off. The responsibility of waking them up at their next timer event,
needs to be handed over to an external device. On PowerPC, we do not have an
external device equivalent to HPET, which is currently done on architectures
like x86. Instead we assign the local timer of one of the CPUs to do this
job.
This patchset is an attempt to make use of the existing timer broadcast
framework in the kernel to meet the above requirement, except that the tick
broadcast device is the local timer of the boot CPU.
This patch series is ported ontop of 3.11-rc1 + the cpuidle driver backend
for powernv posted by Deepthi Dharwar recently. The current design and
implementation supports the ONESHOT tick mode. It does not yet support
the PERIODIC tick mode. This patch is tested with NOHZ_FULL off.
Patch[1/5], Patch[2/5]: optimize the broadcast mechanism on ppc.
Patch[3/5]: Introduces the core of the timer offload framework on powerpc.
Patch[4/5]: The cpu doing the broadcast should not go into tickless idle.
Patch[5/5]: Add a deep idle state to the cpuidle state table on powernv.
Patch[5/5] is the patch that ultimately makes use of the timer offload
framework that the patches Patch[1/5] to Patch[4/5] build.
This patch series is being resent to clarify certain ambiguity in the patch
descriptions from the previous post. Discussion around this:
https://lkml.org/lkml/2013/7/25/754
---
Preeti U Murthy (3):
cpuidle/ppc: Add timer offload framework to support deep idle states
cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints
cpuidle/ppc: Add longnap state to the idle states on powernv
Srivatsa S. Bhat (2):
powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC)
powerpc: Implement broadcast timer interrupt as an IPI message
arch/powerpc/include/asm/smp.h | 3 +
arch/powerpc/include/asm/time.h | 3 +
arch/powerpc/kernel/smp.c | 23 ++++--
arch/powerpc/kernel/time.c | 86 +++++++++++++++++++++++
arch/powerpc/platforms/cell/interrupt.c | 2 -
arch/powerpc/platforms/powernv/Kconfig | 1
arch/powerpc/platforms/powernv/processor_idle.c | 48 +++++++++++++
arch/powerpc/platforms/ps3/smp.c | 2 -
kernel/time/tick-sched.c | 7 ++
9 files changed, 163 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Resend RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC)
2013-07-26 5:13 [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
@ 2013-07-26 5:15 ` Preeti U Murthy
2013-07-26 5:16 ` [Resend RFC PATCH 2/5] powerpc: Implement broadcast timer interrupt as an IPI message Preeti U Murthy
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-07-26 5:15 UTC (permalink / raw)
To: benh, paul.gortmaker, paulus, shangw, galak, fweisbec, paulmck,
michael, arnd, linux-pm, rostedt, rjw, john.stultz, tglx,
chenhui.zhao, deepthi, geoff, linux-kernel, srivatsa.bhat,
schwidefsky, svaidy, linuxppc-dev
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE
map to a common implementation - generic_smp_call_function_single_interrupt().
So, we can consolidate them and save one of the IPI message slots, (which are
precious, since only 4 of those slots are available).
So, implement the functionality of PPC_MSG_CALL_FUNC using
PPC_MSG_CALL_FUNC_SINGLE itself and release its IPI message slot, so that it
can be used for something else in the future, if desired.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/smp.h | 2 +-
arch/powerpc/kernel/smp.c | 12 +++++-------
arch/powerpc/platforms/cell/interrupt.c | 2 +-
arch/powerpc/platforms/ps3/smp.c | 2 +-
4 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ffbaabe..51bf017 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -117,7 +117,7 @@ extern int cpu_to_core_id(int cpu);
*
* Make sure this matches openpic_request_IPIs in open_pic.c, or what shows up
* in /proc/interrupts will be wrong!!! --Troy */
-#define PPC_MSG_CALL_FUNCTION 0
+#define PPC_MSG_UNUSED 0
#define PPC_MSG_RESCHEDULE 1
#define PPC_MSG_CALL_FUNC_SINGLE 2
#define PPC_MSG_DEBUGGER_BREAK 3
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 38b0ba6..bc41e9f 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -111,9 +111,9 @@ int smp_generic_kick_cpu(int nr)
}
#endif /* CONFIG_PPC64 */
-static irqreturn_t call_function_action(int irq, void *data)
+static irqreturn_t unused_action(int irq, void *data)
{
- generic_smp_call_function_interrupt();
+ /* This slot is unused and hence available for use, if needed */
return IRQ_HANDLED;
}
@@ -144,14 +144,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
}
static irq_handler_t smp_ipi_action[] = {
- [PPC_MSG_CALL_FUNCTION] = call_function_action,
+ [PPC_MSG_UNUSED] = unused_action, /* Slot available for future use */
[PPC_MSG_RESCHEDULE] = reschedule_action,
[PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
};
const char *smp_ipi_name[] = {
- [PPC_MSG_CALL_FUNCTION] = "ipi call function",
+ [PPC_MSG_UNUSED] = "ipi unused",
[PPC_MSG_RESCHEDULE] = "ipi reschedule",
[PPC_MSG_CALL_FUNC_SINGLE] = "ipi call function single",
[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
@@ -221,8 +221,6 @@ irqreturn_t smp_ipi_demux(void)
all = xchg(&info->messages, 0);
#ifdef __BIG_ENDIAN
- if (all & (1 << (24 - 8 * PPC_MSG_CALL_FUNCTION)))
- generic_smp_call_function_interrupt();
if (all & (1 << (24 - 8 * PPC_MSG_RESCHEDULE)))
scheduler_ipi();
if (all & (1 << (24 - 8 * PPC_MSG_CALL_FUNC_SINGLE)))
@@ -265,7 +263,7 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
unsigned int cpu;
for_each_cpu(cpu, mask)
- do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
+ do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
}
#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index 2d42f3b..28166e4 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -213,7 +213,7 @@ static void iic_request_ipi(int msg)
void iic_request_IPIs(void)
{
- iic_request_ipi(PPC_MSG_CALL_FUNCTION);
+ iic_request_ipi(PPC_MSG_UNUSED);
iic_request_ipi(PPC_MSG_RESCHEDULE);
iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 4b35166..488f069 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -74,7 +74,7 @@ static int __init ps3_smp_probe(void)
* to index needs to be setup.
*/
- BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION != 0);
+ BUILD_BUG_ON(PPC_MSG_UNUSED != 0);
BUILD_BUG_ON(PPC_MSG_RESCHEDULE != 1);
BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK != 3);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Resend RFC PATCH 2/5] powerpc: Implement broadcast timer interrupt as an IPI message
2013-07-26 5:13 [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
2013-07-26 5:15 ` [Resend RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC) Preeti U Murthy
@ 2013-07-26 5:16 ` Preeti U Murthy
2013-07-26 5:17 ` [Resend RFC PATCH 3/5] cpuidle/ppc: Add timer offload framework to support deep idle states Preeti U Murthy
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-07-26 5:16 UTC (permalink / raw)
To: benh, paul.gortmaker, paulus, shangw, galak, fweisbec, paulmck,
michael, arnd, linux-pm, rostedt, rjw, john.stultz, tglx,
chenhui.zhao, deepthi, geoff, linux-kernel, srivatsa.bhat,
schwidefsky, svaidy, linuxppc-dev
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
For scalability and performance reasons, we want the broadcast timer
interrupts to be handled as efficiently as possible. Fixed IPI messages
are one of the most efficient mechanisms available - they are faster
than the smp_call_function mechanism because the IPI handlers are fixed
and hence they don't involve costly operations such as adding IPI handlers
to the target CPU's function queue, acquiring locks for synchronization etc.
Luckily we have an unused IPI message slot, so use that to implement
broadcast timer interrupts efficiently.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/smp.h | 3 ++-
arch/powerpc/kernel/smp.c | 19 +++++++++++++++----
arch/powerpc/platforms/cell/interrupt.c | 2 +-
arch/powerpc/platforms/ps3/smp.c | 2 +-
4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 51bf017..d877b69 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -117,7 +117,7 @@ extern int cpu_to_core_id(int cpu);
*
* Make sure this matches openpic_request_IPIs in open_pic.c, or what shows up
* in /proc/interrupts will be wrong!!! --Troy */
-#define PPC_MSG_UNUSED 0
+#define PPC_MSG_TIMER 0
#define PPC_MSG_RESCHEDULE 1
#define PPC_MSG_CALL_FUNC_SINGLE 2
#define PPC_MSG_DEBUGGER_BREAK 3
@@ -190,6 +190,7 @@ extern struct smp_ops_t *smp_ops;
extern void arch_send_call_function_single_ipi(int cpu);
extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
+extern void arch_send_tick_broadcast(const struct cpumask *mask);
/* Definitions relative to the secondary CPU spin loop
* and entry point. Not all of them exist on both 32 and
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index bc41e9f..6a68ca4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -35,6 +35,7 @@
#include <asm/ptrace.h>
#include <linux/atomic.h>
#include <asm/irq.h>
+#include <asm/hw_irq.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/prom.h>
@@ -111,9 +112,9 @@ int smp_generic_kick_cpu(int nr)
}
#endif /* CONFIG_PPC64 */
-static irqreturn_t unused_action(int irq, void *data)
+static irqreturn_t timer_action(int irq, void *data)
{
- /* This slot is unused and hence available for use, if needed */
+ timer_interrupt();
return IRQ_HANDLED;
}
@@ -144,14 +145,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
}
static irq_handler_t smp_ipi_action[] = {
- [PPC_MSG_UNUSED] = unused_action, /* Slot available for future use */
+ [PPC_MSG_TIMER] = timer_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
[PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
};
const char *smp_ipi_name[] = {
- [PPC_MSG_UNUSED] = "ipi unused",
+ [PPC_MSG_TIMER] = "ipi timer",
[PPC_MSG_RESCHEDULE] = "ipi reschedule",
[PPC_MSG_CALL_FUNC_SINGLE] = "ipi call function single",
[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
@@ -221,6 +222,8 @@ irqreturn_t smp_ipi_demux(void)
all = xchg(&info->messages, 0);
#ifdef __BIG_ENDIAN
+ if (all & (1 << (24 - 8 * PPC_MSG_TIMER)))
+ timer_interrupt();
if (all & (1 << (24 - 8 * PPC_MSG_RESCHEDULE)))
scheduler_ipi();
if (all & (1 << (24 - 8 * PPC_MSG_CALL_FUNC_SINGLE)))
@@ -266,6 +269,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
}
+void arch_send_tick_broadcast(const struct cpumask *mask)
+{
+ unsigned int cpu;
+
+ for_each_cpu(cpu, mask)
+ do_message_pass(cpu, PPC_MSG_TIMER);
+}
+
#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
void smp_send_debugger_break(void)
{
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index 28166e4..1359113 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -213,7 +213,7 @@ static void iic_request_ipi(int msg)
void iic_request_IPIs(void)
{
- iic_request_ipi(PPC_MSG_UNUSED);
+ iic_request_ipi(PPC_MSG_TIMER);
iic_request_ipi(PPC_MSG_RESCHEDULE);
iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 488f069..5cb742a 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -74,7 +74,7 @@ static int __init ps3_smp_probe(void)
* to index needs to be setup.
*/
- BUILD_BUG_ON(PPC_MSG_UNUSED != 0);
+ BUILD_BUG_ON(PPC_MSG_TIMER != 0);
BUILD_BUG_ON(PPC_MSG_RESCHEDULE != 1);
BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK != 3);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Resend RFC PATCH 3/5] cpuidle/ppc: Add timer offload framework to support deep idle states
2013-07-26 5:13 [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
2013-07-26 5:15 ` [Resend RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC) Preeti U Murthy
2013-07-26 5:16 ` [Resend RFC PATCH 2/5] powerpc: Implement broadcast timer interrupt as an IPI message Preeti U Murthy
@ 2013-07-26 5:17 ` Preeti U Murthy
2013-07-26 5:19 ` [Resend RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints Preeti U Murthy
2013-07-26 5:21 ` [Resend RFC PATCH 5/5] cpuidle/ppc: Add longnap state to the idle states on powernv Preeti U Murthy
4 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-07-26 5:17 UTC (permalink / raw)
To: benh, paul.gortmaker, paulus, shangw, galak, fweisbec, paulmck,
michael, arnd, linux-pm, rostedt, rjw, john.stultz, tglx,
chenhui.zhao, deepthi, geoff, linux-kernel, srivatsa.bhat,
schwidefsky, svaidy, linuxppc-dev
On ppc, in deep idle states, the local clock event device of CPUs gets
switched off. On PowerPC, the local clock event device is called the
decrementer. Make use of the broadcast framework to issue interrupts to
cpus in deep idle states on their timer events, except that on ppc, we
do not have an external device such as HPET, but we use the decrementer
of one of the CPUs itself as the broadcast device.
Instantiate two different clock event devices, one representing the
decrementer and another representing the broadcast device for each cpu.
The cpu which registers its broadcast device will be responsible for
performing the function of issuing timer interrupts to CPUs in deep idle
states, and is referred to as the broadcast cpu in the changelogs of this
patchset for convenience. Such a CPU is not allowed to enter deep idle
states, where the decrementer is switched off.
For now, only the boot cpu's broadcast device gets registered as a clock event
device along with the decrementer. Hence this is the broadcast cpu.
On the broadcast cpu, on each timer interrupt, apart from the regular local
timer event handler the broadcast handler is also called. We avoid the overhead
of programming the decrementer specifically for a broadcast event. The reason is for
performance and scalability reasons. Say cpuX goes to deep idle state. It
has to ask the broadcast CPU to reprogram its(broadcast CPU's) decrementer for
the next local timer event of cpuX. cpuX can do so only by sending an IPI to the
broadcast CPU. With many more cpus going to deep idle, this model of sending
IPIs each time will result in performance bottleneck and may not scale well.
Apart from this there is no change in the way broadcast is handled today. On
a broadcast ipi the event handler for a timer interrupt is called on the cpu
in deep idle state to handle the local events.
The current design and implementation of the timer offload framework supports
the ONESHOT tick mode but not the PERIODIC mode.
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/time.h | 3 +
arch/powerpc/kernel/smp.c | 4 +-
arch/powerpc/kernel/time.c | 81 ++++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/Kconfig | 1
4 files changed, 86 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..936be0d 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -24,14 +24,17 @@ extern unsigned long tb_ticks_per_jiffy;
extern unsigned long tb_ticks_per_usec;
extern unsigned long tb_ticks_per_sec;
extern struct clock_event_device decrementer_clockevent;
+extern struct clock_event_device broadcast_clockevent;
struct rtc_time;
extern void to_tm(int tim, struct rtc_time * tm);
extern void GregorianDay(struct rtc_time *tm);
+extern void decrementer_timer_interrupt(void);
extern void generic_calibrate_decr(void);
extern void set_dec_cpu6(unsigned int val);
+extern int bc_cpu;
/* Some sane defaults: 125 MHz timebase, 1GHz processor */
extern unsigned long ppc_proc_freq;
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6a68ca4..d3b7014 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -114,7 +114,7 @@ int smp_generic_kick_cpu(int nr)
static irqreturn_t timer_action(int irq, void *data)
{
- timer_interrupt();
+ decrementer_timer_interrupt();
return IRQ_HANDLED;
}
@@ -223,7 +223,7 @@ irqreturn_t smp_ipi_demux(void)
#ifdef __BIG_ENDIAN
if (all & (1 << (24 - 8 * PPC_MSG_TIMER)))
- timer_interrupt();
+ decrementer_timer_interrupt();
if (all & (1 << (24 - 8 * PPC_MSG_RESCHEDULE)))
scheduler_ipi();
if (all & (1 << (24 - 8 * PPC_MSG_CALL_FUNC_SINGLE)))
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 65ab9e9..7e858e1 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -42,6 +42,7 @@
#include <linux/timex.h>
#include <linux/kernel_stat.h>
#include <linux/time.h>
+#include <linux/timer.h>
#include <linux/init.h>
#include <linux/profile.h>
#include <linux/cpu.h>
@@ -97,8 +98,11 @@ static struct clocksource clocksource_timebase = {
static int decrementer_set_next_event(unsigned long evt,
struct clock_event_device *dev);
+static int broadcast_set_next_event(unsigned long evt,
+ struct clock_event_device *dev);
static void decrementer_set_mode(enum clock_event_mode mode,
struct clock_event_device *dev);
+static void decrementer_timer_broadcast(const struct cpumask *mask);
struct clock_event_device decrementer_clockevent = {
.name = "decrementer",
@@ -106,13 +110,26 @@ struct clock_event_device decrementer_clockevent = {
.irq = 0,
.set_next_event = decrementer_set_next_event,
.set_mode = decrementer_set_mode,
- .features = CLOCK_EVT_FEAT_ONESHOT,
+ .broadcast = decrementer_timer_broadcast,
+ .features = CLOCK_EVT_FEAT_C3STOP | CLOCK_EVT_FEAT_ONESHOT,
};
EXPORT_SYMBOL(decrementer_clockevent);
+struct clock_event_device broadcast_clockevent = {
+ .name = "broadcast",
+ .rating = 200,
+ .irq = 0,
+ .set_next_event = broadcast_set_next_event,
+ .set_mode = decrementer_set_mode,
+ .features = CLOCK_EVT_FEAT_ONESHOT,
+};
+EXPORT_SYMBOL(broadcast_clockevent);
+
DEFINE_PER_CPU(u64, decrementers_next_tb);
static DEFINE_PER_CPU(struct clock_event_device, decrementers);
+static DEFINE_PER_CPU(struct clock_event_device, bc_timer);
+int bc_cpu;
#define XSEC_PER_SEC (1024*1024)
#ifdef CONFIG_PPC64
@@ -487,6 +504,8 @@ void timer_interrupt(struct pt_regs * regs)
struct pt_regs *old_regs;
u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
struct clock_event_device *evt = &__get_cpu_var(decrementers);
+ struct clock_event_device *bc_evt = &__get_cpu_var(bc_timer);
+ int cpu = smp_processor_id();
u64 now;
/* Ensure a positive value is written to the decrementer, or else
@@ -532,6 +551,10 @@ void timer_interrupt(struct pt_regs * regs)
*next_tb = ~(u64)0;
if (evt->event_handler)
evt->event_handler(evt);
+ if (cpu == bc_cpu && bc_evt->event_handler) {
+ bc_evt->event_handler(bc_evt);
+ }
+
} else {
now = *next_tb - now;
if (now <= DECREMENTER_MAX)
@@ -806,6 +829,20 @@ static int decrementer_set_next_event(unsigned long evt,
return 0;
}
+/*
+ * We cannot program the decrementer of a remote CPU. Hence CPUs going into
+ * deep idle states need to send IPIs to the broadcast CPU to program its
+ * decrementer for their next local event so as to receive a broadcast IPI
+ * for the same. In order to avoid the overhead of multiple CPUs from sending
+ * IPIs, this function is a nop. Instead the broadcast CPU will handle the
+ * wakeup of CPUs in deep idle states in each of its local timer interrupts.
+ */
+static int broadcast_set_next_event(unsigned long evt,
+ struct clock_event_device *dev)
+{
+ return 0;
+}
+
static void decrementer_set_mode(enum clock_event_mode mode,
struct clock_event_device *dev)
{
@@ -813,6 +850,20 @@ static void decrementer_set_mode(enum clock_event_mode mode,
decrementer_set_next_event(DECREMENTER_MAX, dev);
}
+void decrementer_timer_interrupt(void)
+{
+ struct clock_event_device *evt;
+ evt = &per_cpu(decrementers, smp_processor_id());
+
+ if (evt->event_handler)
+ evt->event_handler(evt);
+}
+
+static void decrementer_timer_broadcast(const struct cpumask *mask)
+{
+ arch_send_tick_broadcast(mask);
+}
+
static void register_decrementer_clockevent(int cpu)
{
struct clock_event_device *dec = &per_cpu(decrementers, cpu);
@@ -826,6 +877,20 @@ static void register_decrementer_clockevent(int cpu)
clockevents_register_device(dec);
}
+static void register_broadcast_clockevent(int cpu)
+{
+ struct clock_event_device *bc_evt = &per_cpu(bc_timer, cpu);
+
+ *bc_evt = broadcast_clockevent;
+ bc_evt->cpumask = cpumask_of(cpu);
+
+ printk_once(KERN_DEBUG "clockevent: %s mult[%x] shift[%d] cpu[%d]\n",
+ bc_evt->name, bc_evt->mult, bc_evt->shift, cpu);
+
+ clockevents_register_device(bc_evt);
+ bc_cpu = cpu;
+}
+
static void __init init_decrementer_clockevent(void)
{
int cpu = smp_processor_id();
@@ -840,6 +905,19 @@ static void __init init_decrementer_clockevent(void)
register_decrementer_clockevent(cpu);
}
+static void __init init_broadcast_clockevent(void)
+{
+ int cpu = smp_processor_id();
+
+ clockevents_calc_mult_shift(&broadcast_clockevent, ppc_tb_freq, 4);
+
+ broadcast_clockevent.max_delta_ns =
+ clockevent_delta2ns(DECREMENTER_MAX, &broadcast_clockevent);
+ broadcast_clockevent.min_delta_ns =
+ clockevent_delta2ns(2, &broadcast_clockevent);
+ register_broadcast_clockevent(cpu);
+}
+
void secondary_cpu_time_init(void)
{
/* Start the decrementer on CPUs that have manual control
@@ -916,6 +994,7 @@ void __init time_init(void)
clocksource_init();
init_decrementer_clockevent();
+ init_broadcast_clockevent();
}
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index ace2d22..e1a96eb 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -6,6 +6,7 @@ config PPC_POWERNV
select PPC_ICP_NATIVE
select PPC_P7_NAP
select PPC_PCI_CHOICE if EMBEDDED
+ select GENERIC_CLOCKEVENTS_BROADCAST
select EPAPR_BOOT
default y
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Resend RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints
2013-07-26 5:13 [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
` (2 preceding siblings ...)
2013-07-26 5:17 ` [Resend RFC PATCH 3/5] cpuidle/ppc: Add timer offload framework to support deep idle states Preeti U Murthy
@ 2013-07-26 5:19 ` Preeti U Murthy
2013-07-26 5:21 ` [Resend RFC PATCH 5/5] cpuidle/ppc: Add longnap state to the idle states on powernv Preeti U Murthy
4 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-07-26 5:19 UTC (permalink / raw)
To: benh, paul.gortmaker, paulus, shangw, galak, fweisbec, paulmck,
michael, arnd, linux-pm, rostedt, rjw, john.stultz, tglx,
chenhui.zhao, deepthi, geoff, linux-kernel, srivatsa.bhat,
schwidefsky, svaidy, linuxppc-dev
In the current design of timer offload framework, the broadcast cpu should
*not* go into tickless idle so as to avoid missed wakeups on CPUs in deep idle states.
Since we prevent the CPUs entering deep idle states from programming the
decrementer of the broadcast cpu for their respective next local events for
reasons mentioned in PATCH[3/5], the broadcast CPU checks if there are any
CPUs to be woken up during each of its timer interrupt, which is programmed
to its local events.
With tickless idle, the broadcast CPU might not have a timer interrupt
pending till after many ticks, which can result in missed wakeups on CPUs
in deep idle states. By disabling tickless idle, worst case, the tick_sched
hrtimer will trigger a timer interrupt every period.
However the current setup of tickless idle does not let us make the choice
of tickless on individual cpus. NOHZ_MODE_INACTIVE which disables tickless idle,
is a system wide setting. Hence resort to an arch specific call to check if a cpu
can go into tickless idle.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
arch/powerpc/kernel/time.c | 5 +++++
kernel/time/tick-sched.c | 7 +++++++
2 files changed, 12 insertions(+)
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 7e858e1..916c32f 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -864,6 +864,11 @@ static void decrementer_timer_broadcast(const struct cpumask *mask)
arch_send_tick_broadcast(mask);
}
+int arch_can_stop_idle_tick(int cpu)
+{
+ return cpu != bc_cpu;
+}
+
static void register_decrementer_clockevent(int cpu)
{
struct clock_event_device *dec = &per_cpu(decrementers, cpu);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6960172..e9ffa84 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -700,8 +700,15 @@ static void tick_nohz_full_stop_tick(struct tick_sched *ts)
#endif
}
+int __weak arch_can_stop_idle_tick(int cpu)
+{
+ return 1;
+}
+
static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
+ if (!arch_can_stop_idle_tick(cpu))
+ return false;
/*
* If this cpu is offline and it is the one which updates
* jiffies, then give up the assignment and let it be taken by
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Resend RFC PATCH 5/5] cpuidle/ppc: Add longnap state to the idle states on powernv
2013-07-26 5:13 [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
` (3 preceding siblings ...)
2013-07-26 5:19 ` [Resend RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints Preeti U Murthy
@ 2013-07-26 5:21 ` Preeti U Murthy
4 siblings, 0 replies; 6+ messages in thread
From: Preeti U Murthy @ 2013-07-26 5:21 UTC (permalink / raw)
To: benh, paul.gortmaker, paulus, shangw, galak, fweisbec, paulmck,
michael, arnd, linux-pm, rostedt, rjw, john.stultz, tglx,
chenhui.zhao, deepthi, geoff, linux-kernel, srivatsa.bhat,
schwidefsky, svaidy, linuxppc-dev
This patch hooks into the existing broadcast framework with the support that this
patchset introduces for ppc, and the cpuidle driver backend
for powernv(posted out recently by Deepthi Dharwar) to add sleep state as
one of the deep idle states, in which the decrementer is switched off.
However in this patch, we only emulate sleep by going into a state which does
a nap with the decrementer interrupts disabled, termed as longnap. This enables
focus on the timer broadcast framework for ppc in this series of patches ,
which is required as a first step to enable sleep on ppc.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
arch/powerpc/platforms/powernv/processor_idle.c | 48 +++++++++++++++++++++++
1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/powernv/processor_idle.c b/arch/powerpc/platforms/powernv/processor_idle.c
index f43ad91a..9aca502 100644
--- a/arch/powerpc/platforms/powernv/processor_idle.c
+++ b/arch/powerpc/platforms/powernv/processor_idle.c
@@ -9,16 +9,18 @@
#include <linux/cpuidle.h>
#include <linux/cpu.h>
#include <linux/notifier.h>
+#include <linux/clockchips.h>
#include <asm/machdep.h>
#include <asm/runlatch.h>
+#include <asm/time.h>
struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
.owner = THIS_MODULE,
};
-#define MAX_IDLE_STATE_COUNT 2
+#define MAX_IDLE_STATE_COUNT 3
static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
static struct cpuidle_device __percpu *powernv_cpuidle_devices;
@@ -54,6 +56,43 @@ static int nap_loop(struct cpuidle_device *dev,
return index;
}
+/* Emulate sleep, with long nap.
+ * During sleep, the core does not receive decrementer interrupts.
+ * Emulate sleep using long nap with decrementers interrupts disabled.
+ * This is an initial prototype to test the timer offload framework for ppc.
+ * We will eventually introduce the sleep state once the timer offload framework
+ * for ppc is stable.
+ */
+static int longnap_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ int cpu = dev->cpu;
+
+ unsigned long lpcr = mfspr(SPRN_LPCR);
+
+ lpcr &= ~(LPCR_MER | LPCR_PECE); /* lpcr[mer] must be 0 */
+
+ /* exit powersave upon external interrupt, but not decrementer
+ * interrupt, Emulate sleep.
+ */
+ lpcr |= LPCR_PECE0;
+
+ if (cpu != bc_cpu) {
+ mtspr(SPRN_LPCR, lpcr);
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
+ power7_nap();
+ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
+ } else {
+ /* Wakeup on a decrementer interrupt, Do a nap */
+ lpcr |= LPCR_PECE1;
+ mtspr(SPRN_LPCR, lpcr);
+ power7_nap();
+ }
+
+ return index;
+}
+
/*
* States for dedicated partition case.
*/
@@ -72,6 +111,13 @@ static struct cpuidle_state powernv_states[MAX_IDLE_STATE_COUNT] = {
.exit_latency = 10,
.target_residency = 100,
.enter = &nap_loop },
+ { /* LongNap */
+ .name = "LongNap",
+ .desc = "LongNap",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 10,
+ .target_residency = 100,
+ .enter = &longnap_loop },
};
static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-07-26 5:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-26 5:13 [Resend RFC PATCH 0/5] cpuidle/ppc: Timer offload framework to support deep idle states Preeti U Murthy
2013-07-26 5:15 ` [Resend RFC PATCH 1/5] powerpc: Free up the IPI message slot of ipi call function (PPC_MSG_CALL_FUNC) Preeti U Murthy
2013-07-26 5:16 ` [Resend RFC PATCH 2/5] powerpc: Implement broadcast timer interrupt as an IPI message Preeti U Murthy
2013-07-26 5:17 ` [Resend RFC PATCH 3/5] cpuidle/ppc: Add timer offload framework to support deep idle states Preeti U Murthy
2013-07-26 5:19 ` [Resend RFC PATCH 4/5] cpuidle/ppc: CPU goes tickless if there are no arch-specific constraints Preeti U Murthy
2013-07-26 5:21 ` [Resend RFC PATCH 5/5] cpuidle/ppc: Add longnap state to the idle states on powernv Preeti U Murthy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).