* [PATCH 0/9] oops and kdump patches
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
Here are a number of patches I've put together based on some rather
strenuous testing of our oops and kdump paths.
^ permalink raw reply
* [PATCH 1/9] powerpc: Give us time to get all oopses out before panicking
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
I've been seeing truncated output when people send system reset info
to me. We should see a backtrace for every CPU, but the panic() code
takes the box down before they all make it out to the console. The
panic code runs unlocked so we also see corrupted console output.
If we are going to panic, then delay 1 second before calling into the
panic code. Move oops_exit inside the die lock and put a newline
between oopses for clarity.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
We should really rename kexec_should_crash() to something like
kernel_will_panic() and use it here. I'll work on that in a follow up
patch.
Index: linux-powerpc/arch/powerpc/kernel/traps.c
===================================================================
--- linux-powerpc.orig/arch/powerpc/kernel/traps.c 2011-07-05 14:21:44.017308041 +1000
+++ linux-powerpc/arch/powerpc/kernel/traps.c 2011-07-05 14:21:52.727459818 +1000
@@ -158,6 +158,8 @@ int die(const char *str, struct pt_regs
bust_spinlocks(0);
die.lock_owner = -1;
add_taint(TAINT_DIE);
+ oops_exit();
+ printk("\n");
raw_spin_unlock_irqrestore(&die.lock, flags);
if (kexec_should_crash(current) ||
@@ -165,13 +167,23 @@ int die(const char *str, struct pt_regs
crash_kexec(regs);
crash_kexec_secondary(regs);
+ /*
+ * While our oops output is serialised by a spinlock, output
+ * from panic() called below can race and corrupt it. If we
+ * know we are going to panic, delay for 1 second so we have a
+ * chance to get clean backtraces from all CPUs that are oopsing.
+ */
+ if (in_interrupt() || panic_on_oops || !current->pid ||
+ is_global_init(current)) {
+ mdelay(MSEC_PER_SEC);
+ }
+
if (in_interrupt())
panic("Fatal exception in interrupt");
if (panic_on_oops)
panic("Fatal exception");
- oops_exit();
do_exit(err);
return 0;
^ permalink raw reply
* [PATCH 2/9] powerpc: Remove broken and complicated kdump system reset code
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
We have a lot of complicated logic that handles possible recursion between
kdump and a system reset exception. We can solve this in a much simpler
way using the same setjmp/longjmp tricks xmon does.
As a first step, this patch removes the old system reset code.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/include/asm/kexec.h
===================================================================
--- linux-build.orig/arch/powerpc/include/asm/kexec.h 2011-11-09 16:27:35.918705307 +1100
+++ linux-build/arch/powerpc/include/asm/kexec.h 2011-11-14 08:31:02.111805931 +1100
@@ -73,11 +73,6 @@ extern void kexec_smp_wait(void); /* get
master to copy new code to 0 */
extern int crashing_cpu;
extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *));
-extern cpumask_t cpus_in_sr;
-static inline int kexec_sr_activated(int cpu)
-{
- return cpumask_test_cpu(cpu, &cpus_in_sr);
-}
struct kimage;
struct pt_regs;
@@ -94,7 +89,6 @@ extern void reserve_crashkernel(void);
extern void machine_kexec_mask_interrupts(void);
#else /* !CONFIG_KEXEC */
-static inline int kexec_sr_activated(int cpu) { return 0; }
static inline void crash_kexec_secondary(struct pt_regs *regs) { }
static inline int overlaps_crashkernel(unsigned long start, unsigned long size)
Index: linux-build/arch/powerpc/kernel/crash.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/crash.c 2011-11-09 16:27:35.950705892 +1100
+++ linux-build/arch/powerpc/kernel/crash.c 2011-11-14 08:31:02.111805931 +1100
@@ -47,7 +47,6 @@
/* This keeps a track of which one is crashing cpu. */
int crashing_cpu = -1;
static cpumask_t cpus_in_crash = CPU_MASK_NONE;
-cpumask_t cpus_in_sr = CPU_MASK_NONE;
#define CRASH_HANDLER_MAX 3
/* NULL terminated list of shutdown handles */
@@ -55,7 +54,6 @@ static crash_shutdown_t crash_shutdown_h
static DEFINE_SPINLOCK(crash_handlers_lock);
#ifdef CONFIG_SMP
-static atomic_t enter_on_soft_reset = ATOMIC_INIT(0);
void crash_ipi_callback(struct pt_regs *regs)
{
@@ -70,23 +68,8 @@ void crash_ipi_callback(struct pt_regs *
cpumask_set_cpu(cpu, &cpus_in_crash);
/*
- * Entered via soft-reset - could be the kdump
- * process is invoked using soft-reset or user activated
- * it if some CPU did not respond to an IPI.
- * For soft-reset, the secondary CPU can enter this func
- * twice. 1 - using IPI, and 2. soft-reset.
- * Tell the kexec CPU that entered via soft-reset and ready
- * to go down.
- */
- if (cpumask_test_cpu(cpu, &cpus_in_sr)) {
- cpumask_clear_cpu(cpu, &cpus_in_sr);
- atomic_inc(&enter_on_soft_reset);
- }
-
- /*
* Starting the kdump boot.
* This barrier is needed to make sure that all CPUs are stopped.
- * If not, soft-reset will be invoked to bring other CPUs.
*/
while (!cpumask_test_cpu(crashing_cpu, &cpus_in_crash))
cpu_relax();
@@ -103,25 +86,14 @@ void crash_ipi_callback(struct pt_regs *
/* NOTREACHED */
}
-/*
- * Wait until all CPUs are entered via soft-reset.
- */
-static void crash_soft_reset_check(int cpu)
-{
- unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
-
- cpumask_clear_cpu(cpu, &cpus_in_sr);
- while (atomic_read(&enter_on_soft_reset) != ncpus)
- cpu_relax();
-}
-
-
static void crash_kexec_prepare_cpus(int cpu)
{
unsigned int msecs;
unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
+ printk(KERN_EMERG "Sending IPI to other CPUs\n");
+
crash_send_ipi(crash_ipi_callback);
smp_wmb();
@@ -131,7 +103,6 @@ static void crash_kexec_prepare_cpus(int
* respond.
* Delay of at least 10 seconds.
*/
- printk(KERN_EMERG "Sending IPI to other cpus...\n");
msecs = 10000;
while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
cpu_relax();
@@ -140,69 +111,36 @@ static void crash_kexec_prepare_cpus(int
/* Would it be better to replace the trap vector here? */
- /*
- * FIXME: In case if we do not get all CPUs, one possibility: ask the
- * user to do soft reset such that we get all.
- * Soft-reset will be used until better mechanism is implemented.
- */
if (cpumask_weight(&cpus_in_crash) < ncpus) {
- printk(KERN_EMERG "done waiting: %d cpu(s) not responding\n",
+ printk(KERN_EMERG "ERROR: %d CPU(s) not responding\n",
ncpus - cpumask_weight(&cpus_in_crash));
- printk(KERN_EMERG "Activate soft-reset to stop other cpu(s)\n");
- cpumask_clear(&cpus_in_sr);
- atomic_set(&enter_on_soft_reset, 0);
- while (cpumask_weight(&cpus_in_crash) < ncpus)
- cpu_relax();
}
- /*
- * Make sure all CPUs are entered via soft-reset if the kdump is
- * invoked using soft-reset.
- */
- if (cpumask_test_cpu(cpu, &cpus_in_sr))
- crash_soft_reset_check(cpu);
- /* Leave the IPI callback set */
+
+ printk(KERN_EMERG "IPI complete\n");
}
/*
- * This function will be called by secondary cpus or by kexec cpu
- * if soft-reset is activated to stop some CPUs.
+ * This function will be called by secondary cpus.
*/
void crash_kexec_secondary(struct pt_regs *regs)
{
- int cpu = smp_processor_id();
unsigned long flags;
- int msecs = 5;
+ int msecs = 500;
local_irq_save(flags);
- /* Wait 5ms if the kexec CPU is not entered yet. */
+
+ /* Wait 500ms for the primary crash CPU to signal its progress */
while (crashing_cpu < 0) {
if (--msecs < 0) {
- /*
- * Either kdump image is not loaded or
- * kdump process is not started - Probably xmon
- * exited using 'x'(exit and recover) or
- * kexec_should_crash() failed for all running tasks.
- */
- cpumask_clear_cpu(cpu, &cpus_in_sr);
+ /* No response, kdump image may not have been loaded */
local_irq_restore(flags);
return;
}
+
mdelay(1);
cpu_relax();
}
- if (cpu == crashing_cpu) {
- /*
- * Panic CPU will enter this func only via soft-reset.
- * Wait until all secondary CPUs entered and
- * then start kexec boot.
- */
- crash_soft_reset_check(cpu);
- cpumask_set_cpu(crashing_cpu, &cpus_in_crash);
- if (ppc_md.kexec_cpu_down)
- ppc_md.kexec_cpu_down(1, 0);
- machine_kexec(kexec_crash_image);
- /* NOTREACHED */
- }
+
crash_ipi_callback(regs);
}
@@ -225,7 +163,6 @@ static void crash_kexec_prepare_cpus(int
void crash_kexec_secondary(struct pt_regs *regs)
{
- cpumask_clear(&cpus_in_sr);
}
#endif /* CONFIG_SMP */
Index: linux-build/arch/powerpc/kernel/traps.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/traps.c 2011-11-14 08:31:01.519795547 +1100
+++ linux-build/arch/powerpc/kernel/traps.c 2011-11-14 08:31:02.111805931 +1100
@@ -162,10 +162,20 @@ int die(const char *str, struct pt_regs
printk("\n");
raw_spin_unlock_irqrestore(&die.lock, flags);
- if (kexec_should_crash(current) ||
- kexec_sr_activated(smp_processor_id()))
+ /*
+ * A system reset (0x100) is a request to dump, so we always send
+ * it through the crashdump code.
+ */
+ if (kexec_should_crash(current) || (TRAP(regs) == 0x100)) {
crash_kexec(regs);
- crash_kexec_secondary(regs);
+
+ /*
+ * We aren't the primary crash CPU. We need to send it
+ * to a holding pattern to avoid it ending up in the panic
+ * code.
+ */
+ crash_kexec_secondary(regs);
+ }
/*
* While our oops output is serialised by a spinlock, output
@@ -232,25 +242,8 @@ void system_reset_exception(struct pt_re
return;
}
-#ifdef CONFIG_KEXEC
- cpumask_set_cpu(smp_processor_id(), &cpus_in_sr);
-#endif
-
die("System Reset", regs, SIGABRT);
- /*
- * Some CPUs when released from the debugger will execute this path.
- * These CPUs entered the debugger via a soft-reset. If the CPU was
- * hung before entering the debugger it will return to the hung
- * state when exiting this function. This causes a problem in
- * kdump since the hung CPU(s) will not respond to the IPI sent
- * from kdump. To prevent the problem we call crash_kexec_secondary()
- * here. If a kdump had not been initiated or we exit the debugger
- * with the "exit and recover" command (x) crash_kexec_secondary()
- * will return after 5ms and the CPU returns to its previous state.
- */
- crash_kexec_secondary(regs);
-
/* Must die if the interrupt is not recoverable */
if (!(regs->msr & MSR_RI))
panic("Unrecoverable System Reset");
^ permalink raw reply
* [PATCH 3/9] powerpc/kdump: Use setjmp/longjmp to handle kdump and system reset recursion
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
We can handle recursion caused by system reset by reusing the crash
shutdown fault handler.
Since we don't have an OS triggerable NMI, if all CPUs don't make it
into kdump then we tell the user to issue a system reset. However if
we have a panic timeout set we cannot wait forever and must continue
the kdump.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/kernel/crash.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/crash.c 2011-11-25 16:41:06.228864087 +1100
+++ linux-build/arch/powerpc/kernel/crash.c 2011-11-25 16:42:05.825915628 +1100
@@ -53,6 +53,16 @@ static cpumask_t cpus_in_crash = CPU_MAS
static crash_shutdown_t crash_shutdown_handles[CRASH_HANDLER_MAX+1];
static DEFINE_SPINLOCK(crash_handlers_lock);
+static unsigned long crash_shutdown_buf[JMP_BUF_LEN];
+static int crash_shutdown_cpu = -1;
+
+static int handle_fault(struct pt_regs *regs)
+{
+ if (crash_shutdown_cpu == smp_processor_id())
+ longjmp(crash_shutdown_buf, 1);
+ return 0;
+}
+
#ifdef CONFIG_SMP
void crash_ipi_callback(struct pt_regs *regs)
@@ -89,14 +99,16 @@ void crash_ipi_callback(struct pt_regs *
static void crash_kexec_prepare_cpus(int cpu)
{
unsigned int msecs;
-
unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
+ int tries = 0;
+ int (*old_handler)(struct pt_regs *regs);
printk(KERN_EMERG "Sending IPI to other CPUs\n");
crash_send_ipi(crash_ipi_callback);
smp_wmb();
+again:
/*
* FIXME: Until we will have the way to stop other CPUs reliably,
* the crash CPU will send an IPI and wait for other CPUs to
@@ -111,12 +123,52 @@ static void crash_kexec_prepare_cpus(int
/* Would it be better to replace the trap vector here? */
- if (cpumask_weight(&cpus_in_crash) < ncpus) {
- printk(KERN_EMERG "ERROR: %d CPU(s) not responding\n",
- ncpus - cpumask_weight(&cpus_in_crash));
+ if (cpumask_weight(&cpus_in_crash) >= ncpus) {
+ printk(KERN_EMERG "IPI complete\n");
+ return;
+ }
+
+ printk(KERN_EMERG "ERROR: %d cpu(s) not responding\n",
+ ncpus - cpumask_weight(&cpus_in_crash));
+
+ /*
+ * If we have a panic timeout set then we can't wait indefinitely
+ * for someone to activate system reset. We also give up on the
+ * second time through if system reset fail to work.
+ */
+ if ((panic_timeout > 0) || (tries > 0))
+ return;
+
+ /*
+ * A system reset will cause all CPUs to take an 0x100 exception.
+ * The primary CPU returns here via setjmp, and the secondary
+ * CPUs reexecute the crash_kexec_secondary path.
+ */
+ old_handler = __debugger;
+ __debugger = handle_fault;
+ crash_shutdown_cpu = smp_processor_id();
+
+ if (setjmp(crash_shutdown_buf) == 0) {
+ printk(KERN_EMERG "Activate system reset (dumprestart) "
+ "to stop other cpu(s)\n");
+
+ /*
+ * A system reset will force all CPUs to execute the
+ * crash code again. We need to reset cpus_in_crash so we
+ * wait for everyone to do this.
+ */
+ cpus_in_crash = CPU_MASK_NONE;
+ smp_mb();
+
+ while (cpumask_weight(&cpus_in_crash) < ncpus)
+ cpu_relax();
}
- printk(KERN_EMERG "IPI complete\n");
+ crash_shutdown_cpu = -1;
+ __debugger = old_handler;
+
+ tries++;
+ goto again;
}
/*
@@ -245,16 +297,6 @@ int crash_shutdown_unregister(crash_shut
}
EXPORT_SYMBOL(crash_shutdown_unregister);
-static unsigned long crash_shutdown_buf[JMP_BUF_LEN];
-static int crash_shutdown_cpu = -1;
-
-static int handle_fault(struct pt_regs *regs)
-{
- if (crash_shutdown_cpu == smp_processor_id())
- longjmp(crash_shutdown_buf, 1);
- return 0;
-}
-
void default_machine_crash_shutdown(struct pt_regs *regs)
{
unsigned int i;
^ permalink raw reply
* [PATCH 4/9] powerpc: Cleanup crash/kexec code
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
Remove some unnecessary defines and fix some spelling mistakes.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/kernel/crash.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/crash.c 2011-11-25 16:42:11.554016696 +1100
+++ linux-build/arch/powerpc/kernel/crash.c 2011-11-25 16:42:37.110467611 +1100
@@ -10,41 +10,27 @@
*
*/
-#undef DEBUG
-
#include <linux/kernel.h>
#include <linux/smp.h>
#include <linux/reboot.h>
#include <linux/kexec.h>
-#include <linux/bootmem.h>
#include <linux/export.h>
#include <linux/crash_dump.h>
#include <linux/delay.h>
-#include <linux/elf.h>
-#include <linux/elfcore.h>
#include <linux/init.h>
#include <linux/irq.h>
#include <linux/types.h>
-#include <linux/memblock.h>
#include <asm/processor.h>
#include <asm/machdep.h>
#include <asm/kexec.h>
#include <asm/kdump.h>
#include <asm/prom.h>
-#include <asm/firmware.h>
#include <asm/smp.h>
#include <asm/system.h>
#include <asm/setjmp.h>
-#ifdef DEBUG
-#include <asm/udbg.h>
-#define DBG(fmt...) udbg_printf(fmt)
-#else
-#define DBG(fmt...)
-#endif
-
-/* This keeps a track of which one is crashing cpu. */
+/* This keeps a track of which one is the crashing cpu. */
int crashing_cpu = -1;
static cpumask_t cpus_in_crash = CPU_MASK_NONE;
@@ -201,7 +187,7 @@ void crash_kexec_secondary(struct pt_reg
static void crash_kexec_prepare_cpus(int cpu)
{
/*
- * move the secondarys to us so that we can copy
+ * move the secondaries to us so that we can copy
* the new kernel 0-0x100 safely
*
* do this if kexec in setup.c ?
@@ -302,7 +288,6 @@ void default_machine_crash_shutdown(stru
unsigned int i;
int (*old_handler)(struct pt_regs *regs);
-
/*
* This function is only called after the system
* has panicked or is otherwise in a critical state.
@@ -328,7 +313,7 @@ void default_machine_crash_shutdown(stru
machine_kexec_mask_interrupts();
/*
- * Call registered shutdown routines savely. Swap out
+ * Call registered shutdown routines safely. Swap out
* __debugger_fault_handler, and replace on exit.
*/
old_handler = __debugger_fault_handler;
Index: linux-build/arch/powerpc/include/asm/kexec.h
===================================================================
--- linux-build.orig/arch/powerpc/include/asm/kexec.h 2011-11-25 16:41:06.224864016 +1100
+++ linux-build/arch/powerpc/include/asm/kexec.h 2011-11-25 16:42:37.110467611 +1100
@@ -49,7 +49,6 @@
#define KEXEC_STATE_REAL_MODE 2
#ifndef __ASSEMBLY__
-#include <linux/cpumask.h>
#include <asm/reg.h>
typedef void (*crash_shutdown_t)(void);
^ permalink raw reply
* [PATCH 5/9] powerpc: Rework die()
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
Our die() code was based off a very old x86 version. Update it to
mirror the current x86 code.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/kernel/traps.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/traps.c 2011-11-14 08:31:02.111805931 +1100
+++ linux-build/arch/powerpc/kernel/traps.c 2011-11-14 08:32:21.185192875 +1100
@@ -98,18 +98,14 @@ static void pmac_backlight_unblank(void)
static inline void pmac_backlight_unblank(void) { }
#endif
-int die(const char *str, struct pt_regs *regs, long err)
+static arch_spinlock_t die_lock = __ARCH_SPIN_LOCK_UNLOCKED;
+static int die_owner = -1;
+static unsigned int die_nest_count;
+static int die_counter;
+
+static unsigned __kprobes long oops_begin(struct pt_regs *regs)
{
- static struct {
- raw_spinlock_t lock;
- u32 lock_owner;
- int lock_owner_depth;
- } die = {
- .lock = __RAW_SPIN_LOCK_UNLOCKED(die.lock),
- .lock_owner = -1,
- .lock_owner_depth = 0
- };
- static int die_counter;
+ int cpu;
unsigned long flags;
if (debugger(regs))
@@ -117,50 +113,37 @@ int die(const char *str, struct pt_regs
oops_enter();
- if (die.lock_owner != raw_smp_processor_id()) {
- console_verbose();
- raw_spin_lock_irqsave(&die.lock, flags);
- die.lock_owner = smp_processor_id();
- die.lock_owner_depth = 0;
- bust_spinlocks(1);
- if (machine_is(powermac))
- pmac_backlight_unblank();
- } else {
- local_save_flags(flags);
- }
-
- if (++die.lock_owner_depth < 3) {
- printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
-#ifdef CONFIG_PREEMPT
- printk("PREEMPT ");
-#endif
-#ifdef CONFIG_SMP
- printk("SMP NR_CPUS=%d ", NR_CPUS);
-#endif
-#ifdef CONFIG_DEBUG_PAGEALLOC
- printk("DEBUG_PAGEALLOC ");
-#endif
-#ifdef CONFIG_NUMA
- printk("NUMA ");
-#endif
- printk("%s\n", ppc_md.name ? ppc_md.name : "");
-
- if (notify_die(DIE_OOPS, str, regs, err, 255,
- SIGSEGV) == NOTIFY_STOP)
- return 1;
-
- print_modules();
- show_regs(regs);
- } else {
- printk("Recursive die() failure, output suppressed\n");
+ /* racy, but better than risking deadlock. */
+ raw_local_irq_save(flags);
+ cpu = smp_processor_id();
+ if (!arch_spin_trylock(&die_lock)) {
+ if (cpu == die_owner)
+ /* nested oops. should stop eventually */;
+ else
+ arch_spin_lock(&die_lock);
}
+ die_nest_count++;
+ die_owner = cpu;
+ console_verbose();
+ bust_spinlocks(1);
+ if (machine_is(powermac))
+ pmac_backlight_unblank();
+ return flags;
+}
+static void __kprobes oops_end(unsigned long flags, struct pt_regs *regs,
+ int signr)
+{
bust_spinlocks(0);
- die.lock_owner = -1;
+ die_owner = -1;
add_taint(TAINT_DIE);
+ die_nest_count--;
oops_exit();
printk("\n");
- raw_spin_unlock_irqrestore(&die.lock, flags);
+ if (!die_nest_count)
+ /* Nest count reaches zero, release the lock. */
+ arch_spin_unlock(&die_lock);
+ raw_local_irq_restore(flags);
/*
* A system reset (0x100) is a request to dump, so we always send
@@ -177,6 +160,9 @@ int die(const char *str, struct pt_regs
crash_kexec_secondary(regs);
}
+ if (!signr)
+ return;
+
/*
* While our oops output is serialised by a spinlock, output
* from panic() called below can race and corrupt it. If we
@@ -190,15 +176,46 @@ int die(const char *str, struct pt_regs
if (in_interrupt())
panic("Fatal exception in interrupt");
-
if (panic_on_oops)
panic("Fatal exception");
+ do_exit(signr);
+}
- do_exit(err);
+static int __kprobes __die(const char *str, struct pt_regs *regs, long err)
+{
+ printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
+#ifdef CONFIG_PREEMPT
+ printk("PREEMPT ");
+#endif
+#ifdef CONFIG_SMP
+ printk("SMP NR_CPUS=%d ", NR_CPUS);
+#endif
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ printk("DEBUG_PAGEALLOC ");
+#endif
+#ifdef CONFIG_NUMA
+ printk("NUMA ");
+#endif
+ printk("%s\n", ppc_md.name ? ppc_md.name : "");
+
+ if (notify_die(DIE_OOPS, str, regs, err, 255, SIGSEGV) == NOTIFY_STOP)
+ return 1;
+
+ print_modules();
+ show_regs(regs);
return 0;
}
+void die(const char *str, struct pt_regs *regs, long err)
+{
+ unsigned long flags = oops_begin(regs);
+
+ if (__die(str, regs, err))
+ err = 0;
+ oops_end(flags, regs, err);
+}
+
void user_single_step_siginfo(struct task_struct *tsk,
struct pt_regs *regs, siginfo_t *info)
{
@@ -217,10 +234,11 @@ void _exception(int signr, struct pt_reg
"at %016lx nip %016lx lr %016lx code %x\n";
if (!user_mode(regs)) {
- if (die("Exception in kernel mode", regs, signr))
- return;
- } else if (show_unhandled_signals &&
- unhandled_signal(current, signr)) {
+ die("Exception in kernel mode", regs, signr);
+ return;
+ }
+
+ if (show_unhandled_signals && unhandled_signal(current, signr)) {
printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
current->comm, current->pid, signr,
addr, regs->nip, regs->link, code);
Index: linux-build/arch/powerpc/include/asm/system.h
===================================================================
--- linux-build.orig/arch/powerpc/include/asm/system.h 2011-11-08 11:40:10.408778257 +1100
+++ linux-build/arch/powerpc/include/asm/system.h 2011-11-14 08:32:21.185192875 +1100
@@ -193,8 +193,8 @@ extern void cacheable_memzero(void *p, u
extern void *cacheable_memcpy(void *, const void *, unsigned int);
extern int do_page_fault(struct pt_regs *, unsigned long, unsigned long);
extern void bad_page_fault(struct pt_regs *, unsigned long, int);
-extern int die(const char *, struct pt_regs *, long);
extern void _exception(int, struct pt_regs *, int, unsigned long);
+extern void die(const char *, struct pt_regs *, long);
extern void _nmask_and_or_msr(unsigned long nmask, unsigned long or_val);
#ifdef CONFIG_BOOKE_WDT
^ permalink raw reply
* [PATCH 6/9] powerpc: Reduce pseries panic timeout from 180s to 10s
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
We've had a 180 second panic timeout on ppc64 for as long as I
can remember. This patch reduces it to 10 seconds on pseries for a few
reasons:
- Almost all pseries machines have a hypervisor console so panic
output will be available in a scrollback buffer.
- The 180 seconds impacts our availability, users (other than
kernel hackers) just want the box to come back around so it
can continue its work.
- I spend a lot of my life staring at the 180 second panic timeout.
Many pseries machines take minutes to power cycle, so it's quicker
to sit through the 180 seconds than it is to power cycle.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/platforms/pseries/setup.c
===================================================================
--- linux-build.orig/arch/powerpc/platforms/pseries/setup.c 2011-11-08 11:41:51.842584444 +1100
+++ linux-build/arch/powerpc/platforms/pseries/setup.c 2011-11-14 08:32:24.581252444 +1100
@@ -354,6 +354,8 @@ early_initcall(alloc_dispatch_log_kmem_c
static void __init pSeries_setup_arch(void)
{
+ panic_timeout = 10;
+
/* Discover PIC type and setup ppc_md accordingly */
pseries_discover_pic();
^ permalink raw reply
* [PATCH 7/9] powerpc/xics: Reset the CPPR if H_EOI fails
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
I have an intermittent kdump fail where the hypervisor fails an H_EOI.
As a result our CPPR is never reset to 0xff and we no longer accept
interrupts.
This patch calls icp_hv_set_cppr to reset the CPPR if H_EOI fails,
fixing the kdump fail.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
- I'm still trying to understand why the H_EOI is failing, perhaps it's
the code in machine_kexec_mask_interrupts that EOIs and masks interrupts.
- The patch is simpler than the diff output suggests, icp_hv_set_cppr
was moved above icp_hv_set_xirr and the call to icp_hv_set_cppr was
added.
Index: linux-build/arch/powerpc/sysdev/xics/icp-hv.c
===================================================================
--- linux-build.orig/arch/powerpc/sysdev/xics/icp-hv.c 2011-11-25 17:35:38.454558874 +1100
+++ linux-build/arch/powerpc/sysdev/xics/icp-hv.c 2011-11-25 20:15:06.169174037 +1100
@@ -41,23 +41,24 @@ static inline unsigned int icp_hv_get_xi
return ret;
}
-static inline void icp_hv_set_xirr(unsigned int value)
+static inline void icp_hv_set_cppr(u8 value)
{
- long rc = plpar_hcall_norets(H_EOI, value);
+ long rc = plpar_hcall_norets(H_CPPR, value);
if (rc != H_SUCCESS) {
- pr_err("%s: bad return code eoi xirr=0x%x returned %ld\n",
+ pr_err("%s: bad return code cppr cppr=0x%x returned %ld\n",
__func__, value, rc);
WARN_ON_ONCE(1);
}
}
-static inline void icp_hv_set_cppr(u8 value)
+static inline void icp_hv_set_xirr(unsigned int value)
{
- long rc = plpar_hcall_norets(H_CPPR, value);
+ long rc = plpar_hcall_norets(H_EOI, value);
if (rc != H_SUCCESS) {
- pr_err("%s: bad return code cppr cppr=0x%x returned %ld\n",
+ pr_err("%s: bad return code eoi xirr=0x%x returned %ld\n",
__func__, value, rc);
WARN_ON_ONCE(1);
+ icp_hv_set_cppr(value >> 24);
}
}
^ permalink raw reply
* [PATCH 8/9] powerpc/kdump: Delay before sending IPI on a system reset
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
If we enter the kdump code via system reset, wait a bit before
sending the IPI to capture all secondary CPUs. Without it we race
with the hypervisor that is issuing the system reset to each CPU.
If the IPI gets there first the system reset oops output then shows
the register state of the IPI handler which is not what we want.
I took the opportunity to add defines for all the various delays
we have. There's no need for cpu_relax when we are doing an mdelay,
so remove them too.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/kernel/crash.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/crash.c 2011-11-28 11:44:42.222009861 +1100
+++ linux-build/arch/powerpc/kernel/crash.c 2011-11-28 14:01:58.033283718 +1100
@@ -30,6 +30,20 @@
#include <asm/system.h>
#include <asm/setjmp.h>
+/*
+ * The primary CPU waits a while for all secondary CPUs to enter. This is to
+ * avoid sending an IPI if the secondary CPUs are entering
+ * crash_kexec_secondary on their own (eg via a system reset).
+ *
+ * The secondary timeout has to be longer than the primary. Both timeouts are
+ * in milliseconds.
+ */
+#define PRIMARY_TIMEOUT 500
+#define SECONDARY_TIMEOUT 1000
+
+#define IPI_TIMEOUT 10000
+#define REAL_MODE_TIMEOUT 10000
+
/* This keeps a track of which one is the crashing cpu. */
int crashing_cpu = -1;
static cpumask_t cpus_in_crash = CPU_MASK_NONE;
@@ -99,11 +113,9 @@ again:
* FIXME: Until we will have the way to stop other CPUs reliably,
* the crash CPU will send an IPI and wait for other CPUs to
* respond.
- * Delay of at least 10 seconds.
*/
- msecs = 10000;
+ msecs = IPI_TIMEOUT;
while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
- cpu_relax();
mdelay(1);
}
@@ -163,11 +175,11 @@ again:
void crash_kexec_secondary(struct pt_regs *regs)
{
unsigned long flags;
- int msecs = 500;
+ int msecs = SECONDARY_TIMEOUT;
local_irq_save(flags);
- /* Wait 500ms for the primary crash CPU to signal its progress */
+ /* Wait for the primary crash CPU to signal its progress */
while (crashing_cpu < 0) {
if (--msecs < 0) {
/* No response, kdump image may not have been loaded */
@@ -176,7 +188,6 @@ void crash_kexec_secondary(struct pt_reg
}
mdelay(1);
- cpu_relax();
}
crash_ipi_callback(regs);
@@ -211,7 +222,7 @@ static void crash_kexec_wait_realmode(in
unsigned int msecs;
int i;
- msecs = 10000;
+ msecs = REAL_MODE_TIMEOUT;
for (i=0; i < nr_cpu_ids && msecs > 0; i++) {
if (i == cpu)
continue;
@@ -306,6 +317,14 @@ void default_machine_crash_shutdown(stru
*/
crashing_cpu = smp_processor_id();
crash_save_cpu(regs, crashing_cpu);
+
+ /*
+ * If we came in via system reset, wait a while for the secondary
+ * CPUs to enter.
+ */
+ if (TRAP(regs) == 0x100)
+ mdelay(PRIMARY_TIMEOUT);
+
crash_kexec_prepare_cpus(crashing_cpu);
cpumask_set_cpu(crashing_cpu, &cpus_in_crash);
crash_kexec_wait_realmode(crashing_cpu);
^ permalink raw reply
* [PATCH 9/9] powerpc/kdump: Only save CPU state first time through the secondary CPU capture code
From: Anton Blanchard @ 2011-11-30 10:23 UTC (permalink / raw)
To: benh, paulus, hbabu; +Cc: linuxppc-dev
In-Reply-To: <20111130102308.348262468@samba.org>
We might enter the secondary CPU capture code twice, eg if we have to
unstick some CPUs with a system reset. In this case we don't want to
overwrite the state on CPUs that had made it into the capture code OK,
so use the cpus_state_saved cpumask for that and make it local to
crash_ipi_callback.
For controlling progress now use atomic_t cpus_in_crash to count how
many CPUs have made it into the kdump code, and time_to_dump to tell
everyone it's time to dump.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-build/arch/powerpc/kernel/crash.c
===================================================================
--- linux-build.orig/arch/powerpc/kernel/crash.c 2011-11-30 07:38:35.131392789 +1100
+++ linux-build/arch/powerpc/kernel/crash.c 2011-11-30 21:22:18.790917413 +1100
@@ -46,7 +46,8 @@
/* This keeps a track of which one is the crashing cpu. */
int crashing_cpu = -1;
-static cpumask_t cpus_in_crash = CPU_MASK_NONE;
+static atomic_t cpus_in_crash;
+static int time_to_dump;
#define CRASH_HANDLER_MAX 3
/* NULL terminated list of shutdown handles */
@@ -67,21 +68,27 @@ static int handle_fault(struct pt_regs *
void crash_ipi_callback(struct pt_regs *regs)
{
+ static cpumask_t cpus_state_saved = CPU_MASK_NONE;
+
int cpu = smp_processor_id();
if (!cpu_online(cpu))
return;
hard_irq_disable();
- if (!cpumask_test_cpu(cpu, &cpus_in_crash))
+ if (!cpumask_test_cpu(cpu, &cpus_state_saved)) {
crash_save_cpu(regs, cpu);
- cpumask_set_cpu(cpu, &cpus_in_crash);
+ cpumask_set_cpu(cpu, &cpus_state_saved);
+ }
+
+ atomic_inc(&cpus_in_crash);
+ smp_mb__after_atomic_inc();
/*
* Starting the kdump boot.
* This barrier is needed to make sure that all CPUs are stopped.
*/
- while (!cpumask_test_cpu(crashing_cpu, &cpus_in_crash))
+ while (!time_to_dump)
cpu_relax();
if (ppc_md.kexec_cpu_down)
@@ -115,19 +122,18 @@ again:
* respond.
*/
msecs = IPI_TIMEOUT;
- while ((cpumask_weight(&cpus_in_crash) < ncpus) && (--msecs > 0)) {
+ while ((atomic_read(&cpus_in_crash) < ncpus) && (--msecs > 0))
mdelay(1);
- }
/* Would it be better to replace the trap vector here? */
- if (cpumask_weight(&cpus_in_crash) >= ncpus) {
+ if (atomic_read(&cpus_in_crash) >= ncpus) {
printk(KERN_EMERG "IPI complete\n");
return;
}
printk(KERN_EMERG "ERROR: %d cpu(s) not responding\n",
- ncpus - cpumask_weight(&cpus_in_crash));
+ ncpus - atomic_read(&cpus_in_crash));
/*
* If we have a panic timeout set then we can't wait indefinitely
@@ -155,10 +161,10 @@ again:
* crash code again. We need to reset cpus_in_crash so we
* wait for everyone to do this.
*/
- cpus_in_crash = CPU_MASK_NONE;
+ atomic_set(&cpus_in_crash, 0);
smp_mb();
- while (cpumask_weight(&cpus_in_crash) < ncpus)
+ while (atomic_read(&cpus_in_crash) < ncpus)
cpu_relax();
}
@@ -316,7 +322,6 @@ void default_machine_crash_shutdown(stru
* such that another IPI will not be sent.
*/
crashing_cpu = smp_processor_id();
- crash_save_cpu(regs, crashing_cpu);
/*
* If we came in via system reset, wait a while for the secondary
@@ -326,7 +331,11 @@ void default_machine_crash_shutdown(stru
mdelay(PRIMARY_TIMEOUT);
crash_kexec_prepare_cpus(crashing_cpu);
- cpumask_set_cpu(crashing_cpu, &cpus_in_crash);
+
+ crash_save_cpu(regs, crashing_cpu);
+
+ time_to_dump = 1;
+
crash_kexec_wait_realmode(crashing_cpu);
machine_kexec_mask_interrupts();
^ permalink raw reply
* [PATCH] powerpc: Enable squashfs as a module
From: Anton Blanchard @ 2011-11-30 10:38 UTC (permalink / raw)
To: benh, paulus; +Cc: linuxppc-dev
Most distros use it so we may as well enable it and get regular compile
testing.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: linux-powerpc/arch/powerpc/configs/ppc64_defconfig
===================================================================
--- linux-powerpc.orig/arch/powerpc/configs/ppc64_defconfig 2011-11-03 10:36:30.106405452 +1100
+++ linux-powerpc/arch/powerpc/configs/ppc64_defconfig 2011-11-03 10:49:15.731301312 +1100
@@ -390,6 +390,11 @@ CONFIG_HUGETLBFS=y
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
CONFIG_CRAMFS=m
+CONFIG_SQUASHFS=m
+CONFIG_SQUASHFS_XATTR=y
+CONFIG_SQUASHFS_ZLIB=y
+CONFIG_SQUASHFS_LZO=y
+CONFIG_SQUASHFS_XZ=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
Index: linux-powerpc/arch/powerpc/configs/pseries_defconfig
===================================================================
--- linux-powerpc.orig/arch/powerpc/configs/pseries_defconfig 2011-11-03 10:36:30.130405855 +1100
+++ linux-powerpc/arch/powerpc/configs/pseries_defconfig 2011-11-03 10:49:15.731301312 +1100
@@ -304,6 +304,11 @@ CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_HUGETLBFS=y
CONFIG_CRAMFS=m
+CONFIG_SQUASHFS=m
+CONFIG_SQUASHFS_XATTR=y
+CONFIG_SQUASHFS_ZLIB=y
+CONFIG_SQUASHFS_LZO=y
+CONFIG_SQUASHFS_XZ=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
^ permalink raw reply
* Re: [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze
From: tiejun.chen @ 2011-11-30 11:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Jim Keniston, Anton Blanchard, linux-kernel, Steven Rostedt,
Yong Zhang, paulus, yrl.pp-manager.tt, Masami Hiramatsu,
linuxppc-dev
In-Reply-To: <1322626752.21641.22.camel@pasglop>
Benjamin Herrenschmidt wrote:
> On Fri, 2011-07-01 at 18:03 +0800, tiejun.chen wrote:
>> Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
>> 1> update pr_regs->gpr[1] = mem(old r1 + (-A))
>> 2> 'stw <old r1>, mem<(old r1 + (-A)) >
>>
>> You should notice the stack based on new r1 would be covered with mem<old r1
>> +(-A)>. So after this, the kernel exit from post_krpobe, something would be
>> broken. This should depend on sizeof(-A).
>>
>> For kprobe show_interrupts, you can see pregs->nip is re-written violently so
>> kernel issued.
>>
>> But sometimes we may only re-write some violate registers the kernel still
>> alive. And so this is just why the kernel works well for some kprobed point
>> after you change some kernel options/toolchains.
>>
>> If I'm correct its difficult to kprobe these stwu sp operation since the
>> sizeof(-A) is undermined for the kernel. So we have to implement in-depend
>> interrupt stack like PPC64.
>
> So I've spent a lot of time trying to find a better way to fix that bug
> and I think I might have finally found one :-)
I can understand what you mean in below since I remember you already clarified
this way previously.
>
> - When you try to emulate stwcx on the kernel stack (and only there),
I think it should be stwu/stdu.
> don't actually perform the store. Set a TIF flag instead to indicate
> special processing in the exception return path and store the address to
> update somewhere either in a free slot of the stack frame itself of
> somewhere in the thread struct (the former would be easier). You may as
> well do some sanity checking on the value while at it to catch errors
> early.
>
> - In the exception return code, we already test for various TIF flags
> (*** see comment below, it's even buggy today for preempt ***), so we
> add a test for that flag and go to do_work.
>
> - At the end of do_work, we check for this TIF flag. If it's not set or
> any other flag is set, move on as usual. However, if it's the only flag
> still set:
>
> - Copy the exception frame we're about to unwind to just -below- the
> new r1 value we want to write to. Then perform the write, and change
> r1 to point to that copy of the frame.
>
> - Branch to restore: which will unwind everything from that copy of
> the frame, and eventually set r1 to GPR(1) in the copy which contains
> the new value of r1.
We still can't restore this there. As you know this emulated store instruction
can touch any filed inside pt_regs. Sometimes nip would be involved for this
problematic location. And unfortunately, this is just that we meet currently. So
we have to go exc_exit_restart.
.globl exc_exit_restart
exc_exit_restart:
lwz r11,_NIP(r1)
lwz r12,_MSR(r1)
I mean we have to do that real restore here. So I'm really not sure if its a
good way to add such a codes including check TIF/copy-get new r1/restore
operation here since this is so deep for the exception return code.
exc_exit_start:
mtspr SPRN_SRR0,r11
mtspr SPRN_SRR1,r12
>
> This is the less intrusive approach and should work just fine, it's also
> more robust than anything I've been able to think of and the approach
> would work for 32 and 64-bit similarily.
>
> (***) Above comment about a bug: If you look at entry_64.S version of
> ret_from_except_lite you'll notice that in the !preempt case, after
> we've checked MSR_PR we test for any TIF flag in _TIF_USER_WORK_MASK to
> decide whether to go to do_work or not. However, in the preempt case, we
> do a convoluted trick to test SIGPENDING only if PR was set and always
> test NEED_RESCHED ... but we forget to test any other bit of
> _TIF_USER_WORK_MASK !!! So that means that with preempt, we completely
> fail to test for things like single step, syscall tracing, etc...
>
This is another problem we should address.
> I think this should be fixed at the same time, by simplifying the code
> by doing:
>
> - Test PR. If set, go to test_work_user, else continue (or the other
> way around and call it test_work_kernel)
>
> - In test_work_user, always test for _TIF_USER_WORK_MASK to decide to
> go to do_work, maybe call it do_user_work
>
> - In test_work_kernel, test for _TIF_KERNEL_WORK_MASK which is set to
> our new flag along with NEED_RESCHED if preempt is enabled and branch to
> do_kernel_work.
>
> do_user_work is basically the same as today's user_work
>
> do_kernel_work is basically the same as today preempt block with added
> code to handle the new flag as described above.
>
> Is anybody volunteering for fixing that ? I don't have the bandwidth
I always use one specific kprobe stack to fix this for BOOKE and work well in my
local tree :) Do you remember my v3 patch? I think its possible to extend this
for all PPC variants.
Anyway, I'd like to be this volunteer with our last solution.
Tiejun
> right now, but if nobody shows up I suppose I'll have to eventually deal
> with it myself :-)
>
> Cheers,
> Ben.
^ permalink raw reply
* Re: [PATCH 3/6] 44x: Removing dead CONFIG_PPC47x
From: Josh Boyer @ 2011-11-30 11:43 UTC (permalink / raw)
To: Tony Breeds; +Cc: Christoph Egger, LinuxPPC-dev
In-Reply-To: <1322630640-13708-4-git-send-email-tony@bakeyournoodle.com>
On Wed, Nov 30, 2011 at 12:23 AM, Tony Breeds <tony@bakeyournoodle.com> wro=
te:
> From: Christoph Egger <siccegge@cs.fau.de>
>
> CONFIG_PPC47x doesn't exist in Kconfig, therefore removing all
> references for it from the source code.
>
> Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
> ---
> =A0arch/powerpc/mm/44x_mmu.c | =A0 =A04 ----
> =A01 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
> index f60e006..5d4e3ff 100644
> --- a/arch/powerpc/mm/44x_mmu.c
> +++ b/arch/powerpc/mm/44x_mmu.c
> @@ -78,11 +78,7 @@ static void __init ppc44x_pin_tlb(unsigned int virt, u=
nsigned int phys)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"tlbwe =A0%1,%3,%5\n"
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"tlbwe =A0%0,%3,%6\n"
> =A0 =A0 =A0 =A0:
> -#ifdef CONFIG_PPC47x
> - =A0 =A0 =A0 : "r" (PPC47x_TLB2_S_RWX),
> -#else
> =A0 =A0 =A0 =A0: "r" (PPC44x_TLB_SW | PPC44x_TLB_SR | PPC44x_TLB_SX | PPC=
44x_TLB_G),
> -#endif
That doesn't look right. The code is there doing something, why is it
just being removed? I would think the change would be to use
CONFIG_PPC_47x?
Or if the code there isn't needed any longer, the changelog should say why.
josh
^ permalink raw reply
* Re: [PATCH 4/6] powerpc/boot: Add extended precision shifts to the boot wrapper.
From: Josh Boyer @ 2011-11-30 11:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: LinuxPPC-dev
In-Reply-To: <1322632107.21641.43.camel@pasglop>
On Wed, Nov 30, 2011 at 12:48 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2011-11-30 at 16:23 +1100, Tony Breeds wrote:
>> Code copied from arch/powerpc/kernel/misc_32.S
>>
>> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
>> ---
>> =A0arch/powerpc/boot/div64.S | =A0 52 ++++++++++++++++++++++++++++++++++=
+++++++++++
>> =A01 files changed, 52 insertions(+), 0 deletions(-)
>
> Should we just link with libgcc ? :-)
Please tell me you're joking.
However, adding this code and wonderful and all but why do we need to
add it? Changelog should say why.
josh
^ permalink raw reply
* [PATCH net-next v5 4/4] powerpc: tqm8548/tqm8xx: add and update CAN device nodes
From: Wolfgang Grandegger @ 2011-11-30 12:10 UTC (permalink / raw)
To: netdev; +Cc: devicetree-discuss, linux-can, linuxppc-dev, socketcan-users
In-Reply-To: <1322655035-18809-1-git-send-email-wg@grandegger.com>
This patch enables or updates support for the CC770 and AN82527
CAN controller on the TQM8548 and TQM8xx boards.
CC: devicetree-discuss@lists.ozlabs.org
CC: linuxppc-dev@ozlabs.org
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
---
arch/powerpc/boot/dts/tqm8548-bigflash.dts | 19 ++++++++++++++-----
arch/powerpc/boot/dts/tqm8548.dts | 19 ++++++++++++++-----
arch/powerpc/boot/dts/tqm8xx.dts | 25 +++++++++++++++++++++++++
3 files changed, 53 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/boot/dts/tqm8548-bigflash.dts b/arch/powerpc/boot/dts/tqm8548-bigflash.dts
index 9452c3c..d918752 100644
--- a/arch/powerpc/boot/dts/tqm8548-bigflash.dts
+++ b/arch/powerpc/boot/dts/tqm8548-bigflash.dts
@@ -352,7 +352,7 @@
ranges = <
0 0x0 0xfc000000 0x04000000 // NOR FLASH bank 1
1 0x0 0xf8000000 0x08000000 // NOR FLASH bank 0
- 2 0x0 0xa3000000 0x00008000 // CAN (2 x i82527)
+ 2 0x0 0xa3000000 0x00008000 // CAN (2 x CC770)
3 0x0 0xa3010000 0x00008000 // NAND FLASH
>;
@@ -393,18 +393,27 @@
};
/* Note: CAN support needs be enabled in U-Boot */
- can0@2,0 {
- compatible = "intel,82527"; // Bosch CC770
+ can@2,0 {
+ compatible = "bosch,cc770"; // Bosch CC770
reg = <2 0x0 0x100>;
interrupts = <4 1>;
interrupt-parent = <&mpic>;
+ bosch,external-clock-frequency = <16000000>;
+ bosch,disconnect-rx1-input;
+ bosch,disconnect-tx1-output;
+ bosch,iso-low-speed-mux;
+ bosch,clock-out-frequency = <16000000>;
};
- can1@2,100 {
- compatible = "intel,82527"; // Bosch CC770
+ can@2,100 {
+ compatible = "bosch,cc770"; // Bosch CC770
reg = <2 0x100 0x100>;
interrupts = <4 1>;
interrupt-parent = <&mpic>;
+ bosch,external-clock-frequency = <16000000>;
+ bosch,disconnect-rx1-input;
+ bosch,disconnect-tx1-output;
+ bosch,iso-low-speed-mux;
};
/* Note: NAND support needs to be enabled in U-Boot */
diff --git a/arch/powerpc/boot/dts/tqm8548.dts b/arch/powerpc/boot/dts/tqm8548.dts
index 619776f..988d887 100644
--- a/arch/powerpc/boot/dts/tqm8548.dts
+++ b/arch/powerpc/boot/dts/tqm8548.dts
@@ -352,7 +352,7 @@
ranges = <
0 0x0 0xfc000000 0x04000000 // NOR FLASH bank 1
1 0x0 0xf8000000 0x08000000 // NOR FLASH bank 0
- 2 0x0 0xe3000000 0x00008000 // CAN (2 x i82527)
+ 2 0x0 0xe3000000 0x00008000 // CAN (2 x CC770)
3 0x0 0xe3010000 0x00008000 // NAND FLASH
>;
@@ -393,18 +393,27 @@
};
/* Note: CAN support needs be enabled in U-Boot */
- can0@2,0 {
- compatible = "intel,82527"; // Bosch CC770
+ can@2,0 {
+ compatible = "bosch,cc770"; // Bosch CC770
reg = <2 0x0 0x100>;
interrupts = <4 1>;
interrupt-parent = <&mpic>;
+ bosch,external-clock-frequency = <16000000>;
+ bosch,disconnect-rx1-input;
+ bosch,disconnect-tx1-output;
+ bosch,iso-low-speed-mux;
+ bosch,clock-out-frequency = <16000000>;
};
- can1@2,100 {
- compatible = "intel,82527"; // Bosch CC770
+ can@2,100 {
+ compatible = "bosch,cc770"; // Bosch CC770
reg = <2 0x100 0x100>;
interrupts = <4 1>;
interrupt-parent = <&mpic>;
+ bosch,external-clock-frequency = <16000000>;
+ bosch,disconnect-rx1-input;
+ bosch,disconnect-tx1-output;
+ bosch,iso-low-speed-mux;
};
/* Note: NAND support needs to be enabled in U-Boot */
diff --git a/arch/powerpc/boot/dts/tqm8xx.dts b/arch/powerpc/boot/dts/tqm8xx.dts
index f6da7ec..c3dba25 100644
--- a/arch/powerpc/boot/dts/tqm8xx.dts
+++ b/arch/powerpc/boot/dts/tqm8xx.dts
@@ -57,6 +57,7 @@
ranges = <
0x0 0x0 0x40000000 0x800000
+ 0x3 0x0 0xc0000000 0x200
>;
flash@0,0 {
@@ -67,6 +68,30 @@
bank-width = <4>;
device-width = <2>;
};
+
+ /* Note: CAN support needs be enabled in U-Boot */
+ can@3,0 {
+ compatible = "intc,82527";
+ reg = <3 0x0 0x80>;
+ interrupts = <8 1>;
+ interrupt-parent = <&PIC>;
+ bosch,external-clock-frequency = <16000000>;
+ bosch,disconnect-rx1-input;
+ bosch,disconnect-tx1-output;
+ bosch,iso-low-speed-mux;
+ bosch,clock-out-frequency = <16000000>;
+ };
+
+ can@3,100 {
+ compatible = "intc,82527";
+ reg = <3 0x100 0x80>;
+ interrupts = <8 1>;
+ interrupt-parent = <&PIC>;
+ bosch,external-clock-frequency = <16000000>;
+ bosch,disconnect-rx1-input;
+ bosch,disconnect-tx1-output;
+ bosch,iso-low-speed-mux;
+ };
};
soc@fff00000 {
--
1.7.4.1
^ permalink raw reply related
* [PATCH net-next v5 3/4] can: cc770: add platform bus driver for the CC770 and AN82527
From: Wolfgang Grandegger @ 2011-11-30 12:10 UTC (permalink / raw)
To: netdev; +Cc: Devicetree-discuss, linux-can, linuxppc-dev, socketcan-users
In-Reply-To: <1322655035-18809-1-git-send-email-wg@grandegger.com>
This driver works with both, static platform data and device tree
bindings. It has been tested on a TQM855L board with two AN82527
CAN controllers on the local bus.
CC: Devicetree-discuss@lists.ozlabs.org
CC: linuxppc-dev@ozlabs.org
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
.../devicetree/bindings/net/can/cc770.txt | 56 ++++
drivers/net/can/cc770/Kconfig | 7 +
drivers/net/can/cc770/Makefile | 1 +
drivers/net/can/cc770/cc770_platform.c | 273 ++++++++++++++++++++
4 files changed, 337 insertions(+), 0 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/can/cc770.txt
create mode 100644 drivers/net/can/cc770/cc770_platform.c
diff --git a/Documentation/devicetree/bindings/net/can/cc770.txt b/Documentation/devicetree/bindings/net/can/cc770.txt
new file mode 100644
index 0000000..01e282d
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/can/cc770.txt
@@ -0,0 +1,56 @@
+Memory mapped Bosch CC770 and Intel AN82527 CAN controller
+
+Note: The CC770 is a CAN controller from Bosch, which is 100%
+compatible with the old AN82527 from Intel, but with "bugs" being fixed.
+
+Required properties:
+
+- compatible : should be "bosch,cc770" for the CC770 and "intc,82527"
+ for the AN82527.
+
+- reg : should specify the chip select, address offset and size required
+ to map the registers of the controller. The size is usually 0x80.
+
+- interrupts : property with a value describing the interrupt source
+ (number and sensitivity) required for the controller.
+
+Optional properties:
+
+- bosch,external-clock-frequency : frequency of the external oscillator
+ clock in Hz. Note that the internal clock frequency used by the
+ controller is half of that value. If not specified, a default
+ value of 16000000 (16 MHz) is used.
+
+- bosch,clock-out-frequency : slock frequency in Hz on the CLKOUT pin.
+ If not specified or if the specified value is 0, the CLKOUT pin
+ will be disabled.
+
+- bosch,slew-rate : slew rate of the CLKOUT signal. If not specified,
+ a resonable value will be calculated.
+
+- bosch,disconnect-rx0-input : see data sheet.
+
+- bosch,disconnect-rx1-input : see data sheet.
+
+- bosch,disconnect-tx1-output : see data sheet.
+
+- bosch,polarity-dominant : see data sheet.
+
+- bosch,divide-memory-clock : see data sheet.
+
+- bosch,iso-low-speed-mux : see data sheet.
+
+For further information, please have a look to the CC770 or AN82527.
+
+Examples:
+
+can@3,100 {
+ compatible = "bosch,cc770";
+ reg = <3 0x100 0x80>;
+ interrupts = <2 0>;
+ interrupt-parent = <&mpic>;
+ bosch,external-clock-frequency = <16000000>;
+};
+
+
+
diff --git a/drivers/net/can/cc770/Kconfig b/drivers/net/can/cc770/Kconfig
index 28e4d48..22c07a8 100644
--- a/drivers/net/can/cc770/Kconfig
+++ b/drivers/net/can/cc770/Kconfig
@@ -11,4 +11,11 @@ config CAN_CC770_ISA
connected to the ISA bus using I/O port, memory mapped or
indirect access.
+config CAN_CC770_PLATFORM
+ tristate "Generic Platform Bus based CC770 driver"
+ ---help---
+ This driver adds support for the CC770 and AN82527 chips
+ connected to the "platform bus" (Linux abstraction for directly
+ to the processor attached devices).
+
endif
diff --git a/drivers/net/can/cc770/Makefile b/drivers/net/can/cc770/Makefile
index 872ecff..9fb8321 100644
--- a/drivers/net/can/cc770/Makefile
+++ b/drivers/net/can/cc770/Makefile
@@ -4,5 +4,6 @@
obj-$(CONFIG_CAN_CC770) += cc770.o
obj-$(CONFIG_CAN_CC770_ISA) += cc770_isa.o
+obj-$(CONFIG_CAN_CC770_PLATFORM) += cc770_platform.o
ccflags-$(CONFIG_CAN_DEBUG_DEVICES) := -DDEBUG
diff --git a/drivers/net/can/cc770/cc770_platform.c b/drivers/net/can/cc770/cc770_platform.c
new file mode 100644
index 0000000..fb87b22
--- /dev/null
+++ b/drivers/net/can/cc770/cc770_platform.c
@@ -0,0 +1,273 @@
+/*
+ * Driver for CC770 and AN82527 CAN controllers on the platform bus
+ *
+ * Copyright (C) 2009, 2011 Wolfgang Grandegger <wg@grandegger.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the version 2 of the GNU General Public License
+ * as published by the Free Software Foundation
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/*
+ * If platform data are used you should have similar definitions
+ * in your board-specific code:
+ *
+ * static struct cc770_platform_data myboard_cc770_pdata = {
+ * .osc_freq = 16000000,
+ * .cir = 0x41,
+ * .cor = 0x20,
+ * .bcr = 0x40,
+ * };
+ *
+ * Please see include/linux/can/platform/cc770.h for description of
+ * above fields.
+ *
+ * If the device tree is used, you need a CAN node definition in your
+ * DTS file similar to:
+ *
+ * can@3,100 {
+ * compatible = "bosch,cc770";
+ * reg = <3 0x100 0x80>;
+ * interrupts = <2 0>;
+ * interrupt-parent = <&mpic>;
+ * bosch,external-clock-frequency = <16000000>;
+ * };
+ *
+ * See "Documentation/devicetree/bindings/net/can/cc770.txt" for further
+ * information.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/netdevice.h>
+#include <linux/delay.h>
+#include <linux/can.h>
+#include <linux/can/dev.h>
+#include <linux/can/platform/cc770.h>
+
+#include <linux/of_platform.h>
+
+#include "cc770.h"
+
+#define DRV_NAME "cc770_platform"
+
+MODULE_AUTHOR("Wolfgang Grandegger <wg@grandegger.com>");
+MODULE_DESCRIPTION("Socket-CAN driver for CC770 on the platform bus");
+MODULE_LICENSE("GPL v2");
+
+#define CC770_PLATFORM_CAN_CLOCK 16000000
+
+static u8 cc770_platform_read_reg(const struct cc770_priv *priv, int reg)
+{
+ return in_8(priv->reg_base + reg);
+}
+
+static void cc770_platform_write_reg(const struct cc770_priv *priv, int reg,
+ u8 val)
+{
+ out_8(priv->reg_base + reg, val);
+}
+
+static int __devinit cc770_get_of_node_data(struct platform_device *pdev,
+ struct cc770_priv *priv)
+{
+ struct device_node *np = pdev->dev.of_node;
+ const u32 *prop;
+ int prop_size;
+ u32 clkext;
+
+ prop = of_get_property(np, "bosch,external-clock-frequency",
+ &prop_size);
+ if (prop && (prop_size == sizeof(u32)))
+ clkext = *prop;
+ else
+ clkext = CC770_PLATFORM_CAN_CLOCK; /* default */
+ priv->can.clock.freq = clkext;
+
+ /* The system clock may not exceed 10 MHz */
+ if (priv->can.clock.freq > 10000000) {
+ priv->cpu_interface |= CPUIF_DSC;
+ priv->can.clock.freq /= 2;
+ }
+
+ /* The memory clock may not exceed 8 MHz */
+ if (priv->can.clock.freq > 8000000)
+ priv->cpu_interface |= CPUIF_DMC;
+
+ if (of_get_property(np, "bosch,divide-memory-clock", NULL))
+ priv->cpu_interface |= CPUIF_DMC;
+ if (of_get_property(np, "bosch,iso-low-speed-mux", NULL))
+ priv->cpu_interface |= CPUIF_MUX;
+
+ if (!of_get_property(np, "bosch,no-comperator-bypass", NULL))
+ priv->bus_config |= BUSCFG_CBY;
+ if (of_get_property(np, "bosch,disconnect-rx0-input", NULL))
+ priv->bus_config |= BUSCFG_DR0;
+ if (of_get_property(np, "bosch,disconnect-rx1-input", NULL))
+ priv->bus_config |= BUSCFG_DR1;
+ if (of_get_property(np, "bosch,disconnect-tx1-output", NULL))
+ priv->bus_config |= BUSCFG_DT1;
+ if (of_get_property(np, "bosch,polarity-dominant", NULL))
+ priv->bus_config |= BUSCFG_POL;
+
+ prop = of_get_property(np, "bosch,clock-out-frequency", &prop_size);
+ if (prop && (prop_size == sizeof(u32)) && *prop > 0) {
+ u32 cdv = clkext / *prop;
+ int slew;
+
+ if (cdv > 0 && cdv < 16) {
+ priv->cpu_interface |= CPUIF_CEN;
+ priv->clkout |= (cdv - 1) & CLKOUT_CD_MASK;
+
+ prop = of_get_property(np, "bosch,slew-rate",
+ &prop_size);
+ if (prop && (prop_size == sizeof(u32))) {
+ slew = *prop;
+ } else {
+ /* Determine default slew rate */
+ slew = (CLKOUT_SL_MASK >>
+ CLKOUT_SL_SHIFT) -
+ ((cdv * clkext - 1) / 8000000);
+ if (slew < 0)
+ slew = 0;
+ }
+ priv->clkout |= (slew << CLKOUT_SL_SHIFT) &
+ CLKOUT_SL_MASK;
+ } else {
+ dev_dbg(&pdev->dev, "invalid clock-out-frequency\n");
+ }
+ }
+
+ return 0;
+}
+
+static int __devinit cc770_get_platform_data(struct platform_device *pdev,
+ struct cc770_priv *priv)
+{
+
+ struct cc770_platform_data *pdata = pdev->dev.platform_data;
+
+ priv->can.clock.freq = pdata->osc_freq;
+ if (priv->cpu_interface | CPUIF_DSC)
+ priv->can.clock.freq /= 2;
+ priv->clkout = pdata->cor;
+ priv->bus_config = pdata->bcr;
+ priv->cpu_interface = pdata->cir;
+
+ return 0;
+}
+
+static int __devinit cc770_platform_probe(struct platform_device *pdev)
+{
+ struct net_device *dev;
+ struct cc770_priv *priv;
+ struct resource *mem;
+ resource_size_t mem_size;
+ void __iomem *base;
+ int err, irq;
+
+ mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ irq = platform_get_irq(pdev, 0);
+ if (!mem || irq <= 0)
+ return -ENODEV;
+
+ mem_size = resource_size(mem);
+ if (!request_mem_region(mem->start, mem_size, pdev->name))
+ return -EBUSY;
+
+ base = ioremap(mem->start, mem_size);
+ if (!base) {
+ err = -ENOMEM;
+ goto exit_release_mem;
+ }
+
+ dev = alloc_cc770dev(0);
+ if (!dev) {
+ err = -ENOMEM;
+ goto exit_unmap_mem;
+ }
+
+ dev->irq = irq;
+ priv = netdev_priv(dev);
+ priv->read_reg = cc770_platform_read_reg;
+ priv->write_reg = cc770_platform_write_reg;
+ priv->irq_flags = IRQF_SHARED;
+ priv->reg_base = base;
+
+ if (pdev->dev.of_node)
+ err = cc770_get_of_node_data(pdev, priv);
+ else if (pdev->dev.platform_data)
+ err = cc770_get_platform_data(pdev, priv);
+ else
+ err = -ENODEV;
+ if (err)
+ goto exit_free_cc770;
+
+ dev_dbg(&pdev->dev,
+ "reg_base=0x%p irq=%d clock=%d cpu_interface=0x%02x "
+ "bus_config=0x%02x clkout=0x%02x\n",
+ priv->reg_base, dev->irq, priv->can.clock.freq,
+ priv->cpu_interface, priv->bus_config, priv->clkout);
+
+ dev_set_drvdata(&pdev->dev, dev);
+ SET_NETDEV_DEV(dev, &pdev->dev);
+
+ err = register_cc770dev(dev);
+ if (err) {
+ dev_err(&pdev->dev,
+ "couldn't register CC700 device (err=%d)\n", err);
+ goto exit_free_cc770;
+ }
+
+ return 0;
+
+exit_free_cc770:
+ free_cc770dev(dev);
+exit_unmap_mem:
+ iounmap(base);
+exit_release_mem:
+ release_mem_region(mem->start, mem_size);
+
+ return err;
+}
+
+static int __devexit cc770_platform_remove(struct platform_device *pdev)
+{
+ struct net_device *dev = dev_get_drvdata(&pdev->dev);
+ struct cc770_priv *priv = netdev_priv(dev);
+ struct resource *mem;
+
+ unregister_cc770dev(dev);
+ iounmap(priv->reg_base);
+ free_cc770dev(dev);
+
+ mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ release_mem_region(mem->start, resource_size(mem));
+
+ return 0;
+}
+
+static struct of_device_id __devinitdata cc770_platform_table[] = {
+ {.compatible = "bosch,cc770"}, /* CC770 from Bosch */
+ {.compatible = "intc,82527"}, /* AN82527 from Intel CP */
+ {},
+};
+
+static struct platform_driver cc770_platform_driver = {
+ .driver = {
+ .name = DRV_NAME,
+ .owner = THIS_MODULE,
+ .of_match_table = cc770_platform_table,
+ },
+ .probe = cc770_platform_probe,
+ .remove = __devexit_p(cc770_platform_remove),
+};
+
+module_platform_driver(cc770_platform_driver);
+
--
1.7.4.1
^ permalink raw reply related
* [RFC PATCH v3 0/4] cpuidle: (POWER) cpuidle driver for pSeries
From: Deepthi Dharwar @ 2011-11-30 12:46 UTC (permalink / raw)
To: benh, len.brown; +Cc: linuxppc-dev, linux-pm, linux-kernel, linux-pm
This patch series ports the cpuidle framework for ppc64 platform and
implements a cpuidle back-end driver for ppc64 (pSeries) platform.
Currently idle states are managed by pseries_{dedicated,shared}_idle_sleep()
routines in arch/powerpc/platforms/pseries/setup.c. There are
two idle states (snooze and cede) that are exploited by
these routines based on simple heuristics.
Moving the idle states over to cpuidle framework can take advantage of
the advanced heuristics, tunables, and features provided by cpuidle
framework. Additional idle states like extended cede with hints would be
included and exploited using the cpuidle framework. The statistics and
tracing infrastructure provided by the cpuidle framework also helps in
enabling power management related tools and help tune the system and
applications.
This series aims to maintain compatibility and functionality to
existing pSeries idle cpu management code. There are no new functions
or idle states added as part of this series.
The previous version of this patch can be found at
https://lkml.org/lkml/2011/11/17/127
Changes from the previous version (v2):
1] Rebased to latest 3.2-rc3
2] Incorporated the changes from the feedback provided by Ben
in the previous version of this series.
This patch series includes:
[1/4] - Provides arch specific cpu_idle_wait() function required by cpuidle
subsystem.
[2/4] - pseries_idle cpuidle driver
[3/4] - Enables cpuidle for pSeries and directly calls cpuidle_idle_call()
[4/4] - Handles powersave=off kernel boot parameter and disables registration
of pseries_idle cpuidle driver.
This series has been tested on ppc64 pSeries POWER7 system with the snooze
and cede states
--
arch/powerpc/Kconfig | 4
arch/powerpc/include/asm/processor.h | 3
arch/powerpc/include/asm/system.h | 9 +
arch/powerpc/kernel/idle.c | 27 ++
arch/powerpc/kernel/sysfs.c | 2
arch/powerpc/platforms/Kconfig | 6
arch/powerpc/platforms/pseries/Kconfig | 9 +
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 329 +++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 3
arch/powerpc/platforms/pseries/setup.c | 104 +------
arch/powerpc/platforms/pseries/smp.c | 1
include/linux/cpuidle.h | 2
13 files changed, 411 insertions(+), 89 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
-Deepthi
^ permalink raw reply
* [RFC PATCH v3 1/4] cpuidle: (powerpc) Add cpu_idle_wait() to allow switching of idle routines
From: Deepthi Dharwar @ 2011-11-30 12:46 UTC (permalink / raw)
To: benh, len.brown; +Cc: linuxppc-dev, linux-pm, linux-kernel, linux-pm
In-Reply-To: <20111130124608.972.87712.stgit@deepthi-ThinkPad-T420>
This patch provides cpu_idle_wait() routine for the powerpc
platform which is required by the cpuidle subsystem. This
routine is required to change the idle handler on SMP systems.
The equivalent routine for x86 is in arch/x86/kernel/process.c
but the powerpc implementation is different.
cpuidle_disable variable is to enable/disable cpuidle
framework if power_save option is set during the boot
time.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
---
arch/powerpc/Kconfig | 4 ++++
arch/powerpc/include/asm/processor.h | 2 ++
arch/powerpc/include/asm/system.h | 1 +
arch/powerpc/kernel/idle.c | 27 +++++++++++++++++++++++++++
4 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 951e18f..beeeec7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -87,6 +87,10 @@ config ARCH_HAS_ILOG2_U64
bool
default y if 64BIT
+config ARCH_HAS_CPU_IDLE_WAIT
+ bool
+ default y
+
config GENERIC_HWEIGHT
bool
default y
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index eb11a44..811b7e7 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -382,6 +382,8 @@ static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32)
}
#endif
+enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
+
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PROCESSOR_H */
diff --git a/arch/powerpc/include/asm/system.h b/arch/powerpc/include/asm/system.h
index e30a13d..ff66680 100644
--- a/arch/powerpc/include/asm/system.h
+++ b/arch/powerpc/include/asm/system.h
@@ -221,6 +221,7 @@ extern unsigned long klimit;
extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
extern int powersave_nap; /* set if nap mode can be used in idle loop */
+void cpu_idle_wait(void);
/*
* Atomic exchange
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..8574b0e 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -39,9 +39,13 @@
#define cpu_should_die() 0
#endif
+unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
+EXPORT_SYMBOL(cpuidle_disable);
+
static int __init powersave_off(char *arg)
{
ppc_md.power_save = NULL;
+ cpuidle_disable = IDLE_POWERSAVE_OFF;
return 0;
}
__setup("powersave=off", powersave_off);
@@ -102,6 +106,29 @@ void cpu_idle(void)
}
}
+
+/*
+ * cpu_idle_wait - Used to ensure that all the CPUs come out of the old
+ * idle loop and start using the new idle loop.
+ * Required while changing idle handler on SMP systems.
+ * Caller must have changed idle handler to the new value before the call.
+ * This window may be larger on shared systems.
+ */
+void cpu_idle_wait(void)
+{
+ int cpu;
+ smp_mb();
+
+ /* kick all the CPUs so that they exit out of old idle routine */
+ get_online_cpus();
+ for_each_online_cpu(cpu) {
+ if (cpu != smp_processor_id())
+ smp_send_reschedule(cpu);
+ }
+ put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(cpu_idle_wait);
+
int powersave_nap;
#ifdef CONFIG_SYSCTL
^ permalink raw reply related
* [RFC PATCH v3 2/4] cpuidle: (POWER) cpuidle driver for pSeries
From: Deepthi Dharwar @ 2011-11-30 12:46 UTC (permalink / raw)
To: benh, len.brown; +Cc: linuxppc-dev, linux-pm, linux-kernel, linux-pm
In-Reply-To: <20111130124608.972.87712.stgit@deepthi-ThinkPad-T420>
This patch implements a back-end cpuidle driver for pSeries
based on pseries_dedicated_idle_loop and pseries_shared_idle_loop
routines. The driver is built only if CONFIG_CPU_IDLE is set. This
cpuidle driver uses global registration of idle states and
not per-cpu.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
---
arch/powerpc/include/asm/system.h | 8 +
arch/powerpc/kernel/sysfs.c | 2
arch/powerpc/platforms/pseries/Kconfig | 9 +
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 326 +++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 3
arch/powerpc/platforms/pseries/setup.c | 3
arch/powerpc/platforms/pseries/smp.c | 1
8 files changed, 350 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/processor_idle.c
diff --git a/arch/powerpc/include/asm/system.h b/arch/powerpc/include/asm/system.h
index ff66680..f56a0a7 100644
--- a/arch/powerpc/include/asm/system.h
+++ b/arch/powerpc/include/asm/system.h
@@ -223,6 +223,14 @@ extern void *zalloc_maybe_bootmem(size_t size, gfp_t mask);
extern int powersave_nap; /* set if nap mode can be used in idle loop */
void cpu_idle_wait(void);
+#ifdef CONFIG_PSERIES_IDLE
+extern void update_smt_snooze_delay(int snooze);
+extern int pseries_notify_cpuidle_add_cpu(int cpu);
+#else
+static inline void update_smt_snooze_delay(int snooze) {}
+static inline int pseries_notify_cpuidle_add_cpu(int cpu) { return 0; }
+#endif
+
/*
* Atomic exchange
*
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index ce035c1..ebe5d78 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
#include <asm/machdep.h>
#include <asm/smp.h>
#include <asm/pmc.h>
+#include <asm/system.h>
#include "cacheinfo.h"
@@ -51,6 +52,7 @@ static ssize_t store_smt_snooze_delay(struct sys_device *dev,
return -EINVAL;
per_cpu(smt_snooze_delay, cpu->sysdev.id) = snooze;
+ update_smt_snooze_delay(snooze);
return count;
}
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index c81f6bb..ae7b6d4 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -120,3 +120,12 @@ config DTL
which are accessible through a debugfs file.
Say N if you are unsure.
+
+config PSERIES_IDLE
+ tristate "Cpuidle driver for pSeries platforms"
+ depends on CPU_IDLE
+ depends on PPC_PSERIES
+ default y
+ help
+ Select this option to enable processor idle state management
+ through cpuidle subsystem.
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 3556e40..236db46 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_DTL) += dtl.o
obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
+obj-$(CONFIG_PSERIES_IDLE) += processor_idle.o
ifeq ($(CONFIG_PPC_PSERIES),y)
obj-$(CONFIG_SUSPEND) += suspend.o
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
new file mode 100644
index 0000000..352b78a
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -0,0 +1,326 @@
+/*
+ * processor_idle - idle state cpuidle driver.
+ * Adapted from drivers/idle/intel_idle.c and
+ * drivers/acpi/processor_idle.c
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/moduleparam.h>
+#include <linux/cpuidle.h>
+#include <linux/cpu.h>
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <asm/system.h>
+#include <asm/machdep.h>
+#include <asm/firmware.h>
+
+#include "plpar_wrappers.h"
+#include "pseries.h"
+
+struct cpuidle_driver pseries_idle_driver = {
+ .name = "pseries_idle",
+ .owner = THIS_MODULE,
+};
+
+#define MAX_IDLE_STATE_COUNT 2
+
+static int max_idle_state = MAX_IDLE_STATE_COUNT - 1;
+static struct cpuidle_device __percpu *pseries_cpuidle_devices;
+static struct cpuidle_state *cpuidle_state_table;
+
+void update_smt_snooze_delay(int snooze)
+{
+ struct cpuidle_driver *drv = cpuidle_get_driver();
+ if (drv)
+ drv->states[0].target_residency = snooze;
+}
+
+static inline void idle_loop_prolog(unsigned long *in_purr, ktime_t *kt_before)
+{
+
+ *kt_before = ktime_get_real();
+ *in_purr = mfspr(SPRN_PURR);
+ /*
+ * Indicate to the HV that we are idle. Now would be
+ * a good time to find other work to dispatch.
+ */
+ get_lppaca()->idle = 1;
+}
+
+static inline s64 idle_loop_epilog(unsigned long in_purr, ktime_t kt_before)
+{
+ get_lppaca()->wait_state_cycles += mfspr(SPRN_PURR) - in_purr;
+ get_lppaca()->idle = 0;
+
+ return ktime_to_us(ktime_sub(ktime_get_real(), kt_before));
+}
+
+static int snooze_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr;
+ ktime_t kt_before;
+ unsigned long start_snooze;
+ long snooze = drv->states[0].target_residency;
+
+ idle_loop_prolog(&in_purr, &kt_before);
+
+ if (snooze) {
+ start_snooze = get_tb() + snooze * tb_ticks_per_usec;
+ local_irq_enable();
+ set_thread_flag(TIF_POLLING_NRFLAG);
+
+ while ((snooze < 0) || (get_tb() < start_snooze)) {
+ if (need_resched() || cpu_is_offline(dev->cpu))
+ goto out;
+ ppc64_runlatch_off();
+ HMT_low();
+ HMT_very_low();
+ }
+
+ HMT_medium();
+ clear_thread_flag(TIF_POLLING_NRFLAG);
+ smp_mb();
+ local_irq_disable();
+ }
+
+out:
+ HMT_medium();
+ dev->last_residency =
+ (int)idle_loop_epilog(in_purr, kt_before);
+ return index;
+}
+
+static int dedicated_cede_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr;
+ ktime_t kt_before;
+
+ idle_loop_prolog(&in_purr, &kt_before);
+ get_lppaca()->donate_dedicated_cpu = 1;
+
+ ppc64_runlatch_off();
+ HMT_medium();
+ cede_processor();
+
+ get_lppaca()->donate_dedicated_cpu = 0;
+ dev->last_residency =
+ (int)idle_loop_epilog(in_purr, kt_before);
+ return index;
+}
+
+static int shared_cede_loop(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv,
+ int index)
+{
+ unsigned long in_purr;
+ ktime_t kt_before;
+
+ idle_loop_prolog(&in_purr, &kt_before);
+
+ /*
+ * Yield the processor to the hypervisor. We return if
+ * an external interrupt occurs (which are driven prior
+ * to returning here) or if a prod occurs from another
+ * processor. When returning here, external interrupts
+ * are enabled.
+ */
+ cede_processor();
+
+ dev->last_residency =
+ (int)idle_loop_epilog(in_purr, kt_before);
+ return index;
+}
+
+/*
+ * States for dedicated partition case.
+ */
+static struct cpuidle_state dedicated_states[MAX_IDLE_STATE_COUNT] = {
+ { /* Snooze */
+ .name = "snooze",
+ .desc = "snooze",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .enter = &snooze_loop },
+ { /* CEDE */
+ .name = "CEDE",
+ .desc = "CEDE",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 1,
+ .target_residency = 10,
+ .enter = &dedicated_cede_loop },
+};
+
+/*
+ * States for shared partition case.
+ */
+static struct cpuidle_state shared_states[MAX_IDLE_STATE_COUNT] = {
+ { /* Shared Cede */
+ .name = "Shared Cede",
+ .desc = "Shared Cede",
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 0,
+ .target_residency = 0,
+ .enter = &shared_cede_loop },
+};
+
+int pseries_notify_cpuidle_add_cpu(int cpu)
+{
+ struct cpuidle_device *dev =
+ per_cpu_ptr(pseries_cpuidle_devices, cpu);
+ if (dev && cpuidle_get_driver()) {
+ cpuidle_disable_device(dev);
+ cpuidle_enable_device(dev);
+ }
+ return 0;
+}
+
+/*
+ * pseries_cpuidle_driver_init()
+ */
+static int pseries_cpuidle_driver_init(void)
+{
+ int idle_state;
+ struct cpuidle_driver *drv = &pseries_idle_driver;
+
+ drv->state_count = 0;
+
+ for (idle_state = 0; idle_state < MAX_IDLE_STATE_COUNT; ++idle_state) {
+
+ if (idle_state > max_idle_state)
+ break;
+
+ /* is the state not enabled? */
+ if (cpuidle_state_table[idle_state].enter == NULL)
+ continue;
+
+ drv->states[drv->state_count] = /* structure copy */
+ cpuidle_state_table[idle_state];
+
+ if (cpuidle_state_table == dedicated_states)
+ drv->states[drv->state_count].target_residency =
+ __get_cpu_var(smt_snooze_delay);
+
+ drv->state_count += 1;
+ }
+
+ return 0;
+}
+
+/* pseries_idle_devices_uninit(void)
+ * unregister cpuidle devices and de-allocate memory
+ */
+static void pseries_idle_devices_uninit(void)
+{
+ int i;
+ struct cpuidle_device *dev;
+
+ for_each_possible_cpu(i) {
+ dev = per_cpu_ptr(pseries_cpuidle_devices, i);
+ cpuidle_unregister_device(dev);
+ }
+
+ free_percpu(pseries_cpuidle_devices);
+ return;
+}
+
+/* pseries_idle_devices_init()
+ * allocate, initialize and register cpuidle device
+ */
+static int pseries_idle_devices_init(void)
+{
+ int i;
+ struct cpuidle_driver *drv = &pseries_idle_driver;
+ struct cpuidle_device *dev;
+
+ pseries_cpuidle_devices = alloc_percpu(struct cpuidle_device);
+ if (pseries_cpuidle_devices == NULL)
+ return -ENOMEM;
+
+ for_each_possible_cpu(i) {
+ dev = per_cpu_ptr(pseries_cpuidle_devices, i);
+ dev->state_count = drv->state_count;
+ dev->cpu = i;
+ if (cpuidle_register_device(dev)) {
+ printk(KERN_DEBUG \
+ "cpuidle_register_device %d failed!\n", i);
+ return -EIO;
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * pseries_idle_probe()
+ * Choose state table for shared versus dedicated partition
+ */
+static int pseries_idle_probe(void)
+{
+
+ if (!firmware_has_feature(FW_FEATURE_SPLPAR))
+ return -ENODEV;
+
+ if (max_idle_state == 0) {
+ printk(KERN_DEBUG "pseries processor idle disabled.\n");
+ return -EPERM;
+ }
+
+ if (get_lppaca()->shared_proc)
+ cpuidle_state_table = shared_states;
+ else
+ cpuidle_state_table = dedicated_states;
+
+ return 0;
+}
+
+static int __init pseries_processor_idle_init(void)
+{
+ int retval;
+
+ retval = pseries_idle_probe();
+ if (retval)
+ return retval;
+
+ pseries_cpuidle_driver_init();
+ retval = cpuidle_register_driver(&pseries_idle_driver);
+ if (retval) {
+ printk(KERN_DEBUG "Registration of pseries driver failed.\n");
+ return retval;
+ }
+
+ retval = pseries_idle_devices_init();
+ if (retval) {
+ pseries_idle_devices_uninit();
+ cpuidle_unregister_driver(&pseries_idle_driver);
+ return retval;
+ }
+
+ printk(KERN_DEBUG "pseries_idle_driver registered\n");
+
+ return 0;
+}
+
+static void __exit pseries_processor_idle_exit(void)
+{
+
+ pseries_idle_devices_uninit();
+ cpuidle_unregister_driver(&pseries_idle_driver);
+
+ return;
+}
+
+module_init(pseries_processor_idle_init);
+module_exit(pseries_processor_idle_exit);
+
+MODULE_AUTHOR("Deepthi Dharwar <deepthi@linux.vnet.ibm.com>");
+MODULE_DESCRIPTION("Cpuidle driver for POWER");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 24c7162..9a3dda0 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -57,4 +57,7 @@ extern struct device_node *dlpar_configure_connector(u32);
extern int dlpar_attach_node(struct device_node *);
extern int dlpar_detach_node(struct device_node *);
+/* Snooze Delay, pseries_idle */
+DECLARE_PER_CPU(long, smt_snooze_delay);
+
#endif /* _PSERIES_PSERIES_H */
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index c3408ca..9c6716a 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -586,9 +586,6 @@ static int __init pSeries_probe(void)
return 1;
}
-
-DECLARE_PER_CPU(long, smt_snooze_delay);
-
static void pseries_dedicated_idle_sleep(void)
{
unsigned int cpu = smp_processor_id();
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 26e93fd..bbc3c42 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -148,6 +148,7 @@ static void __devinit smp_xics_setup_cpu(int cpu)
set_cpu_current_state(cpu, CPU_STATE_ONLINE);
set_default_offline_state(cpu);
#endif
+ pseries_notify_cpuidle_add_cpu(cpu);
}
static int __devinit smp_pSeries_kick_cpu(int nr)
^ permalink raw reply related
* [RFC PATCH v3 3/4] cpuidle: (POWER) Enable cpuidle and directly call cpuidle_idle_call() for pSeries
From: Deepthi Dharwar @ 2011-11-30 12:46 UTC (permalink / raw)
To: benh, len.brown; +Cc: linuxppc-dev, linux-pm, linux-kernel, linux-pm
In-Reply-To: <20111130124608.972.87712.stgit@deepthi-ThinkPad-T420>
This patch enables cpuidle for pSeries and pSeries_idle is
directly called from the idle loop. As a result of pSeries_idle, cpuidle
driver registered with cpuidle subsystem comes into action. On
failure of loading of the driver or cpuidle framework default idle
is executed as part of the function. This patch
also removes the routines pseries_shared_idle_sleep and
pseries_dedicated_idle_sleep as they are now implemented as part of
pseries_idle cpuidle driver.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
---
arch/powerpc/platforms/Kconfig | 6 ++
arch/powerpc/platforms/pseries/setup.c | 101 +++++---------------------------
include/linux/cpuidle.h | 2 -
3 files changed, 23 insertions(+), 86 deletions(-)
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 3fe6d92..31e1ade 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -211,6 +211,12 @@ config PPC_PASEMI_CPUFREQ
endmenu
+menu "CPUIdle driver"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
config PPC601_SYNC_FIX
bool "Workarounds for PPC601 bugs"
depends on 6xx && (PPC_PREP || PPC_PMAC)
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 9c6716a..f19fc52 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -39,6 +39,7 @@
#include <linux/irq.h>
#include <linux/seq_file.h>
#include <linux/root_dev.h>
+#include <linux/cpuidle.h>
#include <asm/mmu.h>
#include <asm/processor.h>
@@ -74,9 +75,6 @@ EXPORT_SYMBOL(CMO_PageSize);
int fwnmi_active; /* TRUE if an FWNMI handler is present */
-static void pseries_shared_idle_sleep(void);
-static void pseries_dedicated_idle_sleep(void);
-
static struct device_node *pSeries_mpic_node;
static void pSeries_show_cpuinfo(struct seq_file *m)
@@ -352,6 +350,21 @@ static int alloc_dispatch_log_kmem_cache(void)
}
early_initcall(alloc_dispatch_log_kmem_cache);
+static void pSeries_idle(void)
+{
+ /* This would call on the cpuidle framework, and the back-end pseries
+ * driver to go to idle states
+ */
+ if (cpuidle_idle_call()) {
+ /* On error, execute default handler
+ * to go into low thread priority and possibly
+ * low power mode.
+ */
+ HMT_low();
+ HMT_very_low();
+ }
+}
+
static void __init pSeries_setup_arch(void)
{
/* Discover PIC type and setup ppc_md accordingly */
@@ -374,18 +387,9 @@ static void __init pSeries_setup_arch(void)
pSeries_nvram_init();
- /* Choose an idle loop */
if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
vpa_init(boot_cpuid);
- if (get_lppaca()->shared_proc) {
- printk(KERN_DEBUG "Using shared processor idle loop\n");
- ppc_md.power_save = pseries_shared_idle_sleep;
- } else {
- printk(KERN_DEBUG "Using dedicated idle loop\n");
- ppc_md.power_save = pseries_dedicated_idle_sleep;
- }
- } else {
- printk(KERN_DEBUG "Using default idle loop\n");
+ ppc_md.power_save = pSeries_idle;
}
if (firmware_has_feature(FW_FEATURE_LPAR))
@@ -586,77 +590,6 @@ static int __init pSeries_probe(void)
return 1;
}
-static void pseries_dedicated_idle_sleep(void)
-{
- unsigned int cpu = smp_processor_id();
- unsigned long start_snooze;
- unsigned long in_purr, out_purr;
- long snooze = __get_cpu_var(smt_snooze_delay);
-
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
- get_lppaca()->donate_dedicated_cpu = 1;
- in_purr = mfspr(SPRN_PURR);
-
- /*
- * We come in with interrupts disabled, and need_resched()
- * has been checked recently. If we should poll for a little
- * while, do so.
- */
- if (snooze) {
- start_snooze = get_tb() + snooze * tb_ticks_per_usec;
- local_irq_enable();
- set_thread_flag(TIF_POLLING_NRFLAG);
-
- while ((snooze < 0) || (get_tb() < start_snooze)) {
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- ppc64_runlatch_off();
- HMT_low();
- HMT_very_low();
- }
-
- HMT_medium();
- clear_thread_flag(TIF_POLLING_NRFLAG);
- smp_mb();
- local_irq_disable();
- if (need_resched() || cpu_is_offline(cpu))
- goto out;
- }
-
- cede_processor();
-
-out:
- HMT_medium();
- out_purr = mfspr(SPRN_PURR);
- get_lppaca()->wait_state_cycles += out_purr - in_purr;
- get_lppaca()->donate_dedicated_cpu = 0;
- get_lppaca()->idle = 0;
-}
-
-static void pseries_shared_idle_sleep(void)
-{
- /*
- * Indicate to the HV that we are idle. Now would be
- * a good time to find other work to dispatch.
- */
- get_lppaca()->idle = 1;
-
- /*
- * Yield the processor to the hypervisor. We return if
- * an external interrupt occurs (which are driven prior
- * to returning here) or if a prod occurs from another
- * processor. When returning here, external interrupts
- * are enabled.
- */
- cede_processor();
-
- get_lppaca()->idle = 0;
-}
-
static int pSeries_pci_probe_mode(struct pci_bus *bus)
{
if (firmware_has_feature(FW_FEATURE_LPAR))
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 7408af8..23f81de 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -130,7 +130,6 @@ struct cpuidle_driver {
#ifdef CONFIG_CPU_IDLE
extern void disable_cpuidle(void);
extern int cpuidle_idle_call(void);
-
extern int cpuidle_register_driver(struct cpuidle_driver *drv);
struct cpuidle_driver *cpuidle_get_driver(void);
extern void cpuidle_unregister_driver(struct cpuidle_driver *drv);
@@ -145,7 +144,6 @@ extern void cpuidle_disable_device(struct cpuidle_device *dev);
#else
static inline void disable_cpuidle(void) { }
static inline int cpuidle_idle_call(void) { return -ENODEV; }
-
static inline int cpuidle_register_driver(struct cpuidle_driver *drv)
{return -ENODEV; }
static inline struct cpuidle_driver *cpuidle_get_driver(void) {return NULL; }
^ permalink raw reply related
* Re: [PATCH 5/6] powerpc/boot: Add mfdcrx
From: Segher Boessenkool @ 2011-11-30 13:09 UTC (permalink / raw)
To: Tony Breeds; +Cc: LinuxPPC-dev
In-Reply-To: <1322630640-13708-6-git-send-email-tony@bakeyournoodle.com>
> +#define mfdcrx(rn) \
> + ({ \
> + unsigned long rval; \
> + asm volatile("mfdcrx %0,%1" : "=r"(rval) : "g"(rn)); \
> + rval; \
> + })
"g" is never correct on PowerPC, you want "r" here. You can write
this as a static inline btw, you only need the #define stuff when
there is an "i" constraint involved.
Segher
^ permalink raw reply
* Re: [PATCH 6/6] 44x/currituck: Add support for the new IBM currituck platform
From: Kumar Gala @ 2011-11-30 13:23 UTC (permalink / raw)
To: Tony Breeds; +Cc: LinuxPPC-dev
In-Reply-To: <1322630640-13708-7-git-send-email-tony@bakeyournoodle.com>
On Nov 29, 2011, at 11:24 PM, Tony Breeds wrote:
> Based on original work by David 'Shaggy' Kliekamp.
>=20
> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
> ---
> arch/powerpc/boot/Makefile | 5 +-
> arch/powerpc/boot/dts/currituck.dts | 240 =
++++++++++++++++++++++++++
> arch/powerpc/boot/treeboot-currituck.c | 129 ++++++++++++++
> arch/powerpc/boot/wrapper | 3 +
> arch/powerpc/configs/44x/currituck_defconfig | 110 ++++++++++++
> arch/powerpc/include/asm/reg.h | 1 +
> arch/powerpc/kernel/cputable.c | 14 ++
> arch/powerpc/kernel/head_44x.S | 2 +
> arch/powerpc/platforms/44x/Kconfig | 10 +
> arch/powerpc/platforms/44x/Makefile | 1 +
> arch/powerpc/platforms/44x/ppc47x.c | 198 =
+++++++++++++++++++++
> arch/powerpc/sysdev/ppc4xx_pci.c | 57 ++++++-
> arch/powerpc/sysdev/ppc4xx_pci.h | 7 +
> 13 files changed, 775 insertions(+), 2 deletions(-)
> create mode 100644 arch/powerpc/boot/dts/currituck.dts
> create mode 100644 arch/powerpc/boot/treeboot-currituck.c
> create mode 100644 arch/powerpc/configs/44x/currituck_defconfig
> create mode 100644 arch/powerpc/platforms/44x/ppc47x.c
Split the board support patches from the SoC support.
> diff --git a/arch/powerpc/include/asm/reg.h =
b/arch/powerpc/include/asm/reg.h
> index 559da19..aa38de6 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -951,6 +951,7 @@
> #define PVR_403GCX 0x00201400
> #define PVR_405GP 0x40110000
> #define PVR_476 0x11a52000
> +#define PVR_476CURRITUCK 0x7ff50000
This seems like it should be PVR_476FPE
> #define PVR_STB03XXX 0x40310000
> #define PVR_NP405H 0x41410000
> #define PVR_NP405L 0x41610000
> diff --git a/arch/powerpc/kernel/cputable.c =
b/arch/powerpc/kernel/cputable.c
> index edae5bb..02e0749 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -1830,6 +1830,20 @@ static struct cpu_spec __initdata cpu_specs[] =3D=
{
> .machine_check =3D machine_check_47x,
> .platform =3D "ppc470",
> },
> + { /* 476 core Currituck */
comment should probably be:
/* 476FPE */
> + .pvr_mask =3D 0xffff0000,
> + .pvr_value =3D 0x7ff50000,
> + .cpu_name =3D "476",
should probably be "476FPE"
> + .cpu_features =3D CPU_FTRS_47X | =
CPU_FTR_476_DD2,
> + .cpu_user_features =3D COMMON_USER_BOOKE |
> + PPC_FEATURE_HAS_FPU,
> + .mmu_features =3D MMU_FTR_TYPE_47x |
> + MMU_FTR_USE_TLBIVAX_BCAST | =
MMU_FTR_LOCK_BCAST_INVAL,
> + .icache_bsize =3D 32,
> + .dcache_bsize =3D 128,
> + .machine_check =3D machine_check_47x,
> + .platform =3D "ppc470",
> + },
> { /* 476 iss */
> .pvr_mask =3D 0xffff0000,
> .pvr_value =3D 0x00050000,
> diff --git a/arch/powerpc/kernel/head_44x.S =
b/arch/powerpc/kernel/head_44x.S
> index b725dab..3aca1e2 100644
> --- a/arch/powerpc/kernel/head_44x.S
> +++ b/arch/powerpc/kernel/head_44x.S
> @@ -732,6 +732,8 @@ _GLOBAL(init_cpu_state)
> /* We use the PVR to differenciate 44x cores from 476 */
> mfspr r3,SPRN_PVR
> srwi r3,r3,16
> + cmplwi cr0,r3,PVR_476CURRITUCK@h
> + beq head_start_47x
> cmplwi cr0,r3,PVR_476@h
> beq head_start_47x
> cmplwi cr0,r3,PVR_476_ISS@h
- k=
^ permalink raw reply
* Re: Kernel support for the Freescale P2020-MSC8156 AdvancedMC Reference Design
From: Kumar Gala @ 2011-11-30 13:24 UTC (permalink / raw)
To: Daniel Ng2; +Cc: linuxppc-dev
In-Reply-To: <32883132.post@talk.nabble.com>
On Nov 29, 2011, at 11:34 PM, Daniel Ng2 wrote:
>=20
> Hi,
>=20
> Does anyone know of any kernel support for the Freescale P2020-MSC8156 =
AMC
> board?-
>=20
> =
http://freescale.com.hk/webapp/sps/site/prod_summary.jsp?code=3DP2020-MSC8=
156AMCRD
>=20
> I am looking for platform-specific files ie. the ones that go in
> arch/powerpc/platforms/85xx and DTS files if possible...
>=20
> Cheers,
> Daniel
There isn't any support for this board in the open source kernel.
- k=
^ permalink raw reply
* [RFC PATCH v3 4/4] cpuidle: (POWER) Handle power_save=off
From: Deepthi Dharwar @ 2011-11-30 12:47 UTC (permalink / raw)
To: benh, len.brown; +Cc: linuxppc-dev, linux-pm, linux-kernel, linux-pm
In-Reply-To: <20111130124608.972.87712.stgit@deepthi-ThinkPad-T420>
This patch makes pseries_idle_driver not to be registered when
power_save=off kernel boot option is specified. The
cpuidle_disable variable used here is similar to
its usage on x86. If cpuidle_disable is set then
sysfs entries for cpuidle framework are not created
and the required drivers are not loaded.
Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com>
---
arch/powerpc/include/asm/processor.h | 1 +
arch/powerpc/platforms/pseries/processor_idle.c | 3 +++
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 811b7e7..b585bff 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -382,6 +382,7 @@ static inline unsigned long get_clean_sp(struct pt_regs *regs, int is_32)
}
#endif
+extern unsigned long cpuidle_disable;
enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
#endif /* __KERNEL__ */
diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
index 352b78a..4f59af0 100644
--- a/arch/powerpc/platforms/pseries/processor_idle.c
+++ b/arch/powerpc/platforms/pseries/processor_idle.c
@@ -269,6 +269,9 @@ static int pseries_idle_probe(void)
if (!firmware_has_feature(FW_FEATURE_SPLPAR))
return -ENODEV;
+ if (cpuidle_disable != IDLE_NO_OVERRIDE)
+ return -ENODEV;
+
if (max_idle_state == 0) {
printk(KERN_DEBUG "pseries processor idle disabled.\n");
return -EPERM;
^ permalink raw reply related
* Re: [PATCH v3 2/8] [booke] Rename mapping based RELOCATABLE to DYNAMIC_MEMSTART for BookE
From: Josh Boyer @ 2011-11-30 14:41 UTC (permalink / raw)
To: Scott Wood
Cc: Josh Poimboeuf, David Laight, Alan Modra, Suzuki K. Poulose,
linuxppc-dev
In-Reply-To: <4ED4126D.7000201@freescale.com>
On Mon, Nov 28, 2011 at 5:59 PM, Scott Wood <scottwood@freescale.com> wrote=
:
> On 11/23/2011 10:47 AM, Josh Boyer wrote:
>> On Mon, Nov 14, 2011 at 12:41 AM, Suzuki K. Poulose <suzuki@in.ibm.com> =
wrote:
>>> The current implementation of CONFIG_RELOCATABLE in BookE is based
>>> on mapping the page aligned kernel load address to KERNELBASE. This
>>> approach however is not enough for platforms, where the TLB page size
>>> is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used
>>> currently in BookE to DYNAMIC_MEMSTART to reflect the actual method.
>
> Should reword the config help to make it clear what the alignment
> restriction is, or where to find the information for a particular
> platform. =A0Someone reading "page aligned" without any context that we'r=
e
> talking about special large pages is going to think 4K -- and on e500,
> many large page sizes are supported, so the required alignment is found
> in Linux init code rather than a CPU manual.
>
>>>
>>> The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the
>>> dynamic relocations will be introduced in the later in the patch series=
.
>>>
>>> This change would allow the use of the old method of RELOCATABLE for
>>> platforms which can afford to enforce the page alignment (platforms wit=
h
>>> smaller TLB size).
>>
>> I'm OK with the general direction, but this touches a lot of non-4xx
>> code. =A0I'd prefer it if Ben took this directly on whatever final
>> solution is done.
>>
>>> I haven tested this change only on 440x. I don't have an FSL BookE to v=
erify
>>> the changes there.
>>>
>>> Scott,
>>> Could you please test this patch on FSL and let me know the results ?
>>
>> Scott, did you ever get around to testing this? =A0In my opinion, this
>> shouldn't go in without a Tested-by: from someone that tried it on an
>> FSL platform.
>
> Booted OK for me on e500v2 with RAM starting at 256M.
>
> Tested-by: Scott Wood <scottwood@freescale.com>
>
>> We add DYNAMIC_MEMSTART for 32-bit, and we have RELOCATABLE for
>> 64-bit. =A0Then throughout almost the rest of the patch, all we're doing
>> is duplicating what RELOCATABLE already did (e.g. if ! either thing).
>> It works, but it is kind of ugly.
>>
>> Instead, could we define a helper config variable that can be used in
>> place of that construct? =A0Something like:
>>
>> config NONSTATIC_KERNEL (or whatever)
>> =A0 =A0 bool
>> =A0 =A0 default n
>>
>> ...
>>
>> config DYNAMIC_MEMSTART
>> =A0 =A0 <blah>
>> =A0 =A0 select NONSTATIC_KERNEL
>>
>> ...
>>
>> config RELOCATABLE
>> =A0 =A0 <blah>
>> =A0 =A0 select NONSTATIC_KERNEL
>
> I agree.
Suzie do you think you could respin this patch with the suggested
changes and include Scott's Tested-by? The rest of the series looks
fine and I'd like to get it included in my next branch.
josh
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox