* [PATCH] fixes for rcu_offline_cpu, rcu_move_batch (2.6.8-rc2)
@ 2004-07-26 19:37 Nathan Lynch
2004-07-27 0:01 ` Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Nathan Lynch @ 2004-07-26 19:37 UTC (permalink / raw)
To: lkml; +Cc: Rusty Russell, Andrew Morton
Hi,
rcu_offline_cpu and rcu_move_batch have been broken since the
list_head's in struct rcu_head and struct rcu_data were replaced with
singly-linked lists:
CC kernel/rcupdate.o
kernel/rcupdate.c: In function `rcu_move_batch':
kernel/rcupdate.c:222: warning: passing arg 2 of `list_add_tail' from
incompatible pointer type
kernel/rcupdate.c: In function `rcu_offline_cpu':
kernel/rcupdate.c:239: warning: passing arg 1 of `rcu_move_batch' from
incompatible pointer type
kernel/rcupdate.c:240: warning: passing arg 1 of `rcu_move_batch' from
incompatible pointer type
kernel/rcupdate.c:236: warning: label `unlock' defined but not used
Kernel crashes when you try to offline a cpu, not surprisingly.
It also looks like rcu_move_batch isn't preempt-safe so I touched that
up, and got rid of an unused label in rcu_offline_cpu.
This fixes the crash for me; does it look ok?
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
diff -prauNX /home/nathanl/working/dontdiff 2.6.8-rc2/kernel/rcupdate.c 2.6.8-rc2-ntl/kernel/rcupdate.c
--- 2.6.8-rc2/kernel/rcupdate.c 2004-07-23 19:58:35.000000000 -0500
+++ 2.6.8-rc2-ntl/kernel/rcupdate.c 2004-07-26 13:41:15.000000000 -0500
@@ -210,16 +210,18 @@ static void rcu_check_quiescent_state(vo
* locking requirements, the list it's pulling from has to belong to a cpu
* which is dead and hence not processing interrupts.
*/
-static void rcu_move_batch(struct list_head *list)
+static void rcu_move_batch(struct rcu_head *list)
{
- struct list_head *entry;
- int cpu = smp_processor_id();
+ int cpu;
local_irq_disable();
- while (!list_empty(list)) {
- entry = list->next;
- list_del(entry);
- list_add_tail(entry, &RCU_nxtlist(cpu));
+
+ cpu = smp_processor_id();
+
+ while (list != NULL) {
+ *RCU_nxttail(cpu) = list;
+ RCU_nxttail(cpu) = &list->next;
+ list = list->next;
}
local_irq_enable();
}
@@ -233,11 +235,10 @@ static void rcu_offline_cpu(int cpu)
spin_lock_bh(&rcu_state.mutex);
if (rcu_ctrlblk.cur != rcu_ctrlblk.completed)
cpu_quiet(cpu);
-unlock:
spin_unlock_bh(&rcu_state.mutex);
- rcu_move_batch(&RCU_curlist(cpu));
- rcu_move_batch(&RCU_nxtlist(cpu));
+ rcu_move_batch(RCU_curlist(cpu));
+ rcu_move_batch(RCU_nxtlist(cpu));
tasklet_kill_immediate(&RCU_tasklet(cpu), cpu);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fixes for rcu_offline_cpu, rcu_move_batch (2.6.8-rc2)
2004-07-26 19:37 [PATCH] fixes for rcu_offline_cpu, rcu_move_batch (2.6.8-rc2) Nathan Lynch
@ 2004-07-27 0:01 ` Andrew Morton
2004-07-27 5:49 ` Zwane Mwaikambo
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-07-27 0:01 UTC (permalink / raw)
To: Nathan Lynch; +Cc: linux-kernel, rusty
Nathan Lynch <nathanl@austin.ibm.com> wrote:
>
> Hi,
>
> rcu_offline_cpu and rcu_move_batch have been broken since the
> list_head's in struct rcu_head and struct rcu_data were replaced with
> singly-linked lists:
>
> CC kernel/rcupdate.o
> kernel/rcupdate.c: In function `rcu_move_batch':
> kernel/rcupdate.c:222: warning: passing arg 2 of `list_add_tail' from
> incompatible pointer type
> kernel/rcupdate.c: In function `rcu_offline_cpu':
> kernel/rcupdate.c:239: warning: passing arg 1 of `rcu_move_batch' from
> incompatible pointer type
> kernel/rcupdate.c:240: warning: passing arg 1 of `rcu_move_batch' from
> incompatible pointer type
> kernel/rcupdate.c:236: warning: label `unlock' defined but not used
oop. We really should find some way to get more people to enable CPU
hotplug. We have a coverage problem.
> Kernel crashes when you try to offline a cpu, not surprisingly.
>
> It also looks like rcu_move_batch isn't preempt-safe so I touched that
> up, and got rid of an unused label in rcu_offline_cpu.
>
> This fixes the crash for me; does it look ok?
>
Looks good to me, thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fixes for rcu_offline_cpu, rcu_move_batch (2.6.8-rc2)
2004-07-27 0:01 ` Andrew Morton
@ 2004-07-27 5:49 ` Zwane Mwaikambo
2004-07-31 20:53 ` [PATCH][2.6-mm] i386 Hotplug CPU Zwane Mwaikambo
0 siblings, 1 reply; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-07-27 5:49 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nathan Lynch, Linux Kernel, Rusty Russell
On Mon, 26 Jul 2004, Andrew Morton wrote:
> oop. We really should find some way to get more people to enable CPU
> hotplug. We have a coverage problem.
You'll have to suck it in when i send the i386 implementation then ;) It
was on my queue including a bunch of fixes which needs testing before i send.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH][2.6-mm] i386 Hotplug CPU
2004-07-27 5:49 ` Zwane Mwaikambo
@ 2004-07-31 20:53 ` Zwane Mwaikambo
2004-07-31 21:01 ` Zwane Mwaikambo
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-07-31 20:53 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel, Rusty Russell, lhcs-devel
Hi Andrew,
Could you consider taking the i386 arch support bits for cpu
hotplug? The main purpose is allowing for easier testing at each release.
Pulled from Rusty's previous changelog, plus a few additions (2,11,12)
1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
arch/i386/Kconfig | 9 ++++
arch/i386/kernel/apic.c | 3 +
arch/i386/kernel/io_apic.c | 2 +
arch/i386/kernel/irq.c | 81 ++++++++++++++++++++++++++++++++++++------
arch/i386/kernel/msr.c | 2 -
arch/i386/kernel/process.c | 36 ++++++++++++++++++
arch/i386/kernel/smp.c | 25 ++++++++-----
arch/i386/kernel/smpboot.c | 86 ++++++++++++++++++++++++++++++++++++++++++---
arch/i386/kernel/traps.c | 8 ++++
include/asm-i386/cpu.h | 2 +
include/asm-i386/smp.h | 3 +
11 files changed, 229 insertions(+), 28 deletions(-)
Signed-off-by: Zwane Mwaikambo <zwane@fsmlabs.com>
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/Kconfig
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/Kconfig,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 Kconfig
--- linux-2.6.8-rc2-mm1-lch/arch/i386/Kconfig 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/Kconfig 30 Jul 2004 02:12:44 -0000
@@ -1168,6 +1168,15 @@ config SCx200
This support is also available as a module. If compiled as a
module, it will be called scx200.
+config HOTPLUG_CPU
+ bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
+ depends on SMP && HOTPLUG && EXPERIMENTAL
+ ---help---
+ Say Y here to experiment with turning CPUs off and on. CPUs
+ can be controlled through /sys/devices/system/cpu.
+
+ Say N.
+
source "drivers/pcmcia/Kconfig"
source "drivers/pci/hotplug/Kconfig"
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/apic.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/apic.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 apic.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/apic.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/apic.c 31 Jul 2004 18:16:45 -0000
@@ -26,6 +26,7 @@
#include <linux/mc146818rtc.h>
#include <linux/kernel_stat.h>
#include <linux/sysdev.h>
+#include <linux/cpu.h>
#include <asm/atomic.h>
#include <asm/smp.h>
@@ -1007,7 +1008,7 @@ void __init setup_secondary_APIC_clock(v
local_irq_enable();
}
-void __init disable_APIC_timer(void)
+void __devinit disable_APIC_timer(void)
{
if (using_apic_timer) {
unsigned long v;
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 io_apic.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/io_apic.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/io_apic.c 30 Jul 2004 02:12:44 -0000
@@ -574,9 +574,11 @@ static int balanced_irq(void *unused)
time_remaining = schedule_timeout(time_remaining);
if (time_after(jiffies,
prev_balance_time+balanced_irq_interval)) {
+ preempt_disable();
do_irq_balance();
prev_balance_time = jiffies;
time_remaining = balanced_irq_interval;
+ preempt_enable();
}
}
return 0;
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/irq.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/irq.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/irq.c 30 Jul 2004 02:12:44 -0000
@@ -34,6 +34,8 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kallsyms.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
#include <asm/atomic.h>
#include <asm/io.h>
@@ -149,9 +151,8 @@ int show_interrupts(struct seq_file *p,
if (i == 0) {
seq_printf(p, " ");
- for (j=0; j<NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "CPU%d ",j);
+ for_each_cpu(j)
+ seq_printf(p, "CPU%d ",j);
seq_putc(p, '\n');
}
@@ -164,9 +165,8 @@ int show_interrupts(struct seq_file *p,
#ifndef CONFIG_SMP
seq_printf(p, "%10u ", kstat_irqs(i));
#else
- for (j = 0; j < NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]);
+ for_each_cpu(j)
+ seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]);
#endif
seq_printf(p, " %14s", irq_desc[i].handler->typename);
seq_printf(p, " %s", action->name);
@@ -179,15 +179,13 @@ skip:
spin_unlock_irqrestore(&irq_desc[i].lock, flags);
} else if (i == NR_IRQS) {
seq_printf(p, "NMI: ");
- for (j = 0; j < NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "%10u ", nmi_count(j));
+ for_each_cpu(j)
+ seq_printf(p, "%10u ", nmi_count(j));
seq_putc(p, '\n');
#ifdef CONFIG_X86_LOCAL_APIC
seq_printf(p, "LOC: ");
- for (j = 0; j < NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "%10u ", irq_stat[j].apic_timer_irqs);
+ for_each_cpu(j)
+ seq_printf(p, "%10u ", irq_stat[j].apic_timer_irqs);
seq_putc(p, '\n');
#endif
seq_printf(p, "ERR: %10u\n", atomic_read(&irq_err_count));
@@ -408,6 +406,20 @@ void enable_irq(unsigned int irq)
spin_unlock_irqrestore(&desc->lock, flags);
}
+#ifdef CONFIG_HOTPLUG_CPU
+static int irqs_stabilizing;
+
+#define irq_is_cpu_offline() do { \
+ if (cpu_is_offline(smp_processor_id()) \
+ && (system_state == SYSTEM_RUNNING) \
+ && !irqs_stabilizing) \
+ printk(KERN_ERR "IRQ %i on offline %i\n", \
+ irq, smp_processor_id()); \
+} while (0)
+#else
+#define irq_is_cpu_offline() do {} while (0)
+#endif
+
/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
@@ -431,6 +443,7 @@ asmlinkage unsigned int do_IRQ(struct pt
unsigned int status;
irq_enter();
+ irq_is_cpu_offline();
#ifdef CONFIG_DEBUG_STACKOVERFLOW
/* Debugging check for stack overflow: is there less than 1KB free? */
@@ -1025,7 +1038,51 @@ static int irq_affinity_write_proc(struc
return full_count;
}
+#endif
+
+#ifdef CONFIG_HOTPLUG_CPU
+#include <mach_apic.h>
+void fixup_irqs(void)
+{
+ unsigned int irq;
+ static int warned;
+
+ for (irq = 0; irq < NR_IRQS; irq++) {
+ cpumask_t mask;
+ if (irq == 2)
+ continue;
+
+ cpus_and(mask, irq_affinity[irq], cpu_online_map);
+ if (any_online_cpu(mask) == NR_CPUS) {
+ printk("Breaking affinity for irq %i\n", irq);
+ mask = cpu_online_map;
+ }
+ if (irq_desc[irq].handler->set_affinity)
+ irq_desc[irq].handler->set_affinity(irq, mask);
+ else if (irq_desc[irq].action && !(warned++))
+ printk("Cannot set affinity for irq %i\n", irq);
+ }
+
+#if 0
+ irqs_stabilizing = 1;
+ barrier();
+ /* Ingo Molnar says: "after the IO-APIC masks have been redirected
+ [note the nop - the interrupt-enable boundary on x86 is two
+ instructions from sti] - to flush out pending hardirqs and
+ IPIs. After this point nothing is supposed to reach this CPU." */
+ __asm__ __volatile__("sti; nop; cli");
+ barrier();
+ irqs_stabilizing = 0;
+#else
+ /* That doesn't seem sufficient. Give it 0.02ms. */
+ irqs_stabilizing = 1;
+ local_irq_enable();
+ udelay(20);
+ local_irq_disable();
+ irqs_stabilizing = 0;
+#endif
+}
#endif
static int prof_cpu_mask_read_proc (char *page, char **start, off_t off,
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/msr.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/msr.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 msr.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/msr.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/msr.c 31 Jul 2004 19:09:41 -0000
@@ -260,7 +260,7 @@ static struct file_operations msr_fops =
.open = msr_open,
};
-static int msr_class_simple_device_add(int i)
+static int __devinit msr_class_simple_device_add(int i)
{
int err = 0;
struct class_device *class_err;
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/process.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/process.c 31 Jul 2004 19:51:06 -0000
@@ -13,6 +13,7 @@
#include <stdarg.h>
+#include <linux/cpu.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/fs.h>
@@ -54,6 +55,9 @@
#include <linux/irq.h>
#include <linux/err.h>
+#include <asm/tlbflush.h>
+#include <asm/cpu.h>
+
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
int hlt_counter;
@@ -132,6 +136,36 @@ static void poll_idle (void)
}
}
+#ifdef CONFIG_HOTPLUG_CPU
+#include <asm/nmi.h>
+/* We don't actually take CPU down, just spin without interrupts. */
+static inline void play_dead(void)
+{
+ disable_APIC_timer();
+
+ /* Ack it */
+ __get_cpu_var(cpu_state) = CPU_DEAD;
+
+ /* We shouldn't have to disable interrupts while dead, but
+ * some interrupts just don't seem to go away, and this makes
+ * it "work" for testing purposes. */
+ /* Death loop */
+ while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE)
+ cpu_relax();
+
+ local_irq_disable();
+ __flush_tlb_all();
+ cpu_set(smp_processor_id(), cpu_online_map);
+ enable_APIC_timer();
+ local_irq_enable();
+}
+#else
+static inline void play_dead(void)
+{
+ BUG();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
/*
* The idle thread. There's no useful work to be
* done, so just try to conserve power and have a
@@ -148,6 +182,8 @@ void cpu_idle (void)
if (!idle)
idle = default_idle;
+ if (cpu_is_offline(smp_processor_id()))
+ play_dead();
irq_stat[smp_processor_id()].idle_timestamp = jiffies;
idle();
}
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smp.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/smp.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smp.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smp.c 31 Jul 2004 20:02:59 -0000
@@ -19,6 +19,7 @@
#include <linux/mc146818rtc.h>
#include <linux/cache.h>
#include <linux/interrupt.h>
+#include <linux/cpu.h>
#include <asm/mtrr.h>
#include <asm/tlbflush.h>
@@ -163,7 +164,7 @@ void send_IPI_mask_bitmask(cpumask_t cpu
unsigned long flags;
local_irq_save(flags);
-
+ WARN_ON(mask & ~cpus_addr(cpu_online_map)[0]);
/*
* Wait for idle.
*/
@@ -345,21 +346,21 @@ out:
static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
unsigned long va)
{
- cpumask_t tmp;
/*
* A couple of (to be removed) sanity checks:
*
- * - we do not send IPIs to not-yet booted CPUs.
* - current CPU must not be in mask
* - mask must exist :)
*/
BUG_ON(cpus_empty(cpumask));
-
- cpus_and(tmp, cpumask, cpu_online_map);
- BUG_ON(!cpus_equal(cpumask, tmp));
BUG_ON(cpu_isset(smp_processor_id(), cpumask));
BUG_ON(!mm);
+ /* If a CPU which we ran on has gone down, OK. */
+ cpus_and(cpumask, cpumask, cpu_online_map);
+ if (cpus_empty(cpumask))
+ return;
+
/*
* i'm not happy about this global shared spinlock in the
* MM hot path, but we'll see how contended it is.
@@ -484,6 +485,7 @@ void smp_send_nmi_allbutself(void)
*/
void smp_send_reschedule(int cpu)
{
+ WARN_ON(cpu_is_offline(cpu));
send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
}
@@ -524,10 +526,16 @@ int smp_call_function (void (*func) (voi
*/
{
struct call_data_struct data;
- int cpus = num_online_cpus()-1;
+ int cpus;
- if (!cpus)
+ /* Holding any lock stops cpus from going down. */
+ spin_lock(&call_lock);
+ cpus = num_online_cpus()-1;
+
+ if (!cpus) {
+ spin_unlock(&call_lock);
return 0;
+ }
/* Can deadlock when called with interrupts disabled */
WARN_ON(irqs_disabled());
@@ -539,7 +547,6 @@ int smp_call_function (void (*func) (voi
if (wait)
atomic_set(&data.finished, 0);
- spin_lock(&call_lock);
call_data = &data;
mb();
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smpboot.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smpboot.c 31 Jul 2004 19:30:38 -0000
@@ -44,6 +44,9 @@
#include <linux/smp_lock.h>
#include <linux/irq.h>
#include <linux/bootmem.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
+#include <linux/percpu.h>
#include <linux/delay.h>
#include <linux/mc146818rtc.h>
@@ -88,6 +91,9 @@ extern unsigned char trampoline_end [];
static unsigned char *trampoline_base;
static int trampoline_exec;
+/* State of each CPU. */
+DEFINE_PER_CPU(int, cpu_state) = { 0 };
+
/*
* Currently trivial. Write the real->protected mode
* bootstrap into the page concerned. The caller
@@ -1139,6 +1145,9 @@ static void __init smp_boot_cpus(unsigne
who understands all this stuff should rewrite it properly. --RR 15/Jul/02 */
void __init smp_prepare_cpus(unsigned int max_cpus)
{
+ smp_commenced_mask = cpumask_of_cpu(0);
+ cpu_callin_map = cpumask_of_cpu(0);
+ mb();
smp_boot_cpus(max_cpus);
}
@@ -1148,20 +1157,87 @@ void __devinit smp_prepare_boot_cpu(void
cpu_set(smp_processor_id(), cpu_callout_map);
}
-int __devinit __cpu_up(unsigned int cpu)
+#ifdef CONFIG_HOTPLUG_CPU
+extern void fixup_irqs(void);
+
+/* must be called with the cpucontrol mutex held */
+static int __devinit cpu_enable(unsigned int cpu)
{
- /* This only works at boot for x86. See "rewrite" above. */
- if (cpu_isset(cpu, smp_commenced_mask)) {
- local_irq_enable();
- return -ENOSYS;
+ /* get the target out of its holding state */
+ per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+ wmb();
+
+ /* wait for the processor to ack it. timeout? */
+ while (!cpu_online(cpu))
+ cpu_relax();
+
+ fixup_irqs();
+ /* counter the disable in fixup_irqs() */
+ local_irq_enable();
+ return 0;
+}
+
+int __cpu_disable(void)
+{
+ /*
+ * Perhaps use cpufreq to drop frequency, but that could go
+ * into generic code.
+ *
+ * We won't take down the boot processor on i386 due to some
+ * interrupts only being able to be serviced by the BSP.
+ * Especially so if we're not using an IOAPIC -zwane
+ */
+ if (smp_processor_id() == 0)
+ return -EBUSY;
+
+ fixup_irqs();
+ return 0;
+}
+
+void __cpu_die(unsigned int cpu)
+{
+ /* We don't do anything here: idle task is faking death itself. */
+ unsigned int i;
+
+ for (i = 0; i < 10; i++) {
+ /* They ack this in play_dead by setting CPU_DEAD */
+ if (per_cpu(cpu_state, cpu) == CPU_DEAD)
+ return;
+ current->state = TASK_UNINTERRUPTIBLE;
+ schedule_timeout(HZ/10);
}
+ printk(KERN_ERR "CPU %u didn't die...\n", cpu);
+}
+#else /* ... !CONFIG_HOTPLUG_CPU */
+int __cpu_disable(void)
+{
+ return -ENOSYS;
+}
+
+void __cpu_die(unsigned int cpu)
+{
+ /* We said "no" in __cpu_disable */
+ BUG();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+int __devinit __cpu_up(unsigned int cpu)
+{
/* In case one didn't come up */
if (!cpu_isset(cpu, cpu_callin_map)) {
+ printk(KERN_DEBUG "skipping cpu%d, didn't come online\n", cpu);
local_irq_enable();
return -EIO;
}
+#ifdef CONFIG_HOTPLUG_CPU
+ /* Already up, and in cpu_quiescent now? */
+ if (cpu_isset(cpu, smp_commenced_mask)) {
+ cpu_enable(cpu);
+ return 0;
+ }
+#endif
+
local_irq_enable();
/* Unleash the CPU! */
cpu_set(cpu, smp_commenced_mask);
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/traps.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/traps.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 traps.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/traps.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/traps.c 31 Jul 2004 20:44:55 -0000
@@ -627,6 +627,14 @@ asmlinkage void do_nmi(struct pt_regs *
nmi_enter();
cpu = smp_processor_id();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (!cpu_online(cpu)) {
+ nmi_exit();
+ return;
+ }
+#endif
+
++nmi_count(cpu);
if (!nmi_callback(regs, cpu))
Index: linux-2.6.8-rc2-mm1-lch/include/asm-i386/cpu.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/include/asm-i386/cpu.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 cpu.h
--- linux-2.6.8-rc2-mm1-lch/include/asm-i386/cpu.h 28 Jul 2004 14:35:23 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/include/asm-i386/cpu.h 30 Jul 2004 02:13:42 -0000
@@ -5,6 +5,7 @@
#include <linux/cpu.h>
#include <linux/topology.h>
#include <linux/nodemask.h>
+#include <linux/percpu.h>
#include <asm/node.h>
@@ -26,4 +27,5 @@ static inline int arch_register_cpu(int
return register_cpu(&cpu_devices[num].cpu, num, parent);
}
+DECLARE_PER_CPU(int, cpu_state);
#endif /* _ASM_I386_CPU_H_ */
Index: linux-2.6.8-rc2-mm1-lch/include/asm-i386/smp.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/include/asm-i386/smp.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.h
--- linux-2.6.8-rc2-mm1-lch/include/asm-i386/smp.h 28 Jul 2004 14:35:23 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/include/asm-i386/smp.h 30 Jul 2004 02:12:44 -0000
@@ -84,6 +84,9 @@ static __inline int logical_smp_processo
}
#endif
+
+extern int __cpu_disable(void);
+extern void __cpu_die(unsigned int cpu);
#endif /* !__ASSEMBLY__ */
#define NO_PROC_ID 0xFF /* No processor magic marker */
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-07-31 20:53 ` [PATCH][2.6-mm] i386 Hotplug CPU Zwane Mwaikambo
@ 2004-07-31 21:01 ` Zwane Mwaikambo
2004-08-01 3:19 ` Zwane Mwaikambo
2004-08-11 13:50 ` Pavel Machek
2 siblings, 0 replies; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-07-31 21:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel, Rusty Russell, lhcs-devel
Oops, bad patch, i'll resend shortly.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-07-31 20:53 ` [PATCH][2.6-mm] i386 Hotplug CPU Zwane Mwaikambo
2004-07-31 21:01 ` Zwane Mwaikambo
@ 2004-08-01 3:19 ` Zwane Mwaikambo
2004-08-11 13:50 ` Pavel Machek
2 siblings, 0 replies; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-08-01 3:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux Kernel, Rusty Russell, lhcs-devel
On Sat, 31 Jul 2004, Zwane Mwaikambo wrote:
> Hi Andrew,
> Could you consider taking the i386 arch support bits for cpu
> hotplug? The main purpose is allowing for easier testing at each release.
>
> Pulled from Rusty's previous changelog, plus a few additions (2,11,12)
>
> 1) Add CONFIG_HOTPLUG_CPU
> 2) disable local APIC timer on dead cpus.
> 3) Disable preempt around irq balancing to prevent CPUs going down.
> 4) Print irq stats for all possible cpus.
> 5) Debugging check for interrupts on offline cpus.
> 6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
> 7) play_dead() for offline cpus to spin inside.
> 8) Handle offline cpus set in flush_tlb_others().
> 9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
> 10) Implement __cpu_disable() and __cpu_die().
> 11) Enable local interrupts in cpu_enable() after fixup_irqs()
> 12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
There was a race between the cpu being cleared from cpu_online_map
and local APIC timer interrupts sneaking in, so i've moved the APIC timer
interrupt disable earlier. Survived a couple of hours of stress testing.
arch/i386/Kconfig | 9 ++++
arch/i386/kernel/apic.c | 3 +
arch/i386/kernel/io_apic.c | 2 +
arch/i386/kernel/irq.c | 81 ++++++++++++++++++++++++++++++++++------
arch/i386/kernel/msr.c | 2 -
arch/i386/kernel/process.c | 34 +++++++++++++++++
arch/i386/kernel/smp.c | 25 ++++++++----
arch/i386/kernel/smpboot.c | 89 ++++++++++++++++++++++++++++++++++++++++++---
arch/i386/kernel/traps.c | 8 ++++
include/asm-i386/cpu.h | 2 +
include/asm-i386/smp.h | 3 +
11 files changed, 230 insertions(+), 28 deletions(-)
Index: linux-2.6.8-rc2-mm1-lch/include/asm-i386/cpu.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/include/asm-i386/cpu.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 cpu.h
--- linux-2.6.8-rc2-mm1-lch/include/asm-i386/cpu.h 28 Jul 2004 14:35:23 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/include/asm-i386/cpu.h 30 Jul 2004 02:13:42 -0000
@@ -5,6 +5,7 @@
#include <linux/cpu.h>
#include <linux/topology.h>
#include <linux/nodemask.h>
+#include <linux/percpu.h>
#include <asm/node.h>
@@ -26,4 +27,5 @@ static inline int arch_register_cpu(int
return register_cpu(&cpu_devices[num].cpu, num, parent);
}
+DECLARE_PER_CPU(int, cpu_state);
#endif /* _ASM_I386_CPU_H_ */
Index: linux-2.6.8-rc2-mm1-lch/include/asm-i386/smp.h
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/include/asm-i386/smp.h,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.h
--- linux-2.6.8-rc2-mm1-lch/include/asm-i386/smp.h 28 Jul 2004 14:35:23 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/include/asm-i386/smp.h 30 Jul 2004 02:12:44 -0000
@@ -84,6 +84,9 @@ static __inline int logical_smp_processo
}
#endif
+
+extern int __cpu_disable(void);
+extern void __cpu_die(unsigned int cpu);
#endif /* !__ASSEMBLY__ */
#define NO_PROC_ID 0xFF /* No processor magic marker */
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/Kconfig
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/Kconfig,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 Kconfig
--- linux-2.6.8-rc2-mm1-lch/arch/i386/Kconfig 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/Kconfig 30 Jul 2004 02:12:44 -0000
@@ -1168,6 +1168,15 @@ config SCx200
This support is also available as a module. If compiled as a
module, it will be called scx200.
+config HOTPLUG_CPU
+ bool "Support for hot-pluggable CPUs (EXPERIMENTAL)"
+ depends on SMP && HOTPLUG && EXPERIMENTAL
+ ---help---
+ Say Y here to experiment with turning CPUs off and on. CPUs
+ can be controlled through /sys/devices/system/cpu.
+
+ Say N.
+
source "drivers/pcmcia/Kconfig"
source "drivers/pci/hotplug/Kconfig"
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/apic.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/apic.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 apic.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/apic.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/apic.c 31 Jul 2004 18:16:45 -0000
@@ -26,6 +26,7 @@
#include <linux/mc146818rtc.h>
#include <linux/kernel_stat.h>
#include <linux/sysdev.h>
+#include <linux/cpu.h>
#include <asm/atomic.h>
#include <asm/smp.h>
@@ -1007,7 +1008,7 @@ void __init setup_secondary_APIC_clock(v
local_irq_enable();
}
-void __init disable_APIC_timer(void)
+void __devinit disable_APIC_timer(void)
{
if (using_apic_timer) {
unsigned long v;
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 io_apic.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/io_apic.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/io_apic.c 30 Jul 2004 02:12:44 -0000
@@ -574,9 +574,11 @@ static int balanced_irq(void *unused)
time_remaining = schedule_timeout(time_remaining);
if (time_after(jiffies,
prev_balance_time+balanced_irq_interval)) {
+ preempt_disable();
do_irq_balance();
prev_balance_time = jiffies;
time_remaining = balanced_irq_interval;
+ preempt_enable();
}
}
return 0;
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/irq.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 irq.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/irq.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/irq.c 30 Jul 2004 02:12:44 -0000
@@ -34,6 +34,8 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kallsyms.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
#include <asm/atomic.h>
#include <asm/io.h>
@@ -149,9 +151,8 @@ int show_interrupts(struct seq_file *p,
if (i == 0) {
seq_printf(p, " ");
- for (j=0; j<NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "CPU%d ",j);
+ for_each_cpu(j)
+ seq_printf(p, "CPU%d ",j);
seq_putc(p, '\n');
}
@@ -164,9 +165,8 @@ int show_interrupts(struct seq_file *p,
#ifndef CONFIG_SMP
seq_printf(p, "%10u ", kstat_irqs(i));
#else
- for (j = 0; j < NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]);
+ for_each_cpu(j)
+ seq_printf(p, "%10u ", kstat_cpu(j).irqs[i]);
#endif
seq_printf(p, " %14s", irq_desc[i].handler->typename);
seq_printf(p, " %s", action->name);
@@ -179,15 +179,13 @@ skip:
spin_unlock_irqrestore(&irq_desc[i].lock, flags);
} else if (i == NR_IRQS) {
seq_printf(p, "NMI: ");
- for (j = 0; j < NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "%10u ", nmi_count(j));
+ for_each_cpu(j)
+ seq_printf(p, "%10u ", nmi_count(j));
seq_putc(p, '\n');
#ifdef CONFIG_X86_LOCAL_APIC
seq_printf(p, "LOC: ");
- for (j = 0; j < NR_CPUS; j++)
- if (cpu_online(j))
- seq_printf(p, "%10u ", irq_stat[j].apic_timer_irqs);
+ for_each_cpu(j)
+ seq_printf(p, "%10u ", irq_stat[j].apic_timer_irqs);
seq_putc(p, '\n');
#endif
seq_printf(p, "ERR: %10u\n", atomic_read(&irq_err_count));
@@ -408,6 +406,20 @@ void enable_irq(unsigned int irq)
spin_unlock_irqrestore(&desc->lock, flags);
}
+#ifdef CONFIG_HOTPLUG_CPU
+static int irqs_stabilizing;
+
+#define irq_is_cpu_offline() do { \
+ if (cpu_is_offline(smp_processor_id()) \
+ && (system_state == SYSTEM_RUNNING) \
+ && !irqs_stabilizing) \
+ printk(KERN_ERR "IRQ %i on offline %i\n", \
+ irq, smp_processor_id()); \
+} while (0)
+#else
+#define irq_is_cpu_offline() do {} while (0)
+#endif
+
/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
@@ -431,6 +443,7 @@ asmlinkage unsigned int do_IRQ(struct pt
unsigned int status;
irq_enter();
+ irq_is_cpu_offline();
#ifdef CONFIG_DEBUG_STACKOVERFLOW
/* Debugging check for stack overflow: is there less than 1KB free? */
@@ -1025,7 +1038,51 @@ static int irq_affinity_write_proc(struc
return full_count;
}
+#endif
+
+#ifdef CONFIG_HOTPLUG_CPU
+#include <mach_apic.h>
+void fixup_irqs(void)
+{
+ unsigned int irq;
+ static int warned;
+
+ for (irq = 0; irq < NR_IRQS; irq++) {
+ cpumask_t mask;
+ if (irq == 2)
+ continue;
+
+ cpus_and(mask, irq_affinity[irq], cpu_online_map);
+ if (any_online_cpu(mask) == NR_CPUS) {
+ printk("Breaking affinity for irq %i\n", irq);
+ mask = cpu_online_map;
+ }
+ if (irq_desc[irq].handler->set_affinity)
+ irq_desc[irq].handler->set_affinity(irq, mask);
+ else if (irq_desc[irq].action && !(warned++))
+ printk("Cannot set affinity for irq %i\n", irq);
+ }
+
+#if 0
+ irqs_stabilizing = 1;
+ barrier();
+ /* Ingo Molnar says: "after the IO-APIC masks have been redirected
+ [note the nop - the interrupt-enable boundary on x86 is two
+ instructions from sti] - to flush out pending hardirqs and
+ IPIs. After this point nothing is supposed to reach this CPU." */
+ __asm__ __volatile__("sti; nop; cli");
+ barrier();
+ irqs_stabilizing = 0;
+#else
+ /* That doesn't seem sufficient. Give it 0.02ms. */
+ irqs_stabilizing = 1;
+ local_irq_enable();
+ udelay(20);
+ local_irq_disable();
+ irqs_stabilizing = 0;
+#endif
+}
#endif
static int prof_cpu_mask_read_proc (char *page, char **start, off_t off,
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/msr.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/msr.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 msr.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/msr.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/msr.c 31 Jul 2004 19:09:41 -0000
@@ -260,7 +260,7 @@ static struct file_operations msr_fops =
.open = msr_open,
};
-static int msr_class_simple_device_add(int i)
+static int __devinit msr_class_simple_device_add(int i)
{
int err = 0;
struct class_device *class_err;
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/process.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/process.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 process.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/process.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/process.c 1 Aug 2004 01:43:07 -0000
@@ -13,6 +13,7 @@
#include <stdarg.h>
+#include <linux/cpu.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/fs.h>
@@ -54,6 +55,9 @@
#include <linux/irq.h>
#include <linux/err.h>
+#include <asm/tlbflush.h>
+#include <asm/cpu.h>
+
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
int hlt_counter;
@@ -132,6 +136,34 @@ static void poll_idle (void)
}
}
+#ifdef CONFIG_HOTPLUG_CPU
+#include <asm/nmi.h>
+/* We don't actually take CPU down, just spin without interrupts. */
+static inline void play_dead(void)
+{
+ /* Ack it */
+ __get_cpu_var(cpu_state) = CPU_DEAD;
+
+ /* We shouldn't have to disable interrupts while dead, but
+ * some interrupts just don't seem to go away, and this makes
+ * it "work" for testing purposes. */
+ /* Death loop */
+ while (__get_cpu_var(cpu_state) != CPU_UP_PREPARE)
+ cpu_relax();
+
+ local_irq_disable();
+ __flush_tlb_all();
+ cpu_set(smp_processor_id(), cpu_online_map);
+ enable_APIC_timer();
+ local_irq_enable();
+}
+#else
+static inline void play_dead(void)
+{
+ BUG();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
/*
* The idle thread. There's no useful work to be
* done, so just try to conserve power and have a
@@ -148,6 +180,8 @@ void cpu_idle (void)
if (!idle)
idle = default_idle;
+ if (cpu_is_offline(smp_processor_id()))
+ play_dead();
irq_stat[smp_processor_id()].idle_timestamp = jiffies;
idle();
}
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smp.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/smp.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smp.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smp.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smp.c 31 Jul 2004 20:02:59 -0000
@@ -19,6 +19,7 @@
#include <linux/mc146818rtc.h>
#include <linux/cache.h>
#include <linux/interrupt.h>
+#include <linux/cpu.h>
#include <asm/mtrr.h>
#include <asm/tlbflush.h>
@@ -163,7 +164,7 @@ void send_IPI_mask_bitmask(cpumask_t cpu
unsigned long flags;
local_irq_save(flags);
-
+ WARN_ON(mask & ~cpus_addr(cpu_online_map)[0]);
/*
* Wait for idle.
*/
@@ -345,21 +346,21 @@ out:
static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
unsigned long va)
{
- cpumask_t tmp;
/*
* A couple of (to be removed) sanity checks:
*
- * - we do not send IPIs to not-yet booted CPUs.
* - current CPU must not be in mask
* - mask must exist :)
*/
BUG_ON(cpus_empty(cpumask));
-
- cpus_and(tmp, cpumask, cpu_online_map);
- BUG_ON(!cpus_equal(cpumask, tmp));
BUG_ON(cpu_isset(smp_processor_id(), cpumask));
BUG_ON(!mm);
+ /* If a CPU which we ran on has gone down, OK. */
+ cpus_and(cpumask, cpumask, cpu_online_map);
+ if (cpus_empty(cpumask))
+ return;
+
/*
* i'm not happy about this global shared spinlock in the
* MM hot path, but we'll see how contended it is.
@@ -484,6 +485,7 @@ void smp_send_nmi_allbutself(void)
*/
void smp_send_reschedule(int cpu)
{
+ WARN_ON(cpu_is_offline(cpu));
send_IPI_mask(cpumask_of_cpu(cpu), RESCHEDULE_VECTOR);
}
@@ -524,10 +526,16 @@ int smp_call_function (void (*func) (voi
*/
{
struct call_data_struct data;
- int cpus = num_online_cpus()-1;
+ int cpus;
- if (!cpus)
+ /* Holding any lock stops cpus from going down. */
+ spin_lock(&call_lock);
+ cpus = num_online_cpus()-1;
+
+ if (!cpus) {
+ spin_unlock(&call_lock);
return 0;
+ }
/* Can deadlock when called with interrupts disabled */
WARN_ON(irqs_disabled());
@@ -539,7 +547,6 @@ int smp_call_function (void (*func) (voi
if (wait)
atomic_set(&data.finished, 0);
- spin_lock(&call_lock);
call_data = &data;
mb();
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 smpboot.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smpboot.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/smpboot.c 1 Aug 2004 01:45:38 -0000
@@ -44,6 +44,9 @@
#include <linux/smp_lock.h>
#include <linux/irq.h>
#include <linux/bootmem.h>
+#include <linux/notifier.h>
+#include <linux/cpu.h>
+#include <linux/percpu.h>
#include <linux/delay.h>
#include <linux/mc146818rtc.h>
@@ -88,6 +91,9 @@ extern unsigned char trampoline_end [];
static unsigned char *trampoline_base;
static int trampoline_exec;
+/* State of each CPU. */
+DEFINE_PER_CPU(int, cpu_state) = { 0 };
+
/*
* Currently trivial. Write the real->protected mode
* bootstrap into the page concerned. The caller
@@ -1139,6 +1145,9 @@ static void __init smp_boot_cpus(unsigne
who understands all this stuff should rewrite it properly. --RR 15/Jul/02 */
void __init smp_prepare_cpus(unsigned int max_cpus)
{
+ smp_commenced_mask = cpumask_of_cpu(0);
+ cpu_callin_map = cpumask_of_cpu(0);
+ mb();
smp_boot_cpus(max_cpus);
}
@@ -1148,20 +1157,90 @@ void __devinit smp_prepare_boot_cpu(void
cpu_set(smp_processor_id(), cpu_callout_map);
}
-int __devinit __cpu_up(unsigned int cpu)
+#ifdef CONFIG_HOTPLUG_CPU
+extern void fixup_irqs(void);
+
+/* must be called with the cpucontrol mutex held */
+static int __devinit cpu_enable(unsigned int cpu)
{
- /* This only works at boot for x86. See "rewrite" above. */
- if (cpu_isset(cpu, smp_commenced_mask)) {
- local_irq_enable();
- return -ENOSYS;
+ /* get the target out of its holding state */
+ per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
+ wmb();
+
+ /* wait for the processor to ack it. timeout? */
+ while (!cpu_online(cpu))
+ cpu_relax();
+
+ fixup_irqs();
+ /* counter the disable in fixup_irqs() */
+ local_irq_enable();
+ return 0;
+}
+
+int __cpu_disable(void)
+{
+ /*
+ * Perhaps use cpufreq to drop frequency, but that could go
+ * into generic code.
+ *
+ * We won't take down the boot processor on i386 due to some
+ * interrupts only being able to be serviced by the BSP.
+ * Especially so if we're not using an IOAPIC -zwane
+ */
+ if (smp_processor_id() == 0)
+ return -EBUSY;
+
+ fixup_irqs();
+
+ /* We enable the timer again on the exit path of the death loop */
+ disable_APIC_timer();
+ return 0;
+}
+
+void __cpu_die(unsigned int cpu)
+{
+ /* We don't do anything here: idle task is faking death itself. */
+ unsigned int i;
+
+ for (i = 0; i < 10; i++) {
+ /* They ack this in play_dead by setting CPU_DEAD */
+ if (per_cpu(cpu_state, cpu) == CPU_DEAD)
+ return;
+ current->state = TASK_UNINTERRUPTIBLE;
+ schedule_timeout(HZ/10);
}
+ printk(KERN_ERR "CPU %u didn't die...\n", cpu);
+}
+#else /* ... !CONFIG_HOTPLUG_CPU */
+int __cpu_disable(void)
+{
+ return -ENOSYS;
+}
+void __cpu_die(unsigned int cpu)
+{
+ /* We said "no" in __cpu_disable */
+ BUG();
+}
+#endif /* CONFIG_HOTPLUG_CPU */
+
+int __devinit __cpu_up(unsigned int cpu)
+{
/* In case one didn't come up */
if (!cpu_isset(cpu, cpu_callin_map)) {
+ printk(KERN_DEBUG "skipping cpu%d, didn't come online\n", cpu);
local_irq_enable();
return -EIO;
}
+#ifdef CONFIG_HOTPLUG_CPU
+ /* Already up, and in cpu_quiescent now? */
+ if (cpu_isset(cpu, smp_commenced_mask)) {
+ cpu_enable(cpu);
+ return 0;
+ }
+#endif
+
local_irq_enable();
/* Unleash the CPU! */
cpu_set(cpu, smp_commenced_mask);
Index: linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/traps.c
===================================================================
RCS file: /home/cvsroot/linux-2.6.8-rc2-mm1/arch/i386/kernel/traps.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 traps.c
--- linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/traps.c 28 Jul 2004 14:35:21 -0000 1.1.1.1
+++ linux-2.6.8-rc2-mm1-lch/arch/i386/kernel/traps.c 31 Jul 2004 20:44:55 -0000
@@ -627,6 +627,14 @@ asmlinkage void do_nmi(struct pt_regs *
nmi_enter();
cpu = smp_processor_id();
+
+#ifdef CONFIG_HOTPLUG_CPU
+ if (!cpu_online(cpu)) {
+ nmi_exit();
+ return;
+ }
+#endif
+
++nmi_count(cpu);
if (!nmi_callback(regs, cpu))
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-07-31 20:53 ` [PATCH][2.6-mm] i386 Hotplug CPU Zwane Mwaikambo
2004-07-31 21:01 ` Zwane Mwaikambo
2004-08-01 3:19 ` Zwane Mwaikambo
@ 2004-08-11 13:50 ` Pavel Machek
2004-08-12 0:44 ` [lhcs-devel] " Zwane Mwaikambo
2 siblings, 1 reply; 11+ messages in thread
From: Pavel Machek @ 2004-08-11 13:50 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Andrew Morton, Linux Kernel, Rusty Russell, lhcs-devel
Hi!
> +#ifdef CONFIG_HOTPLUG_CPU
> +#include <asm/nmi.h>
> +/* We don't actually take CPU down, just spin without interrupts. */
> +static inline void play_dead(void)
> +{
Well... if this can be fixed to really take cpu down, it will
be immediately usefull for suspend-to-ram. If it is made to
at least survive registers being overwritten, it will be usefull
for suspend-to-disk...
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [lhcs-devel] Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-08-11 13:50 ` Pavel Machek
@ 2004-08-12 0:44 ` Zwane Mwaikambo
2004-08-15 3:19 ` Zwane Mwaikambo
0 siblings, 1 reply; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-08-12 0:44 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andrew Morton, Linux Kernel, Rusty Russell, lhcs-devel
Hi Pavel,
On Wed, 11 Aug 2004, Pavel Machek wrote:
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +#include <asm/nmi.h>
> > +/* We don't actually take CPU down, just spin without interrupts. */
> > +static inline void play_dead(void)
> > +{
>
> Well... if this can be fixed to really take cpu down, it will
> be immediately usefull for suspend-to-ram. If it is made to
> at least survive registers being overwritten, it will be usefull
> for suspend-to-disk...
Yeah i recall you mentioning this earlier, i'll look into adding the
necessary bits so that you have enough state to resume from. Your
mentioning this was one of the reasons i wanted this in.
Thanks,
Zwane
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [lhcs-devel] Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-08-12 0:44 ` [lhcs-devel] " Zwane Mwaikambo
@ 2004-08-15 3:19 ` Zwane Mwaikambo
2004-08-15 14:46 ` Pavel Machek
0 siblings, 1 reply; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-08-15 3:19 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andrew Morton, Linux Kernel, Rusty Russell, lhcs-devel
On Wed, 11 Aug 2004, Zwane Mwaikambo wrote:
> Yeah i recall you mentioning this earlier, i'll look into adding the
> necessary bits so that you have enough state to resume from. Your
> mentioning this was one of the reasons i wanted this in.
Pavel, considering that the processor is in a quiescent state when it's in
the idle thread, can't we simply restart them all when we do the final
sleep? So on the resume, we steer the APs straight into the offline cpu
spin and manually bring them up again when the BSP has resumed? I reckon
we don't have to save any state at all. I probably don't have the full
picture yet so feel free to set me straight.
Thanks,
Zwane
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [lhcs-devel] Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-08-15 3:19 ` Zwane Mwaikambo
@ 2004-08-15 14:46 ` Pavel Machek
2004-08-15 15:03 ` Zwane Mwaikambo
0 siblings, 1 reply; 11+ messages in thread
From: Pavel Machek @ 2004-08-15 14:46 UTC (permalink / raw)
To: Zwane Mwaikambo; +Cc: Andrew Morton, Linux Kernel, Rusty Russell, lhcs-devel
Hi!
> > Yeah i recall you mentioning this earlier, i'll look into adding the
> > necessary bits so that you have enough state to resume from. Your
> > mentioning this was one of the reasons i wanted this in.
>
> Pavel, considering that the processor is in a quiescent state when it's in
> the idle thread, can't we simply restart them all when we do the final
> sleep? So on the resume, we steer the APs straight into the offline cpu
> spin and manually bring them up again when the BSP has resumed? I
> reckon
Sorry, I do not understand what AP and BSP means in this context.
> we don't have to save any state at all. I probably don't have the full
> picture yet so feel free to set me straight.
Yes, we can just shut those cpus down on suspend and completely boot
them from real mode during resume... that should work. And we will
need to do that during suspend-to-ram.
Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [lhcs-devel] Re: [PATCH][2.6-mm] i386 Hotplug CPU
2004-08-15 14:46 ` Pavel Machek
@ 2004-08-15 15:03 ` Zwane Mwaikambo
0 siblings, 0 replies; 11+ messages in thread
From: Zwane Mwaikambo @ 2004-08-15 15:03 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andrew Morton, Linux Kernel, Rusty Russell, lhcs-devel
On Sun, 15 Aug 2004, Pavel Machek wrote:
> > > Yeah i recall you mentioning this earlier, i'll look into adding the
> > > necessary bits so that you have enough state to resume from. Your
> > > mentioning this was one of the reasons i wanted this in.
> >
> > Pavel, considering that the processor is in a quiescent state when it's in
> > the idle thread, can't we simply restart them all when we do the final
> > sleep? So on the resume, we steer the APs straight into the offline cpu
> > spin and manually bring them up again when the BSP has resumed? I
> > reckon
>
> Sorry, I do not understand what AP and BSP means in this context.
My mistake, Application and Bootstrap Processors.
> Yes, we can just shut those cpus down on suspend and completely boot
> them from real mode during resume... that should work. And we will
> need to do that during suspend-to-ram.
Thanks i just wanted to run that by you first.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-08-15 14:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-26 19:37 [PATCH] fixes for rcu_offline_cpu, rcu_move_batch (2.6.8-rc2) Nathan Lynch
2004-07-27 0:01 ` Andrew Morton
2004-07-27 5:49 ` Zwane Mwaikambo
2004-07-31 20:53 ` [PATCH][2.6-mm] i386 Hotplug CPU Zwane Mwaikambo
2004-07-31 21:01 ` Zwane Mwaikambo
2004-08-01 3:19 ` Zwane Mwaikambo
2004-08-11 13:50 ` Pavel Machek
2004-08-12 0:44 ` [lhcs-devel] " Zwane Mwaikambo
2004-08-15 3:19 ` Zwane Mwaikambo
2004-08-15 14:46 ` Pavel Machek
2004-08-15 15:03 ` Zwane Mwaikambo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.