[PATCH 0/3] x86: fix hang when AP bringup is too slow

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] x86: fix hang when AP bringup is too slow
@ 2014-03-13 14:25 Igor Mammedov
  2014-03-13 14:25 ` [PATCH 1/3] x86: replace timeouts when booting secondary CPU with infinite wait loop Igor Mammedov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Igor Mammedov @ 2014-03-13 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, imammedo, bp, paul.gortmaker, JBeulich, prarit,
	drjones, toshi.kani, x86, riel, gong.chen

Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It happens more
often if host is over-committed).

Hang happens because master CPU timeouts on waiting till
AP boots and 'cancels' CPU online operation assuming AP
is not functional but AP may continue run wild later
causing various hangs or panics in running kernel that
is assuming that AP was offline.

This is an alternative approach, that instead of canceling
in-progress AP bringup (https://lkml.org/lkml/2014/3/6/257),
removes timeouts so that AP bringup won't be affected by
poor timing and syncs AP with master CPU at early startup
making sure that AP won't run wild if master CPU doesn't
expect AP to come online.

Below is detailed description of a more often happening hang:
-----
Master CPU may timeout before cpu_callin_mask is set and cancel
booting CPU, but being onlined CPU still continues to boot, sets
cpu_active_mask (CPU_STARTING notifiers) and spins in
check_tsc_sync_target() for master cpu to arrive. Following attempt
to online another cpu hangs in stop_machine, initiated from here:
smp_callin ->
  smp_store_cpu_info ->
    identify_secondary_cpu ->
      mtrr_ap_init -> set_mtrr_from_inactive_cpu

stop_machine waits on completion of stop_work on all CPUs from
cpu_active_mask including a failed CPU that spins in check_tsc_sync_target().

Igor Mammedov (3):
  x86: replace timeouts when booting secondary CPU with infinite wait
    loop
  x86: halt secondary CPU if master doesn't wait on it
  x86: cleanup not needed cpu_initialized_mask

 arch/x86/include/asm/cpumask.h |    1 -
 arch/x86/kernel/cpu/common.c   |   35 ++++++++++++-------
 arch/x86/kernel/smpboot.c      |   70 ++--------------------------------------
 3 files changed, 25 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] x86: replace timeouts when booting secondary CPU with infinite wait loop
  2014-03-13 14:25 [PATCH 0/3] x86: fix hang when AP bringup is too slow Igor Mammedov
@ 2014-03-13 14:25 ` Igor Mammedov
  2014-03-13 14:25 ` [PATCH 2/3] x86: halt secondary CPU if master doesn't wait on it Igor Mammedov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2014-03-13 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, imammedo, bp, paul.gortmaker, JBeulich, prarit,
	drjones, toshi.kani, x86, riel, gong.chen

Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here: https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
make it wait indefinitely till AP boots.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/cpu/common.c |   18 ++++++++++-
 arch/x86/kernel/smpboot.c    |   68 ++----------------------------------------
 2 files changed, 19 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8e28bf2..e048e87 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1235,16 +1235,22 @@ void cpu_init(void)
 	struct task_struct *me;
 	struct tss_struct *t;
 	unsigned long v;
-	int cpu;
+	int cpu = stack_smp_processor_id();
 	int i;
 
 	/*
+	 * wait till the master CPU completes it's STARTUP sequence,
+	 * and decides to wait till this AP boots
+	 */
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+		cpu_relax();
+
+	/*
 	 * Load microcode on this cpu if a valid microcode is available.
 	 * This is early microcode loading procedure.
 	 */
 	load_ucode_ap();
 
-	cpu = stack_smp_processor_id();
 	t = &per_cpu(init_tss, cpu);
 	oist = &per_cpu(orig_ist, cpu);
 
@@ -1335,6 +1341,14 @@ void cpu_init(void)
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &curr->thread;
 
+	/*
+	 * wait till the master CPU completes it's STARTUP sequence,
+	 * and decides to wait till this AP boots
+	 */
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+		cpu_relax();
+
+
 	show_ucode_info_early();
 
 	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 37e11e5..8bf67bd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -111,7 +111,6 @@ atomic_t init_deasserted;
 static void smp_callin(void)
 {
 	int cpuid, phys_id;
-	unsigned long timeout;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -129,37 +128,6 @@ static void smp_callin(void)
 	 * (This works even if the APIC is not enabled.)
 	 */
 	phys_id = read_apic_id();
-	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
-		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
-					phys_id, cpuid);
-	}
-	pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
-
-	/*
-	 * STARTUP IPIs are fragile beasts as they might sometimes
-	 * trigger some glue motherboard logic. Complete APIC bus
-	 * silence for 1 second, this overestimates the time the
-	 * boot CPU is spending to send the up to 2 STARTUP IPIs
-	 * by a factor of two. This should be enough.
-	 */
-
-	/*
-	 * Waiting 2s total for startup (udelay is not yet working)
-	 */
-	timeout = jiffies + 2*HZ;
-	while (time_before(jiffies, timeout)) {
-		/*
-		 * Has the boot CPU finished it's STARTUP sequence?
-		 */
-		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
-			break;
-		cpu_relax();
-	}
-
-	if (!time_before(jiffies, timeout)) {
-		panic("%s: CPU%d started up but did not get a callout!\n",
-		      __func__, cpuid);
-	}
 
 	/*
 	 * the boot CPU has finished the init stage and is spinning
@@ -749,7 +717,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	unsigned long start_ip = real_mode_header->trampoline_start;
 
 	unsigned long boot_error = 0;
-	int timeout;
 	int cpu0_nmi_registered = 0;
 
 	/* Just in case we booted with a single CPU. */
@@ -818,12 +785,9 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		pr_debug("After Callout %d\n", cpu);
 
 		/*
-		 * Wait 5s total for a response
+		 * Wait till AP completes initial initialization
 		 */
-		for (timeout = 0; timeout < 50000; timeout++) {
-			if (cpumask_test_cpu(cpu, cpu_callin_mask))
-				break;	/* It has booted */
-			udelay(100);
+		while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
 			/*
 			 * Allow other tasks to run while we wait for the
 			 * AP to come online. This also gives a chance
@@ -832,33 +796,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 			 */
 			schedule();
 		}
-
-		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
-			print_cpu_msr(&cpu_data(cpu));
-			pr_debug("CPU%d: has booted.\n", cpu);
-		} else {
-			boot_error = 1;
-			if (*trampoline_status == 0xA5A5A5A5)
-				/* trampoline started but...? */
-				pr_err("CPU%d: Stuck ??\n", cpu);
-			else
-				/* trampoline code not run */
-				pr_err("CPU%d: Not responding\n", cpu);
-			if (apic->inquire_remote_apic)
-				apic->inquire_remote_apic(apicid);
-		}
-	}
-
-	if (boot_error) {
-		/* Try to put things back the way they were before ... */
-		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-
-		/* was set by do_boot_cpu() */
-		cpumask_clear_cpu(cpu, cpu_callout_mask);
-
-		/* was set by cpu_init() */
-		cpumask_clear_cpu(cpu, cpu_initialized_mask);
-
+	} else {
 		set_cpu_present(cpu, false);
 		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] x86: halt secondary CPU if master doesn't wait on it
  2014-03-13 14:25 [PATCH 0/3] x86: fix hang when AP bringup is too slow Igor Mammedov
  2014-03-13 14:25 ` [PATCH 1/3] x86: replace timeouts when booting secondary CPU with infinite wait loop Igor Mammedov
@ 2014-03-13 14:25 ` Igor Mammedov
  2014-03-13 14:25 ` [PATCH 3/3] x86: cleanup not needed cpu_initialized_mask Igor Mammedov
  2014-03-18 12:21 ` [PATCH 0/3] x86: fix hang when AP bringup is too slow Prarit Bhargava
  3 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2014-03-13 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, imammedo, bp, paul.gortmaker, JBeulich, prarit,
	drjones, toshi.kani, x86, riel, gong.chen

if during CPU hotplug master CPU had errors during
sending AP wake up IPIs, it fails hotplug operation.
But AP might start and spin on never set
cpu_callout_mask indefinitely, consuming power or
host VCPU resources needlessly.

On the other hand, master CPU marks AP with
BAD_APICID if hotplug attempt failed, so use this
fact to halt CPU to avoid needless spinning.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/kernel/cpu/common.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e048e87..9f07b8e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,8 +1242,11 @@ void cpu_init(void)
 	 * wait till the master CPU completes it's STARTUP sequence,
 	 * and decides to wait till this AP boots
 	 */
-	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
 		cpu_relax();
+		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
+			halt();
+	}
 
 	/*
 	 * Load microcode on this cpu if a valid microcode is available.
@@ -1345,8 +1348,11 @@ void cpu_init(void)
 	 * wait till the master CPU completes it's STARTUP sequence,
 	 * and decides to wait till this AP boots
 	 */
-	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
 		cpu_relax();
+		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
+			halt();
+	}
 
 
 	show_ucode_info_early();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] x86: cleanup not needed cpu_initialized_mask
  2014-03-13 14:25 [PATCH 0/3] x86: fix hang when AP bringup is too slow Igor Mammedov
  2014-03-13 14:25 ` [PATCH 1/3] x86: replace timeouts when booting secondary CPU with infinite wait loop Igor Mammedov
  2014-03-13 14:25 ` [PATCH 2/3] x86: halt secondary CPU if master doesn't wait on it Igor Mammedov
@ 2014-03-13 14:25 ` Igor Mammedov
  2014-03-18 12:21 ` [PATCH 0/3] x86: fix hang when AP bringup is too slow Prarit Bhargava
  3 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2014-03-13 14:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, hpa, imammedo, bp, paul.gortmaker, JBeulich, prarit,
	drjones, toshi.kani, x86, riel, gong.chen

cpu_initialized_mask is used in cpu_init() for detecting
if AP has been already started. But now test could never
return true since master CPU doesn't give up waiting on
AP startup and native_cpu_up() checks for:

 physid_isset(apicid, phys_cpu_present_map) &&
 cpu_callin_mask

so that it would not attempt to call do_boot_cpu() on
already started AP.
And since cpu_initialized_mask isn't used for anything
else just remove it altogether.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 arch/x86/include/asm/cpumask.h |    1 -
 arch/x86/kernel/cpu/common.c   |   11 -----------
 arch/x86/kernel/smpboot.c      |    2 --
 3 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/cpumask.h b/arch/x86/include/asm/cpumask.h
index 61c852f..c64c5f5 100644
--- a/arch/x86/include/asm/cpumask.h
+++ b/arch/x86/include/asm/cpumask.h
@@ -5,7 +5,6 @@
 
 extern cpumask_var_t cpu_callin_mask;
 extern cpumask_var_t cpu_callout_mask;
-extern cpumask_var_t cpu_initialized_mask;
 extern cpumask_var_t cpu_sibling_setup_mask;
 
 extern void setup_cpu_local_masks(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9f07b8e..d1382f9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -47,7 +47,6 @@
 #include "cpu.h"
 
 /* all of these masks are initialized in setup_cpu_local_masks() */
-cpumask_var_t cpu_initialized_mask;
 cpumask_var_t cpu_callout_mask;
 cpumask_var_t cpu_callin_mask;
 
@@ -57,7 +56,6 @@ cpumask_var_t cpu_sibling_setup_mask;
 /* correctly size the local cpu masks */
 void __init setup_cpu_local_masks(void)
 {
-	alloc_bootmem_cpumask_var(&cpu_initialized_mask);
 	alloc_bootmem_cpumask_var(&cpu_callin_mask);
 	alloc_bootmem_cpumask_var(&cpu_callout_mask);
 	alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask);
@@ -1265,9 +1263,6 @@ void cpu_init(void)
 
 	me = current;
 
-	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
-		panic("CPU#%d already initialized!\n", cpu);
-
 	pr_debug("Initializing CPU#%d\n", cpu);
 
 	clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
@@ -1357,12 +1352,6 @@ void cpu_init(void)
 
 	show_ucode_info_early();
 
-	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
-		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
-		for (;;)
-			local_irq_enable();
-	}
-
 	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
 
 	if (cpu_has_vme || cpu_has_tsc || cpu_has_de)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8bf67bd..803f658 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1237,8 +1237,6 @@ static void __ref remove_cpu_from_maps(int cpu)
 	set_cpu_online(cpu, false);
 	cpumask_clear_cpu(cpu, cpu_callout_mask);
 	cpumask_clear_cpu(cpu, cpu_callin_mask);
-	/* was set by cpu_init() */
-	cpumask_clear_cpu(cpu, cpu_initialized_mask);
 	numa_remove_cpu(cpu);
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
  2014-03-13 14:25 [PATCH 0/3] x86: fix hang when AP bringup is too slow Igor Mammedov
                   ` (2 preceding siblings ...)
  2014-03-13 14:25 ` [PATCH 3/3] x86: cleanup not needed cpu_initialized_mask Igor Mammedov
@ 2014-03-18 12:21 ` Prarit Bhargava
  2014-03-18 18:49   ` Igor Mammedov
  3 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2014-03-18 12:21 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, bp, paul.gortmaker, JBeulich,
	drjones, toshi.kani, x86, riel, gong.chen

On 03/13/2014 10:25 AM, Igor Mammedov wrote:
> Hang is observed on virtual machines during CPU hotplug,
> especially in big guests with many CPUs. (It happens more
> often if host is over-committed).
> 

Hey Igor, I like this better than the previous version.  Thanks for taking into
account the possible races in this code.

A quick question on system behaviour.  As you know I've been more concerned
lately with error handling, etc., through the cpu hotplug code as we've seen
several customer reports of silent failures or cascading failures in the cpu
hotplug code when users have been attempting to perform physical hotplug.

After your patches have been applied, in theory the following can happen:

The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
for the cut-and-paste),

void cpu_init(void)
{
        int cpu = smp_processor_id();
        struct task_struct *curr = current;
        struct tss_struct *t = &per_cpu(init_tss, cpu);
        struct thread_struct *thread = &curr->thread;

        /*
         * wait till the master CPU completes it's STARTUP sequence,
         * and decides to wait till this AP boots
         */
        while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
                cpu_relax();
                if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
                        halt();
        }

and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
watchdog fires on the AP cpu:

1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
I'm being really lazy and not looking at the code ;)

2.  Is there anything we can do in this code to notify the user of a problem?
Even a pr_crit() here I think would help to indicate what went wrong; it might
be useful for future debugging in this area to have some sort of output.  I
think a WARN() or BUG() is necessary here as there are several calls to cpu_init().

3.  Change this comment:

         * wait till the master CPU completes it's STARTUP sequence,
         * and decides to wait till this AP boots

to

	/* wait for the master CPU to complete this cpu's STARTUP. */ ?

Apologies for the late review,

P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
  2014-03-18 12:21 ` [PATCH 0/3] x86: fix hang when AP bringup is too slow Prarit Bhargava
@ 2014-03-18 18:49   ` Igor Mammedov
  2014-03-19 11:51     ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: Igor Mammedov @ 2014-03-18 18:49 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, tglx, mingo, hpa, bp, paul.gortmaker, JBeulich,
	drjones, toshi.kani, x86, riel, gong.chen

On Tue, 18 Mar 2014 08:21:19 -0400
Prarit Bhargava <prarit@redhat.com> wrote:

> 
> 
> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It happens more
> > often if host is over-committed).
> > 
> 
> Hey Igor, I like this better than the previous version.  Thanks for taking into
> account the possible races in this code.
> 
> A quick question on system behaviour.  As you know I've been more concerned
> lately with error handling, etc., through the cpu hotplug code as we've seen
> several customer reports of silent failures or cascading failures in the cpu
> hotplug code when users have been attempting to perform physical hotplug.
> 
> After your patches have been applied, in theory the following can happen:
> 
> The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
> for the cut-and-paste),
> 
> void cpu_init(void)
> {
>         int cpu = smp_processor_id();
>         struct task_struct *curr = current;
>         struct tss_struct *t = &per_cpu(init_tss, cpu);
>         struct thread_struct *thread = &curr->thread;
> 
>         /*
>          * wait till the master CPU completes it's STARTUP sequence,
>          * and decides to wait till this AP boots
>          */
>         while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
>                 cpu_relax();
>                 if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
>                         halt();
>         }
> 
> and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
> watchdog fires on the AP cpu:
> 
> 1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
> I'm being really lazy and not looking at the code ;)
It shouldn't, CPU is in pristine state and just came from boot trampoline at
this point without interrupts configured yet.

> 
> 2.  Is there anything we can do in this code to notify the user of a problem?
> Even a pr_crit() here I think would help to indicate what went wrong; it might
> be useful for future debugging in this area to have some sort of output.  I
> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
Do you mean something like this:

+		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
+                       WARN(1);
+			halt();
+               }

> 
> 3.  Change this comment:
> 
>          * wait till the master CPU completes it's STARTUP sequence,
>          * and decides to wait till this AP boots
> 
> to
> 
> 	/* wait for the master CPU to complete this cpu's STARTUP. */ ?
well, that is not quite the same as above, comment should underline that
AP waits for ACK from master CPU before continuing with this AP initialization.

How about:
/* wait for ACK from master CPU before continuing with AP initialization */

> 
> Apologies for the late review,
> 
> P.


-- 
Regards,
  Igor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
  2014-03-18 18:49   ` Igor Mammedov
@ 2014-03-19 11:51     ` Prarit Bhargava
  2014-03-19 12:54       ` Igor Mammedov
  0 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2014-03-19 11:51 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, bp, paul.gortmaker, JBeulich,
	drjones, toshi.kani, x86, riel, gong.chen



On 03/18/2014 02:49 PM, Igor Mammedov wrote:
> On Tue, 18 Mar 2014 08:21:19 -0400
> Prarit Bhargava <prarit@redhat.com> wrote:
> 
>>
>>
>> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
>>> Hang is observed on virtual machines during CPU hotplug,
>>> especially in big guests with many CPUs. (It happens more
>>> often if host is over-committed).
>>>
>>
>> Hey Igor, I like this better than the previous version.  Thanks for taking into
>> account the possible races in this code.
>>
>> A quick question on system behaviour.  As you know I've been more concerned
>> lately with error handling, etc., through the cpu hotplug code as we've seen
>> several customer reports of silent failures or cascading failures in the cpu
>> hotplug code when users have been attempting to perform physical hotplug.
>>
>> After your patches have been applied, in theory the following can happen:
>>
>> The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
>> for the cut-and-paste),
>>
>> void cpu_init(void)
>> {
>>         int cpu = smp_processor_id();
>>         struct task_struct *curr = current;
>>         struct tss_struct *t = &per_cpu(init_tss, cpu);
>>         struct thread_struct *thread = &curr->thread;
>>
>>         /*
>>          * wait till the master CPU completes it's STARTUP sequence,
>>          * and decides to wait till this AP boots
>>          */
>>         while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
>>                 cpu_relax();
>>                 if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
>>                         halt();
>>         }
>>
>> and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
>> watchdog fires on the AP cpu:
>>
>> 1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
>> I'm being really lazy and not looking at the code ;)
> It shouldn't, CPU is in pristine state and just came from boot trampoline at
> this point without interrupts configured yet.

Okay, not a big problem.

> 
>>
>> 2.  Is there anything we can do in this code to notify the user of a problem?
>> Even a pr_crit() here I think would help to indicate what went wrong; it might
>> be useful for future debugging in this area to have some sort of output.  I
>> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
> Do you mean something like this:
> 
> +		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
> +                       WARN(1);
> +			halt();
> +               }

Yeah, maybe WARN_ON(1, "some comment") though.

> 
>>
>> 3.  Change this comment:
>>
>>          * wait till the master CPU completes it's STARTUP sequence,
>>          * and decides to wait till this AP boots
>>
>> to
>>
>> 	/* wait for the master CPU to complete this cpu's STARTUP. */ ?
> well, that is not quite the same as above, comment should underline that
> AP waits for ACK from master CPU before continuing with this AP initialization.
> 
> How about:
> /* wait for ACK from master CPU before continuing with AP initialization */

Awesome :)

P.

> 
>>
>> Apologies for the late review,
>>
>> P.
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
  2014-03-19 11:51     ` Prarit Bhargava
@ 2014-03-19 12:54       ` Igor Mammedov
  2014-03-25 11:36         ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: Igor Mammedov @ 2014-03-19 12:54 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, tglx, mingo, hpa, bp, paul.gortmaker, JBeulich,
	drjones, toshi.kani, x86, riel, gong.chen

On Wed, 19 Mar 2014 07:51:05 -0400
Prarit Bhargava <prarit@redhat.com> wrote:

> 
> 
> On 03/18/2014 02:49 PM, Igor Mammedov wrote:
> > On Tue, 18 Mar 2014 08:21:19 -0400
> > Prarit Bhargava <prarit@redhat.com> wrote:
> > 
> >>
> >>
> >> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
> >>> Hang is observed on virtual machines during CPU hotplug,
> >>> especially in big guests with many CPUs. (It happens more
> >>> often if host is over-committed).
> >>>
> >>
> >> Hey Igor, I like this better than the previous version.  Thanks for taking into
> >> account the possible races in this code.
> >>
> >> A quick question on system behaviour.  As you know I've been more concerned
> >> lately with error handling, etc., through the cpu hotplug code as we've seen
> >> several customer reports of silent failures or cascading failures in the cpu
> >> hotplug code when users have been attempting to perform physical hotplug.
> >>
> >> After your patches have been applied, in theory the following can happen:
> >>
> >> The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
> >> for the cut-and-paste),
> >>
> >> void cpu_init(void)
> >> {
> >>         int cpu = smp_processor_id();
> >>         struct task_struct *curr = current;
> >>         struct tss_struct *t = &per_cpu(init_tss, cpu);
> >>         struct thread_struct *thread = &curr->thread;
> >>
> >>         /*
> >>          * wait till the master CPU completes it's STARTUP sequence,
> >>          * and decides to wait till this AP boots
> >>          */
> >>         while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
> >>                 cpu_relax();
> >>                 if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
> >>                         halt();
> >>         }
> >>
> >> and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
> >> watchdog fires on the AP cpu:
> >>
> >> 1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
> >> I'm being really lazy and not looking at the code ;)
> > It shouldn't, CPU is in pristine state and just came from boot trampoline at
> > this point without interrupts configured yet.
> 
> Okay, not a big problem.
> 
> > 
> >>
> >> 2.  Is there anything we can do in this code to notify the user of a problem?
> >> Even a pr_crit() here I think would help to indicate what went wrong; it might
> >> be useful for future debugging in this area to have some sort of output.  I
> >> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
> > Do you mean something like this:
> > 
> > +		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
> > +                       WARN(1);
> > +			halt();
> > +               }
> 
> Yeah, maybe WARN_ON(1, "some comment") though.
printk at so early stage might be cause issues, since it is quite complex.
Its' disabling/enabling irqs, calls *_delay_*() functions and takes locks.
The last is especially dangerous because if AP is shot down by another
INIT/SIPI, system will hang on next printk if locks were acquired by AP
at that time.
That case is possible if master CPU has got errors during wakeup_ap() and
failed cpu_up() then it was unplugged + plugged via ACPI and attempted
to be onlined again. 

It's much safer not to do anything complex at AP start-up so early.

BTW:
when AP reaches halt() line, failure is not silent. the master CPU might
print error message if debug level logging is active:
see arch/x86/kernel/smpboot.c:native_cpu_up()
...
        if (err) {
                pr_debug("do_boot_cpu failed %d\n", err);
                return -EIO;
        }
...

perhaps we should change pr_debug to pr_crit here to make it more visible.
something like:

@@ -858,7 +858,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
        err = do_boot_cpu(apicid, cpu, tidle);
        if (err) {
-               pr_debug("do_boot_cpu failed %d\n", err);
+               pr_crit("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
                return -EIO;
        }


> 
> > 
> >>
> >> 3.  Change this comment:
> >>
> >>          * wait till the master CPU completes it's STARTUP sequence,
> >>          * and decides to wait till this AP boots
> >>
> >> to
> >>
> >> 	/* wait for the master CPU to complete this cpu's STARTUP. */ ?
> > well, that is not quite the same as above, comment should underline that
> > AP waits for ACK from master CPU before continuing with this AP initialization.
> > 
> > How about:
> > /* wait for ACK from master CPU before continuing with AP initialization */
> 
> Awesome :)
> 
> P.
> 
> > 
> >>
> >> Apologies for the late review,
> >>
> >> P.
> > 
> > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
  2014-03-19 12:54       ` Igor Mammedov
@ 2014-03-25 11:36         ` Prarit Bhargava
  2014-03-25 12:44           ` Igor Mammedov
  0 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2014-03-25 11:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: linux-kernel, tglx, mingo, hpa, bp, paul.gortmaker, JBeulich,
	drjones, toshi.kani, x86, riel, gong.chen



On 03/19/2014 08:54 AM, Igor Mammedov wrote:
> On Wed, 19 Mar 2014 07:51:05 -0400
> Prarit Bhargava <prarit@redhat.com> wrote:
> 
>>
>>
>> On 03/18/2014 02:49 PM, Igor Mammedov wrote:
>>> On Tue, 18 Mar 2014 08:21:19 -0400
>>> Prarit Bhargava <prarit@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
>>>>> Hang is observed on virtual machines during CPU hotplug,
>>>>> especially in big guests with many CPUs. (It happens more
>>>>> often if host is over-committed).
>>>>>
>>>>
>>>> Hey Igor, I like this better than the previous version.  Thanks for taking into
>>>> account the possible races in this code.
>>>>
>>>> A quick question on system behaviour.  As you know I've been more concerned
>>>> lately with error handling, etc., through the cpu hotplug code as we've seen
>>>> several customer reports of silent failures or cascading failures in the cpu
>>>> hotplug code when users have been attempting to perform physical hotplug.
>>>>
>>>> After your patches have been applied, in theory the following can happen:
>>>>
>>>> The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
>>>> for the cut-and-paste),
>>>>
>>>> void cpu_init(void)
>>>> {
>>>>         int cpu = smp_processor_id();
>>>>         struct task_struct *curr = current;
>>>>         struct tss_struct *t = &per_cpu(init_tss, cpu);
>>>>         struct thread_struct *thread = &curr->thread;
>>>>
>>>>         /*
>>>>          * wait till the master CPU completes it's STARTUP sequence,
>>>>          * and decides to wait till this AP boots
>>>>          */
>>>>         while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
>>>>                 cpu_relax();
>>>>                 if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
>>>>                         halt();
>>>>         }
>>>>
>>>> and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
>>>> watchdog fires on the AP cpu:
>>>>
>>>> 1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
>>>> I'm being really lazy and not looking at the code ;)
>>> It shouldn't, CPU is in pristine state and just came from boot trampoline at
>>> this point without interrupts configured yet.
>>
>> Okay, not a big problem.
>>
>>>
>>>>
>>>> 2.  Is there anything we can do in this code to notify the user of a problem?
>>>> Even a pr_crit() here I think would help to indicate what went wrong; it might
>>>> be useful for future debugging in this area to have some sort of output.  I
>>>> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
>>> Do you mean something like this:
>>>
>>> +		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
>>> +                       WARN(1);
>>> +			halt();
>>> +               }
>>
>> Yeah, maybe WARN_ON(1, "some comment") though.
> printk at so early stage might be cause issues, since it is quite complex.
> Its' disabling/enabling irqs, calls *_delay_*() functions and takes locks.
> The last is especially dangerous because if AP is shot down by another
> INIT/SIPI, system will hang on next printk if locks were acquired by AP
> at that time.

early_printk()?

> That case is possible if master CPU has got errors during wakeup_ap() and
> failed cpu_up() then it was unplugged + plugged via ACPI and attempted
> to be onlined again. 
> 
> It's much safer not to do anything complex at AP start-up so early.
> 
> BTW:
> when AP reaches halt() line, failure is not silent. the master CPU might
> print error message if debug level logging is active:
> see arch/x86/kernel/smpboot.c:native_cpu_up()
> ...
>         if (err) {
>                 pr_debug("do_boot_cpu failed %d\n", err);
>                 return -EIO;
>         }
> ...
> 
> perhaps we should change pr_debug to pr_crit here to make it more visible.
> something like:
> 
> @@ -858,7 +858,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
>  
>         err = do_boot_cpu(apicid, cpu, tidle);
>         if (err) {
> -               pr_debug("do_boot_cpu failed %d\n", err);
> +               pr_crit("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
>                 return -EIO;
>         }
> 

Yes, this is a good idea.

> 
>>
>>>
>>>>
>>>> 3.  Change this comment:
>>>>
>>>>          * wait till the master CPU completes it's STARTUP sequence,
>>>>          * and decides to wait till this AP boots
>>>>
>>>> to
>>>>
>>>> 	/* wait for the master CPU to complete this cpu's STARTUP. */ ?
>>> well, that is not quite the same as above, comment should underline that
>>> AP waits for ACK from master CPU before continuing with this AP initialization.
>>>
>>> How about:
>>> /* wait for ACK from master CPU before continuing with AP initialization */
>>
>> Awesome :)
>>
>> P.
>>
>>>
>>>>
>>>> Apologies for the late review,
>>>>
>>>> P.
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
  2014-03-25 11:36         ` Prarit Bhargava
@ 2014-03-25 12:44           ` Igor Mammedov
  0 siblings, 0 replies; 10+ messages in thread
From: Igor Mammedov @ 2014-03-25 12:44 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, tglx, mingo, hpa, bp, paul.gortmaker, JBeulich,
	drjones, toshi.kani, x86, riel, gong.chen

On Tue, 25 Mar 2014 07:36:07 -0400
Prarit Bhargava <prarit@redhat.com> wrote:

> 
> 
> On 03/19/2014 08:54 AM, Igor Mammedov wrote:
> > On Wed, 19 Mar 2014 07:51:05 -0400
> > Prarit Bhargava <prarit@redhat.com> wrote:
> > 
> >>
> >>
> >> On 03/18/2014 02:49 PM, Igor Mammedov wrote:
> >>> On Tue, 18 Mar 2014 08:21:19 -0400
> >>> Prarit Bhargava <prarit@redhat.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
> >>>>> Hang is observed on virtual machines during CPU hotplug,
> >>>>> especially in big guests with many CPUs. (It happens more
> >>>>> often if host is over-committed).
> >>>>>
> >>>>
> >>>> Hey Igor, I like this better than the previous version.  Thanks for taking into
> >>>> account the possible races in this code.
> >>>>
> >>>> A quick question on system behaviour.  As you know I've been more concerned
> >>>> lately with error handling, etc., through the cpu hotplug code as we've seen
> >>>> several customer reports of silent failures or cascading failures in the cpu
> >>>> hotplug code when users have been attempting to perform physical hotplug.
> >>>>
> >>>> After your patches have been applied, in theory the following can happen:
> >>>>
> >>>> The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
> >>>> for the cut-and-paste),
> >>>>
> >>>> void cpu_init(void)
> >>>> {
> >>>>         int cpu = smp_processor_id();
> >>>>         struct task_struct *curr = current;
> >>>>         struct tss_struct *t = &per_cpu(init_tss, cpu);
> >>>>         struct thread_struct *thread = &curr->thread;
> >>>>
> >>>>         /*
> >>>>          * wait till the master CPU completes it's STARTUP sequence,
> >>>>          * and decides to wait till this AP boots
> >>>>          */
> >>>>         while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
> >>>>                 cpu_relax();
> >>>>                 if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
> >>>>                         halt();
> >>>>         }
> >>>>
> >>>> and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
> >>>> watchdog fires on the AP cpu:
> >>>>
> >>>> 1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
> >>>> I'm being really lazy and not looking at the code ;)
> >>> It shouldn't, CPU is in pristine state and just came from boot trampoline at
> >>> this point without interrupts configured yet.
> >>
> >> Okay, not a big problem.
> >>
> >>>
> >>>>
> >>>> 2.  Is there anything we can do in this code to notify the user of a problem?
> >>>> Even a pr_crit() here I think would help to indicate what went wrong; it might
> >>>> be useful for future debugging in this area to have some sort of output.  I
> >>>> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
> >>> Do you mean something like this:
> >>>
> >>> +		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
> >>> +                       WARN(1);
> >>> +			halt();
> >>> +               }
> >>
> >> Yeah, maybe WARN_ON(1, "some comment") though.
> > printk at so early stage might be cause issues, since it is quite complex.
> > Its' disabling/enabling irqs, calls *_delay_*() functions and takes locks.
> > The last is especially dangerous because if AP is shot down by another
> > INIT/SIPI, system will hang on next printk if locks were acquired by AP
> > at that time.
> 
> early_printk()?
early_printk could mess with console output, when another CPU does output
on the associated console.

Just to be safe I'd avoid to do anything here, provided that master CPU
will print error message.


> 
> > That case is possible if master CPU has got errors during wakeup_ap() and
> > failed cpu_up() then it was unplugged + plugged via ACPI and attempted
> > to be onlined again. 
> > 
> > It's much safer not to do anything complex at AP start-up so early.
> > 
> > BTW:
> > when AP reaches halt() line, failure is not silent. the master CPU might
> > print error message if debug level logging is active:
> > see arch/x86/kernel/smpboot.c:native_cpu_up()
> > ...
> >         if (err) {
> >                 pr_debug("do_boot_cpu failed %d\n", err);
> >                 return -EIO;
> >         }
> > ...
> > 
> > perhaps we should change pr_debug to pr_crit here to make it more visible.
> > something like:
> > 
> > @@ -858,7 +858,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
> >  
> >         err = do_boot_cpu(apicid, cpu, tidle);
> >         if (err) {
> > -               pr_debug("do_boot_cpu failed %d\n", err);
> > +               pr_crit("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
> >                 return -EIO;
> >         }
> > 
> 
> Yes, this is a good idea.
Ok, I'll respin series with above patch added.

> 
> > 
> >>
> >>>
> >>>>
> >>>> 3.  Change this comment:
> >>>>
> >>>>          * wait till the master CPU completes it's STARTUP sequence,
> >>>>          * and decides to wait till this AP boots
> >>>>
> >>>> to
> >>>>
> >>>> 	/* wait for the master CPU to complete this cpu's STARTUP. */ ?
> >>> well, that is not quite the same as above, comment should underline that
> >>> AP waits for ACK from master CPU before continuing with this AP initialization.
> >>>
> >>> How about:
> >>> /* wait for ACK from master CPU before continuing with AP initialization */
> >>
> >> Awesome :)
> >>
> >> P.
> >>
> >>>
> >>>>
> >>>> Apologies for the late review,
> >>>>
> >>>> P.
> >>>
> >>>
> > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-03-25 15:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-13 14:25 [PATCH 0/3] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-03-13 14:25 ` [PATCH 1/3] x86: replace timeouts when booting secondary CPU with infinite wait loop Igor Mammedov
2014-03-13 14:25 ` [PATCH 2/3] x86: halt secondary CPU if master doesn't wait on it Igor Mammedov
2014-03-13 14:25 ` [PATCH 3/3] x86: cleanup not needed cpu_initialized_mask Igor Mammedov
2014-03-18 12:21 ` [PATCH 0/3] x86: fix hang when AP bringup is too slow Prarit Bhargava
2014-03-18 18:49   ` Igor Mammedov
2014-03-19 11:51     ` Prarit Bhargava
2014-03-19 12:54       ` Igor Mammedov
2014-03-25 11:36         ` Prarit Bhargava
2014-03-25 12:44           ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).