public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Prarit Bhargava <prarit@redhat.com>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
	mingo@redhat.com, hpa@zytor.com, bp@suse.de,
	paul.gortmaker@windriver.com, JBeulich@suse.com,
	drjones@redhat.com, toshi.kani@hp.com, x86@kernel.org,
	riel@redhat.com, gong.chen@linux.intel.com
Subject: Re: [PATCH 0/3] x86: fix hang when AP bringup is too slow
Date: Tue, 18 Mar 2014 19:49:51 +0100	[thread overview]
Message-ID: <20140318194951.17fd61ea@thinkpad> (raw)
In-Reply-To: <53283A3F.6040302@redhat.com>

On Tue, 18 Mar 2014 08:21:19 -0400
Prarit Bhargava <prarit@redhat.com> wrote:

> 
> 
> On 03/13/2014 10:25 AM, Igor Mammedov wrote:
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It happens more
> > often if host is over-committed).
> > 
> 
> Hey Igor, I like this better than the previous version.  Thanks for taking into
> account the possible races in this code.
> 
> A quick question on system behaviour.  As you know I've been more concerned
> lately with error handling, etc., through the cpu hotplug code as we've seen
> several customer reports of silent failures or cascading failures in the cpu
> hotplug code when users have been attempting to perform physical hotplug.
> 
> After your patches have been applied, in theory the following can happen:
> 
> The master CPU is completing the AP cpu's bring up.  The AP cpu is doing (sorry
> for the cut-and-paste),
> 
> void cpu_init(void)
> {
>         int cpu = smp_processor_id();
>         struct task_struct *curr = current;
>         struct tss_struct *t = &per_cpu(init_tss, cpu);
>         struct thread_struct *thread = &curr->thread;
> 
>         /*
>          * wait till the master CPU completes it's STARTUP sequence,
>          * and decides to wait till this AP boots
>          */
>         while (!cpumask_test_cpu(cpu, cpu_callout_mask)) {
>                 cpu_relax();
>                 if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID)
>                         halt();
>         }
> 
> and is spinning on cpu_relax().  Suppose something goes wrong and the softlockup
> watchdog fires on the AP cpu:
> 
> 1.  Can it? :) ie) will the softlockup fire at this point of the AP init?  Okay,
> I'm being really lazy and not looking at the code ;)
It shouldn't, CPU is in pristine state and just came from boot trampoline at
this point without interrupts configured yet.

> 
> 2.  Is there anything we can do in this code to notify the user of a problem?
> Even a pr_crit() here I think would help to indicate what went wrong; it might
> be useful for future debugging in this area to have some sort of output.  I
> think a WARN() or BUG() is necessary here as there are several calls to cpu_init().
Do you mean something like this:

+		if (per_cpu(x86_cpu_to_apicid, cpu) == BAD_APICID) {
+                       WARN(1);
+			halt();
+               }

> 
> 3.  Change this comment:
> 
>          * wait till the master CPU completes it's STARTUP sequence,
>          * and decides to wait till this AP boots
> 
> to
> 
> 	/* wait for the master CPU to complete this cpu's STARTUP. */ ?
well, that is not quite the same as above, comment should underline that
AP waits for ACK from master CPU before continuing with this AP initialization.

How about:
/* wait for ACK from master CPU before continuing with AP initialization */

> 
> Apologies for the late review,
> 
> P.


-- 
Regards,
  Igor

  reply	other threads:[~2014-03-18 19:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-13 14:25 [PATCH 0/3] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-03-13 14:25 ` [PATCH 1/3] x86: replace timeouts when booting secondary CPU with infinite wait loop Igor Mammedov
2014-03-13 14:25 ` [PATCH 2/3] x86: halt secondary CPU if master doesn't wait on it Igor Mammedov
2014-03-13 14:25 ` [PATCH 3/3] x86: cleanup not needed cpu_initialized_mask Igor Mammedov
2014-03-18 12:21 ` [PATCH 0/3] x86: fix hang when AP bringup is too slow Prarit Bhargava
2014-03-18 18:49   ` Igor Mammedov [this message]
2014-03-19 11:51     ` Prarit Bhargava
2014-03-19 12:54       ` Igor Mammedov
2014-03-25 11:36         ` Prarit Bhargava
2014-03-25 12:44           ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140318194951.17fd61ea@thinkpad \
    --to=imammedo@redhat.com \
    --cc=JBeulich@suse.com \
    --cc=bp@suse.de \
    --cc=drjones@redhat.com \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=toshi.kani@hp.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox