From: Igor Mammedov <imammedo@redhat.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
mingo@redhat.com, hpa@zytor.com, x86@kernel.org, bp@suse.de,
paul.gortmaker@windriver.com, JBeulich@suse.com,
prarit@redhat.com, drjones@redhat.com, toshi.kani@hp.com,
riel@redhat.com, gong.chen@linux.intel.com
Subject: Re: [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop
Date: Wed, 2 Apr 2014 23:29:56 +0200 [thread overview]
Message-ID: <20140402232956.54848fbe@thinkpad> (raw)
In-Reply-To: <87ppkzk5zi.fsf@tassilo.jf.intel.com>
On Wed, 02 Apr 2014 10:15:29 -0700
Andi Kleen <andi@firstfloor.org> wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
>
> > Hang is observed on virtual machines during CPU hotplug,
> > especially in big guests with many CPUs. (It reproducible
> > more often if host is over-committed).
> >
> > It happens because master CPU gives up waiting on
> > secondary CPU and allows it to run wild. As result
> > AP causes locking or crashing system. For example
> > as described here: https://lkml.org/lkml/2014/3/6/257
> >
> > If master CPU have sent STARTUP IPI successfully,
> > make it wait indefinitely till AP boots.
>
>
> But what happens on a real machine when the other CPU is dead?
One possible way to boot such machine would be to disable dead CPU
in kernel parameters.
> I've seen that. Kernel still boots. With your patch it would
> hang.
>
> I don't think you can do that. It needs to have some timeout.
> Maybe a longer or configurable one?
there were patch that tried to keep timeouts and 'gracefully'
cancel AP boot if master timed out on it.
https://lkml.org/lkml/2014/3/6/257
It's possible to keep timeouts in do_boot_cpu(), is setting
trampoline_status sufficient indication that AP is not dead
and worth waiting for?
than it could be rewritten like this:
if (!boot_error) {
boot_error = 1;
for (timeout = 0; timeout < 50000; timeout++) {
/* Wait till AP signals that it's ready to start initialization */
if (*trampoline_status == 0xA5A5A5A5) {
boot_error = 0;
/* allow AP to start initializing. */
cpumask_set_cpu(cpu, cpu_callout_mask);
/* wait till AP boots till cpu_callin_mask point */
while (cpumask_test_cpu(cpu, cpu_callin_mask))
schedule();
break; /* It has booted */
}
udelay(100);
}
}
it will provide timeout if AP is dead and still keep AP from running wild
if master CPU timed out on it.
>
> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only
--
Regards,
Igor
next prev parent reply other threads:[~2014-04-02 21:31 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-31 20:09 [PATCH v2 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-03-31 20:09 ` [PATCH v2 1/5] x86: replace timeouts when booting secondary CPU with infinite wait loop Igor Mammedov
2014-04-02 17:15 ` Andi Kleen
2014-04-02 21:29 ` Igor Mammedov [this message]
2014-04-02 23:48 ` Andi Kleen
2014-04-03 6:43 ` Ingo Molnar
2014-04-03 21:03 ` Andi Kleen
2014-03-31 20:09 ` [PATCH v2 2/5] x86: cleanup not needed cpu_initialized_mask Igor Mammedov
2014-03-31 20:09 ` [PATCH v2 3/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
2014-03-31 20:09 ` [PATCH v2 4/5] x86: fix list corruption on CPU hotplug Igor Mammedov
2014-03-31 20:09 ` [PATCH v2 5/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140402232956.54848fbe@thinkpad \
--to=imammedo@redhat.com \
--cc=JBeulich@suse.com \
--cc=andi@firstfloor.org \
--cc=bp@suse.de \
--cc=drjones@redhat.com \
--cc=gong.chen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=paul.gortmaker@windriver.com \
--cc=prarit@redhat.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=toshi.kani@hp.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.