From: Igor Mammedov <imammedo@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de, mingo@redhat.com, x86@kernel.org
Subject: Re: [PATCH v5 0/4] x86: fix hang when AP bringup is too slow
Date: Wed, 4 Jun 2014 15:21:55 +0200 [thread overview]
Message-ID: <20140604152155.08e15821@nial.usersys.redhat.com> (raw)
In-Reply-To: <1399322991-19329-1-git-send-email-imammedo@redhat.com>
On Mon, 5 May 2014 22:49:47 +0200
Igor Mammedov <imammedo@redhat.com> wrote:
> changes since v4:
> * merge "[PATCH v4 1/5] x86: fix list corruption on CPU hotplug"
> and "[PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic()"
> together
> * "x86: initialize secondary CPU only if master CPU will wait for it:
> - add 10 seconds timeout description into commit message
> - add smp_mb() after clearing cpu_initialized_mask
>
> changes since v3:
> * put simple bugfixes first
> * move common part of syncing with master CPU in cpu_init()
> for x32/64 variant into helper function
> * cpu_init(): WARN_ON if cpu_initialized_mask is set
> * fix panic on CPU unplug, caused by erroneous removing
> of "pr->dev = dev;" in drivers/acpi/acpi_processor.c
>
> --
> Hang is observed on virtual machines during CPU hotplug,
> especially in big guests with many CPUs. (It happens more
> often if host is over-committed).
>
> Hang happens because master CPU timeouts on waiting till
> AP boots and 'cancels' CPU online operation assuming AP
> is not functional but AP may continue run wild later
> causing various hangs or panics in running kernel that
> is assuming that AP was offline.
>
> This is an alternative approach, that instead of canceling
> in-progress AP bringup (https://lkml.org/lkml/2014/3/6/257),
> removes timeouts so that AP bringup won't be affected by
> poor timing and syncs AP with master CPU at early startup
> making sure that AP won't run wild if master CPU doesn't
> expect AP to come online.
>
> Series also fixes 3 bugs found during testing CPU bringup
> failure case.
since 3.16 merge window is open now,
ping
> --
> Below is the detailed description of a more often happening hang:
> ---
> Master CPU may timeout before cpu_callin_mask is set and cancel
> booting CPU, but being onlined CPU still continues to boot, sets
> cpu_active_mask (CPU_STARTING notifiers) and spins in
> check_tsc_sync_target() for master cpu to arrive. Following attempt
> to online another cpu hangs in stop_machine, initiated from here:
> smp_callin ->
> smp_store_cpu_info ->
> identify_secondary_cpu ->
> mtrr_ap_init -> set_mtrr_from_inactive_cpu
>
> stop_machine waits on completion of stop_work on all CPUs from
> cpu_active_mask including a failed CPU that spins in check_tsc_sync_target().
>
> Igor Mammedov (4):
> x86: fix list/memory corruption on CPU hotplug
> acpi_processor: do not mark present at boot but not onlined CPU as
> onlined
> x86: log error on secondary CPU wakeup failure at ERR level
> x86: initialize secondary CPU only if master CPU will wait for it
>
> arch/x86/kernel/cpu/common.c | 27 ++++++----
> arch/x86/kernel/smpboot.c | 104 +++++++++++++----------------------------
> drivers/acpi/acpi_processor.c | 1 -
> 3 files changed, 48 insertions(+), 84 deletions(-)
>
next prev parent reply other threads:[~2014-06-04 13:22 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-05 20:49 [PATCH v5 0/4] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-05-05 20:49 ` [PATCH v5 1/4] x86: fix list/memory corruption on CPU hotplug Igor Mammedov
2014-05-05 20:49 ` [PATCH v5 2/4] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
2014-05-07 23:48 ` Rafael J. Wysocki
2014-05-08 6:09 ` Ingo Molnar
2014-05-08 11:33 ` Rafael J. Wysocki
2014-05-08 11:17 ` Ingo Molnar
2014-05-05 20:49 ` [PATCH v5 3/4] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
2014-05-05 20:49 ` [PATCH v5 4/4] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
2014-05-05 21:35 ` Toshi Kani
2014-06-04 13:21 ` Igor Mammedov [this message]
2014-06-05 12:29 ` [PATCH v5 0/4] x86: fix hang when AP bringup is too slow Ingo Molnar
2014-06-05 13:12 ` Igor Mammedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140604152155.08e15821@nial.usersys.redhat.com \
--to=imammedo@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.