From: Igor Mammedov <imammedo@redhat.com>
To: prarit@redhat.com
Cc: Igor Mammedov <imammedo@redhat.com>,
drjones@redhat.com, toshi.kani@hp.com, rjw@rjwysocki.net,
linux-acpi@vger.kernel.org
Subject: Re: [PATCH v4 0/5] x86: fix hang when AP bringup is too slow
Date: Tue, 29 Apr 2014 10:36:28 +0200 [thread overview]
Message-ID: <20140429103628.714e772f@thinkpad> (raw)
In-Reply-To: <1397488277-14865-1-git-send-email-imammedo@redhat.com>
On Mon, 14 Apr 2014 17:11:12 +0200
Igor Mammedov <imammedo@redhat.com> wrote:
> changes since v3:
> * put simple bugfixes first
> * move common part of syncing with master CPU in cpu_init()
> for x32/64 variant into helper function
> * cpu_init(): WARN_ON if cpu_initialized_mask is set
> * fix panic on CPU unplug, caused by erroneous removing
> of "pr->dev = dev;" in drivers/acpi/acpi_processor.c
Hi guys,
It seems there won't be more comments on series,
could you review it, please?
>
> --
> Hang is observed on virtual machines during CPU hotplug,
> especially in big guests with many CPUs. (It happens more
> often if host is over-committed).
>
> Hang happens because master CPU timeouts on waiting till
> AP boots and 'cancels' CPU online operation assuming AP
> is not functional but AP may continue run wild later
> causing various hangs or panics in running kernel that
> is assuming that AP was offline.
>
> This is an alternative approach, that instead of canceling
> in-progress AP bringup (https://lkml.org/lkml/2014/3/6/257),
> removes timeouts so that AP bringup won't be affected by
> poor timing and syncs AP with master CPU at early startup
> making sure that AP won't run wild if master CPU doesn't
> expect AP to come online.
>
> Series also fixes 3 bugs found during testing CPU bringup
> failure case.
>
> --
> Below is the detailed description of a more often happening hang:
> ---
> Master CPU may timeout before cpu_callin_mask is set and cancel
> booting CPU, but being onlined CPU still continues to boot, sets
> cpu_active_mask (CPU_STARTING notifiers) and spins in
> check_tsc_sync_target() for master cpu to arrive. Following attempt
> to online another cpu hangs in stop_machine, initiated from here:
> smp_callin ->
> smp_store_cpu_info ->
> identify_secondary_cpu ->
> mtrr_ap_init -> set_mtrr_from_inactive_cpu
>
> stop_machine waits on completion of stop_work on all CPUs from
> cpu_active_mask including a failed CPU that spins in check_tsc_sync_target().
>
>
> Igor Mammedov (5):
> x86: fix list corruption on CPU hotplug
> x86: fix memory corruption in acpi_unmap_lsapic()
> acpi_processor: do not mark present at boot but not onlined CPU as
> onlined
> x86: log error on secondary CPU wakeup failure at ERR level
> x86: initialize secondary CPU only if master CPU will wait for it
>
> arch/x86/kernel/cpu/common.c | 27 ++++++----
> arch/x86/kernel/smpboot.c | 103 ++++++++++++----------------------------
> drivers/acpi/acpi_processor.c | 1 -
> 3 files changed, 47 insertions(+), 84 deletions(-)
>
--
Regards,
Igor
prev parent reply other threads:[~2014-04-29 8:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
2014-04-30 21:18 ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
2014-04-15 5:48 ` Rafael J. Wysocki
2014-04-15 6:00 ` Igor Mammedov
2014-04-15 6:04 ` Ingo Molnar
2014-04-15 15:48 ` Rafael J. Wysocki
2014-04-15 5:53 ` Rafael J. Wysocki
2014-04-30 21:25 ` Toshi Kani
2014-05-02 11:32 ` Igor Mammedov
2014-05-02 17:23 ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
2014-04-30 21:30 ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
2014-05-01 23:11 ` Toshi Kani
2014-05-02 8:21 ` Igor Mammedov
2014-05-02 14:52 ` Toshi Kani
2014-05-05 20:26 ` Igor Mammedov
2014-04-29 8:36 ` Igor Mammedov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140429103628.714e772f@thinkpad \
--to=imammedo@redhat.com \
--cc=drjones@redhat.com \
--cc=linux-acpi@vger.kernel.org \
--cc=prarit@redhat.com \
--cc=rjw@rjwysocki.net \
--cc=toshi.kani@hp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).