linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: prarit@redhat.com
Cc: Igor Mammedov <imammedo@redhat.com>,
	drjones@redhat.com, toshi.kani@hp.com, rjw@rjwysocki.net,
	linux-acpi@vger.kernel.org
Subject: Re: [PATCH v4 0/5] x86: fix hang when AP bringup is too slow
Date: Tue, 29 Apr 2014 10:36:28 +0200	[thread overview]
Message-ID: <20140429103628.714e772f@thinkpad> (raw)
In-Reply-To: <1397488277-14865-1-git-send-email-imammedo@redhat.com>

On Mon, 14 Apr 2014 17:11:12 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> changes since v3:
>  * put simple bugfixes first
>  * move common part of syncing with master CPU in cpu_init()
>    for x32/64 variant into helper function
>  * cpu_init(): WARN_ON if cpu_initialized_mask is set
>  * fix panic on CPU unplug, caused by erroneous removing
>    of "pr->dev = dev;" in drivers/acpi/acpi_processor.c
Hi guys,

It seems there won't be more comments on series,
could you review it, please?

> 
> --
> Hang is observed on virtual machines during CPU hotplug,
> especially in big guests with many CPUs. (It happens more
> often if host is over-committed).
> 
> Hang happens because master CPU timeouts on waiting till
> AP boots and 'cancels' CPU online operation assuming AP
> is not functional but AP may continue run wild later
> causing various hangs or panics in running kernel that
> is assuming that AP was offline.
> 
> This is an alternative approach, that instead of canceling
> in-progress AP bringup (https://lkml.org/lkml/2014/3/6/257),
> removes timeouts so that AP bringup won't be affected by
> poor timing and syncs AP with master CPU at early startup
> making sure that AP won't run wild if master CPU doesn't
> expect AP to come online.
> 
> Series also fixes 3 bugs found during testing CPU bringup
> failure case.
> 
> --
> Below is the detailed description of a more often happening hang:
> ---
> Master CPU may timeout before cpu_callin_mask is set and cancel
> booting CPU, but being onlined CPU still continues to boot, sets
> cpu_active_mask (CPU_STARTING notifiers) and spins in
> check_tsc_sync_target() for master cpu to arrive. Following attempt
> to online another cpu hangs in stop_machine, initiated from here:
> smp_callin ->
>   smp_store_cpu_info ->
>     identify_secondary_cpu ->
>       mtrr_ap_init -> set_mtrr_from_inactive_cpu
> 
> stop_machine waits on completion of stop_work on all CPUs from
> cpu_active_mask including a failed CPU that spins in check_tsc_sync_target().
> 
> 
> Igor Mammedov (5):
>   x86: fix list corruption on CPU hotplug
>   x86: fix memory corruption in acpi_unmap_lsapic()
>   acpi_processor: do not mark present at boot but not onlined CPU as
>     onlined
>   x86: log error on secondary CPU wakeup failure at ERR level
>   x86: initialize secondary CPU only if master CPU will wait for it
> 
>  arch/x86/kernel/cpu/common.c  |   27 ++++++----
>  arch/x86/kernel/smpboot.c     |  103 ++++++++++++----------------------------
>  drivers/acpi/acpi_processor.c |    1 -
>  3 files changed, 47 insertions(+), 84 deletions(-)
> 


-- 
Regards,
  Igor

      parent reply	other threads:[~2014-04-29  8:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14 15:11 [PATCH v4 0/5] x86: fix hang when AP bringup is too slow Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 1/5] x86: fix list corruption on CPU hotplug Igor Mammedov
2014-04-30 21:18   ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 2/5] x86: fix memory corruption in acpi_unmap_lsapic() Igor Mammedov
2014-04-14 15:11 ` [PATCH v4 3/5] acpi_processor: do not mark present at boot but not onlined CPU as onlined Igor Mammedov
2014-04-15  5:48   ` Rafael J. Wysocki
2014-04-15  6:00     ` Igor Mammedov
2014-04-15  6:04     ` Ingo Molnar
2014-04-15 15:48       ` Rafael J. Wysocki
2014-04-15  5:53   ` Rafael J. Wysocki
2014-04-30 21:25   ` Toshi Kani
2014-05-02 11:32     ` Igor Mammedov
2014-05-02 17:23       ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 4/5] x86: log error on secondary CPU wakeup failure at ERR level Igor Mammedov
2014-04-30 21:30   ` Toshi Kani
2014-04-14 15:11 ` [PATCH v4 5/5] x86: initialize secondary CPU only if master CPU will wait for it Igor Mammedov
2014-05-01 23:11   ` Toshi Kani
2014-05-02  8:21     ` Igor Mammedov
2014-05-02 14:52       ` Toshi Kani
2014-05-05 20:26         ` Igor Mammedov
2014-04-29  8:36 ` Igor Mammedov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140429103628.714e772f@thinkpad \
    --to=imammedo@redhat.com \
    --cc=drjones@redhat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=toshi.kani@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).