linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alok Kataria <akataria@vmware.com>
To: "bigeasy@linutronix.de" <bigeasy@linutronix.de>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>, "bp@alien8.de" <bp@alien8.de>,
	"m.v.b@runbox.com" <m.v.b@runbox.com>
Cc: "linux-tip-commits@vger.kernel.org"  <linux-tip-commits@vger.kernel.org>
Subject: Re: [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN)
Date: Fri, 11 Nov 2016 05:49:18 +0000	[thread overview]
Message-ID: <1478843692.2694.235.camel@vmware.com> (raw)
In-Reply-To: <tip-d49597fd3bc7d9534de55e9256767f073be1b33a@git.kernel.org>

Hi Thomas, 

On Wed, 2016-11-09 at 12:27 -0800, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  d49597fd3bc7d9534de55e9256767f073be1b33a
> Gitweb:     https://urldefense.proofpoint.com/v2/url?u=http-3A__git.kernel.org_tip_d49597fd3bc7d9534de55e9256767f073be1b33a&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=qI64vSH3y6q8wJhcqpI4dXYma-i1RTtlxgKwKwhFWWo&e= 
> Author:     Thomas Gleixner <tglx@linutronix.de>
> AuthorDate: Wed, 9 Nov 2016 16:35:51 +0100
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Wed, 9 Nov 2016 21:05:01 +0100
> 
> x86/cpu: Deal with broken firmware (VMWare/XEN)
> 
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID.
> 
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map. The
> physical package id which is used after a CPU comes up is retrieved from
> CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.
> 
> There exist VMware and XEN implementations which violate the spec. As a
> result the physical to logical package map, which relies on the ACPI/MP
> tables does not work on those systems, because the CPUID initialized
> physical package id does not match the firmware id. This causes system
> crashes and malfunction due to invalid package mappings.

For documentation purpose let me note that, VMware VMs running at
virtual hardware version 9 and above don't have this ACPI/MP and CPUID
divergence on the package id. So not everyone will see this issue on
their VMs, this bug is limited to folks running at virtual hardware
version 8 and prior.

It's good that we can workaround the platform bug for those VMs, thanks
for adding these checks.

Alok

> 
> The only way to cure this is to sanitize the physical package id after the
> CPUID enumeration and yell when the APIC ids are different. Fix up the
> initial APIC id, which is fine as it is only used printout purposes.
> 
> If the physical package IDs differ yell and use the package information
> from the ACPI/MP tables so the existing logical package map just works.
> 
> Chas provided the resulting dmesg output for his affected 4 virtual
> sockets, 1 core per socket VM:
> 
> [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
> [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
> ....
> 
> Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>,
> Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Alok Kataria <akataria@vmware.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: #4.6+ <stable@vger,kernel.org>
> Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_alpine.DEB.2.20.1611091613540.3501-40nanos&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=HNQMGUrw_s6Mc_oyREBnD4TrUjERbLcH1viAZr-aFPY&e= 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/cpu/common.c | 32 ++++++++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 9bd910a..cc9e980 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -979,6 +979,35 @@ static void x86_init_cache_qos(struct cpuinfo_x86 *c)
>  }
>  
>  /*
> + * The physical to logical package id mapping is initialized from the
> + * acpi/mptables information. Make sure that CPUID actually agrees with
> + * that.
> + */
> +static void sanitize_package_id(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> +	unsigned int pkg, apicid, cpu = smp_processor_id();
> +
> +	apicid = apic->cpu_present_to_apicid(cpu);
> +	pkg = apicid >> boot_cpu_data.x86_coreid_bits;
> +
> +	if (apicid != c->initial_apicid) {
> +		pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
> +		       cpu, apicid, c->initial_apicid);
> +		c->initial_apicid = apicid;
> +	}
> +	if (pkg != c->phys_proc_id) {
> +		pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
> +		       cpu, pkg, c->phys_proc_id);
> +		c->phys_proc_id = pkg;
> +	}
> +	c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
> +#else
> +	c->logical_proc_id = 0;
> +#endif
> +}
> +
> +/*
>   * This does the hard work of actually picking apart the CPU stuff...
>   */
>  static void identify_cpu(struct cpuinfo_x86 *c)
> @@ -1103,8 +1132,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>  #ifdef CONFIG_NUMA
>  	numa_add_cpu(smp_processor_id());
>  #endif
> -	/* The boot/hotplug time assigment got cleared, restore it */
> -	c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
> +	sanitize_package_id(c);
>  }
>  
>  /*

  reply	other threads:[~2016-11-11  5:50 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02 12:25 [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory Sebastian Andrzej Siewior
2016-11-02 22:47 ` Charles (Chas) Williams
2016-11-03 17:47   ` Sebastian Andrzej Siewior
2016-11-04 12:20     ` Charles (Chas) Williams
2016-11-04 18:03       ` Sebastian Andrzej Siewior
2016-11-04 20:42         ` Charles (Chas) Williams
2016-11-04 20:57           ` Sebastian Andrzej Siewior
2016-11-07 16:19   ` Thomas Gleixner
2016-11-07 16:59     ` Charles (Chas) Williams
2016-11-07 20:20       ` Thomas Gleixner
2016-11-08 14:20         ` Charles (Chas) Williams
2016-11-08 14:31           ` Thomas Gleixner
2016-11-08 14:57             ` Charles (Chas) Williams
2016-11-08 16:22               ` Thomas Gleixner
2016-11-09 15:35                 ` [PATCH] x86/cpuid: Deal with broken firmware once more Thomas Gleixner
2016-11-09 15:37                   ` Thomas Gleixner
2016-11-09 16:03                   ` Peter Zijlstra
2016-11-09 16:34                     ` Charles (Chas) Williams
2016-11-09 18:37                       ` Thomas Gleixner
2016-11-09 18:15                   ` Charles (Chas) Williams
2016-11-09 20:27                   ` [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN) tip-bot for Thomas Gleixner
2016-11-11  5:49                     ` Alok Kataria [this message]
2016-11-10  3:57                   ` [PATCH] x86/cpuid: Deal with broken firmware once more M. Vefa Bicakci
2016-11-10 10:50                     ` Charles (Chas) Williams
2016-11-10 11:14                       ` Thomas Gleixner
2016-11-12 22:05                       ` M. Vefa Bicakci
2016-11-10 11:13                     ` Thomas Gleixner
2016-11-10 11:39                       ` Peter Zijlstra
2016-11-10 14:02                       ` Boris Ostrovsky
2016-11-10 15:05                         ` Charles (Chas) Williams
2016-11-10 15:31                           ` Boris Ostrovsky
2016-11-10 15:54                             ` Sebastian Andrzej Siewior
2016-11-10 17:15                             ` Thomas Gleixner
2016-11-12 22:05                             ` M. Vefa Bicakci
2016-11-13 18:04                               ` Boris Ostrovsky
2016-11-13 23:42                                 ` M. Vefa Bicakci
2016-11-15  1:21                                   ` Boris Ostrovsky
2016-11-18 11:16                                     ` Thomas Gleixner
2016-11-18 14:22                                       ` Boris Ostrovsky
2016-11-10 15:12                         ` Thomas Gleixner
2016-11-10 15:38                           ` Boris Ostrovsky
2016-11-10 17:13                             ` Thomas Gleixner
2016-11-10 18:01                               ` Boris Ostrovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1478843692.2694.235.camel@vmware.com \
    --to=akataria@vmware.com \
    --cc=bigeasy@linutronix.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=m.v.b@runbox.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).