[PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure
@ 2025-03-31 14:37 Sasha Levin
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Sasha Levin
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Sasha Levin @ 2025-03-31 14:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zhongqiu Han, Shuah Khan, Sasha Levin, trenn, shuah, jwyatt,
	jkacur, peng.fan, linux-pm

From: Zhongqiu Han <quic_zhonhan@quicinc.com>

[ Upstream commit 208baa3ec9043a664d9acfb8174b332e6b17fb69 ]

If malloc returns NULL due to low memory, 'config' pointer can be NULL.
Add a check to prevent NULL dereference.

Link: https://lore.kernel.org/r/20250219122715.3892223-1-quic_zhonhan@quicinc.com
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 tools/power/cpupower/bench/parse.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/power/cpupower/bench/parse.c b/tools/power/cpupower/bench/parse.c
index e63dc11fa3a53..48e25be6e1635 100644
--- a/tools/power/cpupower/bench/parse.c
+++ b/tools/power/cpupower/bench/parse.c
@@ -120,6 +120,10 @@ FILE *prepare_output(const char *dirname)
 struct config *prepare_default_config()
 {
 	struct config *config = malloc(sizeof(struct config));
+	if (!config) {
+		perror("malloc");
+		return NULL;
+	}
 
 	dprintf("loading defaults\n");
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
@ 2025-03-31 14:37 ` Sasha Levin
  2025-04-18 16:54   ` Pavel Machek
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 3/6] perf: arm_pmu: Don't disable counter in armpmu_add() Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Sasha Levin @ 2025-03-31 14:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Max Grobecker, Ingo Molnar, Borislav Petkov, Sasha Levin, tglx,
	mingo, dave.hansen, x86, thomas.lendacky, perry.yuan,
	mario.limonciello, riel, mjguzik, seanjc, darwi

From: Max Grobecker <max@grobecker.info>

[ Upstream commit a4248ee16f411ac1ea7dfab228a6659b111e3d65 ]

When running in a virtual machine, we might see the original hardware CPU
vendor string (i.e. "AuthenticAMD"), but a model and family ID set by the
hypervisor. In case we run on AMD hardware and the hypervisor sets a model
ID < 0x14, the LAHF cpu feature is eliminated from the the list of CPU
capabilities present to circumvent a bug with some BIOSes in conjunction with
AMD K8 processors.

Parsing the flags list from /proc/cpuinfo seems to be happening mostly in
bash scripts and prebuilt Docker containers, as it does not need to have
additionals tools present – even though more reliable ways like using "kcpuid",
which calls the CPUID instruction instead of parsing a list, should be preferred.
Scripts, that use /proc/cpuinfo to determine if the current CPU is
"compliant" with defined microarchitecture levels like x86-64-v2 will falsely
claim the CPU is incapable of modern CPU instructions when "lahf_lm" is missing
in that flags list.

This can prevent some docker containers from starting or build scripts to create
unoptimized binaries.

Admittably, this is more a small inconvenience than a severe bug in the kernel
and the shoddy scripts that rely on parsing /proc/cpuinfo
should be fixed instead.

This patch adds an additional check to see if we're running inside a
virtual machine (X86_FEATURE_HYPERVISOR is present), which, to my
understanding, can't be present on a real K8 processor as it was introduced
only with the later/other Athlon64 models.

Example output with the "lahf_lm" flag missing in the flags list
(should be shown between "hypervisor" and "abm"):

    $ cat /proc/cpuinfo
    processor       : 0
    vendor_id       : AuthenticAMD
    cpu family      : 15
    model           : 6
    model name      : Common KVM processor
    stepping        : 1
    microcode       : 0x1000065
    cpu MHz         : 2599.998
    cache size      : 512 KB
    physical id     : 0
    siblings        : 1
    core id         : 0
    cpu cores       : 1
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
                      cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp
                      lm rep_good nopl cpuid extd_apicid tsc_known_freq pni
                      pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt
                      tsc_deadline_timer aes xsave avx f16c hypervisor abm
                      3dnowprefetch vmmcall bmi1 avx2 bmi2 xsaveopt

... while kcpuid shows the feature to be present in the CPU:

    # kcpuid -d | grep lahf
         lahf_lm             - LAHF/SAHF available in 64-bit mode

[ mingo: Updated the comment a bit, incorporated Boris's review feedback. ]

Signed-off-by: Max Grobecker <max@grobecker.info>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kernel/cpu/amd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c10f7dcaa7b7c..5f0bdb53b0067 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -839,7 +839,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
 	 * (model = 0x14) and later actually support it.
 	 * (AMD Erratum #110, docId: 25759).
 	 */
-	if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM)) {
+	if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) {
 		clear_cpu_cap(c, X86_FEATURE_LAHF_LM);
 		if (!rdmsrl_amd_safe(0xc001100d, &value)) {
 			value &= ~BIT_64(32);
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH AUTOSEL 5.10 3/6] perf: arm_pmu: Don't disable counter in armpmu_add()
  2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Sasha Levin
@ 2025-03-31 14:37 ` Sasha Levin
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-03-31 14:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Rutland, Rob Herring, Anshuman Khandual, James Clark,
	Will Deacon, Sasha Levin, linux-arm-kernel, linux-perf-users

From: Mark Rutland <mark.rutland@arm.com>

[ Upstream commit dcca27bc1eccb9abc2552aab950b18a9742fb8e7 ]

Currently armpmu_add() tries to handle a newly-allocated counter having
a stale associated event, but this should not be possible, and if this
were to happen the current mitigation is insufficient and potentially
expensive. It would be better to warn if we encounter the impossible
case.

Calls to pmu::add() and pmu::del() are serialized by the core perf code,
and armpmu_del() clears the relevant slot in pmu_hw_events::events[]
before clearing the bit in pmu_hw_events::used_mask such that the
counter can be reallocated. Thus when armpmu_add() allocates a counter
index from pmu_hw_events::used_mask, it should not be possible to observe
a stale even in pmu_hw_events::events[] unless either
pmu_hw_events::used_mask or pmu_hw_events::events[] have been corrupted.

If this were to happen, we'd end up with two events with the same
event->hw.idx, which would clash with each other during reprogramming,
deletion, etc, and produce bogus results. Add a WARN_ON_ONCE() for this
case so that we can detect if this ever occurs in practice.

That possiblity aside, there's no need to call arm_pmu::disable(event)
for the new event. The PMU reset code initialises the counter in a
disabled state, and armpmu_del() will disable the counter before it can
be reused. Remove the redundant disable.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250218-arm-brbe-v19-v20-2-4e9922fc2e8e@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/perf/arm_pmu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 7fd11ef5cb8a2..8568b5a78c45b 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -338,12 +338,10 @@ armpmu_add(struct perf_event *event, int flags)
 	if (idx < 0)
 		return idx;

-	/*
-	 * If there is an event in the counter we are going to use then make
-	 * sure it is disabled.
-	 */
+	/* The newly-allocated counter should be empty */
+	WARN_ON_ONCE(hw_events->events[idx]);
+
 	event->hw.idx = idx;
-	armpmu->disable(event);
 	hw_events->events[idx] = event;

 	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD
  2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Sasha Levin
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 3/6] perf: arm_pmu: Don't disable counter in armpmu_add() Sasha Levin
@ 2025-03-31 14:37 ` Sasha Levin
  2025-04-18 16:55   ` Pavel Machek
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 5/6] xen/mcelog: Add __nonstring annotations for unterminated strings Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Sasha Levin @ 2025-03-31 14:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Douglas Anderson, Dmitry Baryshkov, Trilok Soni, Catalin Marinas,
	Sasha Levin, will, mark.rutland, oliver.upton,
	shameerali.kolothum.thodi, maz, bwicaksono, linux-arm-kernel

From: Douglas Anderson <dianders@chromium.org>

[ Upstream commit 401c3333bb2396aa52e4121887a6f6a6e2f040bc ]

Add a definition for the Qualcomm Kryo 300-series Gold cores.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Trilok Soni <quic_tsoni@quicinc.com>
Link: https://lore.kernel.org/r/20241219131107.v3.1.I18e0288742871393228249a768e5d56ea65d93dc@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/arm64/include/asm/cputype.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index d8305b4657d2e..5e292e08393d5 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -110,6 +110,7 @@
 #define QCOM_CPU_PART_KRYO		0x200
 #define QCOM_CPU_PART_KRYO_2XX_GOLD	0x800
 #define QCOM_CPU_PART_KRYO_2XX_SILVER	0x801
+#define QCOM_CPU_PART_KRYO_3XX_GOLD	0x802
 #define QCOM_CPU_PART_KRYO_3XX_SILVER	0x803
 #define QCOM_CPU_PART_KRYO_4XX_GOLD	0x804
 #define QCOM_CPU_PART_KRYO_4XX_SILVER	0x805
@@ -167,6 +168,7 @@
 #define MIDR_QCOM_KRYO MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO)
 #define MIDR_QCOM_KRYO_2XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_GOLD)
 #define MIDR_QCOM_KRYO_2XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_SILVER)
+#define MIDR_QCOM_KRYO_3XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_GOLD)
 #define MIDR_QCOM_KRYO_3XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_SILVER)
 #define MIDR_QCOM_KRYO_4XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_GOLD)
 #define MIDR_QCOM_KRYO_4XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_SILVER)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH AUTOSEL 5.10 5/6] xen/mcelog: Add __nonstring annotations for unterminated strings
  2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
                   ` (2 preceding siblings ...)
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD Sasha Levin
@ 2025-03-31 14:37 ` Sasha Levin
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 6/6] x86/mm/ident_map: Fix theoretical virtual address overflow to zero Sasha Levin
  2025-04-18 16:52 ` [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Pavel Machek
  5 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-03-31 14:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, xen-devel, Sasha Levin

From: Kees Cook <kees@kernel.org>

[ Upstream commit 1c3dfc7c6b0f551fdca3f7c1f1e4c73be8adb17d ]

When a character array without a terminating NUL character has a static
initializer, GCC 15's -Wunterminated-string-initialization will only
warn if the array lacks the "nonstring" attribute[1]. Mark the arrays
with __nonstring to and correctly identify the char array as "not a C
string" and thereby eliminate the warning.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1]
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Kees Cook <kees@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250310222234.work.473-kees@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/xen/interface/xen-mca.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/xen/interface/xen-mca.h b/include/xen/interface/xen-mca.h
index 7483a78d24251..20a3b320d1a58 100644
--- a/include/xen/interface/xen-mca.h
+++ b/include/xen/interface/xen-mca.h
@@ -371,7 +371,7 @@ struct xen_mce {
 #define XEN_MCE_LOG_LEN 32
 
 struct xen_mce_log {
-	char signature[12]; /* "MACHINECHECK" */
+	char signature[12] __nonstring; /* "MACHINECHECK" */
 	unsigned len;	    /* = XEN_MCE_LOG_LEN */
 	unsigned next;
 	unsigned flags;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH AUTOSEL 5.10 6/6] x86/mm/ident_map: Fix theoretical virtual address overflow to zero
  2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
                   ` (3 preceding siblings ...)
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 5/6] xen/mcelog: Add __nonstring annotations for unterminated strings Sasha Levin
@ 2025-03-31 14:37 ` Sasha Levin
  2025-04-18 16:52 ` [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Pavel Machek
  5 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-03-31 14:37 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kirill A. Shutemov, Ingo Molnar, Kai Huang, Tom Lendacky,
	Andy Lutomirski, Linus Torvalds, Sasha Levin, dave.hansen, peterz,
	tglx, mingo, bp, x86

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

[ Upstream commit f666c92090a41ac5524dade63ff96b3adcf8c2ab ]

The current calculation of the 'next' virtual address in the
page table initialization functions in arch/x86/mm/ident_map.c
doesn't protect against wrapping to zero.

This is a theoretical issue that cannot happen currently,
the problematic case is possible only if the user sets a
high enough x86_mapping_info::offset value - which no
current code in the upstream kernel does.

( The wrapping to zero only occurs if the top PGD entry is accessed.
  There are no such users upstream. Only hibernate_64.c uses
  x86_mapping_info::offset, and it operates on the direct mapping
  range, which is not the top PGD entry. )

Should such an overflow happen, it can result in page table
corruption and a hang.

To future-proof this code, replace the manual 'next' calculation
with p?d_addr_end() which handles wrapping correctly.

[ Backporter's note: there's no need to backport this patch. ]

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20241016111458.846228-2-kirill.shutemov@linux.intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/mm/ident_map.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a72..2f383e288c430 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -27,9 +27,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		pud_t *pud = pud_page + pud_index(addr);
 		pmd_t *pmd;
 
-		next = (addr & PUD_MASK) + PUD_SIZE;
-		if (next > end)
-			next = end;
+		next = pud_addr_end(addr, end);
 
 		if (info->direct_gbpages) {
 			pud_t pudval;
@@ -68,10 +66,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		p4d_t *p4d = p4d_page + p4d_index(addr);
 		pud_t *pud;
 
-		next = (addr & P4D_MASK) + P4D_SIZE;
-		if (next > end)
-			next = end;
-
+		next = p4d_addr_end(addr, end);
 		if (p4d_present(*p4d)) {
 			pud = pud_offset(p4d, 0);
 			result = ident_pud_init(info, pud, addr, next);
@@ -113,10 +108,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		pgd_t *pgd = pgd_page + pgd_index(addr);
 		p4d_t *p4d;
 
-		next = (addr & PGDIR_MASK) + PGDIR_SIZE;
-		if (next > end)
-			next = end;
-
+		next = pgd_addr_end(addr, end);
 		if (pgd_present(*pgd)) {
 			p4d = p4d_offset(pgd, 0);
 			result = ident_p4d_init(info, p4d, addr, next);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure
  2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
                   ` (4 preceding siblings ...)
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 6/6] x86/mm/ident_map: Fix theoretical virtual address overflow to zero Sasha Levin
@ 2025-04-18 16:52 ` Pavel Machek
  5 siblings, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2025-04-18 16:52 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Zhongqiu Han, Shuah Khan, trenn, shuah,
	jwyatt, jkacur, peng.fan, linux-pm

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]

Hi!

> [ Upstream commit 208baa3ec9043a664d9acfb8174b332e6b17fb69 ]
> 
> If malloc returns NULL due to low memory, 'config' pointer can be NULL.
> Add a check to prevent NULL dereference.

This fixes nothing. We have oom killer, so we don't have malloc
returning NULL.

Best regards,
								Pavel
								
> +++ b/tools/power/cpupower/bench/parse.c
> @@ -120,6 +120,10 @@ FILE *prepare_output(const char *dirname)
>  struct config *prepare_default_config()
>  {
>  	struct config *config = malloc(sizeof(struct config));
> +	if (!config) {
> +		perror("malloc");
> +		return NULL;
> +	}
>  
>  	dprintf("loading defaults\n");
>  

-- 
DENX Software Engineering GmbH,        Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Sasha Levin
@ 2025-04-18 16:54   ` Pavel Machek
  2025-04-18 17:19     ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2025-04-18 16:54 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Max Grobecker, Ingo Molnar, Borislav Petkov,
	tglx, mingo, dave.hansen, x86, thomas.lendacky, perry.yuan,
	mario.limonciello, riel, mjguzik, seanjc, darwi

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Hi!

> From: Max Grobecker <max@grobecker.info>
> 
> [ Upstream commit a4248ee16f411ac1ea7dfab228a6659b111e3d65 ]

> This can prevent some docker containers from starting or build scripts to create
> unoptimized binaries.
> 
> Admittably, this is more a small inconvenience than a severe bug in the kernel
> and the shoddy scripts that rely on parsing /proc/cpuinfo
> should be fixed instead.

I'd say this is not good stable candidate.

Best regards,
								Pavel

> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -839,7 +839,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
>  	 * (model = 0x14) and later actually support it.
>  	 * (AMD Erratum #110, docId: 25759).
>  	 */
> -	if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM)) {
> +	if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) {
>  		clear_cpu_cap(c, X86_FEATURE_LAHF_LM);
>  		if (!rdmsrl_amd_safe(0xc001100d, &value)) {
>  			value &= ~BIT_64(32);

-- 
DENX Software Engineering GmbH,        Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD
  2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD Sasha Levin
@ 2025-04-18 16:55   ` Pavel Machek
  2025-04-18 19:27     ` Doug Anderson
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2025-04-18 16:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Douglas Anderson, Dmitry Baryshkov,
	Trilok Soni, Catalin Marinas, will, mark.rutland, oliver.upton,
	shameerali.kolothum.thodi, maz, bwicaksono, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

Hi!

> From: Douglas Anderson <dianders@chromium.org>
> 
> [ Upstream commit 401c3333bb2396aa52e4121887a6f6a6e2f040bc ]
> 
> Add a definition for the Qualcomm Kryo 300-series Gold cores.

Why are we adding unused defines to stable?

Best regards,
							Pavel
							
> +++ b/arch/arm64/include/asm/cputype.h
> @@ -110,6 +110,7 @@
>  #define QCOM_CPU_PART_KRYO		0x200
>  #define QCOM_CPU_PART_KRYO_2XX_GOLD	0x800
>  #define QCOM_CPU_PART_KRYO_2XX_SILVER	0x801
> +#define QCOM_CPU_PART_KRYO_3XX_GOLD	0x802
>  #define QCOM_CPU_PART_KRYO_3XX_SILVER	0x803
>  #define QCOM_CPU_PART_KRYO_4XX_GOLD	0x804
>  #define QCOM_CPU_PART_KRYO_4XX_SILVER	0x805
> @@ -167,6 +168,7 @@
>  #define MIDR_QCOM_KRYO MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO)
>  #define MIDR_QCOM_KRYO_2XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_GOLD)
>  #define MIDR_QCOM_KRYO_2XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_SILVER)
> +#define MIDR_QCOM_KRYO_3XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_GOLD)
>  #define MIDR_QCOM_KRYO_3XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_SILVER)
>  #define MIDR_QCOM_KRYO_4XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_GOLD)
>  #define MIDR_QCOM_KRYO_4XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_SILVER)

-- 
DENX Software Engineering GmbH,        Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-04-18 16:54   ` Pavel Machek
@ 2025-04-18 17:19     ` Sean Christopherson
  2025-04-18 17:36       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-18 17:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sasha Levin, linux-kernel, stable, Max Grobecker, Ingo Molnar,
	Borislav Petkov, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi

On Fri, Apr 18, 2025, Pavel Machek wrote:
> Hi!
> 
> > From: Max Grobecker <max@grobecker.info>
> > 
> > [ Upstream commit a4248ee16f411ac1ea7dfab228a6659b111e3d65 ]
> 
> > This can prevent some docker containers from starting or build scripts to create
> > unoptimized binaries.
> > 
> > Admittably, this is more a small inconvenience than a severe bug in the kernel
> > and the shoddy scripts that rely on parsing /proc/cpuinfo
> > should be fixed instead.

Uh, and the hypervisor too?  Why is the hypervisor enumerating an old K8 CPU for
what appears to be a modern workload?

> I'd say this is not good stable candidate.

Eh, practically speaking, there's no chance of this causing problems.  The setup
is all kinds of weird, but AIUI, K8 CPUs don't support virtualization so there's
no chance that the underlying CPU is actually affected by the K8 bug, because the
underlying CPU can't be K8.  And no bare metal CPU will ever set the HYPERVISOR
bit, so there won't be false positives on that front.

I personally object to the patch itself; it's not the kernel's responsibility to
deal with a misconfigured VM.  But unless we revert the commit, I don't see any
reason to withhold this from stable@.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-04-18 17:19     ` Sean Christopherson
@ 2025-04-18 17:36       ` Borislav Petkov
  2025-04-18 18:31         ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-18 17:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi

On Fri, Apr 18, 2025 at 10:19:14AM -0700, Sean Christopherson wrote:
> Uh, and the hypervisor too?  Why is the hypervisor enumerating an old K8 CPU for
> what appears to be a modern workload?
> 
> > I'd say this is not good stable candidate.
> 
> Eh, practically speaking, there's no chance of this causing problems.  The setup
> is all kinds of weird, but AIUI, K8 CPUs don't support virtualization so there's
> no chance that the underlying CPU is actually affected by the K8 bug, because the
> underlying CPU can't be K8.  And no bare metal CPU will ever set the HYPERVISOR
> bit, so there won't be false positives on that front.
> 
> I personally object to the patch itself; it's not the kernel's responsibility to
> deal with a misconfigured VM.  But unless we revert the commit, I don't see any
> reason to withhold this from stable@.

I objected back then but it is some obscure VM migration madness (pasting the
whole reply here because it didn't land on any ML):

Date: Tue,  17 Dec 2024 21:32:21 +0100
From: Max Grobecker <max@grobecker.info>
To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
 Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>
Cc: Max Grobecker <max@grobecker.info>, x86@kernel.org
Subject: Re: [PATCH v2] Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8()
 on AMD when running in a virtual machine
Message-ID: <d77caeea-b922-4bf5-8349-4b5acab4d2eb>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf-8

Hi,

sorry for my late response, was hit by a flu last days.

On Tue, 10 Dec 2024 13:51:50 +0100, Borislav Petkov wrote:
> Lemme get this straight: you - I don't know who "we" is - are running K8
> models in guests? Why?

Oh, I see, I missed to explain that, indeed.

This error happens, when I start a virtual machine using libvirt/QEMU while 
not passing through the host CPU. I do this, because I want to be 
able to live-migrate the VM between hosts, that have slightly different CPUs.
Migration works, but only if I choose the generic "kvm64" CPU preset to be
used with QEMU using the "-cpu kvm64" parameter:

  qemu-system-x86_64 -cpu kvm64

I also explicitly enabled additional features like SSE4.1 or AXV2 to have as
most features as I can but still being able to do live-migration between hosts.

By using this config, the CPU is identified as "Common KVM processor"
inside the VM:

  processor       : 0
  vendor_id       : AuthenticAMD
  cpu family      : 15
  model           : 6
  model name      : Common KVM processor

Also, the model reads as 0x06, which is set by that kvm64 CPU preset,
but usually does not pose a problem.

The original vendor id of the host CPU is still visible to the guest, and in
case the host uses an AMD CPU the combination of "AuthenticAMD" and model 0x06
triggers the bug and the lahf_lm flag vanishes.
If the guest is running with the same settings on an Intel CPU and therefore 
reads "GenuineIntel" as the vendor string, the model is still 0x06, but also 
the lahf_lm flag is still listed in /proc/cpuinfo.

The CPU is mistakenly identified to be an AMD K8 model, while, in fact, nearly
all features a modern Epyc or Xeon CPU is offering, are available.

Greetings,
 Max

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-04-18 17:36       ` Borislav Petkov
@ 2025-04-18 18:31         ` Sean Christopherson
  2025-04-18 19:12           ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-18 18:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi

On Fri, Apr 18, 2025, Borislav Petkov wrote:
> On Fri, Apr 18, 2025 at 10:19:14AM -0700, Sean Christopherson wrote:
> > Uh, and the hypervisor too?  Why is the hypervisor enumerating an old K8 CPU for
> > what appears to be a modern workload?
> > 
> > > I'd say this is not good stable candidate.
> > 
> > Eh, practically speaking, there's no chance of this causing problems.  The setup
> > is all kinds of weird, but AIUI, K8 CPUs don't support virtualization so there's
> > no chance that the underlying CPU is actually affected by the K8 bug, because the
> > underlying CPU can't be K8.  And no bare metal CPU will ever set the HYPERVISOR
> > bit, so there won't be false positives on that front.
> > 
> > I personally object to the patch itself; it's not the kernel's responsibility to
> > deal with a misconfigured VM.  But unless we revert the commit, I don't see any
> > reason to withhold this from stable@.
> 
> I objected back then but it is some obscure VM migration madness (pasting the
> whole reply here because it didn't land on any ML):
> 
> Date: Tue,  17 Dec 2024 21:32:21 +0100
> From: Max Grobecker <max@grobecker.info>
> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
>  Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Max Grobecker <max@grobecker.info>, x86@kernel.org
> Subject: Re: [PATCH v2] Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8()
>  on AMD when running in a virtual machine
> Message-ID: <d77caeea-b922-4bf5-8349-4b5acab4d2eb>
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> Content-Type: text/plain; charset=utf-8
> 
> Hi,
> 
> sorry for my late response, was hit by a flu last days.
> 
> On Tue, 10 Dec 2024 13:51:50 +0100, Borislav Petkov wrote:
> > Lemme get this straight: you - I don't know who "we" is - are running K8
> > models in guests? Why?
> 
> Oh, I see, I missed to explain that, indeed.
> 
> This error happens, when I start a virtual machine using libvirt/QEMU while 
> not passing through the host CPU. I do this, because I want to be 
> able to live-migrate the VM between hosts, that have slightly different CPUs.
> Migration works, but only if I choose the generic "kvm64" CPU preset to be
> used with QEMU using the "-cpu kvm64" parameter:
>  
>   qemu-system-x86_64 -cpu kvm64
>  
> I also explicitly enabled additional features like SSE4.1 or AXV2 to have as
> most features as I can but still being able to do live-migration between hosts.
>   
> By using this config, the CPU is identified as "Common KVM processor"
> inside the VM:
> 
>   processor       : 0
>   vendor_id       : AuthenticAMD
>   cpu family      : 15
>   model           : 6
>   model name      : Common KVM processor
> 
> Also, the model reads as 0x06, which is set by that kvm64 CPU preset,
> but usually does not pose a problem.

IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD).
As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates
an Intel P4:

        .name = "kvm64",
        .level = 0xd,
        .vendor = CPUID_VENDOR_INTEL,
        .family = 15,
        .model = 6,

Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance,
I can't find the code that actually overrides the vendor, gotta love QEMU's object
model):

    /*
     * vendor property is set here but then overloaded with the
     * host cpu vendor for KVM and HVF.
     */
    object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);

Overriding the vendor but using Intel's P4 FMS is flat out wrong.  IMO, QEMU
should use the same FMS as qemu64 for kvm64 when running on AMD.

        .name = "qemu64",
        .level = 0xd,
        .vendor = CPUID_VENDOR_AMD,
        .family = 15,
        .model = 107,
        .stepping = 1,

Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably
far worse.

> The original vendor id of the host CPU is still visible to the guest, and in
> case the host uses an AMD CPU the combination of "AuthenticAMD" and model 0x06
> triggers the bug and the lahf_lm flag vanishes.
> If the guest is running with the same settings on an Intel CPU and therefore 
> reads "GenuineIntel" as the vendor string, the model is still 0x06, but also 
> the lahf_lm flag is still listed in /proc/cpuinfo.
> 
> The CPU is mistakenly identified to be an AMD K8 model, while, in fact, nearly
> all features a modern Epyc or Xeon CPU is offering, are available.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-04-18 18:31         ` Sean Christopherson
@ 2025-04-18 19:12           ` Borislav Petkov
  2025-04-22 17:22             ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-18 19:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi

On Fri, Apr 18, 2025 at 11:31:27AM -0700, Sean Christopherson wrote:
> IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD).
> As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates
> an Intel P4:
> 
>         .name = "kvm64",
>         .level = 0xd,
>         .vendor = CPUID_VENDOR_INTEL,
>         .family = 15,
>         .model = 6,
> 
> Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance,
> I can't find the code that actually overrides the vendor, gotta love QEMU's object
> model):

LOL, I thought I was the only one who thought this is madness. :-P

> 
>     /*
>      * vendor property is set here but then overloaded with the
>      * host cpu vendor for KVM and HVF.
>      */
>     object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);
> 
> Overriding the vendor but using Intel's P4 FMS is flat out wrong.  IMO, QEMU
> should use the same FMS as qemu64 for kvm64 when running on AMD.
> 
>         .name = "qemu64",
>         .level = 0xd,
>         .vendor = CPUID_VENDOR_AMD,
>         .family = 15,
>         .model = 107,
>         .stepping = 1,
> 
> Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably
> far worse.

Ok, let's fix qemu. I don't have a clue, though, how to go about that so I'd
rely on your guidance here.

Because I really hate wagging the dog and "fixing" the kernel because something
else can't be bothered. I didn't object stronger to that fix because it is
meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's
apparently not that awful-ish...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD
  2025-04-18 16:55   ` Pavel Machek
@ 2025-04-18 19:27     ` Doug Anderson
  0 siblings, 0 replies; 25+ messages in thread
From: Doug Anderson @ 2025-04-18 19:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Sasha Levin, linux-kernel, stable, Dmitry Baryshkov, Trilok Soni,
	Catalin Marinas, will, mark.rutland, oliver.upton,
	shameerali.kolothum.thodi, maz, bwicaksono, linux-arm-kernel

Hi,

On Fri, Apr 18, 2025 at 9:55 AM Pavel Machek <pavel@denx.de> wrote:
>
> Hi!
>
> > From: Douglas Anderson <dianders@chromium.org>
> >
> > [ Upstream commit 401c3333bb2396aa52e4121887a6f6a6e2f040bc ]
> >
> > Add a definition for the Qualcomm Kryo 300-series Gold cores.
>
> Why are we adding unused defines to stable?

I don't really have a strong opinion, but I can see the logic at some
level. This patch definitely doesn't _hurt_ and it seems plausible
that a define like this could be used in a future errata. Having this
already in stable would mean that the future errata would just pick
cleanly without anyone having to track down the original patch.

-Doug

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine
  2025-04-18 19:12           ` Borislav Petkov
@ 2025-04-22 17:22             ` Sean Christopherson
  2025-04-22 17:33               ` CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine) Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-22 17:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

+Paolo

On Fri, Apr 18, 2025, Borislav Petkov wrote:
> On Fri, Apr 18, 2025 at 11:31:27AM -0700, Sean Christopherson wrote:
> > IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD).
> > As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates
> > an Intel P4:
> > 
> >         .name = "kvm64",
> >         .level = 0xd,
> >         .vendor = CPUID_VENDOR_INTEL,
> >         .family = 15,
> >         .model = 6,
> > 
> > Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance,
> > I can't find the code that actually overrides the vendor, gotta love QEMU's object
> > model):
> 
> LOL, I thought I was the only one who thought this is madness. :-P

Yeah, I've got backtraces and I still don't entirely understand who's doing what.

> >     /*
> >      * vendor property is set here but then overloaded with the
> >      * host cpu vendor for KVM and HVF.
> >      */
> >     object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);
> > 
> > Overriding the vendor but using Intel's P4 FMS is flat out wrong.  IMO, QEMU
> > should use the same FMS as qemu64 for kvm64 when running on AMD.
> > 
> >         .name = "qemu64",
> >         .level = 0xd,
> >         .vendor = CPUID_VENDOR_AMD,
> >         .family = 15,
> >         .model = 107,
> >         .stepping = 1,
> > 
> > Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably
> > far worse.
> 
> Ok, let's fix qemu. I don't have a clue, though, how to go about that so I'd
> rely on your guidance here.

I have no idea how to fix the QEMU code.

Paolo,

The TL;DR of the problem is that QEMU's "kvm64" CPU type sets FMS to Intel P4,
and doesn't swizzle the FMS to something sane when running on AMD.  This results
in QEMU advertising the CPU as an ancient K8, which causes at least one *known*
problem due software making decisions on the funky FMS.

My stance is that QEMU is buggy/flawed and should stuff a FMS that is sane for
the underlying vendor for kvm64.  I'd send an RFC patch, but for the life of me
I can't figure what that would even look like.

> Because I really hate wagging the dog and "fixing" the kernel because something
> else can't be bothered. I didn't object stronger to that fix because it is
> meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's
> apparently not that awful-ish...

FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful.  There
are definitely cases where the kernel needs to know if it's running as a guest,
because the behavior of "hardware" fundamentally changes in ways that can't be
enumerated otherwise.  E.g. that things like the HPET are fully emulated and thus
will be prone to significant jitter.

But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is
unnecessary because it's the hypervisor/VMM's responsibility to present a sane
model.  And I also think it's outright dangerous, because everywhere the kernel
does X for bare metal and Y for guest results in reduced test coverage.

E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR
code, while humans will largely be testing and using the bare metal code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-22 17:22             ` Sean Christopherson
@ 2025-04-22 17:33               ` Borislav Petkov
  2025-04-22 19:48                 ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-22 17:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Tue, Apr 22, 2025 at 10:22:54AM -0700, Sean Christopherson wrote:
> > Because I really hate wagging the dog and "fixing" the kernel because something
> > else can't be bothered. I didn't object stronger to that fix because it is
> > meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's
> > apparently not that awful-ish...
> 
> FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful.  There
> are definitely cases where the kernel needs to know if it's running as a guest,
> because the behavior of "hardware" fundamentally changes in ways that can't be
> enumerated otherwise.  E.g. that things like the HPET are fully emulated and thus
> will be prone to significant jitter.
> 
> But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is
> unnecessary because it's the hypervisor/VMM's responsibility to present a sane
> model.  And I also think it's outright dangerous, because everywhere the kernel
> does X for bare metal and Y for guest results in reduced test coverage.
> 
> E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR
> code, while humans will largely be testing and using the bare metal code.

All valid points...

At least one case justifies the X86_FEATURE_HYPERVISOR check: microcode loading
and we've chewed that topic back then with Xen ad nauseam.

But I'd love to whack as many of such checks as possible.

$ git grep X86_FEATURE_HYPERVISOR | wc -l
60

I think I should start whacking at those and CC you if I'm not sure.
It'll be a long-term, low prio thing but it'll be a good cleanup.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-22 17:33               ` CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine) Borislav Petkov
@ 2025-04-22 19:48                 ` Sean Christopherson
  2025-04-23  7:20                   ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-22 19:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Tue, Apr 22, 2025, Borislav Petkov wrote:
> On Tue, Apr 22, 2025 at 10:22:54AM -0700, Sean Christopherson wrote:
> > > Because I really hate wagging the dog and "fixing" the kernel because something
> > > else can't be bothered. I didn't object stronger to that fix because it is
> > > meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's
> > > apparently not that awful-ish...
> > 
> > FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful.  There
> > are definitely cases where the kernel needs to know if it's running as a guest,
> > because the behavior of "hardware" fundamentally changes in ways that can't be
> > enumerated otherwise.  E.g. that things like the HPET are fully emulated and thus
> > will be prone to significant jitter.
> > 
> > But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is
> > unnecessary because it's the hypervisor/VMM's responsibility to present a sane
> > model.  And I also think it's outright dangerous, because everywhere the kernel
> > does X for bare metal and Y for guest results in reduced test coverage.
> > 
> > E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR
> > code, while humans will largely be testing and using the bare metal code.
> 
> All valid points...
> 
> At least one case justifies the X86_FEATURE_HYPERVISOR check: microcode loading
> and we've chewed that topic back then with Xen ad nauseam.

Yeah, from my perspective, ucode loading falls into the "fundamentally different"
bucket.

> 
> But I'd love to whack as many of such checks as possible.
> 
> $ git grep X86_FEATURE_HYPERVISOR | wc -l
> 60
> 
> I think I should start whacking at those and CC you if I'm not sure.
> It'll be a long-term, low prio thing but it'll be a good cleanup.

I did a quick pass.  Most of the usage is "fine".  E.g. explicit PV code, cases
where checking for HYPERVISOR is the least awful option, etc.

Looks sketchy, might be worth investigating?
--------------------------------------------
  arch/x86/kernel/cpu/amd.c:              if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
  arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_IBPB_BRTYPE)) {
  arch/x86/kernel/cpu/amd.c:      if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/amd.c:      if (cpu_has(c, X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/topology_amd.c:             if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <= 0x3) {
  arch/x86/mm/init_64.c:  if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/mm/pat/set_memory.c:   return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR);
  drivers/platform/x86/intel/pmc/pltdrv.c:        if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && !xen_initial_domain())
  drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c: if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
--------------------------------------

Could do with some love, but not horrible.
------------------------------------------
Eww.  Optimization to lessen the pain of DR7 interception.  It'd be nice to clean
this up at some point, especially with things like SEV-ES with DebugSwap, where
DR7 is never intercepted.
  arch/x86/include/asm/debugreg.h:        if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
  arch/x86/kernel/hw_breakpoint.c:                 * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

This usage should be restricted to just the FMS matching, but unfortunately
needs to be kept for that check.
  arch/x86/kernel/cpu/bus_lock.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

Most of these look sane, e.g. are just being transparent about the state of
mitigations when running in a VM.  The use in update_srbds_msr() is the only
one that stands out as somewhat sketchy.
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/bugs.c:     else if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
------------------------------------------

Don't bother
------------------------------------------
Most of these look sane, e.g. are just being transparent about the state of
mitigations when running in a VM.  The use in update_srbds_msr() is the only
one that stands out as somewhat sketchy.
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/bugs.c:     else if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/cpu/bugs.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {

Perf, don't bother.  PMUs are notoriously virtualization-unfriendly, and perf
has had to resort to detecting its running in a VM to avoid crashing the kernel,
and I don't see this being fully solved any time soon.
  arch/x86/events/core.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/events/intel/core.c:   if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/events/intel/core.c:           int assume = 3 * !boot_cpu_has(X86_FEATURE_HYPERVISOR);
  arch/x86/events/intel/cstate.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/events/intel/uncore.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

PV code of one form or another.
  arch/x86/include/asm/acrn.h:    if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/hyperv/ivm.c:  if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/mshyperv.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/cpu/vmware.c: * If !boot_cpu_has(X86_FEATURE_HYPERVISOR), vmware_hypercall_mode
  arch/x86/kernel/cpu/vmware.c:   if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
  arch/x86/kernel/jailhouse.c:        !boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/kvm.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/paravirt.c:     if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
  arch/x86/kernel/tsc.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
  arch/x86/kvm/vmx/vmx.c: if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
  arch/x86/virt/svm/cmdline.c:                    if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {

Ugh.  Eliding WBINVD when running as a VM.  Probably the least awful option as
there's no sane way to enumerate that WBINVD is a nop, and a "passthrough" setup
can (and should) simply omit HYPERVISOR.
  arch/x86/include/asm/acenv.h:   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))       \

Skip sanity check on TSC deadline timer.  Makes sense to keep; either the timer
is emulated and thus not subject to hardware errata, or its passed through, in
which case HYPERVSIOR arguably shouldn't be set.
  arch/x86/kernel/apic/apic.c:    if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

This "feature" is awful, but getting rid of it may not be feasible.
https://lore.kernel.org/all/20250201005048.657470-1-seanjc@google.com
  arch/x86/kernel/cpu/mtrr/generic.c:     if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))

Exempting VMs from a gross workaround for old, buggy Intel chipsets.  Fine to keep.
  drivers/acpi/processor_idle.c:  if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

More mitigation crud, probably not worth pursuing.
  arch/x86/kernel/cpu/common.c:        boot_cpu_has(X86_FEATURE_HYPERVISOR)))
  arch/x86/kernel/cpu/common.c:   if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {

LOL.  Skip ucode revision check when detecting bad Spectre mitigation.
  arch/x86/kernel/cpu/intel.c:    if (cpu_has(c, X86_FEATURE_HYPERVISOR))
------------------------------------------

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-22 19:48                 ` Sean Christopherson
@ 2025-04-23  7:20                   ` Borislav Petkov
  2025-04-23 14:10                     ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-23  7:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Tue, Apr 22, 2025 at 12:48:44PM -0700, Sean Christopherson wrote:
> I did a quick pass.

You couldn't resist, I know. Doing something else for a change is
always cool.

:-P

> Most of the usage is "fine".  E.g. explicit PV code, cases
> where checking for HYPERVISOR is the least awful option, etc.
> 
> Looks sketchy, might be worth investigating?

Oh, I will, it is on my
do-this-while-waiting-for-compile/test-to-finish. ;-P

> --------------------------------------------
>   arch/x86/kernel/cpu/amd.c:              if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&

So that first one is to set CC_ATTR_HOST_SEV_SNP when we really are
a SNP host. I'll go through the rest slowly.

>   arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_IBPB_BRTYPE)) {
>   arch/x86/kernel/cpu/amd.c:      if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) {
>   arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
>   arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
>   arch/x86/kernel/cpu/amd.c:      if (cpu_has(c, X86_FEATURE_HYPERVISOR))
>   arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
>   arch/x86/kernel/cpu/amd.c:      if (!cpu_has(c, X86_FEATURE_HYPERVISOR))
>   arch/x86/kernel/cpu/topology_amd.c:             if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <= 0x3) {
>   arch/x86/mm/init_64.c:  if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
>   arch/x86/mm/pat/set_memory.c:   return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR);
>   drivers/platform/x86/intel/pmc/pltdrv.c:        if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && !xen_initial_domain())
>   drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c: if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> --------------------------------------
> 
> 
> Could do with some love, but not horrible.
> ------------------------------------------
> Eww.  Optimization to lessen the pain of DR7 interception.  It'd be nice to clean
> this up at some point, especially with things like SEV-ES with DebugSwap, where
> DR7 is never intercepted.
>   arch/x86/include/asm/debugreg.h:        if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
>   arch/x86/kernel/hw_breakpoint.c:                 * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

Patch adding it says "Because DRn access is 'difficult' with virt;..."
so yeah. I guess we need to agree how to do debug exceptions in guests.
Probably start documenting it and then have guest and host adhere to
that. I'm talking completely without having looked at what the code does
but the "handshake" agreement should be something like this and then we
can start simplifying code...

> This usage should be restricted to just the FMS matching, but unfortunately
> needs to be kept for that check.
>   arch/x86/kernel/cpu/bus_lock.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

I have no idea why that was added - perhaps to avoid split-lock related
#ACs on guests...

/does more git archeology...

Aha, I see it: 6650cdd9a8ccf

Although this doesn't explicitly comment on the guest aspect...

Anyway, thanks for the initial run-thru - I'll keep coming back to this
as time provides and we can talk.

Others reading are ofc more than welcome to do patches...

;-)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-23  7:20                   ` Borislav Petkov
@ 2025-04-23 14:10                     ` Sean Christopherson
  2025-04-23 18:43                       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-23 14:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Wed, Apr 23, 2025, Borislav Petkov wrote:
> > Eww.  Optimization to lessen the pain of DR7 interception.  It'd be nice to clean
> > this up at some point, especially with things like SEV-ES with DebugSwap, where
> > DR7 is never intercepted.
> >   arch/x86/include/asm/debugreg.h:        if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
> >   arch/x86/kernel/hw_breakpoint.c:                 * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()
> 
> Patch adding it says "Because DRn access is 'difficult' with virt;..."
> so yeah. I guess we need to agree how to do debug exceptions in guests.
> Probably start documenting it and then have guest and host adhere to
> that. I'm talking completely without having looked at what the code does
> but the "handshake" agreement should be something like this and then we
> can start simplifying code...

I don't know that we'll be able to simplify the code.

#DBs in the guest are complex because DR[0-3] aren't context switched by hardware,
and running with active breakpoints is uncommon.  As a result, loading the guest's
DRs into hardware on every VM-Enter is undesirable, because it would add significant
latency (load DRs on entry, save DRs on exit) for a relatively rare situation
(guest has active breakpoints).

KVM (and presumably other hypervisors) intercepts DR accesses so that it can
detect when the guest has active breakpoints (DR7 bits enabled), at which point
KVM does load the guest's DRs into hardware and disables DR interception until
the next VM-Exit.

KVM also allows the host user to utilize hardware breakpoints to debug the guest,
which further adds to the madness, and that's not something the guest can change
or even influence.

So removing the "am I guest logic" entirely probably isn't feasible, because in
the common case where there are no active breakpoints, reading cpu_dr7 instead
of DR7 is a significant performance boost for "normal" VMs.

I mentioned SEV-ES+ DebugSwap because in that case DR7 is effectively guaranteed
to not be intercepted, and so the native behavior of reading DR7 instead of the
per-CPU variable is likely desirable.  I believe TDX has similar functionality
(I forget if it's always on, or opt-in).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-23 14:10                     ` Sean Christopherson
@ 2025-04-23 18:43                       ` Borislav Petkov
  2025-04-24 19:18                         ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-23 18:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Wed, Apr 23, 2025 at 07:10:17AM -0700, Sean Christopherson wrote:
> On Wed, Apr 23, 2025, Borislav Petkov wrote:
> > > Eww.  Optimization to lessen the pain of DR7 interception.  It'd be nice to clean
> > > this up at some point, especially with things like SEV-ES with DebugSwap, where
> > > DR7 is never intercepted.
> > >   arch/x86/include/asm/debugreg.h:        if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
> > >   arch/x86/kernel/hw_breakpoint.c:                 * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()
> > 
> > Patch adding it says "Because DRn access is 'difficult' with virt;..."
> > so yeah. I guess we need to agree how to do debug exceptions in guests.
> > Probably start documenting it and then have guest and host adhere to
> > that. I'm talking completely without having looked at what the code does
> > but the "handshake" agreement should be something like this and then we
> > can start simplifying code...
> 
> I don't know that we'll be able to simplify the code.
> 
> #DBs in the guest are complex because DR[0-3] aren't context switched by hardware,
> and running with active breakpoints is uncommon.  As a result, loading the guest's
> DRs into hardware on every VM-Enter is undesirable, because it would add significant
> latency (load DRs on entry, save DRs on exit) for a relatively rare situation
> (guest has active breakpoints).
> 
> KVM (and presumably other hypervisors) intercepts DR accesses so that it can
> detect when the guest has active breakpoints (DR7 bits enabled), at which point
> KVM does load the guest's DRs into hardware and disables DR interception until
> the next VM-Exit.
> 
> KVM also allows the host user to utilize hardware breakpoints to debug the guest,
> which further adds to the madness, and that's not something the guest can change
> or even influence.
> 
> So removing the "am I guest logic" entirely probably isn't feasible, because in
> the common case where there are no active breakpoints, reading cpu_dr7 instead
> of DR7 is a significant performance boost for "normal" VMs.

So I see three modes:

- default off - the usual case

- host debugs the guest

- guests are allowed to do breakpoints

So depending on what is enabled, the code can behave properly - it just
needs logic which tells the relevant code - guest or host - which of the
debugging mode is enabled. And then everything adheres to that and DTRT.

But before any of that, the even more important question is: do we even
care to beef it up that much?

I get the feeling that we don't so it likely is a "whatever's the
easiest" game.

> I mentioned SEV-ES+ DebugSwap because in that case DR7 is effectively guaranteed
> to not be intercepted, and so the native behavior of reading DR7 instead of the
> per-CPU variable is likely desirable.  I believe TDX has similar functionality
> (I forget if it's always on, or opt-in).

Aha, the choice was made by the CoCo hw designers - guests are allowed
to do breakpoints.

Oh well...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-23 18:43                       ` Borislav Petkov
@ 2025-04-24 19:18                         ` Sean Christopherson
  2025-04-24 20:31                           ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-24 19:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Wed, Apr 23, 2025, Borislav Petkov wrote:
> On Wed, Apr 23, 2025 at 07:10:17AM -0700, Sean Christopherson wrote:
> > On Wed, Apr 23, 2025, Borislav Petkov wrote:
> > > > Eww.  Optimization to lessen the pain of DR7 interception.  It'd be nice to clean
> > > > this up at some point, especially with things like SEV-ES with DebugSwap, where
> > > > DR7 is never intercepted.
> > > >   arch/x86/include/asm/debugreg.h:        if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
> > > >   arch/x86/kernel/hw_breakpoint.c:                 * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()
> > > 
> > > Patch adding it says "Because DRn access is 'difficult' with virt;..."
> > > so yeah. I guess we need to agree how to do debug exceptions in guests.
> > > Probably start documenting it and then have guest and host adhere to
> > > that. I'm talking completely without having looked at what the code does
> > > but the "handshake" agreement should be something like this and then we
> > > can start simplifying code...
> > 
> > I don't know that we'll be able to simplify the code.
> > 
> > #DBs in the guest are complex because DR[0-3] aren't context switched by hardware,
> > and running with active breakpoints is uncommon.  As a result, loading the guest's
> > DRs into hardware on every VM-Enter is undesirable, because it would add significant
> > latency (load DRs on entry, save DRs on exit) for a relatively rare situation
> > (guest has active breakpoints).
> > 
> > KVM (and presumably other hypervisors) intercepts DR accesses so that it can
> > detect when the guest has active breakpoints (DR7 bits enabled), at which point
> > KVM does load the guest's DRs into hardware and disables DR interception until
> > the next VM-Exit.
> > 
> > KVM also allows the host user to utilize hardware breakpoints to debug the guest,
> > which further adds to the madness, and that's not something the guest can change
> > or even influence.
> > 
> > So removing the "am I guest logic" entirely probably isn't feasible, because in
> > the common case where there are no active breakpoints, reading cpu_dr7 instead
> > of DR7 is a significant performance boost for "normal" VMs.
> 
> So I see three modes:
> 
> - default off - the usual case
> 
> - host debugs the guest
> 
> - guests are allowed to do breakpoints

Not quite.  KVM supports all of those seamlessly, with some caveats.  E.g. if
host userspace and guest kernel are trying to use the same DRx, the guest will
"lose" and not get its #DBs.

> So depending on what is enabled, the code can behave properly - it just
> needs logic which tells the relevant code - guest or host - which of the
> debugging mode is enabled. And then everything adheres to that and DTRT.
> 
> But before any of that, the even more important question is: do we even
> care to beef it up that much?
> 
> I get the feeling that we don't so it likely is a "whatever's the
> easiest" game.

Definitely not.  All I was thinking was something like:

diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index fdbbbfec745a..a218c5170ecd 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -121,7 +121,7 @@ static __always_inline unsigned long local_db_save(void)
 {
        unsigned long dr7;
 
-       if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
+       if (static_cpu_has(X86_FEATURE_DRS_MAY_VMEXIT) && !hw_breakpoint_active())
                return 0;
 
        get_debugreg(dr7, 7);

Where X86_FEATURE_DRS_MAY_VMEXIT is set if HYPERVISOR is detected, but then
cleared by SEV-ES+ and TDX guests with guaranteed access to DRs.  That said,
even that much infrastructure probably isn't worth the marginal benefits.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-24 19:18                         ` Sean Christopherson
@ 2025-04-24 20:31                           ` Borislav Petkov
  2025-04-26  0:08                             ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-24 20:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Thu, Apr 24, 2025 at 12:18:50PM -0700, Sean Christopherson wrote:
> Not quite.  KVM supports all of those seamlessly, with some caveats.  E.g. if
> host userspace and guest kernel are trying to use the same DRx, the guest will
> "lose" and not get its #DBs.

Pff, so cloud providers have big fat signs over their workstations
saying: you're not allowed to use breakpoints on production systems?

With my silly thinking, I'd prefer to reglement this more explicitly and
actually have the kernel enforce policy:

HV userspace has higher prio with #DB or guests do. But the "losing" bit
sounds weird and not nice.

> Definitely not.  All I was thinking was something like:
> 
> diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
> index fdbbbfec745a..a218c5170ecd 100644
> --- a/arch/x86/include/asm/debugreg.h
> +++ b/arch/x86/include/asm/debugreg.h
> @@ -121,7 +121,7 @@ static __always_inline unsigned long local_db_save(void)
>  {
>         unsigned long dr7;
>  
> -       if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
> +       if (static_cpu_has(X86_FEATURE_DRS_MAY_VMEXIT) && !hw_breakpoint_active())
>                 return 0;
>  
>         get_debugreg(dr7, 7);
> 
> Where X86_FEATURE_DRS_MAY_VMEXIT is set if HYPERVISOR is detected, but then
> cleared by SEV-ES+ and TDX guests with guaranteed access to DRs.  That said,
> even that much infrastructure probably isn't worth the marginal benefits.

Btw you can replace that X86_FEATURE_DRS_MAY_VMEXIT with a cc_platform
flag which gets properly set on all those coco guest types as those
flags are exactly for that stuff.

In any case, I don't see why not. It is easy enough and doesn't make
things worse, API-wise.

Care to send a proper patch with rationale why?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-24 20:31                           ` Borislav Petkov
@ 2025-04-26  0:08                             ` Sean Christopherson
  2025-04-26 11:26                               ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-04-26  0:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Thu, Apr 24, 2025, Borislav Petkov wrote:
> On Thu, Apr 24, 2025 at 12:18:50PM -0700, Sean Christopherson wrote:
> > Not quite.  KVM supports all of those seamlessly, with some caveats.  E.g. if
> > host userspace and guest kernel are trying to use the same DRx, the guest will
> > "lose" and not get its #DBs.
> 
> Pff, so cloud providers have big fat signs over their workstations
> saying: you're not allowed to use breakpoints on production systems?

Heh, it's a bit more than a sign.

> With my silly thinking, I'd prefer to reglement this more explicitly and
> actually have the kernel enforce policy:

The kernel already can enforce policy.  Setting host breakpoints on guest code
is done through a dedicated ioctl(), and access to said ioctl() can be restricted
through various sandboxing methods, e.g. seccomp.

> HV userspace has higher prio with #DB or guests do. But the "losing" bit
> sounds weird and not nice.

Yeah, it's weird and not nice.  But if a human is interactive debugging a guest,
odds are very, very good that a missing breakpoint in the guest is not at all a
concern.

> > Definitely not.  All I was thinking was something like:
> > 
> > diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
> > index fdbbbfec745a..a218c5170ecd 100644
> > --- a/arch/x86/include/asm/debugreg.h
> > +++ b/arch/x86/include/asm/debugreg.h
> > @@ -121,7 +121,7 @@ static __always_inline unsigned long local_db_save(void)
> >  {
> >         unsigned long dr7;
> >  
> > -       if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
> > +       if (static_cpu_has(X86_FEATURE_DRS_MAY_VMEXIT) && !hw_breakpoint_active())
> >                 return 0;
> >  
> >         get_debugreg(dr7, 7);
> > 
> > Where X86_FEATURE_DRS_MAY_VMEXIT is set if HYPERVISOR is detected, but then
> > cleared by SEV-ES+ and TDX guests with guaranteed access to DRs.  That said,
> > even that much infrastructure probably isn't worth the marginal benefits.
> 
> Btw you can replace that X86_FEATURE_DRS_MAY_VMEXIT with a cc_platform
> flag which gets properly set on all those coco guest types as those
> flags are exactly for that stuff.

No, that would defeat the purpose of the check.  The X86_FEATURE_HYPERVISOR has
nothing to do with correctness, it's all about performance.  Critically, it's a
static check that gets patched at runtime.  It's a micro-optimization for bare
metal to avoid a single cache miss (the __this_cpu_read(cpu_dr7)).  Routing
through cc_platform_has() would be far, far heavier than calling hw_breakpoint_active().

I pointed out the SEV-ES+/TDX cases because they likely would benefit from that
same micro-optimization, i.e. by avoiding the call to hw_breakpoint_active().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-26  0:08                             ` Sean Christopherson
@ 2025-04-26 11:26                               ` Borislav Petkov
  2025-05-06  1:04                                 ` Sean Christopherson
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2025-04-26 11:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On April 26, 2025 3:08:29 AM GMT+03:00, Sean Christopherson <seanjc@google.com> wrote:
>The kernel already can enforce policy.  Setting host breakpoints on guest code
>is done through a dedicated ioctl(), and access to said ioctl() can be restricted
>through various sandboxing methods, e.g. seccomp.

Ok, makes sense.

>No, that would defeat the purpose of the check.  The X86_FEATURE_HYPERVISOR has
>nothing to do with correctness, it's all about performance.  Critically, it's a
>static check that gets patched at runtime.  It's a micro-optimization for bare
>metal to avoid a single cache miss (the __this_cpu_read(cpu_dr7)).  Routing
>through cc_platform_has() would be far, far heavier than calling hw_breakpoint_active().

Huh, we care so much about speed here?

-- 
Sent from a small device: formatting sucks and brevity is inevitable.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)
  2025-04-26 11:26                               ` Borislav Petkov
@ 2025-05-06  1:04                                 ` Sean Christopherson
  0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-05-06  1:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Pavel Machek, Sasha Levin, linux-kernel, stable, Max Grobecker,
	Ingo Molnar, tglx, mingo, dave.hansen, x86, thomas.lendacky,
	perry.yuan, mario.limonciello, riel, mjguzik, darwi,
	Paolo Bonzini

On Sat, Apr 26, 2025, Borislav Petkov wrote:
> On April 26, 2025 3:08:29 AM GMT+03:00, Sean Christopherson <seanjc@google.com> wrote:
> >No, that would defeat the purpose of the check.  The X86_FEATURE_HYPERVISOR has
> >nothing to do with correctness, it's all about performance.  Critically, it's a
> >static check that gets patched at runtime.  It's a micro-optimization for bare
> >metal to avoid a single cache miss (the __this_cpu_read(cpu_dr7)).  Routing
> >through cc_platform_has() would be far, far heavier than calling hw_breakpoint_active().
> 
> Huh, we care so much about speed here?

That's a PeterZ question :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-05-06  1:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-31 14:37 [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Sasha Levin
2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Sasha Levin
2025-04-18 16:54   ` Pavel Machek
2025-04-18 17:19     ` Sean Christopherson
2025-04-18 17:36       ` Borislav Petkov
2025-04-18 18:31         ` Sean Christopherson
2025-04-18 19:12           ` Borislav Petkov
2025-04-22 17:22             ` Sean Christopherson
2025-04-22 17:33               ` CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine) Borislav Petkov
2025-04-22 19:48                 ` Sean Christopherson
2025-04-23  7:20                   ` Borislav Petkov
2025-04-23 14:10                     ` Sean Christopherson
2025-04-23 18:43                       ` Borislav Petkov
2025-04-24 19:18                         ` Sean Christopherson
2025-04-24 20:31                           ` Borislav Petkov
2025-04-26  0:08                             ` Sean Christopherson
2025-04-26 11:26                               ` Borislav Petkov
2025-05-06  1:04                                 ` Sean Christopherson
2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 3/6] perf: arm_pmu: Don't disable counter in armpmu_add() Sasha Levin
2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD Sasha Levin
2025-04-18 16:55   ` Pavel Machek
2025-04-18 19:27     ` Doug Anderson
2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 5/6] xen/mcelog: Add __nonstring annotations for unterminated strings Sasha Levin
2025-03-31 14:37 ` [PATCH AUTOSEL 5.10 6/6] x86/mm/ident_map: Fix theoretical virtual address overflow to zero Sasha Levin
2025-04-18 16:52 ` [PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox