public inbox for x86-cpuid@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings
@ 2025-03-04  8:51 Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 01/40] x86/cacheinfo: Validate cpuid leaf 0x2 EDX output Ahmed S. Darwish
                   ` (41 more replies)
  0 siblings, 42 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Hi,

As part of the onging x86-cpuid work [*], we've found that the handling
of leaf 0x2 and leaf 0x4 code paths is difficult to work with in its
current state.  This was mostly due to the organic growth of the x86/cpu
and x86/cacheinfo logic since its very early Linux days.

This series cleans up and refactors these code paths in preparation for
the new x86-cpuid model.

Summary:

- Patches 1 to 3 are independent bugfixes that were discovered during
  this refactoring work.

- Patches 4 to 10 are x86/cpu refactorings for code size and
  readability.

- Patch 10 adds standardized and kernel-doc documented logic for
  accessing leaf 0x2 one byte descriptors.

  This makes the leaf 0x2 sanitization logic centralized in one place.
  x86/cpu and x86/cacheinfo is modified to use such macros afterwards.

- Patches 11 to 28 refactors the x86/cacheinfo code.

  Beside readability, some of the unrelated logic (e.g. AMD northbridge
  cache_disable sysfs code) was first splitted from the generic leaf 0x4
  code paths, at the structure relationships level, then gutted-out into
  their own files.

- Patches 29 to 31 consolidate the existing (loop-based lookup) leaf 0x2
  cache and TLB descriptor tables into one hash-based lookup table.
  This reduces code size while still keeping rodata size in check.

  Standardized macros for accessing this consolidated table are also
  added.  Call sites can now just do:

	const struct leaf_0x2_table *entry;
	union leaf_0x2_regs regs;
	u8 *ptr;

	get_leaf_0x2_regs(&regs);
	for_each_leaf_0x2_entry(regs, ptr, entry) {
		switch (entry->c_type) {
			...
		}
	}

  without need to worry about sanitizing registers, skipping certain
  descriptors, etc.

- Patches 32 and 33 uses the consolidated table above for x86/cpu and
  x86/cacheinfo.

- Patches 34 to 40 provide the final set of x86/refactorings.

This series is based on -rc5.  It also applies cleanly on top of
tip/x86/core.

Note, testing was done by comparing below files:

	/proc/cpuinfo
	/sys/devices/system/cpu/
	/sys/kernel/debug/x86/topo/
	dmesg --notime | grep 'Last level [id]TLB entries'

before and after on various old and new x86 machine configurations.

[*] https://gitlab.com/x86-cpuid.org/x86-cpuid-db
    https://x86-cpuid.org

8<-----

Ahmed S. Darwish (33):
  x86/cacheinfo: Validate cpuid leaf 0x2 EDX output
  x86/cpu: Validate cpuid leaf 0x2 EDX output
  x86/cpu: Properly parse leaf 0x2 TLB descriptor 0x63
  x86/cpuid: Include linux/build_bug.h
  x86/cpu: Remove unnecessary headers and reorder the rest
  x86/cpu: Use max() for leaf 0x2 TLB descriptors parsing
  x86/cpu: Simplify TLB entry count storage
  x86/cpu: Remove leaf 0x2 parsing loop and add helpers
  x86/cacheinfo: Remove unnecessary headers and reorder the rest
  x86/cacheinfo: Use cpuid leaf 0x2 parsing helpers
  x86/cacheinfo: Constify _cpuid4_info_regs instances
  x86/cacheinfo: Align ci_info_init() assignment expressions
  x86/cacheinfo: Standardize _cpuid4_info_regs instance naming
  x86: treewide: Introduce x86_vendor_amd_or_hygon()
  x86/cacheinfo: Consolidate AMD/Hygon leaf 0x8000001d calls
  x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs
  x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file
  x86/cacheinfo: Use sysfs_emit() for sysfs attributes show()
  x86/cacheinfo: Separate Intel and AMD leaf 0x4 code paths
  x86/cacheinfo: Rename _cpuid4_info_regs to _cpuid4_info
  x86/cacheinfo: Clarify type markers for leaf 0x2 cache descriptors
  x86/cacheinfo: Use enums for cache descriptor types
  x86/cpu: Use enums for TLB descriptor types
  sizes.h: Cover all possible x86 cpu cache sizes
  x86/cacheinfo: Use consolidated leaf 0x2 descriptor table
  x86/cpu: Use consolidated leaf 0x2 descriptor table
  x86/cacheinfo: Separate leaf 0x2 handling and post-processing logic
  x86/cacheinfo: Separate intel leaf 0x4 handling
  x86/cacheinfo: Extract out cache level topology ID calculation
  x86/cacheinfo: Extract out cache self-snoop checks
  x86/cacheinfo: Relocate leaf 0x4 cache_type mapping
  x86/cacheinfo: Introduce amd_hygon_cpu_has_l3_cache()
  x86/cacheinfo: Apply maintainer-tip coding style fixes

Thomas Gleixner (7):
  x86/cpu: Get rid of smp_store_cpu_info() indirection
  x86/cpu: Remove unused TLB strings
  x86/cacheinfo: Remove the P4 trace leftovers for real
  x86/cacheinfo: Refactor leaf 0x2 cache descriptor lookup
  x86/cacheinfo: Properly name amd_cpuid4()'s first parameter
  x86/cacheinfo: Use proper name for cacheinfo instances
  x86/cpu: Consolidate CPUID leaf 0x2 tables

 arch/x86/events/amd/uncore.c            |    3 +-
 arch/x86/events/rapl.c                  |    3 +-
 arch/x86/include/asm/cpuid.h            |    1 +
 arch/x86/include/asm/cpuid/types.h      |  173 ++++
 arch/x86/include/asm/processor.h        |   26 +-
 arch/x86/include/asm/smp.h              |    2 -
 arch/x86/kernel/amd_nb.c                |   16 +-
 arch/x86/kernel/cpu/Makefile            |    5 +-
 arch/x86/kernel/cpu/amd.c               |   18 +-
 arch/x86/kernel/cpu/amd_cache_disable.c |  301 +++++++
 arch/x86/kernel/cpu/bugs.c              |   12 +-
 arch/x86/kernel/cpu/cacheinfo.c         | 1062 +++++++----------------
 arch/x86/kernel/cpu/common.c            |   31 +-
 arch/x86/kernel/cpu/cpu.h               |   17 +-
 arch/x86/kernel/cpu/cpuid_0x2_table.c   |  128 +++
 arch/x86/kernel/cpu/hygon.c             |   16 +-
 arch/x86/kernel/cpu/intel.c             |  208 ++---
 arch/x86/kernel/cpu/mce/core.c          |    4 +-
 arch/x86/kernel/cpu/mce/severity.c      |    3 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c      |    3 +-
 arch/x86/kernel/smpboot.c               |   27 +-
 arch/x86/kvm/svm/svm.c                  |    3 +-
 arch/x86/pci/amd_bus.c                  |    3 +-
 arch/x86/xen/enlighten.c                |   15 +-
 arch/x86/xen/pmu.c                      |    3 +-
 arch/x86/xen/smp_pv.c                   |    2 +-
 include/linux/sizes.h                   |    8 +
 27 files changed, 1076 insertions(+), 1017 deletions(-)
 create mode 100644 arch/x86/include/asm/cpuid/types.h
 create mode 100644 arch/x86/kernel/cpu/amd_cache_disable.c
 create mode 100644 arch/x86/kernel/cpu/cpuid_0x2_table.c

base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
--
2.48.1

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v1 01/40] x86/cacheinfo: Validate cpuid leaf 0x2 EDX output
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 02/40] x86/cpu: " Ahmed S. Darwish
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

cpuid leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX.  For these descriptors to be valid, the most
significant bit (MSB) of each register must be clear.

The historical git commit (019361a20f016: "- pre6: Intel: start to add
Pentium IV specific stuff (128-byte cacheline etc)...") introduced leaf
0x2 output parsing.  It only validated the MSBs of EAX, EBX, and ECX,
but left EDX unchecked.

Validate EDX's most-significant bit.

Cc: stable@vger.kernel.org
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index e6fa03ed9172..a6c6bccfa8b8 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -808,7 +808,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 			cpuid(2, &regs[0], &regs[1], &regs[2], &regs[3]);
 
 			/* If bit 31 is set, this is an unknown format */
-			for (j = 0 ; j < 3 ; j++)
+			for (j = 0 ; j < 4 ; j++)
 				if (regs[j] & (1 << 31))
 					regs[j] = 0;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 02/40] x86/cpu: Validate cpuid leaf 0x2 EDX output
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 01/40] x86/cacheinfo: Validate cpuid leaf 0x2 EDX output Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 03/40] x86/cpu: Properly parse leaf 0x2 TLB descriptor 0x63 Ahmed S. Darwish
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

cpuid leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX.  For these descriptors to be valid, the most
significant bit (MSB) of each register must be clear.

Leaf 0x2 parsing at intel.c only validated the MSBs of EAX, EBX, and
ECX, but left EDX unchecked.

Validate EDX's most-significant bit as well.

Fixes: e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of CPU")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 3dce22f00dc3..2a3716afee63 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -799,7 +799,7 @@ static void intel_detect_tlb(struct cpuinfo_x86 *c)
 		cpuid(2, &regs[0], &regs[1], &regs[2], &regs[3]);
 
 		/* If bit 31 is set, this is an unknown format */
-		for (j = 0 ; j < 3 ; j++)
+		for (j = 0 ; j < 4 ; j++)
 			if (regs[j] & (1 << 31))
 				regs[j] = 0;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 03/40] x86/cpu: Properly parse leaf 0x2 TLB descriptor 0x63
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 01/40] x86/cacheinfo: Validate cpuid leaf 0x2 EDX output Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 02/40] x86/cpu: " Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 04/40] x86/cpuid: Include linux/build_bug.h Ahmed S. Darwish
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

cpuid leaf 0x2's one-byte TLB descriptors report the number of entries
for specific TLB types, among other properties.

Typically, each emitted descriptor implies the same number of entries
for its respective TLB type(s).  An emitted 0x63 descriptor is an
exception: it implies 4 data TLB entries for 1GB pages and 32 data TLB
entries for 2MB or 4MB pages.

For the TLB descriptors parsing code, the entry count for 1GB pages is
encoded at the intel_tlb_table[] mapping, but the 2MB/4MB entry count is
totally ignored.

Update leaf 0x2's parsing logic 0x2 to account for 32 data TLB entries
for 2MB/4MB pages implied by the 0x63 descriptor.

Fixes: e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of CPU")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/intel.c | 50 +++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2a3716afee63..134368a3f4b1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -635,26 +635,37 @@ static unsigned int intel_size_cache(struct cpuinfo_x86 *c, unsigned int size)
 }
 #endif
 
-#define TLB_INST_4K	0x01
-#define TLB_INST_4M	0x02
-#define TLB_INST_2M_4M	0x03
+#define TLB_INST_4K		0x01
+#define TLB_INST_4M		0x02
+#define TLB_INST_2M_4M		0x03
 
-#define TLB_INST_ALL	0x05
-#define TLB_INST_1G	0x06
+#define TLB_INST_ALL		0x05
+#define TLB_INST_1G		0x06
 
-#define TLB_DATA_4K	0x11
-#define TLB_DATA_4M	0x12
-#define TLB_DATA_2M_4M	0x13
-#define TLB_DATA_4K_4M	0x14
+#define TLB_DATA_4K		0x11
+#define TLB_DATA_4M		0x12
+#define TLB_DATA_2M_4M		0x13
+#define TLB_DATA_4K_4M		0x14
 
-#define TLB_DATA_1G	0x16
+#define TLB_DATA_1G		0x16
+#define TLB_DATA_1G_2M_4M	0x17
 
-#define TLB_DATA0_4K	0x21
-#define TLB_DATA0_4M	0x22
-#define TLB_DATA0_2M_4M	0x23
+#define TLB_DATA0_4K		0x21
+#define TLB_DATA0_4M		0x22
+#define TLB_DATA0_2M_4M		0x23
 
-#define STLB_4K		0x41
-#define STLB_4K_2M	0x42
+#define STLB_4K			0x41
+#define STLB_4K_2M		0x42
+
+/*
+ * All of leaf 0x2's one-byte TLB descriptors implies the same number of
+ * entries for their respective TLB types.  The 0x63 descriptor is an
+ * exception: it implies 4 dTLB entries for 1GB pages 32 dTLB entries
+ * for 2MB or 4MB pages.  Encode descriptor 0x63 dTLB entry count for
+ * 2MB/4MB pages here, as its count for dTLB 1GB pages is already at the
+ * intel_tlb_table[] mapping.
+ */
+#define TLB_0x63_2M_4M_ENTRIES	32
 
 static const struct _tlb_table intel_tlb_table[] = {
 	{ 0x01, TLB_INST_4K,		32,	" TLB_INST 4 KByte pages, 4-way set associative" },
@@ -676,7 +687,8 @@ static const struct _tlb_table intel_tlb_table[] = {
 	{ 0x5c, TLB_DATA_4K_4M,		128,	" TLB_DATA 4 KByte and 4 MByte pages" },
 	{ 0x5d, TLB_DATA_4K_4M,		256,	" TLB_DATA 4 KByte and 4 MByte pages" },
 	{ 0x61, TLB_INST_4K,		48,	" TLB_INST 4 KByte pages, full associative" },
-	{ 0x63, TLB_DATA_1G,		4,	" TLB_DATA 1 GByte pages, 4-way set associative" },
+	{ 0x63, TLB_DATA_1G_2M_4M,	4,	" TLB_DATA 1 GByte pages, 4-way set associative"
+						" (plus 32 entries TLB_DATA 2 MByte or 4 MByte pages, not encoded here)" },
 	{ 0x6b, TLB_DATA_4K,		256,	" TLB_DATA 4 KByte pages, 8-way associative" },
 	{ 0x6c, TLB_DATA_2M_4M,		128,	" TLB_DATA 2 MByte or 4 MByte pages, 8-way associative" },
 	{ 0x6d, TLB_DATA_1G,		16,	" TLB_DATA 1 GByte pages, fully associative" },
@@ -776,6 +788,12 @@ static void intel_tlb_lookup(const unsigned char desc)
 		if (tlb_lld_4m[ENTRIES] < intel_tlb_table[k].entries)
 			tlb_lld_4m[ENTRIES] = intel_tlb_table[k].entries;
 		break;
+	case TLB_DATA_1G_2M_4M:
+		if (tlb_lld_2m[ENTRIES] < TLB_0x63_2M_4M_ENTRIES)
+			tlb_lld_2m[ENTRIES] = TLB_0x63_2M_4M_ENTRIES;
+		if (tlb_lld_4m[ENTRIES] < TLB_0x63_2M_4M_ENTRIES)
+			tlb_lld_4m[ENTRIES] = TLB_0x63_2M_4M_ENTRIES;
+		fallthrough;
 	case TLB_DATA_1G:
 		if (tlb_lld_1g[ENTRIES] < intel_tlb_table[k].entries)
 			tlb_lld_1g[ENTRIES] = intel_tlb_table[k].entries;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 04/40] x86/cpuid: Include linux/build_bug.h
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (2 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 03/40] x86/cpu: Properly parse leaf 0x2 TLB descriptor 0x63 Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

asm/cpuid.h uses static_assert() at multiple locations but it does not
include the CPP macro's definition at linux/build_bug.h.

Include the needed header.

This gets triggered when cpuid.h is included in new C files, which is to
be done in further commits.

Fixes: 43d86e3cd9a7 ("x86/cpu: Provide cpuid_read() et al.")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpuid.h b/arch/x86/include/asm/cpuid.h
index b2b9b4ef3dae..a92e4b08820a 100644
--- a/arch/x86/include/asm/cpuid.h
+++ b/arch/x86/include/asm/cpuid.h
@@ -6,6 +6,7 @@
 #ifndef _ASM_X86_CPUID_H
 #define _ASM_X86_CPUID_H
 
+#include <linux/build_bug.h>
 #include <linux/types.h>
 
 #include <asm/string.h>
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (3 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 04/40] x86/cpuid: Include linux/build_bug.h Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  9:14   ` Ingo Molnar
  2025-03-04  8:51 ` [PATCH v1 06/40] x86/cpu: Use max() for leaf 0x2 TLB descriptors parsing Ahmed S. Darwish
                   ` (36 subsequent siblings)
  41 siblings, 1 reply; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Remove the headers at intel.c that are no longer required.

Alphabetically reorder what remains since more headers will be included
in further commits.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/intel.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 134368a3f4b1..72f519534e2b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1,40 +1,29 @@
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/kernel.h>
-#include <linux/pgtable.h>
 
-#include <linux/string.h>
 #include <linux/bitops.h>
-#include <linux/smp.h>
-#include <linux/sched.h>
-#include <linux/sched/clock.h>
-#include <linux/thread_info.h>
 #include <linux/init.h>
-#include <linux/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/smp.h>
+#include <linux/string.h>
+
+#ifdef CONFIG_X86_64
+#include <linux/topology.h>
+#endif
 
-#include <asm/cpufeature.h>
-#include <asm/msr.h>
 #include <asm/bugs.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cpufeature.h>
 #include <asm/cpu.h>
+#include <asm/hwcap2.h>
 #include <asm/intel-family.h>
 #include <asm/microcode.h>
-#include <asm/hwcap2.h>
-#include <asm/elf.h>
-#include <asm/cpu_device_id.h>
-#include <asm/resctrl.h>
+#include <asm/msr.h>
 #include <asm/numa.h>
+#include <asm/resctrl.h>
 #include <asm/thermal.h>
 
-#ifdef CONFIG_X86_64
-#include <linux/topology.h>
-#endif
-
 #include "cpu.h"
 
-#ifdef CONFIG_X86_LOCAL_APIC
-#include <asm/mpspec.h>
-#include <asm/apic.h>
-#endif
-
 /*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 06/40] x86/cpu: Use max() for leaf 0x2 TLB descriptors parsing
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (4 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 07/40] x86/cpu: Simplify TLB entry count storage Ahmed S. Darwish
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The conditional statement "if (x < y) { x = y; }" appears 22 times at
the Intel leaf 0x2 descriptors parsing logic.

Replace each of such instances with a max() expression.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/intel.c | 76 ++++++++++++++-----------------------
 1 file changed, 28 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 72f519534e2b..e972c72e2b5d 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -3,6 +3,7 @@
 #include <linux/bitops.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
+#include <linux/minmax.h>
 #include <linux/smp.h>
 #include <linux/string.h>
 
@@ -699,7 +700,9 @@ static const struct _tlb_table intel_tlb_table[] = {
 
 static void intel_tlb_lookup(const unsigned char desc)
 {
+	unsigned int entries;
 	unsigned char k;
+
 	if (desc == 0)
 		return;
 
@@ -711,81 +714,58 @@ static void intel_tlb_lookup(const unsigned char desc)
 	if (intel_tlb_table[k].tlb_type == 0)
 		return;
 
+	entries = intel_tlb_table[k].entries;
 	switch (intel_tlb_table[k].tlb_type) {
 	case STLB_4K:
-		if (tlb_lli_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4k[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lld_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4k[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
+		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
 		break;
 	case STLB_4K_2M:
-		if (tlb_lli_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4k[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lld_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4k[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lli_2m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_2m[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lld_2m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_2m[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lli_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4m[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lld_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
+		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
+		tlb_lli_2m[ENTRIES] = max(tlb_lli_2m[ENTRIES], entries);
+		tlb_lld_2m[ENTRIES] = max(tlb_lld_2m[ENTRIES], entries);
+		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
+		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
 		break;
 	case TLB_INST_ALL:
-		if (tlb_lli_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4k[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lli_2m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_2m[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lli_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
+		tlb_lli_2m[ENTRIES] = max(tlb_lli_2m[ENTRIES], entries);
+		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
 		break;
 	case TLB_INST_4K:
-		if (tlb_lli_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4k[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
 		break;
 	case TLB_INST_4M:
-		if (tlb_lli_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
 		break;
 	case TLB_INST_2M_4M:
-		if (tlb_lli_2m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_2m[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lli_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lli_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lli_2m[ENTRIES] = max(tlb_lli_2m[ENTRIES], entries);
+		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
 		break;
 	case TLB_DATA_4K:
 	case TLB_DATA0_4K:
-		if (tlb_lld_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4k[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
 		break;
 	case TLB_DATA_4M:
 	case TLB_DATA0_4M:
-		if (tlb_lld_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
 		break;
 	case TLB_DATA_2M_4M:
 	case TLB_DATA0_2M_4M:
-		if (tlb_lld_2m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_2m[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lld_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lld_2m[ENTRIES] = max(tlb_lld_2m[ENTRIES], entries);
+		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
 		break;
 	case TLB_DATA_4K_4M:
-		if (tlb_lld_4k[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4k[ENTRIES] = intel_tlb_table[k].entries;
-		if (tlb_lld_4m[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_4m[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
+		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
 		break;
 	case TLB_DATA_1G_2M_4M:
-		if (tlb_lld_2m[ENTRIES] < TLB_0x63_2M_4M_ENTRIES)
-			tlb_lld_2m[ENTRIES] = TLB_0x63_2M_4M_ENTRIES;
-		if (tlb_lld_4m[ENTRIES] < TLB_0x63_2M_4M_ENTRIES)
-			tlb_lld_4m[ENTRIES] = TLB_0x63_2M_4M_ENTRIES;
+		tlb_lld_2m[ENTRIES] = max(tlb_lld_2m[ENTRIES], TLB_0x63_2M_4M_ENTRIES);
+		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], TLB_0x63_2M_4M_ENTRIES);
 		fallthrough;
 	case TLB_DATA_1G:
-		if (tlb_lld_1g[ENTRIES] < intel_tlb_table[k].entries)
-			tlb_lld_1g[ENTRIES] = intel_tlb_table[k].entries;
+		tlb_lld_1g[ENTRIES] = max(tlb_lld_1g[ENTRIES], entries);
 		break;
 	}
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 07/40] x86/cpu: Simplify TLB entry count storage
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (5 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 06/40] x86/cpu: Use max() for leaf 0x2 TLB descriptors parsing Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 08/40] x86/cpu: Get rid of smp_store_cpu_info() indirection Ahmed S. Darwish
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Commit e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of
CPU") introduced u16 "info" arrays for each TLB type.

Since 2012 and each array stores just one type of information: the
number of TLB entries for its respective TLB type.

Replace such arrays with simple variables.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/processor.h | 19 +++++--------
 arch/x86/kernel/cpu/amd.c        | 18 ++++++------
 arch/x86/kernel/cpu/common.c     | 20 ++++++-------
 arch/x86/kernel/cpu/hygon.c      | 16 +++++------
 arch/x86/kernel/cpu/intel.c      | 48 ++++++++++++++++----------------
 5 files changed, 57 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c0cd10182e90..0ea227fa027c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -60,18 +60,13 @@ struct vm86;
 # define ARCH_MIN_MMSTRUCT_ALIGN	0
 #endif
 
-enum tlb_infos {
-	ENTRIES,
-	NR_INFO
-};
-
-extern u16 __read_mostly tlb_lli_4k[NR_INFO];
-extern u16 __read_mostly tlb_lli_2m[NR_INFO];
-extern u16 __read_mostly tlb_lli_4m[NR_INFO];
-extern u16 __read_mostly tlb_lld_4k[NR_INFO];
-extern u16 __read_mostly tlb_lld_2m[NR_INFO];
-extern u16 __read_mostly tlb_lld_4m[NR_INFO];
-extern u16 __read_mostly tlb_lld_1g[NR_INFO];
+extern u16 __read_mostly tlb_lli_4k;
+extern u16 __read_mostly tlb_lli_2m;
+extern u16 __read_mostly tlb_lli_4m;
+extern u16 __read_mostly tlb_lld_4k;
+extern u16 __read_mostly tlb_lld_2m;
+extern u16 __read_mostly tlb_lld_4m;
+extern u16 __read_mostly tlb_lld_1g;
 
 /*
  * CPU type and hardware bug flags. Kept separately for each CPU.
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 54194f5995de..c43e5d4033bb 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1105,8 +1105,8 @@ static void cpu_detect_tlb_amd(struct cpuinfo_x86 *c)
 
 	cpuid(0x80000006, &eax, &ebx, &ecx, &edx);
 
-	tlb_lld_4k[ENTRIES] = (ebx >> 16) & mask;
-	tlb_lli_4k[ENTRIES] = ebx & mask;
+	tlb_lld_4k = (ebx >> 16) & mask;
+	tlb_lli_4k = ebx & mask;
 
 	/*
 	 * K8 doesn't have 2M/4M entries in the L2 TLB so read out the L1 TLB
@@ -1119,26 +1119,26 @@ static void cpu_detect_tlb_amd(struct cpuinfo_x86 *c)
 
 	/* Handle DTLB 2M and 4M sizes, fall back to L1 if L2 is disabled */
 	if (!((eax >> 16) & mask))
-		tlb_lld_2m[ENTRIES] = (cpuid_eax(0x80000005) >> 16) & 0xff;
+		tlb_lld_2m = (cpuid_eax(0x80000005) >> 16) & 0xff;
 	else
-		tlb_lld_2m[ENTRIES] = (eax >> 16) & mask;
+		tlb_lld_2m = (eax >> 16) & mask;
 
 	/* a 4M entry uses two 2M entries */
-	tlb_lld_4m[ENTRIES] = tlb_lld_2m[ENTRIES] >> 1;
+	tlb_lld_4m = tlb_lld_2m >> 1;
 
 	/* Handle ITLB 2M and 4M sizes, fall back to L1 if L2 is disabled */
 	if (!(eax & mask)) {
 		/* Erratum 658 */
 		if (c->x86 == 0x15 && c->x86_model <= 0x1f) {
-			tlb_lli_2m[ENTRIES] = 1024;
+			tlb_lli_2m = 1024;
 		} else {
 			cpuid(0x80000005, &eax, &ebx, &ecx, &edx);
-			tlb_lli_2m[ENTRIES] = eax & 0xff;
+			tlb_lli_2m = eax & 0xff;
 		}
 	} else
-		tlb_lli_2m[ENTRIES] = eax & mask;
+		tlb_lli_2m = eax & mask;
 
-	tlb_lli_4m[ENTRIES] = tlb_lli_2m[ENTRIES] >> 1;
+	tlb_lli_4m = tlb_lli_2m >> 1;
 }
 
 static const struct cpu_dev amd_cpu_dev = {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 7cce91b19fb2..486395356faf 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -846,13 +846,13 @@ void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
 	c->x86_cache_size = l2size;
 }
 
-u16 __read_mostly tlb_lli_4k[NR_INFO];
-u16 __read_mostly tlb_lli_2m[NR_INFO];
-u16 __read_mostly tlb_lli_4m[NR_INFO];
-u16 __read_mostly tlb_lld_4k[NR_INFO];
-u16 __read_mostly tlb_lld_2m[NR_INFO];
-u16 __read_mostly tlb_lld_4m[NR_INFO];
-u16 __read_mostly tlb_lld_1g[NR_INFO];
+u16 __read_mostly tlb_lli_4k;
+u16 __read_mostly tlb_lli_2m;
+u16 __read_mostly tlb_lli_4m;
+u16 __read_mostly tlb_lld_4k;
+u16 __read_mostly tlb_lld_2m;
+u16 __read_mostly tlb_lld_4m;
+u16 __read_mostly tlb_lld_1g;
 
 static void cpu_detect_tlb(struct cpuinfo_x86 *c)
 {
@@ -860,12 +860,10 @@ static void cpu_detect_tlb(struct cpuinfo_x86 *c)
 		this_cpu->c_detect_tlb(c);
 
 	pr_info("Last level iTLB entries: 4KB %d, 2MB %d, 4MB %d\n",
-		tlb_lli_4k[ENTRIES], tlb_lli_2m[ENTRIES],
-		tlb_lli_4m[ENTRIES]);
+		tlb_lli_4k, tlb_lli_2m, tlb_lli_4m);
 
 	pr_info("Last level dTLB entries: 4KB %d, 2MB %d, 4MB %d, 1GB %d\n",
-		tlb_lld_4k[ENTRIES], tlb_lld_2m[ENTRIES],
-		tlb_lld_4m[ENTRIES], tlb_lld_1g[ENTRIES]);
+		tlb_lld_4k, tlb_lld_2m, tlb_lld_4m, tlb_lld_1g);
 }
 
 void get_cpu_vendor(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index c5191b06f9f2..6af4a4a90a52 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -240,26 +240,26 @@ static void cpu_detect_tlb_hygon(struct cpuinfo_x86 *c)
 
 	cpuid(0x80000006, &eax, &ebx, &ecx, &edx);
 
-	tlb_lld_4k[ENTRIES] = (ebx >> 16) & mask;
-	tlb_lli_4k[ENTRIES] = ebx & mask;
+	tlb_lld_4k = (ebx >> 16) & mask;
+	tlb_lli_4k = ebx & mask;
 
 	/* Handle DTLB 2M and 4M sizes, fall back to L1 if L2 is disabled */
 	if (!((eax >> 16) & mask))
-		tlb_lld_2m[ENTRIES] = (cpuid_eax(0x80000005) >> 16) & 0xff;
+		tlb_lld_2m = (cpuid_eax(0x80000005) >> 16) & 0xff;
 	else
-		tlb_lld_2m[ENTRIES] = (eax >> 16) & mask;
+		tlb_lld_2m = (eax >> 16) & mask;
 
 	/* a 4M entry uses two 2M entries */
-	tlb_lld_4m[ENTRIES] = tlb_lld_2m[ENTRIES] >> 1;
+	tlb_lld_4m = tlb_lld_2m >> 1;
 
 	/* Handle ITLB 2M and 4M sizes, fall back to L1 if L2 is disabled */
 	if (!(eax & mask)) {
 		cpuid(0x80000005, &eax, &ebx, &ecx, &edx);
-		tlb_lli_2m[ENTRIES] = eax & 0xff;
+		tlb_lli_2m = eax & 0xff;
 	} else
-		tlb_lli_2m[ENTRIES] = eax & mask;
+		tlb_lli_2m = eax & mask;
 
-	tlb_lli_4m[ENTRIES] = tlb_lli_2m[ENTRIES] >> 1;
+	tlb_lli_4m = tlb_lli_2m >> 1;
 }
 
 static const struct cpu_dev hygon_cpu_dev = {
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index e972c72e2b5d..905f39fce375 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -717,55 +717,55 @@ static void intel_tlb_lookup(const unsigned char desc)
 	entries = intel_tlb_table[k].entries;
 	switch (intel_tlb_table[k].tlb_type) {
 	case STLB_4K:
-		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
-		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
+		tlb_lli_4k = max(tlb_lli_4k, entries);
+		tlb_lld_4k = max(tlb_lld_4k, entries);
 		break;
 	case STLB_4K_2M:
-		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
-		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
-		tlb_lli_2m[ENTRIES] = max(tlb_lli_2m[ENTRIES], entries);
-		tlb_lld_2m[ENTRIES] = max(tlb_lld_2m[ENTRIES], entries);
-		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
-		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
+		tlb_lli_4k = max(tlb_lli_4k, entries);
+		tlb_lld_4k = max(tlb_lld_4k, entries);
+		tlb_lli_2m = max(tlb_lli_2m, entries);
+		tlb_lld_2m = max(tlb_lld_2m, entries);
+		tlb_lli_4m = max(tlb_lli_4m, entries);
+		tlb_lld_4m = max(tlb_lld_4m, entries);
 		break;
 	case TLB_INST_ALL:
-		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
-		tlb_lli_2m[ENTRIES] = max(tlb_lli_2m[ENTRIES], entries);
-		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
+		tlb_lli_4k = max(tlb_lli_4k, entries);
+		tlb_lli_2m = max(tlb_lli_2m, entries);
+		tlb_lli_4m = max(tlb_lli_4m, entries);
 		break;
 	case TLB_INST_4K:
-		tlb_lli_4k[ENTRIES] = max(tlb_lli_4k[ENTRIES], entries);
+		tlb_lli_4k = max(tlb_lli_4k, entries);
 		break;
 	case TLB_INST_4M:
-		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
+		tlb_lli_4m = max(tlb_lli_4m, entries);
 		break;
 	case TLB_INST_2M_4M:
-		tlb_lli_2m[ENTRIES] = max(tlb_lli_2m[ENTRIES], entries);
-		tlb_lli_4m[ENTRIES] = max(tlb_lli_4m[ENTRIES], entries);
+		tlb_lli_2m = max(tlb_lli_2m, entries);
+		tlb_lli_4m = max(tlb_lli_4m, entries);
 		break;
 	case TLB_DATA_4K:
 	case TLB_DATA0_4K:
-		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
+		tlb_lld_4k = max(tlb_lld_4k, entries);
 		break;
 	case TLB_DATA_4M:
 	case TLB_DATA0_4M:
-		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
+		tlb_lld_4m = max(tlb_lld_4m, entries);
 		break;
 	case TLB_DATA_2M_4M:
 	case TLB_DATA0_2M_4M:
-		tlb_lld_2m[ENTRIES] = max(tlb_lld_2m[ENTRIES], entries);
-		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
+		tlb_lld_2m = max(tlb_lld_2m, entries);
+		tlb_lld_4m = max(tlb_lld_4m, entries);
 		break;
 	case TLB_DATA_4K_4M:
-		tlb_lld_4k[ENTRIES] = max(tlb_lld_4k[ENTRIES], entries);
-		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], entries);
+		tlb_lld_4k = max(tlb_lld_4k, entries);
+		tlb_lld_4m = max(tlb_lld_4m, entries);
 		break;
 	case TLB_DATA_1G_2M_4M:
-		tlb_lld_2m[ENTRIES] = max(tlb_lld_2m[ENTRIES], TLB_0x63_2M_4M_ENTRIES);
-		tlb_lld_4m[ENTRIES] = max(tlb_lld_4m[ENTRIES], TLB_0x63_2M_4M_ENTRIES);
+		tlb_lld_2m = max(tlb_lld_2m, TLB_0x63_2M_4M_ENTRIES);
+		tlb_lld_4m = max(tlb_lld_4m, TLB_0x63_2M_4M_ENTRIES);
 		fallthrough;
 	case TLB_DATA_1G:
-		tlb_lld_1g[ENTRIES] = max(tlb_lld_1g[ENTRIES], entries);
+		tlb_lld_1g = max(tlb_lld_1g, entries);
 		break;
 	}
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 08/40] x86/cpu: Get rid of smp_store_cpu_info() indirection
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (6 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 07/40] x86/cpu: Simplify TLB entry count storage Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 09/40] x86/cpu: Remove unused TLB strings Ahmed S. Darwish
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

smp_store_cpu_info() is just a wrapper around identify_secondary_cpu()
without further value.

Move the extra bits from smp_store_cpu_info() into identify_secondary_cpu()
and remove the wrapper.

[darwi: Make it compile and fixup the xen/smp_pv.c instance]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/include/asm/smp.h       |  2 --
 arch/x86/kernel/cpu/common.c     | 11 +++++++++--
 arch/x86/kernel/smpboot.c        | 24 ++----------------------
 arch/x86/xen/smp_pv.c            |  2 +-
 5 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0ea227fa027c..d5d9a071cddc 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -229,7 +229,7 @@ static inline unsigned long long l1tf_pfn_limit(void)
 void init_cpu_devs(void);
 void get_cpu_vendor(struct cpuinfo_x86 *c);
 extern void early_cpu_init(void);
-extern void identify_secondary_cpu(struct cpuinfo_x86 *);
+extern void identify_secondary_cpu(unsigned int cpu);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index ca073f40698f..820a90d2fb4a 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -119,8 +119,6 @@ void native_smp_send_reschedule(int cpu);
 void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
 
-void smp_store_cpu_info(int id);
-
 asmlinkage __visible void smp_reboot_interrupt(void);
 __visible void smp_reschedule_interrupt(struct pt_regs *regs);
 __visible void smp_call_function_interrupt(struct pt_regs *regs);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 486395356faf..749fe02ef1f7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1960,9 +1960,15 @@ static __init void identify_boot_cpu(void)
 	lkgs_init();
 }
 
-void identify_secondary_cpu(struct cpuinfo_x86 *c)
+void identify_secondary_cpu(unsigned int cpu)
 {
-	BUG_ON(c == &boot_cpu_data);
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+	/* Copy boot_cpu_data only on the first bringup */
+	if (!c->initialized)
+		*c = boot_cpu_data;
+	c->cpu_index = cpu;
+
 	identify_cpu(c);
 #ifdef CONFIG_X86_32
 	enable_sep_cpu();
@@ -1973,6 +1979,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
 		update_gds_msr();
 
 	tsx_ap_init();
+	c->initialized = true;
 }
 
 void print_cpu_info(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c10850ae6f09..e199465dc9e1 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -190,7 +190,7 @@ static void ap_starting(void)
 	apic_ap_setup();
 
 	/* Save the processor parameters. */
-	smp_store_cpu_info(cpuid);
+	identify_secondary_cpu(cpuid);
 
 	/*
 	 * The topology information must be up to date before
@@ -215,7 +215,7 @@ static void ap_calibrate_delay(void)
 {
 	/*
 	 * Calibrate the delay loop and update loops_per_jiffy in cpu_data.
-	 * smp_store_cpu_info() stored a value that is close but not as
+	 * identify_secondary_cpu() stored a value that is close but not as
 	 * accurate as the value just calculated.
 	 *
 	 * As this is invoked after the TSC synchronization check,
@@ -315,26 +315,6 @@ static void notrace start_secondary(void *unused)
 	cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
 
-/*
- * The bootstrap kernel entry code has set these up. Save them for
- * a given CPU
- */
-void smp_store_cpu_info(int id)
-{
-	struct cpuinfo_x86 *c = &cpu_data(id);
-
-	/* Copy boot_cpu_data only on the first bringup */
-	if (!c->initialized)
-		*c = boot_cpu_data;
-	c->cpu_index = id;
-	/*
-	 * During boot time, CPU0 has this setup already. Save the info when
-	 * bringing up an AP.
-	 */
-	identify_secondary_cpu(c);
-	c->initialized = true;
-}
-
 static bool
 topology_same_node(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 6863d3da7dec..688ff59318ae 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -70,7 +70,7 @@ static void cpu_bringup(void)
 		xen_enable_syscall();
 	}
 	cpu = smp_processor_id();
-	smp_store_cpu_info(cpu);
+	identify_secondary_cpu(cpu);
 	set_cpu_sibling_map(cpu);
 
 	speculative_store_bypass_ht_init();
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 09/40] x86/cpu: Remove unused TLB strings
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (7 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 08/40] x86/cpu: Get rid of smp_store_cpu_info() indirection Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers Ahmed S. Darwish
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

commit e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of
CPU") added the TLB table for parsing CPUID(0x4) including strings
describing them. The string entry in the table was never used.

Convert it to a comment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cpu.h   |  8 ----
 arch/x86/kernel/cpu/intel.c | 80 ++++++++++++++++++++-----------------
 2 files changed, 43 insertions(+), 45 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 1beccefbaff9..51deb60a9d26 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -33,14 +33,6 @@ struct cpu_dev {
 #endif
 };
 
-struct _tlb_table {
-	unsigned char descriptor;
-	char tlb_type;
-	unsigned int entries;
-	/* unsigned int ways; */
-	char info[128];
-};
-
 #define cpu_dev_register(cpu_devX) \
 	static const struct cpu_dev *const __cpu_dev_##cpu_devX __used \
 	__section(".x86_cpu_dev.init") = \
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 905f39fce375..cfd492cf9c3b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -657,44 +657,50 @@ static unsigned int intel_size_cache(struct cpuinfo_x86 *c, unsigned int size)
  */
 #define TLB_0x63_2M_4M_ENTRIES	32
 
+struct _tlb_table {
+	unsigned char descriptor;
+	char tlb_type;
+	unsigned int entries;
+};
+
 static const struct _tlb_table intel_tlb_table[] = {
-	{ 0x01, TLB_INST_4K,		32,	" TLB_INST 4 KByte pages, 4-way set associative" },
-	{ 0x02, TLB_INST_4M,		2,	" TLB_INST 4 MByte pages, full associative" },
-	{ 0x03, TLB_DATA_4K,		64,	" TLB_DATA 4 KByte pages, 4-way set associative" },
-	{ 0x04, TLB_DATA_4M,		8,	" TLB_DATA 4 MByte pages, 4-way set associative" },
-	{ 0x05, TLB_DATA_4M,		32,	" TLB_DATA 4 MByte pages, 4-way set associative" },
-	{ 0x0b, TLB_INST_4M,		4,	" TLB_INST 4 MByte pages, 4-way set associative" },
-	{ 0x4f, TLB_INST_4K,		32,	" TLB_INST 4 KByte pages" },
-	{ 0x50, TLB_INST_ALL,		64,	" TLB_INST 4 KByte and 2-MByte or 4-MByte pages" },
-	{ 0x51, TLB_INST_ALL,		128,	" TLB_INST 4 KByte and 2-MByte or 4-MByte pages" },
-	{ 0x52, TLB_INST_ALL,		256,	" TLB_INST 4 KByte and 2-MByte or 4-MByte pages" },
-	{ 0x55, TLB_INST_2M_4M,		7,	" TLB_INST 2-MByte or 4-MByte pages, fully associative" },
-	{ 0x56, TLB_DATA0_4M,		16,	" TLB_DATA0 4 MByte pages, 4-way set associative" },
-	{ 0x57, TLB_DATA0_4K,		16,	" TLB_DATA0 4 KByte pages, 4-way associative" },
-	{ 0x59, TLB_DATA0_4K,		16,	" TLB_DATA0 4 KByte pages, fully associative" },
-	{ 0x5a, TLB_DATA0_2M_4M,	32,	" TLB_DATA0 2-MByte or 4 MByte pages, 4-way set associative" },
-	{ 0x5b, TLB_DATA_4K_4M,		64,	" TLB_DATA 4 KByte and 4 MByte pages" },
-	{ 0x5c, TLB_DATA_4K_4M,		128,	" TLB_DATA 4 KByte and 4 MByte pages" },
-	{ 0x5d, TLB_DATA_4K_4M,		256,	" TLB_DATA 4 KByte and 4 MByte pages" },
-	{ 0x61, TLB_INST_4K,		48,	" TLB_INST 4 KByte pages, full associative" },
-	{ 0x63, TLB_DATA_1G_2M_4M,	4,	" TLB_DATA 1 GByte pages, 4-way set associative"
-						" (plus 32 entries TLB_DATA 2 MByte or 4 MByte pages, not encoded here)" },
-	{ 0x6b, TLB_DATA_4K,		256,	" TLB_DATA 4 KByte pages, 8-way associative" },
-	{ 0x6c, TLB_DATA_2M_4M,		128,	" TLB_DATA 2 MByte or 4 MByte pages, 8-way associative" },
-	{ 0x6d, TLB_DATA_1G,		16,	" TLB_DATA 1 GByte pages, fully associative" },
-	{ 0x76, TLB_INST_2M_4M,		8,	" TLB_INST 2-MByte or 4-MByte pages, fully associative" },
-	{ 0xb0, TLB_INST_4K,		128,	" TLB_INST 4 KByte pages, 4-way set associative" },
-	{ 0xb1, TLB_INST_2M_4M,		4,	" TLB_INST 2M pages, 4-way, 8 entries or 4M pages, 4-way entries" },
-	{ 0xb2, TLB_INST_4K,		64,	" TLB_INST 4KByte pages, 4-way set associative" },
-	{ 0xb3, TLB_DATA_4K,		128,	" TLB_DATA 4 KByte pages, 4-way set associative" },
-	{ 0xb4, TLB_DATA_4K,		256,	" TLB_DATA 4 KByte pages, 4-way associative" },
-	{ 0xb5, TLB_INST_4K,		64,	" TLB_INST 4 KByte pages, 8-way set associative" },
-	{ 0xb6, TLB_INST_4K,		128,	" TLB_INST 4 KByte pages, 8-way set associative" },
-	{ 0xba, TLB_DATA_4K,		64,	" TLB_DATA 4 KByte pages, 4-way associative" },
-	{ 0xc0, TLB_DATA_4K_4M,		8,	" TLB_DATA 4 KByte and 4 MByte pages, 4-way associative" },
-	{ 0xc1, STLB_4K_2M,		1024,	" STLB 4 KByte and 2 MByte pages, 8-way associative" },
-	{ 0xc2, TLB_DATA_2M_4M,		16,	" TLB_DATA 2 MByte/4MByte pages, 4-way associative" },
-	{ 0xca, STLB_4K,		512,	" STLB 4 KByte pages, 4-way associative" },
+	{ 0x01, TLB_INST_4K,		32},	/* TLB_INST 4 KByte pages, 4-way set associative */
+	{ 0x02, TLB_INST_4M,		2},	/* TLB_INST 4 MByte pages, full associative */
+	{ 0x03, TLB_DATA_4K,		64},	/* TLB_DATA 4 KByte pages, 4-way set associative */
+	{ 0x04, TLB_DATA_4M,		8},	/* TLB_DATA 4 MByte pages, 4-way set associative */
+	{ 0x05, TLB_DATA_4M,		32},	/* TLB_DATA 4 MByte pages, 4-way set associative */
+	{ 0x0b, TLB_INST_4M,		4},	/* TLB_INST 4 MByte pages, 4-way set associative */
+	{ 0x4f, TLB_INST_4K,		32},	/* TLB_INST 4 KByte pages */
+	{ 0x50, TLB_INST_ALL,		64},	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
+	{ 0x51, TLB_INST_ALL,		128},	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
+	{ 0x52, TLB_INST_ALL,		256},	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
+	{ 0x55, TLB_INST_2M_4M,		7},	/* TLB_INST 2-MByte or 4-MByte pages, fully associative */
+	{ 0x56, TLB_DATA0_4M,		16},	/* TLB_DATA0 4 MByte pages, 4-way set associative */
+	{ 0x57, TLB_DATA0_4K,		16},	/* TLB_DATA0 4 KByte pages, 4-way associative */
+	{ 0x59, TLB_DATA0_4K,		16},	/* TLB_DATA0 4 KByte pages, fully associative */
+	{ 0x5a, TLB_DATA0_2M_4M,	32},	/* TLB_DATA0 2-MByte or 4 MByte pages, 4-way set associative */
+	{ 0x5b, TLB_DATA_4K_4M,		64},	/* TLB_DATA 4 KByte and 4 MByte pages */
+	{ 0x5c, TLB_DATA_4K_4M,		128},	/* TLB_DATA 4 KByte and 4 MByte pages */
+	{ 0x5d, TLB_DATA_4K_4M,		256},	/* TLB_DATA 4 KByte and 4 MByte pages */
+	{ 0x61, TLB_INST_4K,		48},	/* TLB_INST 4 KByte pages, full associative */
+	{ 0x63, TLB_DATA_1G_2M_4M,	4},	/* TLB_DATA 1 GByte pages, 4-way set associative
+						 * (plus 32 entries TLB_DATA 2 MByte or 4 MByte pages, not encoded here) */
+	{ 0x6b, TLB_DATA_4K,		256},	/* TLB_DATA 4 KByte pages, 8-way associative */
+	{ 0x6c, TLB_DATA_2M_4M,		128},	/* TLB_DATA 2 MByte or 4 MByte pages, 8-way associative */
+	{ 0x6d, TLB_DATA_1G,		16},	/* TLB_DATA 1 GByte pages, fully associative */
+	{ 0x76, TLB_INST_2M_4M,		8},	/* TLB_INST 2-MByte or 4-MByte pages, fully associative */
+	{ 0xb0, TLB_INST_4K,		128},	/* TLB_INST 4 KByte pages, 4-way set associative */
+	{ 0xb1, TLB_INST_2M_4M,		4},	/* TLB_INST 2M pages, 4-way, 8 entries or 4M pages, 4-way entries */
+	{ 0xb2, TLB_INST_4K,		64},	/* TLB_INST 4KByte pages, 4-way set associative */
+	{ 0xb3, TLB_DATA_4K,		128},	/* TLB_DATA 4 KByte pages, 4-way set associative */
+	{ 0xb4, TLB_DATA_4K,		256},	/* TLB_DATA 4 KByte pages, 4-way associative */
+	{ 0xb5, TLB_INST_4K,		64},	/* TLB_INST 4 KByte pages, 8-way set associative */
+	{ 0xb6, TLB_INST_4K,		128},	/* TLB_INST 4 KByte pages, 8-way set associative */
+	{ 0xba, TLB_DATA_4K,		64},	/* TLB_DATA 4 KByte pages, 4-way associative */
+	{ 0xc0, TLB_DATA_4K_4M,		8},	/* TLB_DATA 4 KByte and 4 MByte pages, 4-way associative */
+	{ 0xc1, STLB_4K_2M,		1024},	/* STLB 4 KByte and 2 MByte pages, 8-way associative */
+	{ 0xc2, TLB_DATA_2M_4M,		16},	/* TLB_DATA 2 MByte/4MByte pages, 4-way associative */
+	{ 0xca, STLB_4K,		512},	/* STLB 4 KByte pages, 4-way associative */
 	{ 0x00, 0, 0 }
 };
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (8 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 09/40] x86/cpu: Remove unused TLB strings Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  9:26   ` Ingo Molnar
  2025-03-04  8:51 ` [PATCH v1 11/40] x86/cacheinfo: Remove the P4 trace leftovers for real Ahmed S. Darwish
                   ` (31 subsequent siblings)
  41 siblings, 1 reply; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Leaf 0x2 output includes a "query count" byte where it was supposed to
specify the number of repeated cpuid leaf 0x2 subleaf 0 queries needed
to extract all of the hardware's cache and TLB descriptors.

Per current Intel manuals, all CPUs supporting this leaf "will always"
return an iteration count of 1.

Remove the leaf 0x2 query count loop and just query the hardware once.
Parse the output with C99 bitfields instead of ugly bitwise operations.
Provide leaf 0x2 parsing helpers at asm/cpuid/types.h to do all that.

Use the new leaf 0x2 parsing helpers at x86/cpu intel.c.  Further
commits will also use them for x86/cacheinfo.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h | 79 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel.c        | 24 +++------
 2 files changed, 85 insertions(+), 18 deletions(-)
 create mode 100644 arch/x86/include/asm/cpuid/types.h

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
new file mode 100644
index 000000000000..50f6046a57b9
--- /dev/null
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CPUID_TYPES_H
+#define _ASM_X86_CPUID_TYPES_H
+
+#include <linux/types.h>
+
+#include <asm/cpuid.h>
+
+/*
+ * CPUID(0x2) parsing helpers
+ * Check for_each_leaf_0x2_desc() documentation.
+ */
+
+struct leaf_0x2_reg {
+		u32		: 31,
+			invalid	: 1;
+};
+
+union leaf_0x2_regs {
+	struct leaf_0x2_reg	reg[4];
+	u32			regv[4];
+	u8			desc[16];
+};
+
+/**
+ * get_leaf_0x2_regs() - Return sanitized leaf 0x2 register output
+ * @regs:	Output parameter
+ *
+ * Get leaf 0x2 register output and store it in @regs.  Invalid byte
+ * descriptors returned by the hardware will be force set to zero (the
+ * NULL cache/TLB descriptor) before returning them to the caller.
+ */
+static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)
+{
+	cpuid_leaf(0x2, regs);
+
+	/*
+	 * All Intel CPUs must report an iteration count of 1.  In case
+	 * of bogus hardware, treat all returned descriptors as NULL.
+	 */
+	if (regs->desc[0] != 0x01) {
+		for (int i = 0; i < 4; i++)
+			regs->regv[i] = 0;
+		return;
+	}
+
+	/*
+	 * The most significant bit (MSB) of each register must be clear.
+	 * If a register is invalid, replace its descriptors with NULL.
+	 */
+	for (int i = 0; i < 4; i++) {
+		if (regs->reg[i].invalid)
+			regs->regv[i] = 0;
+	}
+}
+
+/**
+ * for_each_leaf_0x2_desc() - Iterator for leaf 0x2 descriptors
+ * @regs:	Leaf 0x2 register output, as returned by get_leaf_0x2_regs()
+ * @desc:	Pointer to the returned descriptor for each iteration
+ *
+ * Loop over the 1-byte descriptors in the passed leaf 0x2 output registers
+ * @regs.  Provide each descriptor through @desc.
+ *
+ * Sample usage::
+ *
+ *	union leaf_0x2_regs regs;
+ *	u8 *desc;
+ *
+ *	get_leaf_0x2_regs(&regs);
+ *	for_each_leaf_0x2_desc(regs, desc) {
+ *		// Handle *desc value
+ *	}
+ */
+#define for_each_leaf_0x2_desc(regs, desc)				\
+	/* Skip the first byte as it is not a descriptor */		\
+	for (desc = &(regs).desc[1]; desc < &(regs).desc[16]; desc++)
+
+#endif /* _ASM_X86_CPUID_TYPES_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index cfd492cf9c3b..57e170ffe3ba 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -15,6 +15,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/cpufeature.h>
 #include <asm/cpu.h>
+#include <asm/cpuid/types.h>
 #include <asm/hwcap2.h>
 #include <asm/intel-family.h>
 #include <asm/microcode.h>
@@ -778,28 +779,15 @@ static void intel_tlb_lookup(const unsigned char desc)
 
 static void intel_detect_tlb(struct cpuinfo_x86 *c)
 {
-	int i, j, n;
-	unsigned int regs[4];
-	unsigned char *desc = (unsigned char *)regs;
+	union leaf_0x2_regs regs;
+	u8 *desc;
 
 	if (c->cpuid_level < 2)
 		return;
 
-	/* Number of times to iterate */
-	n = cpuid_eax(2) & 0xFF;
-
-	for (i = 0 ; i < n ; i++) {
-		cpuid(2, &regs[0], &regs[1], &regs[2], &regs[3]);
-
-		/* If bit 31 is set, this is an unknown format */
-		for (j = 0 ; j < 4 ; j++)
-			if (regs[j] & (1 << 31))
-				regs[j] = 0;
-
-		/* Byte 0 is level count, not a descriptor */
-		for (j = 1 ; j < 16 ; j++)
-			intel_tlb_lookup(desc[j]);
-	}
+	get_leaf_0x2_regs(&regs);
+	for_each_leaf_0x2_desc(regs, desc)
+		intel_tlb_lookup(*desc);
 }
 
 static const struct cpu_dev intel_cpu_dev = {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 11/40] x86/cacheinfo: Remove the P4 trace leftovers for real
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (9 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 12/40] x86/cacheinfo: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

commit 851026a2bf54 ("x86/cacheinfo: Remove unused trace variable") removed
the switch case for LVL_TRACE but did not get rid of the surrounding gunk.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index a6c6bccfa8b8..eccffe2ea06c 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -31,7 +31,6 @@
 #define LVL_1_DATA	2
 #define LVL_2		3
 #define LVL_3		4
-#define LVL_TRACE	5
 
 /* Shared last level cache maps */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
@@ -96,10 +95,6 @@ static const struct _cache_table cache_table[] =
 	{ 0x66, LVL_1_DATA, 8 },	/* 4-way set assoc, sectored cache, 64 byte line size */
 	{ 0x67, LVL_1_DATA, 16 },	/* 4-way set assoc, sectored cache, 64 byte line size */
 	{ 0x68, LVL_1_DATA, 32 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x70, LVL_TRACE,  12 },	/* 8-way set assoc */
-	{ 0x71, LVL_TRACE,  16 },	/* 8-way set assoc */
-	{ 0x72, LVL_TRACE,  32 },	/* 8-way set assoc */
-	{ 0x73, LVL_TRACE,  64 },	/* 8-way set assoc */
 	{ 0x78, LVL_2,      MB(1) },	/* 4-way set assoc, 64 byte line size */
 	{ 0x79, LVL_2,      128 },	/* 8-way set assoc, sectored cache, 64 byte line size */
 	{ 0x7a, LVL_2,      256 },	/* 8-way set assoc, sectored cache, 64 byte line size */
@@ -787,19 +782,13 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 			}
 		}
 	}
-	/*
-	 * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
-	 * trace cache
-	 */
-	if ((!ci->num_leaves || c->x86 == 15) && c->cpuid_level > 1) {
+
+	/* Don't use CPUID(2) if CPUID(4) is supported. */
+	if (!ci->num_leaves && c->cpuid_level > 1) {
 		/* supports eax=2  call */
 		int j, n;
 		unsigned int regs[4];
 		unsigned char *dp = (unsigned char *)regs;
-		int only_trace = 0;
-
-		if (ci->num_leaves && c->x86 == 15)
-			only_trace = 1;
 
 		/* Number of times to iterate */
 		n = cpuid_eax(2) & 0xFF;
@@ -820,8 +809,6 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 				/* look up this descriptor in the table */
 				while (cache_table[k].descriptor != 0) {
 					if (cache_table[k].descriptor == des) {
-						if (only_trace && cache_table[k].cache_type != LVL_TRACE)
-							break;
 						switch (cache_table[k].cache_type) {
 						case LVL_1_INST:
 							l1i += cache_table[k].size;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 12/40] x86/cacheinfo: Remove unnecessary headers and reorder the rest
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (10 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 11/40] x86/cacheinfo: Remove the P4 trace leftovers for real Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 13/40] x86/cacheinfo: Use cpuid leaf 0x2 parsing helpers Ahmed S. Darwish
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Remove the headers at cacheinfo.c that are no longer required.

Alphabetically reorder what remains since more headers will be included
in further commits.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index eccffe2ea06c..b3a520959b51 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -8,21 +8,19 @@
  *	Andi Kleen / Andreas Herrmann	: CPUID4 emulation on AMD.
  */
 
-#include <linux/slab.h>
 #include <linux/cacheinfo.h>
+#include <linux/capability.h>
 #include <linux/cpu.h>
 #include <linux/cpuhotplug.h>
-#include <linux/sched.h>
-#include <linux/capability.h>
-#include <linux/sysfs.h>
 #include <linux/pci.h>
 #include <linux/stop_machine.h>
+#include <linux/sysfs.h>
 
-#include <asm/cpufeature.h>
-#include <asm/cacheinfo.h>
 #include <asm/amd_nb.h>
-#include <asm/smp.h>
+#include <asm/cacheinfo.h>
+#include <asm/cpufeature.h>
 #include <asm/mtrr.h>
+#include <asm/smp.h>
 #include <asm/tlbflush.h>
 
 #include "cpu.h"
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 13/40] x86/cacheinfo: Use cpuid leaf 0x2 parsing helpers
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (11 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 12/40] x86/cacheinfo: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 14/40] x86/cacheinfo: Refactor leaf 0x2 cache descriptor lookup Ahmed S. Darwish
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Use the cpuid leaf 0x2 parsing helpers added in previous commits, which
queries the cpuid leaf just once.

Note, this also makes the leaf 0x2 parsing logic be shared with x86/cpu
intel.c

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 66 +++++++++++++--------------------
 1 file changed, 26 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index b3a520959b51..ebd72016e7a2 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -19,6 +19,7 @@
 #include <asm/amd_nb.h>
 #include <asm/cacheinfo.h>
 #include <asm/cpufeature.h>
+#include <asm/cpuid/types.h>
 #include <asm/mtrr.h>
 #include <asm/smp.h>
 #include <asm/tlbflush.h>
@@ -783,50 +784,35 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 
 	/* Don't use CPUID(2) if CPUID(4) is supported. */
 	if (!ci->num_leaves && c->cpuid_level > 1) {
-		/* supports eax=2  call */
-		int j, n;
-		unsigned int regs[4];
-		unsigned char *dp = (unsigned char *)regs;
-
-		/* Number of times to iterate */
-		n = cpuid_eax(2) & 0xFF;
-
-		for (i = 0 ; i < n ; i++) {
-			cpuid(2, &regs[0], &regs[1], &regs[2], &regs[3]);
-
-			/* If bit 31 is set, this is an unknown format */
-			for (j = 0 ; j < 4 ; j++)
-				if (regs[j] & (1 << 31))
-					regs[j] = 0;
-
-			/* Byte 0 is level count, not a descriptor */
-			for (j = 1 ; j < 16 ; j++) {
-				unsigned char des = dp[j];
-				unsigned char k = 0;
-
-				/* look up this descriptor in the table */
-				while (cache_table[k].descriptor != 0) {
-					if (cache_table[k].descriptor == des) {
-						switch (cache_table[k].cache_type) {
-						case LVL_1_INST:
-							l1i += cache_table[k].size;
-							break;
-						case LVL_1_DATA:
-							l1d += cache_table[k].size;
-							break;
-						case LVL_2:
-							l2 += cache_table[k].size;
-							break;
-						case LVL_3:
-							l3 += cache_table[k].size;
-							break;
-						}
-
+		union leaf_0x2_regs regs;
+		u8 *desc;
+
+		get_leaf_0x2_regs(&regs);
+		for_each_leaf_0x2_desc(regs, desc) {
+			unsigned char k = 0;
+
+			/* look up this descriptor in the table */
+			while (cache_table[k].descriptor != 0) {
+				if (cache_table[k].descriptor == *desc) {
+					switch (cache_table[k].cache_type) {
+					case LVL_1_INST:
+						l1i += cache_table[k].size;
+						break;
+					case LVL_1_DATA:
+						l1d += cache_table[k].size;
+						break;
+					case LVL_2:
+						l2 += cache_table[k].size;
+						break;
+					case LVL_3:
+						l3 += cache_table[k].size;
 						break;
 					}
 
-					k++;
+					break;
 				}
+
+					k++;
 			}
 		}
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 14/40] x86/cacheinfo: Refactor leaf 0x2 cache descriptor lookup
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (12 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 13/40] x86/cacheinfo: Use cpuid leaf 0x2 parsing helpers Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 15/40] x86/cacheinfo: Properly name amd_cpuid4()'s first parameter Ahmed S. Darwish
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

Extract the cache descriptor lookup logic out of the leaf 0x2 parsing
code and into a dedicated function.  This disentangles such lookup from
the deeply nested leaf 0x2 parsing loop.

Remove the cache table termination entry, as it is no longer needed
after the ARRAY_SIZE()-based lookup.

[darwi: Move refactoring logic into this separate commit + commit log.
	Remove the cache table termination entry.]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 45 +++++++++++++++------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index ebd72016e7a2..3be7ea8444ec 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -123,7 +123,6 @@ static const struct _cache_table cache_table[] =
 	{ 0xea, LVL_3,      MB(12) },	/* 24-way set assoc, 64 byte line size */
 	{ 0xeb, LVL_3,      MB(18) },	/* 24-way set assoc, 64 byte line size */
 	{ 0xec, LVL_3,      MB(24) },	/* 24-way set assoc, 64 byte line size */
-	{ 0x00, 0, 0}
 };
 
 
@@ -728,6 +727,16 @@ void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
 	ci->num_leaves = find_num_cache_leaves(c);
 }
 
+static const struct _cache_table *cache_table_get(u8 desc)
+{
+	for (int i = 0; i < ARRAY_SIZE(cache_table); i++) {
+		if (cache_table[i].descriptor == desc)
+			return &cache_table[i];
+	}
+
+	return NULL;
+}
+
 void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 {
 	/* Cache sizes */
@@ -784,35 +793,21 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 
 	/* Don't use CPUID(2) if CPUID(4) is supported. */
 	if (!ci->num_leaves && c->cpuid_level > 1) {
+		const struct _cache_table *entry;
 		union leaf_0x2_regs regs;
 		u8 *desc;
 
 		get_leaf_0x2_regs(&regs);
 		for_each_leaf_0x2_desc(regs, desc) {
-			unsigned char k = 0;
-
-			/* look up this descriptor in the table */
-			while (cache_table[k].descriptor != 0) {
-				if (cache_table[k].descriptor == *desc) {
-					switch (cache_table[k].cache_type) {
-					case LVL_1_INST:
-						l1i += cache_table[k].size;
-						break;
-					case LVL_1_DATA:
-						l1d += cache_table[k].size;
-						break;
-					case LVL_2:
-						l2 += cache_table[k].size;
-						break;
-					case LVL_3:
-						l3 += cache_table[k].size;
-						break;
-					}
-
-					break;
-				}
-
-					k++;
+			entry = cache_table_get(*desc);
+			if (!entry)
+				continue;
+
+			switch (entry->cache_type) {
+			case LVL_1_INST: l1i += entry->size; break;
+			case LVL_1_DATA: l1d += entry->size; break;
+			case LVL_2:	 l2  += entry->size; break;
+			case LVL_3:	 l3  += entry->size; break;
 			}
 		}
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 15/40] x86/cacheinfo: Properly name amd_cpuid4()'s first parameter
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (13 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 14/40] x86/cacheinfo: Refactor leaf 0x2 cache descriptor lookup Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 16/40] x86/cacheinfo: Use proper name for cacheinfo instances Ahmed S. Darwish
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

amd_cpuid4()'s first parameter, "leaf", is not a cpuid leaf as the name
implies.  Rather, it's an index emulating CPUID(4)'s subleaf semantics;
i.e. an ID for the cache object currently enumerated.  Rename that
parameter to "index".

Apply minor coding style fixes to the rest of the function as well.

[darwi: Move into a separate commit and write commit log.
        Use "index" instead of "subleaf" for amd_cpuid4() first param,
	as that's the name typically used at the whole of cacheinfo.c.]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 3be7ea8444ec..24a7503f37e2 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -233,12 +233,10 @@ static const enum cache_type cache_type_map[] = {
 };
 
 static void
-amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
-		     union _cpuid4_leaf_ebx *ebx,
-		     union _cpuid4_leaf_ecx *ecx)
+amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
+	   union _cpuid4_leaf_ebx *ebx, union _cpuid4_leaf_ecx *ecx)
 {
-	unsigned dummy;
-	unsigned line_size, lines_per_tag, assoc, size_in_kb;
+	unsigned int dummy, line_size, lines_per_tag, assoc, size_in_kb;
 	union l1_cache l1i, l1d;
 	union l2_cache l2;
 	union l3_cache l3;
@@ -251,7 +249,7 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 	cpuid(0x80000005, &dummy, &dummy, &l1d.val, &l1i.val);
 	cpuid(0x80000006, &dummy, &dummy, &l2.val, &l3.val);
 
-	switch (leaf) {
+	switch (index) {
 	case 1:
 		l1 = &l1i;
 		fallthrough;
@@ -289,12 +287,11 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_eax *eax,
 	}
 
 	eax->split.is_self_initializing = 1;
-	eax->split.type = types[leaf];
-	eax->split.level = levels[leaf];
+	eax->split.type = types[index];
+	eax->split.level = levels[index];
 	eax->split.num_threads_sharing = 0;
 	eax->split.num_cores_on_die = topology_num_cores_per_package();
 
-
 	if (assoc == 0xffff)
 		eax->split.is_fully_associative = 1;
 	ebx->split.coherency_line_size = line_size - 1;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 16/40] x86/cacheinfo: Use proper name for cacheinfo instances
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (14 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 15/40] x86/cacheinfo: Properly name amd_cpuid4()'s first parameter Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 17/40] x86/cacheinfo: Constify _cpuid4_info_regs instances Ahmed S. Darwish
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

The cacheinfo structure defined at include/linux/cacheinfo.h is a
generic cache info object representation.

Calling its instances at x86 cacheinfo.c "leaf" confuses it with a cpuid
leaf -- especially that multiple cpuid calls are already sprinkled across
that file.  Most of such instances also have a redundant "this_" prefix.

Rename all of the cacheinfo "this_leaf" instances to just "ci".

[darwi: Move into separate commit and write commit log]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 89 ++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 24a7503f37e2..5deffc834291 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -349,11 +349,10 @@ static int amd_get_l3_disable_slot(struct amd_northbridge *nb, unsigned slot)
 	return -1;
 }
 
-static ssize_t show_cache_disable(struct cacheinfo *this_leaf, char *buf,
-				  unsigned int slot)
+static ssize_t show_cache_disable(struct cacheinfo *ci, char *buf, unsigned int slot)
 {
 	int index;
-	struct amd_northbridge *nb = this_leaf->priv;
+	struct amd_northbridge *nb = ci->priv;
 
 	index = amd_get_l3_disable_slot(nb, slot);
 	if (index >= 0)
@@ -367,8 +366,8 @@ static ssize_t								\
 cache_disable_##slot##_show(struct device *dev,				\
 			    struct device_attribute *attr, char *buf)	\
 {									\
-	struct cacheinfo *this_leaf = dev_get_drvdata(dev);		\
-	return show_cache_disable(this_leaf, buf, slot);		\
+	struct cacheinfo *ci = dev_get_drvdata(dev);			\
+	return show_cache_disable(ci, buf, slot);			\
 }
 SHOW_CACHE_DISABLE(0)
 SHOW_CACHE_DISABLE(1)
@@ -435,18 +434,17 @@ static int amd_set_l3_disable_slot(struct amd_northbridge *nb, int cpu,
 	return 0;
 }
 
-static ssize_t store_cache_disable(struct cacheinfo *this_leaf,
-				   const char *buf, size_t count,
-				   unsigned int slot)
+static ssize_t store_cache_disable(struct cacheinfo *ci, const char *buf,
+				   size_t count, unsigned int slot)
 {
 	unsigned long val = 0;
 	int cpu, err = 0;
-	struct amd_northbridge *nb = this_leaf->priv;
+	struct amd_northbridge *nb = ci->priv;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	cpu = cpumask_first(&this_leaf->shared_cpu_map);
+	cpu = cpumask_first(&ci->shared_cpu_map);
 
 	if (kstrtoul(buf, 10, &val) < 0)
 		return -EINVAL;
@@ -467,8 +465,8 @@ cache_disable_##slot##_store(struct device *dev,			\
 			     struct device_attribute *attr,		\
 			     const char *buf, size_t count)		\
 {									\
-	struct cacheinfo *this_leaf = dev_get_drvdata(dev);		\
-	return store_cache_disable(this_leaf, buf, count, slot);	\
+	struct cacheinfo *ci = dev_get_drvdata(dev);			\
+	return store_cache_disable(ci, buf, count, slot);		\
 }
 STORE_CACHE_DISABLE(0)
 STORE_CACHE_DISABLE(1)
@@ -476,8 +474,8 @@ STORE_CACHE_DISABLE(1)
 static ssize_t subcaches_show(struct device *dev,
 			      struct device_attribute *attr, char *buf)
 {
-	struct cacheinfo *this_leaf = dev_get_drvdata(dev);
-	int cpu = cpumask_first(&this_leaf->shared_cpu_map);
+	struct cacheinfo *ci = dev_get_drvdata(dev);
+	int cpu = cpumask_first(&ci->shared_cpu_map);
 
 	return sprintf(buf, "%x\n", amd_get_subcaches(cpu));
 }
@@ -486,8 +484,8 @@ static ssize_t subcaches_store(struct device *dev,
 			       struct device_attribute *attr,
 			       const char *buf, size_t count)
 {
-	struct cacheinfo *this_leaf = dev_get_drvdata(dev);
-	int cpu = cpumask_first(&this_leaf->shared_cpu_map);
+	struct cacheinfo *ci = dev_get_drvdata(dev);
+	int cpu = cpumask_first(&ci->shared_cpu_map);
 	unsigned long val;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -511,10 +509,10 @@ cache_private_attrs_is_visible(struct kobject *kobj,
 			       struct attribute *attr, int unused)
 {
 	struct device *dev = kobj_to_dev(kobj);
-	struct cacheinfo *this_leaf = dev_get_drvdata(dev);
+	struct cacheinfo *ci = dev_get_drvdata(dev);
 	umode_t mode = attr->mode;
 
-	if (!this_leaf->priv)
+	if (!ci->priv)
 		return 0;
 
 	if ((attr == &dev_attr_subcaches.attr) &&
@@ -562,11 +560,11 @@ static void init_amd_l3_attrs(void)
 }
 
 const struct attribute_group *
-cache_get_priv_group(struct cacheinfo *this_leaf)
+cache_get_priv_group(struct cacheinfo *ci)
 {
-	struct amd_northbridge *nb = this_leaf->priv;
+	struct amd_northbridge *nb = ci->priv;
 
-	if (this_leaf->level < 3 || !nb)
+	if (ci->level < 3 || !nb)
 		return NULL;
 
 	if (nb && nb->l3_cache.indices)
@@ -846,7 +844,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 				    struct _cpuid4_info_regs *base)
 {
 	struct cpu_cacheinfo *this_cpu_ci;
-	struct cacheinfo *this_leaf;
+	struct cacheinfo *ci;
 	int i, sibling;
 
 	/*
@@ -858,12 +856,12 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 			this_cpu_ci = get_cpu_cacheinfo(i);
 			if (!this_cpu_ci->info_list)
 				continue;
-			this_leaf = this_cpu_ci->info_list + index;
+			ci = this_cpu_ci->info_list + index;
 			for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) {
 				if (!cpu_online(sibling))
 					continue;
 				cpumask_set_cpu(sibling,
-						&this_leaf->shared_cpu_map);
+						&ci->shared_cpu_map);
 			}
 		}
 	} else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
@@ -883,14 +881,14 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 			if ((apicid < first) || (apicid > last))
 				continue;
 
-			this_leaf = this_cpu_ci->info_list + index;
+			ci = this_cpu_ci->info_list + index;
 
 			for_each_online_cpu(sibling) {
 				apicid = cpu_data(sibling).topo.apicid;
 				if ((apicid < first) || (apicid > last))
 					continue;
 				cpumask_set_cpu(sibling,
-						&this_leaf->shared_cpu_map);
+						&ci->shared_cpu_map);
 			}
 		}
 	} else
@@ -903,7 +901,7 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 				 struct _cpuid4_info_regs *base)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-	struct cacheinfo *this_leaf, *sibling_leaf;
+	struct cacheinfo *ci, *sibling_ci;
 	unsigned long num_threads_sharing;
 	int index_msb, i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
@@ -914,10 +912,10 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 			return;
 	}
 
-	this_leaf = this_cpu_ci->info_list + index;
+	ci = this_cpu_ci->info_list + index;
 	num_threads_sharing = 1 + base->eax.split.num_threads_sharing;
 
-	cpumask_set_cpu(cpu, &this_leaf->shared_cpu_map);
+	cpumask_set_cpu(cpu, &ci->shared_cpu_map);
 	if (num_threads_sharing == 1)
 		return;
 
@@ -929,28 +927,27 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 
 			if (i == cpu || !sib_cpu_ci->info_list)
 				continue;/* skip if itself or no cacheinfo */
-			sibling_leaf = sib_cpu_ci->info_list + index;
-			cpumask_set_cpu(i, &this_leaf->shared_cpu_map);
-			cpumask_set_cpu(cpu, &sibling_leaf->shared_cpu_map);
+			sibling_ci = sib_cpu_ci->info_list + index;
+			cpumask_set_cpu(i, &ci->shared_cpu_map);
+			cpumask_set_cpu(cpu, &sibling_ci->shared_cpu_map);
 		}
 }
 
-static void ci_leaf_init(struct cacheinfo *this_leaf,
-			 struct _cpuid4_info_regs *base)
+static void ci_info_init(struct cacheinfo *ci, struct _cpuid4_info_regs *base)
 {
-	this_leaf->id = base->id;
-	this_leaf->attributes = CACHE_ID;
-	this_leaf->level = base->eax.split.level;
-	this_leaf->type = cache_type_map[base->eax.split.type];
-	this_leaf->coherency_line_size =
+	ci->id = base->id;
+	ci->attributes = CACHE_ID;
+	ci->level = base->eax.split.level;
+	ci->type = cache_type_map[base->eax.split.type];
+	ci->coherency_line_size =
 				base->ebx.split.coherency_line_size + 1;
-	this_leaf->ways_of_associativity =
+	ci->ways_of_associativity =
 				base->ebx.split.ways_of_associativity + 1;
-	this_leaf->size = base->size;
-	this_leaf->number_of_sets = base->ecx.split.number_of_sets + 1;
-	this_leaf->physical_line_partition =
+	ci->size = base->size;
+	ci->number_of_sets = base->ecx.split.number_of_sets + 1;
+	ci->physical_line_partition =
 				base->ebx.split.physical_line_partition + 1;
-	this_leaf->priv = base->nb;
+	ci->priv = base->nb;
 }
 
 int init_cache_level(unsigned int cpu)
@@ -984,7 +981,7 @@ int populate_cache_leaves(unsigned int cpu)
 {
 	unsigned int idx, ret;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-	struct cacheinfo *this_leaf = this_cpu_ci->info_list;
+	struct cacheinfo *ci = this_cpu_ci->info_list;
 	struct _cpuid4_info_regs id4_regs = {};
 
 	for (idx = 0; idx < this_cpu_ci->num_leaves; idx++) {
@@ -992,7 +989,7 @@ int populate_cache_leaves(unsigned int cpu)
 		if (ret)
 			return ret;
 		get_cache_id(cpu, &id4_regs);
-		ci_leaf_init(this_leaf++, &id4_regs);
+		ci_info_init(ci++, &id4_regs);
 		__cache_cpumap_setup(cpu, idx, &id4_regs);
 	}
 	this_cpu_ci->cpu_map_populated = true;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 17/40] x86/cacheinfo: Constify _cpuid4_info_regs instances
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (15 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 16/40] x86/cacheinfo: Use proper name for cacheinfo instances Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 18/40] x86/cacheinfo: Align ci_info_init() assignment expressions Ahmed S. Darwish
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

_cpuid4_info_regs instances are passed through a large number of
functions at cacheinfo.c.  For clarity, constify the instance parameters
where _cpuid4_info_regs is only read from.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 5deffc834291..15ae12c92a83 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -841,7 +841,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 }
 
 static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
-				    struct _cpuid4_info_regs *base)
+				    const struct _cpuid4_info_regs *base)
 {
 	struct cpu_cacheinfo *this_cpu_ci;
 	struct cacheinfo *ci;
@@ -898,7 +898,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 }
 
 static void __cache_cpumap_setup(unsigned int cpu, int index,
-				 struct _cpuid4_info_regs *base)
+				 const struct _cpuid4_info_regs *base)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci, *sibling_ci;
@@ -933,7 +933,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 		}
 }
 
-static void ci_info_init(struct cacheinfo *ci, struct _cpuid4_info_regs *base)
+static void ci_info_init(struct cacheinfo *ci,
+			 const struct _cpuid4_info_regs *base)
 {
 	ci->id = base->id;
 	ci->attributes = CACHE_ID;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 18/40] x86/cacheinfo: Align ci_info_init() assignment expressions
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (16 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 17/40] x86/cacheinfo: Constify _cpuid4_info_regs instances Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 19/40] x86/cacheinfo: Standardize _cpuid4_info_regs instance naming Ahmed S. Darwish
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The ci_info_init() function initializes 10 members of a struct cacheinfo
instance using passed data from cpuid leaf 0x4.

Such assignment expressions are difficult to read in their current form.
Align them for clarity.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 15ae12c92a83..f825d68e8de6 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -936,19 +936,16 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 static void ci_info_init(struct cacheinfo *ci,
 			 const struct _cpuid4_info_regs *base)
 {
-	ci->id = base->id;
-	ci->attributes = CACHE_ID;
-	ci->level = base->eax.split.level;
-	ci->type = cache_type_map[base->eax.split.type];
-	ci->coherency_line_size =
-				base->ebx.split.coherency_line_size + 1;
-	ci->ways_of_associativity =
-				base->ebx.split.ways_of_associativity + 1;
-	ci->size = base->size;
-	ci->number_of_sets = base->ecx.split.number_of_sets + 1;
-	ci->physical_line_partition =
-				base->ebx.split.physical_line_partition + 1;
-	ci->priv = base->nb;
+	ci->id				= base->id;
+	ci->attributes			= CACHE_ID;
+	ci->level			= base->eax.split.level;
+	ci->type			= cache_type_map[base->eax.split.type];
+	ci->coherency_line_size		= base->ebx.split.coherency_line_size + 1;
+	ci->ways_of_associativity	= base->ebx.split.ways_of_associativity + 1;
+	ci->size			= base->size;
+	ci->number_of_sets		= base->ecx.split.number_of_sets + 1;
+	ci->physical_line_partition	= base->ebx.split.physical_line_partition + 1;
+	ci->priv			= base->nb;
 }
 
 int init_cache_level(unsigned int cpu)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 19/40] x86/cacheinfo: Standardize _cpuid4_info_regs instance naming
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (17 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 18/40] x86/cacheinfo: Align ci_info_init() assignment expressions Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 20/40] x86: treewide: Introduce x86_vendor_amd_or_hygon() Ahmed S. Darwish
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The cacheinfo code frequently uses the output registers from cpuid leaf
0x4.  Such registers are cached at struct _cpuid4_info_regs, augmented
with related information, and are then passed across functions.

The naming of these _cpuid4_info_regs instances is confusing at best.

Some instances are called "this_leaf", which is vague as "this" lacks
context and "leaf" is overly generic given that other cpuid leaves are
also processed within cacheinfo.  Other _cpuid4_info_regs instances are
just called "base", adding further ambiguity.

Standardize on id4 for all instances.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 97 +++++++++++++++++----------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index f825d68e8de6..74a2949ff872 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -573,7 +573,7 @@ cache_get_priv_group(struct cacheinfo *ci)
 	return &cache_private_group;
 }
 
-static void amd_init_l3_cache(struct _cpuid4_info_regs *this_leaf, int index)
+static void amd_init_l3_cache(struct _cpuid4_info_regs *id4, int index)
 {
 	int node;
 
@@ -582,16 +582,16 @@ static void amd_init_l3_cache(struct _cpuid4_info_regs *this_leaf, int index)
 		return;
 
 	node = topology_amd_node_id(smp_processor_id());
-	this_leaf->nb = node_to_amd_nb(node);
-	if (this_leaf->nb && !this_leaf->nb->l3_cache.indices)
-		amd_calc_l3_indices(this_leaf->nb);
+	id4->nb = node_to_amd_nb(node);
+	if (id4->nb && !id4->nb->l3_cache.indices)
+		amd_calc_l3_indices(id4->nb);
 }
 #else
 #define amd_init_l3_cache(x, y)
 #endif  /* CONFIG_AMD_NB && CONFIG_SYSFS */
 
 static int
-cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *this_leaf)
+cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4)
 {
 	union _cpuid4_leaf_eax	eax;
 	union _cpuid4_leaf_ebx	ebx;
@@ -604,11 +604,11 @@ cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *this_leaf)
 				    &ebx.full, &ecx.full, &edx);
 		else
 			amd_cpuid4(index, &eax, &ebx, &ecx);
-		amd_init_l3_cache(this_leaf, index);
+		amd_init_l3_cache(id4, index);
 	} else if (boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
 		cpuid_count(0x8000001d, index, &eax.full,
 			    &ebx.full, &ecx.full, &edx);
-		amd_init_l3_cache(this_leaf, index);
+		amd_init_l3_cache(id4, index);
 	} else {
 		cpuid_count(4, index, &eax.full, &ebx.full, &ecx.full, &edx);
 	}
@@ -616,13 +616,14 @@ cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *this_leaf)
 	if (eax.split.type == CTYPE_NULL)
 		return -EIO; /* better error ? */
 
-	this_leaf->eax = eax;
-	this_leaf->ebx = ebx;
-	this_leaf->ecx = ecx;
-	this_leaf->size = (ecx.split.number_of_sets          + 1) *
-			  (ebx.split.coherency_line_size     + 1) *
-			  (ebx.split.physical_line_partition + 1) *
-			  (ebx.split.ways_of_associativity   + 1);
+	id4->eax = eax;
+	id4->ebx = ebx;
+	id4->ecx = ecx;
+	id4->size = (ecx.split.number_of_sets          + 1) *
+		    (ebx.split.coherency_line_size     + 1) *
+		    (ebx.split.physical_line_partition + 1) *
+		    (ebx.split.ways_of_associativity   + 1);
+
 	return 0;
 }
 
@@ -754,29 +755,29 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 		 * parameters cpuid leaf to find the cache details
 		 */
 		for (i = 0; i < ci->num_leaves; i++) {
-			struct _cpuid4_info_regs this_leaf = {};
+			struct _cpuid4_info_regs id4 = {};
 			int retval;
 
-			retval = cpuid4_cache_lookup_regs(i, &this_leaf);
+			retval = cpuid4_cache_lookup_regs(i, &id4);
 			if (retval < 0)
 				continue;
 
-			switch (this_leaf.eax.split.level) {
+			switch (id4.eax.split.level) {
 			case 1:
-				if (this_leaf.eax.split.type == CTYPE_DATA)
-					new_l1d = this_leaf.size/1024;
-				else if (this_leaf.eax.split.type == CTYPE_INST)
-					new_l1i = this_leaf.size/1024;
+				if (id4.eax.split.type == CTYPE_DATA)
+					new_l1d = id4.size/1024;
+				else if (id4.eax.split.type == CTYPE_INST)
+					new_l1i = id4.size/1024;
 				break;
 			case 2:
-				new_l2 = this_leaf.size/1024;
-				num_threads_sharing = 1 + this_leaf.eax.split.num_threads_sharing;
+				new_l2 = id4.size/1024;
+				num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
 				index_msb = get_count_order(num_threads_sharing);
 				l2_id = c->topo.apicid & ~((1 << index_msb) - 1);
 				break;
 			case 3:
-				new_l3 = this_leaf.size/1024;
-				num_threads_sharing = 1 + this_leaf.eax.split.num_threads_sharing;
+				new_l3 = id4.size/1024;
+				num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
 				index_msb = get_count_order(num_threads_sharing);
 				l3_id = c->topo.apicid & ~((1 << index_msb) - 1);
 				break;
@@ -841,7 +842,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 }
 
 static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
-				    const struct _cpuid4_info_regs *base)
+				    const struct _cpuid4_info_regs *id4)
 {
 	struct cpu_cacheinfo *this_cpu_ci;
 	struct cacheinfo *ci;
@@ -867,7 +868,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 	} else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
 		unsigned int apicid, nshared, first, last;
 
-		nshared = base->eax.split.num_threads_sharing + 1;
+		nshared = id4->eax.split.num_threads_sharing + 1;
 		apicid = cpu_data(cpu).topo.apicid;
 		first = apicid - (apicid % nshared);
 		last = first + nshared - 1;
@@ -898,7 +899,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 }
 
 static void __cache_cpumap_setup(unsigned int cpu, int index,
-				 const struct _cpuid4_info_regs *base)
+				 const struct _cpuid4_info_regs *id4)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci, *sibling_ci;
@@ -908,12 +909,12 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 
 	if (c->x86_vendor == X86_VENDOR_AMD ||
 	    c->x86_vendor == X86_VENDOR_HYGON) {
-		if (__cache_amd_cpumap_setup(cpu, index, base))
+		if (__cache_amd_cpumap_setup(cpu, index, id4))
 			return;
 	}
 
 	ci = this_cpu_ci->info_list + index;
-	num_threads_sharing = 1 + base->eax.split.num_threads_sharing;
+	num_threads_sharing = 1 + id4->eax.split.num_threads_sharing;
 
 	cpumask_set_cpu(cpu, &ci->shared_cpu_map);
 	if (num_threads_sharing == 1)
@@ -934,18 +935,18 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 }
 
 static void ci_info_init(struct cacheinfo *ci,
-			 const struct _cpuid4_info_regs *base)
+			 const struct _cpuid4_info_regs *id4)
 {
-	ci->id				= base->id;
+	ci->id				= id4->id;
 	ci->attributes			= CACHE_ID;
-	ci->level			= base->eax.split.level;
-	ci->type			= cache_type_map[base->eax.split.type];
-	ci->coherency_line_size		= base->ebx.split.coherency_line_size + 1;
-	ci->ways_of_associativity	= base->ebx.split.ways_of_associativity + 1;
-	ci->size			= base->size;
-	ci->number_of_sets		= base->ecx.split.number_of_sets + 1;
-	ci->physical_line_partition	= base->ebx.split.physical_line_partition + 1;
-	ci->priv			= base->nb;
+	ci->level			= id4->eax.split.level;
+	ci->type			= cache_type_map[id4->eax.split.type];
+	ci->coherency_line_size		= id4->ebx.split.coherency_line_size + 1;
+	ci->ways_of_associativity	= id4->ebx.split.ways_of_associativity + 1;
+	ci->size			= id4->size;
+	ci->number_of_sets		= id4->ecx.split.number_of_sets + 1;
+	ci->physical_line_partition	= id4->ebx.split.physical_line_partition + 1;
+	ci->priv			= id4->nb;
 }
 
 int init_cache_level(unsigned int cpu)
@@ -964,15 +965,15 @@ int init_cache_level(unsigned int cpu)
  * ECX as cache index. Then right shift apicid by the number's order to get
  * cache id for this cache node.
  */
-static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs)
+static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4)
 {
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	unsigned long num_threads_sharing;
 	int index_msb;
 
-	num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
+	num_threads_sharing = 1 + id4->eax.split.num_threads_sharing;
 	index_msb = get_count_order(num_threads_sharing);
-	id4_regs->id = c->topo.apicid >> index_msb;
+	id4->id = c->topo.apicid >> index_msb;
 }
 
 int populate_cache_leaves(unsigned int cpu)
@@ -980,15 +981,15 @@ int populate_cache_leaves(unsigned int cpu)
 	unsigned int idx, ret;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci = this_cpu_ci->info_list;
-	struct _cpuid4_info_regs id4_regs = {};
+	struct _cpuid4_info_regs id4 = {};
 
 	for (idx = 0; idx < this_cpu_ci->num_leaves; idx++) {
-		ret = cpuid4_cache_lookup_regs(idx, &id4_regs);
+		ret = cpuid4_cache_lookup_regs(idx, &id4);
 		if (ret)
 			return ret;
-		get_cache_id(cpu, &id4_regs);
-		ci_info_init(ci++, &id4_regs);
-		__cache_cpumap_setup(cpu, idx, &id4_regs);
+		get_cache_id(cpu, &id4);
+		ci_info_init(ci++, &id4);
+		__cache_cpumap_setup(cpu, idx, &id4);
 	}
 	this_cpu_ci->cpu_map_populated = true;
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 20/40] x86: treewide: Introduce x86_vendor_amd_or_hygon()
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (18 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 19/40] x86/cacheinfo: Standardize _cpuid4_info_regs instance naming Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 21/40] x86/cacheinfo: Consolidate AMD/Hygon leaf 0x8000001d calls Ahmed S. Darwish
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The pattern to check if an x86 vendor is AMD or HYGON (or not both), is
pretty common across the x86 tree.

Introduce x86_vendor_amd_or_hygon() macro at asm/processor.h, and use it
across the x86 tree.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/events/amd/uncore.c       |  3 +--
 arch/x86/events/rapl.c             |  3 +--
 arch/x86/include/asm/processor.h   |  5 +++++
 arch/x86/kernel/amd_nb.c           |  9 +++------
 arch/x86/kernel/cpu/bugs.c         | 12 ++++--------
 arch/x86/kernel/cpu/cacheinfo.c    |  7 ++-----
 arch/x86/kernel/cpu/mce/core.c     |  4 ++--
 arch/x86/kernel/cpu/mce/severity.c |  3 +--
 arch/x86/kernel/cpu/mtrr/cleanup.c |  3 +--
 arch/x86/kernel/smpboot.c          |  3 +--
 arch/x86/kvm/svm/svm.c             |  3 +--
 arch/x86/pci/amd_bus.c             |  3 +--
 arch/x86/xen/enlighten.c           | 15 +++++----------
 arch/x86/xen/pmu.c                 |  3 +--
 14 files changed, 29 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 49c26ce2b115..5141c0375990 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -1023,8 +1023,7 @@ static int __init amd_uncore_init(void)
 	int ret = -ENODEV;
 	int i;
 
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return -ENODEV;
 
 	if (!boot_cpu_has(X86_FEATURE_TOPOEXT))
diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 6941f4811bec..999ea90059ae 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -123,8 +123,7 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
  *	     them as die-scope.
  */
 #define rapl_pkg_pmu_is_pkg_scope()				\
-	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
-	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+	x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor)
 
 struct rapl_pmu {
 	raw_spinlock_t		lock;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index d5d9a071cddc..0f586c638e87 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -203,6 +203,11 @@ struct cpuinfo_x86 {
 
 #define X86_VENDOR_UNKNOWN	0xff
 
+static inline bool x86_vendor_amd_or_hygon(u8 vendor)
+{
+	return (vendor == X86_VENDOR_AMD || vendor == X86_VENDOR_HYGON);
+}
+
 /*
  * capabilities of CPUs
  */
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 11fac09e3a8c..bac8d3b6f12b 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -127,8 +127,7 @@ bool __init early_is_amd_nb(u32 device)
 	const struct pci_device_id *id;
 	u32 vendor = device & 0xffff;
 
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return false;
 
 	if (cpu_feature_enabled(X86_FEATURE_ZEN))
@@ -147,8 +146,7 @@ struct resource *amd_get_mmconfig_range(struct resource *res)
 	u64 base, msr;
 	unsigned int segn_busn_bits;
 
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return NULL;
 
 	/* assume all cpus from fam10h have mmconfig */
@@ -320,8 +318,7 @@ static __init void fix_erratum_688(void)
 
 static __init int init_amd_nbs(void)
 {
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return 0;
 
 	amd_cache_northbridges();
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index a5d0998d7604..b0dc4e96f4bc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1081,8 +1081,7 @@ static void __init retbleed_select_mitigation(void)
 
 do_cmd_auto:
 	case RETBLEED_CMD_AUTO:
-		if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-		    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
+		if (x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor)) {
 			if (IS_ENABLED(CONFIG_MITIGATION_UNRET_ENTRY))
 				retbleed_mitigation = RETBLEED_MITIGATION_UNRET;
 			else if (IS_ENABLED(CONFIG_MITIGATION_IBPB_ENTRY) &&
@@ -1106,8 +1105,7 @@ static void __init retbleed_select_mitigation(void)
 
 		x86_return_thunk = retbleed_return_thunk;
 
-		if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-		    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+		if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 			pr_err(RETBLEED_UNTRAIN_MSG);
 
 		mitigate_smt = true;
@@ -1872,8 +1870,7 @@ static void __init spectre_v2_select_mitigation(void)
 	 */
 	if (boot_cpu_has_bug(X86_BUG_RETBLEED) &&
 	    boot_cpu_has(X86_FEATURE_IBPB) &&
-	    (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	     boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)) {
+	    x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor)) {
 
 		if (retbleed_cmd != RETBLEED_CMD_IBPB) {
 			setup_force_cpu_cap(X86_FEATURE_USE_IBPB_FW);
@@ -2903,8 +2900,7 @@ static ssize_t retbleed_show_state(char *buf)
 {
 	if (retbleed_mitigation == RETBLEED_MITIGATION_UNRET ||
 	    retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
-		if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-		    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+		if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 			return sysfs_emit(buf, "Vulnerable: untrained return thunk / IBPB on non-AMD based uarch\n");
 
 		return sysfs_emit(buf, "%s; SMT %s\n", retbleed_strings[retbleed_mitigation],
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 74a2949ff872..0024d126c385 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -633,8 +633,7 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c)
 	union _cpuid4_leaf_eax	cache_eax;
 	int 			i = -1;
 
-	if (c->x86_vendor == X86_VENDOR_AMD ||
-	    c->x86_vendor == X86_VENDOR_HYGON)
+	if (x86_vendor_amd_or_hygon(c->x86_vendor))
 		op = 0x8000001d;
 	else
 		op = 4;
@@ -907,11 +906,9 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 	int index_msb, i;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	if (c->x86_vendor == X86_VENDOR_AMD ||
-	    c->x86_vendor == X86_VENDOR_HYGON) {
+	if (x86_vendor_amd_or_hygon(c->x86_vendor))
 		if (__cache_amd_cpumap_setup(cpu, index, id4))
 			return;
-	}
 
 	ci = this_cpu_ci->info_list + index;
 	num_threads_sharing = 1 + id4->eax.split.num_threads_sharing;
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0dc00c9894c7..135d7b8f3e55 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -227,7 +227,7 @@ static void print_mce(struct mce_hw_err *err)
 
 	__print_mce(err);
 
-	if (m->cpuvendor != X86_VENDOR_AMD && m->cpuvendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(m->cpuvendor))
 		pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
 }
 
@@ -2060,7 +2060,7 @@ static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
  */
 static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
 {
-	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
+	if (x86_vendor_amd_or_hygon(c->x86_vendor)) {
 		mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
 		mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
 		mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index dac4d64dfb2a..a3f2f1c236bc 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -413,8 +413,7 @@ static noinstr int mce_severity_intel(struct mce *m, struct pt_regs *regs, char
 
 int noinstr mce_severity(struct mce *m, struct pt_regs *regs, char **msg, bool is_excp)
 {
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+	if (x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return mce_severity_amd(m, regs, msg, is_excp);
 	else
 		return mce_severity_intel(m, regs, msg, is_excp);
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 18cf79d6e2c5..236d7e3b4e55 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -820,8 +820,7 @@ int __init amd_special_default_mtrr(void)
 {
 	u32 l, h;
 
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return 0;
 	if (boot_cpu_data.x86 < 0xf)
 		return 0;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e199465dc9e1..5ba8424cf4e6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1250,8 +1250,7 @@ static inline void mwait_play_dead(void)
 	unsigned int highest_subcstate = 0;
 	int i;
 
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+	if (x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return;
 	if (!this_cpu_has(X86_FEATURE_MWAIT))
 		return;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..8c88f3c0c2cd 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -523,8 +523,7 @@ static bool __kvm_is_svm_supported(void)
 	int cpu = smp_processor_id();
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	if (c->x86_vendor != X86_VENDOR_AMD &&
-	    c->x86_vendor != X86_VENDOR_HYGON) {
+	if (!x86_vendor_amd_or_hygon(c->x86_vendor)) {
 		pr_err("CPU %d isn't AMD or Hygon\n", cpu);
 		return false;
 	}
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 631512f7ec85..43033d54080a 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -399,8 +399,7 @@ static int __init pci_io_ecs_init(void)
 
 static int __init amd_postcore_init(void)
 {
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return 0;
 
 	early_root_info_init();
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 43dcd8c7badc..13df4917d7d8 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -82,11 +82,9 @@ void xen_hypercall_setfunc(void)
 	if (static_call_query(xen_hypercall) != xen_hypercall_hvm)
 		return;
 
-	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	     boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
-		static_call_update(xen_hypercall, xen_hypercall_amd);
-	else
-		static_call_update(xen_hypercall, xen_hypercall_intel);
+	static_call_update(xen_hypercall,
+			   x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor) ?
+			   xen_hypercall_amd : xen_hypercall_intel);
 }
 
 /*
@@ -118,11 +116,8 @@ noinstr void *__xen_hypercall_setfunc(void)
 	if (!boot_cpu_has(X86_FEATURE_CPUID))
 		xen_get_vendor();
 
-	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	     boot_cpu_data.x86_vendor == X86_VENDOR_HYGON))
-		func = xen_hypercall_amd;
-	else
-		func = xen_hypercall_intel;
+	func = x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor) ?
+		xen_hypercall_amd : xen_hypercall_intel;
 
 	static_call_update_early(xen_hypercall, func);
 
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index f06987b0efc3..af5cb19b5990 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -130,8 +130,7 @@ static inline uint32_t get_fam15h_addr(u32 addr)
 
 static inline bool is_amd_pmu_msr(unsigned int msr)
 {
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
+	if (!x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
 		return false;
 
 	if ((msr >= MSR_F15H_PERF_CTL &&
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 21/40] x86/cacheinfo: Consolidate AMD/Hygon leaf 0x8000001d calls
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (19 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 20/40] x86: treewide: Introduce x86_vendor_amd_or_hygon() Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 22/40] x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs Ahmed S. Darwish
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

While gathering CPU cache info, cpuid leaf 0x8000001d is invoked in two
separate if blocks: one for Hygon CPUs and one for AMDs with topology
extensions.  After each invocation, amd_init_l3_cache() is called.

Merge the two if blocks into a single condition, thus removing the
duplicated code.  Future commits will expand these if blocks, so
combining them now is both cleaner and more maintainable.

Note, while at it, remove a useless "better error?" comment that was
within the same function since the 2005 commit e2cac78935ff ("[PATCH]
x86_64: When running cpuid4 need to run on the correct CPU").

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 0024d126c385..6aeabbd94997 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -598,23 +598,24 @@ cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4)
 	union _cpuid4_leaf_ecx	ecx;
 	unsigned		edx;
 
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
-		if (boot_cpu_has(X86_FEATURE_TOPOEXT))
+	if (x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor)) {
+		if (boot_cpu_has(X86_FEATURE_TOPOEXT) ||
+		    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
+			/* AMD with TOPOEXT, or HYGON */
 			cpuid_count(0x8000001d, index, &eax.full,
 				    &ebx.full, &ecx.full, &edx);
-		else
+		} else {
+			/* Legacy AMD fallback */
 			amd_cpuid4(index, &eax, &ebx, &ecx);
-		amd_init_l3_cache(id4, index);
-	} else if (boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
-		cpuid_count(0x8000001d, index, &eax.full,
-			    &ebx.full, &ecx.full, &edx);
+		}
 		amd_init_l3_cache(id4, index);
 	} else {
+		/* Intel */
 		cpuid_count(4, index, &eax.full, &ebx.full, &ecx.full, &edx);
 	}
 
 	if (eax.split.type == CTYPE_NULL)
-		return -EIO; /* better error ? */
+		return -EIO;
 
 	id4->eax = eax;
 	id4->ebx = ebx;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 22/40] x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (20 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 21/40] x86/cacheinfo: Consolidate AMD/Hygon leaf 0x8000001d calls Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 23/40] x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file Ahmed S. Darwish
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The _cpuid4_info_regs structure is meant to hold the cpuid leaf 0x4
output registers (EAX, EBX, and ECX), as well as derived information
such as the cache node ID and size.

It also contains a reference to amd_northbridge, which is there only to
be "parked" until ci_info_init() can store it in the priv pointer of the
linux/cacheinfo.h API.  That priv pointer is then used by AMD-specific
L3 cache_disable_0/1 sysfs attributes.

Decouple amd_northbridge from _cpuid4_info_regs and pass it explicitly
through the functions at x86/cacheinfo.  Doing so clarifies when
amd_northbridge is actually needed (AMD-only code) and when it is
not (Intel-specific code).  It also prepares for moving the AMD-specific
L3 cache_disable_0/1 sysfs code into its own file in next commit.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 45 +++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 6aeabbd94997..2a56c7cc3c2d 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -168,7 +168,6 @@ struct _cpuid4_info_regs {
 	union _cpuid4_leaf_ecx ecx;
 	unsigned int id;
 	unsigned long size;
-	struct amd_northbridge *nb;
 };
 
 /* AMD doesn't have CPUID4. Emulate it here to report the same
@@ -573,25 +572,36 @@ cache_get_priv_group(struct cacheinfo *ci)
 	return &cache_private_group;
 }
 
-static void amd_init_l3_cache(struct _cpuid4_info_regs *id4, int index)
+static struct amd_northbridge *amd_init_l3_cache(int index)
 {
+	struct amd_northbridge *nb;
 	int node;
 
 	/* only for L3, and not in virtualized environments */
 	if (index < 3)
-		return;
+		return NULL;
 
 	node = topology_amd_node_id(smp_processor_id());
-	id4->nb = node_to_amd_nb(node);
-	if (id4->nb && !id4->nb->l3_cache.indices)
-		amd_calc_l3_indices(id4->nb);
+	nb = node_to_amd_nb(node);
+	if (nb && !nb->l3_cache.indices)
+		amd_calc_l3_indices(nb);
+
+	return nb;
 }
 #else
-#define amd_init_l3_cache(x, y)
+static struct amd_northbridge *amd_init_l3_cache(int index)
+{
+	return NULL;
+}
 #endif  /* CONFIG_AMD_NB && CONFIG_SYSFS */
 
-static int
-cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4)
+/*
+ * Fill passed _cpuid4_info_regs structure.
+ * Intel-only code paths should pass NULL for the amd_northbridge
+ * return pointer.
+ */
+static int cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4,
+				    struct amd_northbridge **nb)
 {
 	union _cpuid4_leaf_eax	eax;
 	union _cpuid4_leaf_ebx	ebx;
@@ -608,7 +618,9 @@ cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4)
 			/* Legacy AMD fallback */
 			amd_cpuid4(index, &eax, &ebx, &ecx);
 		}
-		amd_init_l3_cache(id4, index);
+
+		if (nb)
+			*nb = amd_init_l3_cache(index);
 	} else {
 		/* Intel */
 		cpuid_count(4, index, &eax.full, &ebx.full, &ecx.full, &edx);
@@ -758,7 +770,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 			struct _cpuid4_info_regs id4 = {};
 			int retval;
 
-			retval = cpuid4_cache_lookup_regs(i, &id4);
+			retval = cpuid4_cache_lookup_regs(i, &id4, NULL);
 			if (retval < 0)
 				continue;
 
@@ -932,8 +944,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 		}
 }
 
-static void ci_info_init(struct cacheinfo *ci,
-			 const struct _cpuid4_info_regs *id4)
+static void ci_info_init(struct cacheinfo *ci, const struct _cpuid4_info_regs *id4,
+			 struct amd_northbridge *nb)
 {
 	ci->id				= id4->id;
 	ci->attributes			= CACHE_ID;
@@ -944,7 +956,7 @@ static void ci_info_init(struct cacheinfo *ci,
 	ci->size			= id4->size;
 	ci->number_of_sets		= id4->ecx.split.number_of_sets + 1;
 	ci->physical_line_partition	= id4->ebx.split.physical_line_partition + 1;
-	ci->priv			= id4->nb;
+	ci->priv			= nb;
 }
 
 int init_cache_level(unsigned int cpu)
@@ -980,13 +992,14 @@ int populate_cache_leaves(unsigned int cpu)
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci = this_cpu_ci->info_list;
 	struct _cpuid4_info_regs id4 = {};
+	struct amd_northbridge *nb;
 
 	for (idx = 0; idx < this_cpu_ci->num_leaves; idx++) {
-		ret = cpuid4_cache_lookup_regs(idx, &id4);
+		ret = cpuid4_cache_lookup_regs(idx, &id4, &nb);
 		if (ret)
 			return ret;
 		get_cache_id(cpu, &id4);
-		ci_info_init(ci++, &id4);
+		ci_info_init(ci++, &id4, nb);
 		__cache_cpumap_setup(cpu, idx, &id4);
 	}
 	this_cpu_ci->cpu_map_populated = true;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 23/40] x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (21 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 22/40] x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 24/40] x86/cacheinfo: Use sysfs_emit() for sysfs attributes show() Ahmed S. Darwish
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Parent commit decoupled amd_northbridge out of _cpuid4_info_regs, where
it was merely "parked" there until ci_info_init() can store it in the
priv pointer of the linux/cacheinfo.h API.

Given that decoupling, move the AMD-specific L3 cache_disable_0/1 sysfs
code from the generic (and already extremely convoluted) x86/cacheinfo
code into its own file.

Compile the file only if CONFIG_AMD_NB and CONFIG_SYSFS are both
enabled, which mirrors the existing logic.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/Makefile            |   3 +
 arch/x86/kernel/cpu/amd_cache_disable.c | 301 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/cacheinfo.c         | 298 -----------------------
 arch/x86/kernel/cpu/cpu.h               |   9 +
 4 files changed, 313 insertions(+), 298 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/amd_cache_disable.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4efdf5c2efc8..3a39396d422d 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -38,6 +38,9 @@ obj-y					+= intel.o tsx.o
 obj-$(CONFIG_PM)			+= intel_epb.o
 endif
 obj-$(CONFIG_CPU_SUP_AMD)		+= amd.o
+ifeq ($(CONFIG_AMD_NB)$(CONFIG_SYSFS),yy)
+obj-y					+= amd_cache_disable.o
+endif
 obj-$(CONFIG_CPU_SUP_HYGON)		+= hygon.o
 obj-$(CONFIG_CPU_SUP_CYRIX_32)		+= cyrix.o
 obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
diff --git a/arch/x86/kernel/cpu/amd_cache_disable.c b/arch/x86/kernel/cpu/amd_cache_disable.c
new file mode 100644
index 000000000000..6d53aee0d869
--- /dev/null
+++ b/arch/x86/kernel/cpu/amd_cache_disable.c
@@ -0,0 +1,301 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD L3 cache_disable_{0,1} sysfs handling
+ * Documentation/ABI/testing/sysfs-devices-system-cpu
+ */
+
+#include <linux/cacheinfo.h>
+#include <linux/capability.h>
+#include <linux/pci.h>
+#include <linux/sysfs.h>
+
+#include <asm/amd_nb.h>
+
+#include "cpu.h"
+
+/*
+ * L3 cache descriptors
+ */
+static void amd_calc_l3_indices(struct amd_northbridge *nb)
+{
+	struct amd_l3_cache *l3 = &nb->l3_cache;
+	unsigned int sc0, sc1, sc2, sc3;
+	u32 val = 0;
+
+	pci_read_config_dword(nb->misc, 0x1C4, &val);
+
+	/* calculate subcache sizes */
+	l3->subcaches[0] = sc0 = !(val & BIT(0));
+	l3->subcaches[1] = sc1 = !(val & BIT(4));
+
+	if (boot_cpu_data.x86 == 0x15) {
+		l3->subcaches[0] = sc0 += !(val & BIT(1));
+		l3->subcaches[1] = sc1 += !(val & BIT(5));
+	}
+
+	l3->subcaches[2] = sc2 = !(val & BIT(8))  + !(val & BIT(9));
+	l3->subcaches[3] = sc3 = !(val & BIT(12)) + !(val & BIT(13));
+
+	l3->indices = (max(max3(sc0, sc1, sc2), sc3) << 10) - 1;
+}
+
+/*
+ * check whether a slot used for disabling an L3 index is occupied.
+ * @l3: L3 cache descriptor
+ * @slot: slot number (0..1)
+ *
+ * @returns: the disabled index if used or negative value if slot free.
+ */
+static int amd_get_l3_disable_slot(struct amd_northbridge *nb, unsigned int slot)
+{
+	unsigned int reg = 0;
+
+	pci_read_config_dword(nb->misc, 0x1BC + slot * 4, &reg);
+
+	/* check whether this slot is activated already */
+	if (reg & (3UL << 30))
+		return reg & 0xfff;
+
+	return -1;
+}
+
+static ssize_t show_cache_disable(struct cacheinfo *ci, char *buf, unsigned int slot)
+{
+	int index;
+	struct amd_northbridge *nb = ci->priv;
+
+	index = amd_get_l3_disable_slot(nb, slot);
+	if (index >= 0)
+		return sprintf(buf, "%d\n", index);
+
+	return sprintf(buf, "FREE\n");
+}
+
+#define SHOW_CACHE_DISABLE(slot)					\
+static ssize_t								\
+cache_disable_##slot##_show(struct device *dev,				\
+			    struct device_attribute *attr, char *buf)	\
+{									\
+	struct cacheinfo *ci = dev_get_drvdata(dev);			\
+	return show_cache_disable(ci, buf, slot);			\
+}
+
+SHOW_CACHE_DISABLE(0)
+SHOW_CACHE_DISABLE(1)
+
+static void amd_l3_disable_index(struct amd_northbridge *nb, int cpu,
+				 unsigned int slot, unsigned long idx)
+{
+	int i;
+
+	idx |= BIT(30);
+
+	/*
+	 *  disable index in all 4 subcaches
+	 */
+	for (i = 0; i < 4; i++) {
+		u32 reg = idx | (i << 20);
+
+		if (!nb->l3_cache.subcaches[i])
+			continue;
+
+		pci_write_config_dword(nb->misc, 0x1BC + slot * 4, reg);
+
+		/*
+		 * We need to WBINVD on a core on the node containing the L3
+		 * cache which indices we disable therefore a simple wbinvd()
+		 * is not sufficient.
+		 */
+		wbinvd_on_cpu(cpu);
+
+		reg |= BIT(31);
+		pci_write_config_dword(nb->misc, 0x1BC + slot * 4, reg);
+	}
+}
+
+/*
+ * disable a L3 cache index by using a disable-slot
+ *
+ * @l3:    L3 cache descriptor
+ * @cpu:   A CPU on the node containing the L3 cache
+ * @slot:  slot number (0..1)
+ * @index: index to disable
+ *
+ * @return: 0 on success, error status on failure
+ */
+static int amd_set_l3_disable_slot(struct amd_northbridge *nb, int cpu,
+				   unsigned int slot, unsigned long index)
+{
+	int ret = 0;
+
+	/*  check if @slot is already used or the index is already disabled */
+	ret = amd_get_l3_disable_slot(nb, slot);
+	if (ret >= 0)
+		return -EEXIST;
+
+	if (index > nb->l3_cache.indices)
+		return -EINVAL;
+
+	/* check whether the other slot has disabled the same index already */
+	if (index == amd_get_l3_disable_slot(nb, !slot))
+		return -EEXIST;
+
+	amd_l3_disable_index(nb, cpu, slot, index);
+
+	return 0;
+}
+
+static ssize_t store_cache_disable(struct cacheinfo *ci, const char *buf,
+				   size_t count, unsigned int slot)
+{
+	struct amd_northbridge *nb = ci->priv;
+	unsigned long val = 0;
+	int cpu, err = 0;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	cpu = cpumask_first(&ci->shared_cpu_map);
+
+	if (kstrtoul(buf, 10, &val) < 0)
+		return -EINVAL;
+
+	err = amd_set_l3_disable_slot(nb, cpu, slot, val);
+	if (err) {
+		if (err == -EEXIST)
+			pr_warn("L3 slot %d in use/index already disabled!\n",
+				   slot);
+		return err;
+	}
+	return count;
+}
+
+#define STORE_CACHE_DISABLE(slot)					\
+static ssize_t								\
+cache_disable_##slot##_store(struct device *dev,			\
+			     struct device_attribute *attr,		\
+			     const char *buf, size_t count)		\
+{									\
+	struct cacheinfo *ci = dev_get_drvdata(dev);			\
+	return store_cache_disable(ci, buf, count, slot);		\
+}
+
+STORE_CACHE_DISABLE(0)
+STORE_CACHE_DISABLE(1)
+
+static ssize_t subcaches_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct cacheinfo *ci = dev_get_drvdata(dev);
+	int cpu = cpumask_first(&ci->shared_cpu_map);
+
+	return sprintf(buf, "%x\n", amd_get_subcaches(cpu));
+}
+
+static ssize_t subcaches_store(struct device *dev,
+			       struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	struct cacheinfo *ci = dev_get_drvdata(dev);
+	int cpu = cpumask_first(&ci->shared_cpu_map);
+	unsigned long val;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (kstrtoul(buf, 16, &val) < 0)
+		return -EINVAL;
+
+	if (amd_set_subcaches(cpu, val))
+		return -EINVAL;
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(cache_disable_0);
+static DEVICE_ATTR_RW(cache_disable_1);
+static DEVICE_ATTR_RW(subcaches);
+
+static umode_t cache_private_attrs_is_visible(struct kobject *kobj,
+					      struct attribute *attr, int unused)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct cacheinfo *ci = dev_get_drvdata(dev);
+	umode_t mode = attr->mode;
+
+	if (!ci->priv)
+		return 0;
+
+	if ((attr == &dev_attr_subcaches.attr) &&
+	    amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
+		return mode;
+
+	if ((attr == &dev_attr_cache_disable_0.attr ||
+	     attr == &dev_attr_cache_disable_1.attr) &&
+	    amd_nb_has_feature(AMD_NB_L3_INDEX_DISABLE))
+		return mode;
+
+	return 0;
+}
+
+static struct attribute_group cache_private_group = {
+	.is_visible = cache_private_attrs_is_visible,
+};
+
+static void init_amd_l3_attrs(void)
+{
+	static struct attribute **amd_l3_attrs;
+	int n = 1;
+
+	if (amd_l3_attrs) /* already initialized */
+		return;
+
+	if (amd_nb_has_feature(AMD_NB_L3_INDEX_DISABLE))
+		n += 2;
+	if (amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
+		n += 1;
+
+	amd_l3_attrs = kcalloc(n, sizeof(*amd_l3_attrs), GFP_KERNEL);
+	if (!amd_l3_attrs)
+		return;
+
+	n = 0;
+	if (amd_nb_has_feature(AMD_NB_L3_INDEX_DISABLE)) {
+		amd_l3_attrs[n++] = &dev_attr_cache_disable_0.attr;
+		amd_l3_attrs[n++] = &dev_attr_cache_disable_1.attr;
+	}
+	if (amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
+		amd_l3_attrs[n++] = &dev_attr_subcaches.attr;
+
+	cache_private_group.attrs = amd_l3_attrs;
+}
+
+const struct attribute_group *cache_get_priv_group(struct cacheinfo *ci)
+{
+	struct amd_northbridge *nb = ci->priv;
+
+	if (ci->level < 3 || !nb)
+		return NULL;
+
+	if (nb && nb->l3_cache.indices)
+		init_amd_l3_attrs();
+
+	return &cache_private_group;
+}
+
+struct amd_northbridge *amd_init_l3_cache(int index)
+{
+	struct amd_northbridge *nb;
+	int node;
+
+	/* only for L3, and not in virtualized environments */
+	if (index < 3)
+		return NULL;
+
+	node = topology_amd_node_id(smp_processor_id());
+	nb = node_to_amd_nb(node);
+	if (nb && !nb->l3_cache.indices)
+		amd_calc_l3_indices(nb);
+
+	return nb;
+}
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 2a56c7cc3c2d..eadd117c05e1 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -9,12 +9,9 @@
  */
 
 #include <linux/cacheinfo.h>
-#include <linux/capability.h>
 #include <linux/cpu.h>
 #include <linux/cpuhotplug.h>
-#include <linux/pci.h>
 #include <linux/stop_machine.h>
-#include <linux/sysfs.h>
 
 #include <asm/amd_nb.h>
 #include <asm/cacheinfo.h>
@@ -300,301 +297,6 @@ amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 		(ebx->split.ways_of_associativity + 1) - 1;
 }
 
-#if defined(CONFIG_AMD_NB) && defined(CONFIG_SYSFS)
-
-/*
- * L3 cache descriptors
- */
-static void amd_calc_l3_indices(struct amd_northbridge *nb)
-{
-	struct amd_l3_cache *l3 = &nb->l3_cache;
-	unsigned int sc0, sc1, sc2, sc3;
-	u32 val = 0;
-
-	pci_read_config_dword(nb->misc, 0x1C4, &val);
-
-	/* calculate subcache sizes */
-	l3->subcaches[0] = sc0 = !(val & BIT(0));
-	l3->subcaches[1] = sc1 = !(val & BIT(4));
-
-	if (boot_cpu_data.x86 == 0x15) {
-		l3->subcaches[0] = sc0 += !(val & BIT(1));
-		l3->subcaches[1] = sc1 += !(val & BIT(5));
-	}
-
-	l3->subcaches[2] = sc2 = !(val & BIT(8))  + !(val & BIT(9));
-	l3->subcaches[3] = sc3 = !(val & BIT(12)) + !(val & BIT(13));
-
-	l3->indices = (max(max3(sc0, sc1, sc2), sc3) << 10) - 1;
-}
-
-/*
- * check whether a slot used for disabling an L3 index is occupied.
- * @l3: L3 cache descriptor
- * @slot: slot number (0..1)
- *
- * @returns: the disabled index if used or negative value if slot free.
- */
-static int amd_get_l3_disable_slot(struct amd_northbridge *nb, unsigned slot)
-{
-	unsigned int reg = 0;
-
-	pci_read_config_dword(nb->misc, 0x1BC + slot * 4, &reg);
-
-	/* check whether this slot is activated already */
-	if (reg & (3UL << 30))
-		return reg & 0xfff;
-
-	return -1;
-}
-
-static ssize_t show_cache_disable(struct cacheinfo *ci, char *buf, unsigned int slot)
-{
-	int index;
-	struct amd_northbridge *nb = ci->priv;
-
-	index = amd_get_l3_disable_slot(nb, slot);
-	if (index >= 0)
-		return sprintf(buf, "%d\n", index);
-
-	return sprintf(buf, "FREE\n");
-}
-
-#define SHOW_CACHE_DISABLE(slot)					\
-static ssize_t								\
-cache_disable_##slot##_show(struct device *dev,				\
-			    struct device_attribute *attr, char *buf)	\
-{									\
-	struct cacheinfo *ci = dev_get_drvdata(dev);			\
-	return show_cache_disable(ci, buf, slot);			\
-}
-SHOW_CACHE_DISABLE(0)
-SHOW_CACHE_DISABLE(1)
-
-static void amd_l3_disable_index(struct amd_northbridge *nb, int cpu,
-				 unsigned slot, unsigned long idx)
-{
-	int i;
-
-	idx |= BIT(30);
-
-	/*
-	 *  disable index in all 4 subcaches
-	 */
-	for (i = 0; i < 4; i++) {
-		u32 reg = idx | (i << 20);
-
-		if (!nb->l3_cache.subcaches[i])
-			continue;
-
-		pci_write_config_dword(nb->misc, 0x1BC + slot * 4, reg);
-
-		/*
-		 * We need to WBINVD on a core on the node containing the L3
-		 * cache which indices we disable therefore a simple wbinvd()
-		 * is not sufficient.
-		 */
-		wbinvd_on_cpu(cpu);
-
-		reg |= BIT(31);
-		pci_write_config_dword(nb->misc, 0x1BC + slot * 4, reg);
-	}
-}
-
-/*
- * disable a L3 cache index by using a disable-slot
- *
- * @l3:    L3 cache descriptor
- * @cpu:   A CPU on the node containing the L3 cache
- * @slot:  slot number (0..1)
- * @index: index to disable
- *
- * @return: 0 on success, error status on failure
- */
-static int amd_set_l3_disable_slot(struct amd_northbridge *nb, int cpu,
-			    unsigned slot, unsigned long index)
-{
-	int ret = 0;
-
-	/*  check if @slot is already used or the index is already disabled */
-	ret = amd_get_l3_disable_slot(nb, slot);
-	if (ret >= 0)
-		return -EEXIST;
-
-	if (index > nb->l3_cache.indices)
-		return -EINVAL;
-
-	/* check whether the other slot has disabled the same index already */
-	if (index == amd_get_l3_disable_slot(nb, !slot))
-		return -EEXIST;
-
-	amd_l3_disable_index(nb, cpu, slot, index);
-
-	return 0;
-}
-
-static ssize_t store_cache_disable(struct cacheinfo *ci, const char *buf,
-				   size_t count, unsigned int slot)
-{
-	unsigned long val = 0;
-	int cpu, err = 0;
-	struct amd_northbridge *nb = ci->priv;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	cpu = cpumask_first(&ci->shared_cpu_map);
-
-	if (kstrtoul(buf, 10, &val) < 0)
-		return -EINVAL;
-
-	err = amd_set_l3_disable_slot(nb, cpu, slot, val);
-	if (err) {
-		if (err == -EEXIST)
-			pr_warn("L3 slot %d in use/index already disabled!\n",
-				   slot);
-		return err;
-	}
-	return count;
-}
-
-#define STORE_CACHE_DISABLE(slot)					\
-static ssize_t								\
-cache_disable_##slot##_store(struct device *dev,			\
-			     struct device_attribute *attr,		\
-			     const char *buf, size_t count)		\
-{									\
-	struct cacheinfo *ci = dev_get_drvdata(dev);			\
-	return store_cache_disable(ci, buf, count, slot);		\
-}
-STORE_CACHE_DISABLE(0)
-STORE_CACHE_DISABLE(1)
-
-static ssize_t subcaches_show(struct device *dev,
-			      struct device_attribute *attr, char *buf)
-{
-	struct cacheinfo *ci = dev_get_drvdata(dev);
-	int cpu = cpumask_first(&ci->shared_cpu_map);
-
-	return sprintf(buf, "%x\n", amd_get_subcaches(cpu));
-}
-
-static ssize_t subcaches_store(struct device *dev,
-			       struct device_attribute *attr,
-			       const char *buf, size_t count)
-{
-	struct cacheinfo *ci = dev_get_drvdata(dev);
-	int cpu = cpumask_first(&ci->shared_cpu_map);
-	unsigned long val;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	if (kstrtoul(buf, 16, &val) < 0)
-		return -EINVAL;
-
-	if (amd_set_subcaches(cpu, val))
-		return -EINVAL;
-
-	return count;
-}
-
-static DEVICE_ATTR_RW(cache_disable_0);
-static DEVICE_ATTR_RW(cache_disable_1);
-static DEVICE_ATTR_RW(subcaches);
-
-static umode_t
-cache_private_attrs_is_visible(struct kobject *kobj,
-			       struct attribute *attr, int unused)
-{
-	struct device *dev = kobj_to_dev(kobj);
-	struct cacheinfo *ci = dev_get_drvdata(dev);
-	umode_t mode = attr->mode;
-
-	if (!ci->priv)
-		return 0;
-
-	if ((attr == &dev_attr_subcaches.attr) &&
-	    amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
-		return mode;
-
-	if ((attr == &dev_attr_cache_disable_0.attr ||
-	     attr == &dev_attr_cache_disable_1.attr) &&
-	    amd_nb_has_feature(AMD_NB_L3_INDEX_DISABLE))
-		return mode;
-
-	return 0;
-}
-
-static struct attribute_group cache_private_group = {
-	.is_visible = cache_private_attrs_is_visible,
-};
-
-static void init_amd_l3_attrs(void)
-{
-	int n = 1;
-	static struct attribute **amd_l3_attrs;
-
-	if (amd_l3_attrs) /* already initialized */
-		return;
-
-	if (amd_nb_has_feature(AMD_NB_L3_INDEX_DISABLE))
-		n += 2;
-	if (amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
-		n += 1;
-
-	amd_l3_attrs = kcalloc(n, sizeof(*amd_l3_attrs), GFP_KERNEL);
-	if (!amd_l3_attrs)
-		return;
-
-	n = 0;
-	if (amd_nb_has_feature(AMD_NB_L3_INDEX_DISABLE)) {
-		amd_l3_attrs[n++] = &dev_attr_cache_disable_0.attr;
-		amd_l3_attrs[n++] = &dev_attr_cache_disable_1.attr;
-	}
-	if (amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
-		amd_l3_attrs[n++] = &dev_attr_subcaches.attr;
-
-	cache_private_group.attrs = amd_l3_attrs;
-}
-
-const struct attribute_group *
-cache_get_priv_group(struct cacheinfo *ci)
-{
-	struct amd_northbridge *nb = ci->priv;
-
-	if (ci->level < 3 || !nb)
-		return NULL;
-
-	if (nb && nb->l3_cache.indices)
-		init_amd_l3_attrs();
-
-	return &cache_private_group;
-}
-
-static struct amd_northbridge *amd_init_l3_cache(int index)
-{
-	struct amd_northbridge *nb;
-	int node;
-
-	/* only for L3, and not in virtualized environments */
-	if (index < 3)
-		return NULL;
-
-	node = topology_amd_node_id(smp_processor_id());
-	nb = node_to_amd_nb(node);
-	if (nb && !nb->l3_cache.indices)
-		amd_calc_l3_indices(nb);
-
-	return nb;
-}
-#else
-static struct amd_northbridge *amd_init_l3_cache(int index)
-{
-	return NULL;
-}
-#endif  /* CONFIG_AMD_NB && CONFIG_SYSFS */
-
 /*
  * Fill passed _cpuid4_info_regs structure.
  * Intel-only code paths should pass NULL for the amd_northbridge
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 51deb60a9d26..bc38b2d56f26 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -75,6 +75,15 @@ extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id);
 void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c);
 
+#if defined(CONFIG_AMD_NB) && defined(CONFIG_SYSFS)
+struct amd_northbridge *amd_init_l3_cache(int index);
+#else
+static inline struct amd_northbridge *amd_init_l3_cache(int index)
+{
+	return NULL;
+}
+#endif
+
 unsigned int aperfmperf_get_khz(int cpu);
 void cpu_select_mitigations(void);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 24/40] x86/cacheinfo: Use sysfs_emit() for sysfs attributes show()
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (22 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 23/40] x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 25/40] x86/cacheinfo: Separate Intel and AMD leaf 0x4 code paths Ahmed S. Darwish
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Per Documentation/filesystems/sysfs.rst, a sysfs attribute's show()
method should only use sysfs_emit() or sysfs_emit_at() when returning
values to user space.

Use sysfs_emit() for the AMD L3 cache sysfs attributes cache_disable_0,
cache_disable_1, and subcaches.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/amd_cache_disable.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd_cache_disable.c b/arch/x86/kernel/cpu/amd_cache_disable.c
index 6d53aee0d869..d860ad3f8a5a 100644
--- a/arch/x86/kernel/cpu/amd_cache_disable.c
+++ b/arch/x86/kernel/cpu/amd_cache_disable.c
@@ -66,9 +66,9 @@ static ssize_t show_cache_disable(struct cacheinfo *ci, char *buf, unsigned int
 
 	index = amd_get_l3_disable_slot(nb, slot);
 	if (index >= 0)
-		return sprintf(buf, "%d\n", index);
+		return sysfs_emit(buf, "%d\n", index);
 
-	return sprintf(buf, "FREE\n");
+	return sysfs_emit(buf, "FREE\n");
 }
 
 #define SHOW_CACHE_DISABLE(slot)					\
@@ -189,7 +189,7 @@ static ssize_t subcaches_show(struct device *dev, struct device_attribute *attr,
 	struct cacheinfo *ci = dev_get_drvdata(dev);
 	int cpu = cpumask_first(&ci->shared_cpu_map);
 
-	return sprintf(buf, "%x\n", amd_get_subcaches(cpu));
+	return sysfs_emit(buf, "%x\n", amd_get_subcaches(cpu));
 }
 
 static ssize_t subcaches_store(struct device *dev,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 25/40] x86/cacheinfo: Separate Intel and AMD leaf 0x4 code paths
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (23 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 24/40] x86/cacheinfo: Use sysfs_emit() for sysfs attributes show() Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 26/40] x86/cacheinfo: Rename _cpuid4_info_regs to _cpuid4_info Ahmed S. Darwish
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The leaf 0x4 parsing code at cpuid4_cache_lookup_regs() is ugly and
convoluted.  It's tangled with multiple nested conditions to handle:

  - AMD with TOPEXT, or Hygon CPUs via leaf 0x8000001d
  - Legacy AMD fallback via leaf 0x4 emulation
  - Intel CPUs via the actual cpuid leaf 0x4

AMD L3 northbridge initialization is also awkwardly placed alongside the
cpuid calls of the first two scenarios.

Refactor all of that as follows:

  - Update AMD's leaf 0x4 emulation comment to represent current state.
  - Clearly label the AMD leaf 0x4 emulation function as a fallback.
  - Split AMD/Hygon and Intel code paths into separate functions.
  - Move AMD L3 northbridge initialization out of leaf 0x4 code, and
    into populate_cache_leaves() where it belongs.  There,
    ci_info_init() can directly store the initialized object in the
    priv pointer of the linux/cacheinfo.h API.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 93 ++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index eadd117c05e1..cc320817cfc3 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -167,12 +167,11 @@ struct _cpuid4_info_regs {
 	unsigned long size;
 };
 
-/* AMD doesn't have CPUID4. Emulate it here to report the same
-   information to the user.  This makes some assumptions about the machine:
-   L2 not shared, no SMT etc. that is currently true on AMD CPUs.
+/*
+ * Fallback AMD CPUID(4) emulation
+ * AMD CPUs with TOPOEXT can just use CPUID(0x8000001d)
+ */
 
-   In theory the TLBs could be reported as fake type (they are in "dummy").
-   Maybe later */
 union l1_cache {
 	struct {
 		unsigned line_size:8;
@@ -228,9 +227,8 @@ static const enum cache_type cache_type_map[] = {
 	[CTYPE_UNIFIED] = CACHE_TYPE_UNIFIED,
 };
 
-static void
-amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
-	   union _cpuid4_leaf_ebx *ebx, union _cpuid4_leaf_ecx *ecx)
+static void legacy_amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
+			      union _cpuid4_leaf_ebx *ebx, union _cpuid4_leaf_ecx *ecx)
 {
 	unsigned int dummy, line_size, lines_per_tag, assoc, size_in_kb;
 	union l1_cache l1i, l1d;
@@ -297,37 +295,9 @@ amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 		(ebx->split.ways_of_associativity + 1) - 1;
 }
 
-/*
- * Fill passed _cpuid4_info_regs structure.
- * Intel-only code paths should pass NULL for the amd_northbridge
- * return pointer.
- */
-static int cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4,
-				    struct amd_northbridge **nb)
+static int cpuid4_info_fill_done(struct _cpuid4_info_regs *id4, union _cpuid4_leaf_eax eax,
+				 union _cpuid4_leaf_ebx ebx, union _cpuid4_leaf_ecx ecx)
 {
-	union _cpuid4_leaf_eax	eax;
-	union _cpuid4_leaf_ebx	ebx;
-	union _cpuid4_leaf_ecx	ecx;
-	unsigned		edx;
-
-	if (x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor)) {
-		if (boot_cpu_has(X86_FEATURE_TOPOEXT) ||
-		    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) {
-			/* AMD with TOPOEXT, or HYGON */
-			cpuid_count(0x8000001d, index, &eax.full,
-				    &ebx.full, &ecx.full, &edx);
-		} else {
-			/* Legacy AMD fallback */
-			amd_cpuid4(index, &eax, &ebx, &ecx);
-		}
-
-		if (nb)
-			*nb = amd_init_l3_cache(index);
-	} else {
-		/* Intel */
-		cpuid_count(4, index, &eax.full, &ebx.full, &ecx.full, &edx);
-	}
-
 	if (eax.split.type == CTYPE_NULL)
 		return -EIO;
 
@@ -342,6 +312,40 @@ static int cpuid4_cache_lookup_regs(int index, struct _cpuid4_info_regs *id4,
 	return 0;
 }
 
+static int amd_fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
+{
+	union _cpuid4_leaf_eax eax;
+	union _cpuid4_leaf_ebx ebx;
+	union _cpuid4_leaf_ecx ecx;
+	unsigned int ignored;
+
+	if (boot_cpu_has(X86_FEATURE_TOPOEXT) || boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+		cpuid_count(0x8000001d, index, &eax.full, &ebx.full, &ecx.full, &ignored);
+	else
+		legacy_amd_cpuid4(index, &eax, &ebx, &ecx);
+
+	return cpuid4_info_fill_done(id4, eax, ebx, ecx);
+}
+
+static int intel_fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
+{
+	union _cpuid4_leaf_eax eax;
+	union _cpuid4_leaf_ebx ebx;
+	union _cpuid4_leaf_ecx ecx;
+	unsigned int ignored;
+
+	cpuid_count(4, index, &eax.full, &ebx.full, &ecx.full, &ignored);
+
+	return cpuid4_info_fill_done(id4, eax, ebx, ecx);
+}
+
+static int fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
+{
+	return x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor) ?
+		amd_fill_cpuid4_info(index, id4) :
+		intel_fill_cpuid4_info(index, id4);
+}
+
 static int find_num_cache_leaves(struct cpuinfo_x86 *c)
 {
 	unsigned int		eax, ebx, ecx, edx, op;
@@ -472,7 +476,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 			struct _cpuid4_info_regs id4 = {};
 			int retval;
 
-			retval = cpuid4_cache_lookup_regs(i, &id4, NULL);
+			retval = intel_fill_cpuid4_info(i, &id4);
 			if (retval < 0)
 				continue;
 
@@ -690,17 +694,22 @@ static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4)
 
 int populate_cache_leaves(unsigned int cpu)
 {
-	unsigned int idx, ret;
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci = this_cpu_ci->info_list;
 	struct _cpuid4_info_regs id4 = {};
-	struct amd_northbridge *nb;
+	struct amd_northbridge *nb = NULL;
+	int idx, ret;
 
 	for (idx = 0; idx < this_cpu_ci->num_leaves; idx++) {
-		ret = cpuid4_cache_lookup_regs(idx, &id4, &nb);
+		ret = fill_cpuid4_info(idx, &id4);
 		if (ret)
 			return ret;
+
 		get_cache_id(cpu, &id4);
+
+		if (x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor))
+			nb = amd_init_l3_cache(idx);
+
 		ci_info_init(ci++, &id4, nb);
 		__cache_cpumap_setup(cpu, idx, &id4);
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 26/40] x86/cacheinfo: Rename _cpuid4_info_regs to _cpuid4_info
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (24 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 25/40] x86/cacheinfo: Separate Intel and AMD leaf 0x4 code paths Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 27/40] x86/cacheinfo: Clarify type markers for leaf 0x2 cache descriptors Ahmed S. Darwish
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Parent commits decoupled amd_northbridge from _cpuid4_info_regs, moved
AMD L3 northbridge cache_disable_0/1 sysfs code to its own file, and
splitted AMD vs. Intel leaf 0x4 handling into:

    amd_fill_cpuid4_info()
    intel_fill_cpuid4_info()
    fill_cpuid4_info()

After doing all that, the "_cpuid4_info_regs" name becomes a mouthful.
It is also not totally accurate, as the structure holds cpuid4 derived
information like cache node ID and size -- not just regs.

Rename struct _cpuid4_info_regs to _cpuid4_info.  That new name also
better matches the AMD/Intel leaf 0x4 functions mentioned above.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index cc320817cfc3..2d4180b961f4 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -159,7 +159,7 @@ union _cpuid4_leaf_ecx {
 	u32 full;
 };
 
-struct _cpuid4_info_regs {
+struct _cpuid4_info {
 	union _cpuid4_leaf_eax eax;
 	union _cpuid4_leaf_ebx ebx;
 	union _cpuid4_leaf_ecx ecx;
@@ -295,7 +295,7 @@ static void legacy_amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 		(ebx->split.ways_of_associativity + 1) - 1;
 }
 
-static int cpuid4_info_fill_done(struct _cpuid4_info_regs *id4, union _cpuid4_leaf_eax eax,
+static int cpuid4_info_fill_done(struct _cpuid4_info *id4, union _cpuid4_leaf_eax eax,
 				 union _cpuid4_leaf_ebx ebx, union _cpuid4_leaf_ecx ecx)
 {
 	if (eax.split.type == CTYPE_NULL)
@@ -312,7 +312,7 @@ static int cpuid4_info_fill_done(struct _cpuid4_info_regs *id4, union _cpuid4_le
 	return 0;
 }
 
-static int amd_fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
+static int amd_fill_cpuid4_info(int index, struct _cpuid4_info *id4)
 {
 	union _cpuid4_leaf_eax eax;
 	union _cpuid4_leaf_ebx ebx;
@@ -327,7 +327,7 @@ static int amd_fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
 	return cpuid4_info_fill_done(id4, eax, ebx, ecx);
 }
 
-static int intel_fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
+static int intel_fill_cpuid4_info(int index, struct _cpuid4_info *id4)
 {
 	union _cpuid4_leaf_eax eax;
 	union _cpuid4_leaf_ebx ebx;
@@ -339,7 +339,7 @@ static int intel_fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
 	return cpuid4_info_fill_done(id4, eax, ebx, ecx);
 }
 
-static int fill_cpuid4_info(int index, struct _cpuid4_info_regs *id4)
+static int fill_cpuid4_info(int index, struct _cpuid4_info *id4)
 {
 	return x86_vendor_amd_or_hygon(boot_cpu_data.x86_vendor) ?
 		amd_fill_cpuid4_info(index, id4) :
@@ -473,7 +473,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 		 * parameters cpuid leaf to find the cache details
 		 */
 		for (i = 0; i < ci->num_leaves; i++) {
-			struct _cpuid4_info_regs id4 = {};
+			struct _cpuid4_info id4 = {};
 			int retval;
 
 			retval = intel_fill_cpuid4_info(i, &id4);
@@ -560,7 +560,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 }
 
 static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
-				    const struct _cpuid4_info_regs *id4)
+				    const struct _cpuid4_info *id4)
 {
 	struct cpu_cacheinfo *this_cpu_ci;
 	struct cacheinfo *ci;
@@ -617,7 +617,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 }
 
 static void __cache_cpumap_setup(unsigned int cpu, int index,
-				 const struct _cpuid4_info_regs *id4)
+				 const struct _cpuid4_info *id4)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci, *sibling_ci;
@@ -650,7 +650,7 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 		}
 }
 
-static void ci_info_init(struct cacheinfo *ci, const struct _cpuid4_info_regs *id4,
+static void ci_info_init(struct cacheinfo *ci, const struct _cpuid4_info *id4,
 			 struct amd_northbridge *nb)
 {
 	ci->id				= id4->id;
@@ -681,7 +681,7 @@ int init_cache_level(unsigned int cpu)
  * ECX as cache index. Then right shift apicid by the number's order to get
  * cache id for this cache node.
  */
-static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4)
+static void get_cache_id(int cpu, struct _cpuid4_info *id4)
 {
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	unsigned long num_threads_sharing;
@@ -696,8 +696,8 @@ int populate_cache_leaves(unsigned int cpu)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
 	struct cacheinfo *ci = this_cpu_ci->info_list;
-	struct _cpuid4_info_regs id4 = {};
 	struct amd_northbridge *nb = NULL;
+	struct _cpuid4_info id4 = {};
 	int idx, ret;
 
 	for (idx = 0; idx < this_cpu_ci->num_leaves; idx++) {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 27/40] x86/cacheinfo: Clarify type markers for leaf 0x2 cache descriptors
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (25 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 26/40] x86/cacheinfo: Rename _cpuid4_info_regs to _cpuid4_info Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 28/40] x86/cacheinfo: Use enums for cache descriptor types Ahmed S. Darwish
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

cpuid leaf 0x2 output is a stream of one-byte descriptors, each implying
certain details about the CPU's cache and TLB entries.

Two separate tables exist for interpreting these descriptors: one for
TLBs at intel.c and one for caches at cacheinfo.c.  These mapping tables
will be merged in further commits, among other improvements to their
model.

In preparation for this, use more descriptive type names for the leaf
0x2 descriptors associated with cpu caches.  Namely:

	LVL_1_INST	=>	CACHE_L1_INST
	LVL_1_DATA	=>	CACHE_L1_DATA
	LVL_2		=>	CACHE_L2
	LVL_3		=>	CACHE_L3

After the TLB and cache descriptors mapping tables are merged, this will
make it clear that such descriptors correspond to cpu caches.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 152 ++++++++++++++++----------------
 1 file changed, 76 insertions(+), 76 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 2d4180b961f4..9e87321466fe 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -23,10 +23,10 @@
 
 #include "cpu.h"
 
-#define LVL_1_INST	1
-#define LVL_1_DATA	2
-#define LVL_2		3
-#define LVL_3		4
+#define CACHE_L1_INST	1
+#define CACHE_L1_DATA	2
+#define CACHE_L2	3
+#define CACHE_L3	4
 
 /* Shared last level cache maps */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
@@ -52,74 +52,74 @@ struct _cache_table {
 
 static const struct _cache_table cache_table[] =
 {
-	{ 0x06, LVL_1_INST, 8 },	/* 4-way set assoc, 32 byte line size */
-	{ 0x08, LVL_1_INST, 16 },	/* 4-way set assoc, 32 byte line size */
-	{ 0x09, LVL_1_INST, 32 },	/* 4-way set assoc, 64 byte line size */
-	{ 0x0a, LVL_1_DATA, 8 },	/* 2 way set assoc, 32 byte line size */
-	{ 0x0c, LVL_1_DATA, 16 },	/* 4-way set assoc, 32 byte line size */
-	{ 0x0d, LVL_1_DATA, 16 },	/* 4-way set assoc, 64 byte line size */
-	{ 0x0e, LVL_1_DATA, 24 },	/* 6-way set assoc, 64 byte line size */
-	{ 0x21, LVL_2,      256 },	/* 8-way set assoc, 64 byte line size */
-	{ 0x22, LVL_3,      512 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x23, LVL_3,      MB(1) },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x25, LVL_3,      MB(2) },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x29, LVL_3,      MB(4) },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x2c, LVL_1_DATA, 32 },	/* 8-way set assoc, 64 byte line size */
-	{ 0x30, LVL_1_INST, 32 },	/* 8-way set assoc, 64 byte line size */
-	{ 0x39, LVL_2,      128 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3a, LVL_2,      192 },	/* 6-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3b, LVL_2,      128 },	/* 2-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3c, LVL_2,      256 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3d, LVL_2,      384 },	/* 6-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3e, LVL_2,      512 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3f, LVL_2,      256 },	/* 2-way set assoc, 64 byte line size */
-	{ 0x41, LVL_2,      128 },	/* 4-way set assoc, 32 byte line size */
-	{ 0x42, LVL_2,      256 },	/* 4-way set assoc, 32 byte line size */
-	{ 0x43, LVL_2,      512 },	/* 4-way set assoc, 32 byte line size */
-	{ 0x44, LVL_2,      MB(1) },	/* 4-way set assoc, 32 byte line size */
-	{ 0x45, LVL_2,      MB(2) },	/* 4-way set assoc, 32 byte line size */
-	{ 0x46, LVL_3,      MB(4) },	/* 4-way set assoc, 64 byte line size */
-	{ 0x47, LVL_3,      MB(8) },	/* 8-way set assoc, 64 byte line size */
-	{ 0x48, LVL_2,      MB(3) },	/* 12-way set assoc, 64 byte line size */
-	{ 0x49, LVL_3,      MB(4) },	/* 16-way set assoc, 64 byte line size */
-	{ 0x4a, LVL_3,      MB(6) },	/* 12-way set assoc, 64 byte line size */
-	{ 0x4b, LVL_3,      MB(8) },	/* 16-way set assoc, 64 byte line size */
-	{ 0x4c, LVL_3,      MB(12) },	/* 12-way set assoc, 64 byte line size */
-	{ 0x4d, LVL_3,      MB(16) },	/* 16-way set assoc, 64 byte line size */
-	{ 0x4e, LVL_2,      MB(6) },	/* 24-way set assoc, 64 byte line size */
-	{ 0x60, LVL_1_DATA, 16 },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x66, LVL_1_DATA, 8 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x67, LVL_1_DATA, 16 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x68, LVL_1_DATA, 32 },	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x78, LVL_2,      MB(1) },	/* 4-way set assoc, 64 byte line size */
-	{ 0x79, LVL_2,      128 },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7a, LVL_2,      256 },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7b, LVL_2,      512 },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7c, LVL_2,      MB(1) },	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7d, LVL_2,      MB(2) },	/* 8-way set assoc, 64 byte line size */
-	{ 0x7f, LVL_2,      512 },	/* 2-way set assoc, 64 byte line size */
-	{ 0x80, LVL_2,      512 },	/* 8-way set assoc, 64 byte line size */
-	{ 0x82, LVL_2,      256 },	/* 8-way set assoc, 32 byte line size */
-	{ 0x83, LVL_2,      512 },	/* 8-way set assoc, 32 byte line size */
-	{ 0x84, LVL_2,      MB(1) },	/* 8-way set assoc, 32 byte line size */
-	{ 0x85, LVL_2,      MB(2) },	/* 8-way set assoc, 32 byte line size */
-	{ 0x86, LVL_2,      512 },	/* 4-way set assoc, 64 byte line size */
-	{ 0x87, LVL_2,      MB(1) },	/* 8-way set assoc, 64 byte line size */
-	{ 0xd0, LVL_3,      512 },	/* 4-way set assoc, 64 byte line size */
-	{ 0xd1, LVL_3,      MB(1) },	/* 4-way set assoc, 64 byte line size */
-	{ 0xd2, LVL_3,      MB(2) },	/* 4-way set assoc, 64 byte line size */
-	{ 0xd6, LVL_3,      MB(1) },	/* 8-way set assoc, 64 byte line size */
-	{ 0xd7, LVL_3,      MB(2) },	/* 8-way set assoc, 64 byte line size */
-	{ 0xd8, LVL_3,      MB(4) },	/* 12-way set assoc, 64 byte line size */
-	{ 0xdc, LVL_3,      MB(2) },	/* 12-way set assoc, 64 byte line size */
-	{ 0xdd, LVL_3,      MB(4) },	/* 12-way set assoc, 64 byte line size */
-	{ 0xde, LVL_3,      MB(8) },	/* 12-way set assoc, 64 byte line size */
-	{ 0xe2, LVL_3,      MB(2) },	/* 16-way set assoc, 64 byte line size */
-	{ 0xe3, LVL_3,      MB(4) },	/* 16-way set assoc, 64 byte line size */
-	{ 0xe4, LVL_3,      MB(8) },	/* 16-way set assoc, 64 byte line size */
-	{ 0xea, LVL_3,      MB(12) },	/* 24-way set assoc, 64 byte line size */
-	{ 0xeb, LVL_3,      MB(18) },	/* 24-way set assoc, 64 byte line size */
-	{ 0xec, LVL_3,      MB(24) },	/* 24-way set assoc, 64 byte line size */
+	{ 0x06, CACHE_L1_INST,	8	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x08, CACHE_L1_INST,	16	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x09, CACHE_L1_INST,	32	},	/* 4-way set assoc, 64 byte line size */
+	{ 0x0a, CACHE_L1_DATA,	8	},	/* 2 way set assoc, 32 byte line size */
+	{ 0x0c, CACHE_L1_DATA,	16	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x0d, CACHE_L1_DATA,	16	},	/* 4-way set assoc, 64 byte line size */
+	{ 0x0e, CACHE_L1_DATA,	24	},	/* 6-way set assoc, 64 byte line size */
+	{ 0x21, CACHE_L2,	256	},	/* 8-way set assoc, 64 byte line size */
+	{ 0x22, CACHE_L3,	512	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x23, CACHE_L3,	MB(1)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x25, CACHE_L3,	MB(2)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x29, CACHE_L3,	MB(4)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x2c, CACHE_L1_DATA,	32	},	/* 8-way set assoc, 64 byte line size */
+	{ 0x30, CACHE_L1_INST,	32	},	/* 8-way set assoc, 64 byte line size */
+	{ 0x39, CACHE_L2,	128	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x3a, CACHE_L2,	192	},	/* 6-way set assoc, sectored cache, 64 byte line size */
+	{ 0x3b, CACHE_L2,	128	},	/* 2-way set assoc, sectored cache, 64 byte line size */
+	{ 0x3c, CACHE_L2,	256	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x3d, CACHE_L2,	384	},	/* 6-way set assoc, sectored cache, 64 byte line size */
+	{ 0x3e, CACHE_L2,	512	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x3f, CACHE_L2,	256	},	/* 2-way set assoc, 64 byte line size */
+	{ 0x41, CACHE_L2,	128	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x42, CACHE_L2,	256	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x43, CACHE_L2,	512	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x44, CACHE_L2,	MB(1)	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x45, CACHE_L2,	MB(2)	},	/* 4-way set assoc, 32 byte line size */
+	{ 0x46, CACHE_L3,	MB(4)	},	/* 4-way set assoc, 64 byte line size */
+	{ 0x47, CACHE_L3,	MB(8)	},	/* 8-way set assoc, 64 byte line size */
+	{ 0x48, CACHE_L2,	MB(3)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0x49, CACHE_L3,	MB(4)	},	/* 16-way set assoc, 64 byte line size */
+	{ 0x4a, CACHE_L3,	MB(6)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0x4b, CACHE_L3,	MB(8)	},	/* 16-way set assoc, 64 byte line size */
+	{ 0x4c, CACHE_L3,	MB(12)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0x4d, CACHE_L3,	MB(16)	},	/* 16-way set assoc, 64 byte line size */
+	{ 0x4e, CACHE_L2,	MB(6)	},	/* 24-way set assoc, 64 byte line size */
+	{ 0x60, CACHE_L1_DATA,	16	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x66, CACHE_L1_DATA,	8	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x67, CACHE_L1_DATA,	16	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x68, CACHE_L1_DATA,	32	},	/* 4-way set assoc, sectored cache, 64 byte line size */
+	{ 0x78, CACHE_L2,	MB(1)	},	/* 4-way set assoc, 64 byte line size */
+	{ 0x79, CACHE_L2,	128	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x7a, CACHE_L2,	256	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x7b, CACHE_L2,	512	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x7c, CACHE_L2,	MB(1)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
+	{ 0x7d, CACHE_L2,	MB(2)	},	/* 8-way set assoc, 64 byte line size */
+	{ 0x7f, CACHE_L2,	512	},	/* 2-way set assoc, 64 byte line size */
+	{ 0x80, CACHE_L2,	512	},	/* 8-way set assoc, 64 byte line size */
+	{ 0x82, CACHE_L2,	256	},	/* 8-way set assoc, 32 byte line size */
+	{ 0x83, CACHE_L2,	512	},	/* 8-way set assoc, 32 byte line size */
+	{ 0x84, CACHE_L2,	MB(1)	},	/* 8-way set assoc, 32 byte line size */
+	{ 0x85, CACHE_L2,	MB(2)	},	/* 8-way set assoc, 32 byte line size */
+	{ 0x86, CACHE_L2,	512	},	/* 4-way set assoc, 64 byte line size */
+	{ 0x87, CACHE_L2,	MB(1)	},	/* 8-way set assoc, 64 byte line size */
+	{ 0xd0, CACHE_L3,	512	},	/* 4-way set assoc, 64 byte line size */
+	{ 0xd1, CACHE_L3,	MB(1)	},	/* 4-way set assoc, 64 byte line size */
+	{ 0xd2, CACHE_L3,	MB(2)	},	/* 4-way set assoc, 64 byte line size */
+	{ 0xd6, CACHE_L3,	MB(1)	},	/* 8-way set assoc, 64 byte line size */
+	{ 0xd7, CACHE_L3,	MB(2)	},	/* 8-way set assoc, 64 byte line size */
+	{ 0xd8, CACHE_L3,	MB(4)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0xdc, CACHE_L3,	MB(2)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0xdd, CACHE_L3,	MB(4)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0xde, CACHE_L3,	MB(8)	},	/* 12-way set assoc, 64 byte line size */
+	{ 0xe2, CACHE_L3,	MB(2)	},	/* 16-way set assoc, 64 byte line size */
+	{ 0xe3, CACHE_L3,	MB(4)	},	/* 16-way set assoc, 64 byte line size */
+	{ 0xe4, CACHE_L3,	MB(8)	},	/* 16-way set assoc, 64 byte line size */
+	{ 0xea, CACHE_L3,	MB(12)	},	/* 24-way set assoc, 64 byte line size */
+	{ 0xeb, CACHE_L3,	MB(18)	},	/* 24-way set assoc, 64 byte line size */
+	{ 0xec, CACHE_L3,	MB(24)	},	/* 24-way set assoc, 64 byte line size */
 };
 
 
@@ -518,10 +518,10 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 				continue;
 
 			switch (entry->cache_type) {
-			case LVL_1_INST: l1i += entry->size; break;
-			case LVL_1_DATA: l1d += entry->size; break;
-			case LVL_2:	 l2  += entry->size; break;
-			case LVL_3:	 l3  += entry->size; break;
+			case CACHE_L1_INST:	l1i += entry->size; break;
+			case CACHE_L1_DATA:	l1d += entry->size; break;
+			case CACHE_L2:		l2  += entry->size; break;
+			case CACHE_L3:		l3  += entry->size; break;
 			}
 		}
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 28/40] x86/cacheinfo: Use enums for cache descriptor types
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (26 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 27/40] x86/cacheinfo: Clarify type markers for leaf 0x2 cache descriptors Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 29/40] x86/cpu: Use enums for TLB " Ahmed S. Darwish
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The leaf 0x2 one-byte cache descriptor types:

	CACHE_L1_INST
	CACHE_L1_DATA
	CACHE_L2
	CACHE_L3

are just discriminators to be used within the cache_table[] mapping.
Their specific values are irrelevant.

Use enums for such types.

Make the enum packed and static assert that its values remain within a
single byte so that the cache_table[] array size do not go out of hand.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h | 14 ++++++++++++++
 arch/x86/kernel/cpu/cacheinfo.c    |  9 ++-------
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
index 50f6046a57b9..0051d8c2b9ee 100644
--- a/arch/x86/include/asm/cpuid/types.h
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_CPUID_TYPES_H
 #define _ASM_X86_CPUID_TYPES_H
 
+#include <linux/build_bug.h>
+#include <linux/compiler_attributes.h>
 #include <linux/types.h>
 
 #include <asm/cpuid.h>
@@ -76,4 +78,16 @@ static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)
 	/* Skip the first byte as it is not a descriptor */		\
 	for (desc = &(regs).desc[1]; desc < &(regs).desc[16]; desc++)
 
+/*
+ * Leaf 0x2 1-byte descriptors' cache types
+ * To be used for their mappings at cache_table[]
+ */
+enum _cache_table_type {
+	CACHE_L1_INST,
+	CACHE_L1_DATA,
+	CACHE_L2,
+	CACHE_L3,
+} __packed;
+static_assert(sizeof(enum _cache_table_type) == 1);
+
 #endif /* _ASM_X86_CPUID_TYPES_H */
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 9e87321466fe..a7fccbab268d 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -23,11 +23,6 @@
 
 #include "cpu.h"
 
-#define CACHE_L1_INST	1
-#define CACHE_L1_DATA	2
-#define CACHE_L2	3
-#define CACHE_L3	4
-
 /* Shared last level cache maps */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 
@@ -41,7 +36,7 @@ unsigned int memory_caching_control __ro_after_init;
 
 struct _cache_table {
 	unsigned char descriptor;
-	char cache_type;
+	enum _cache_table_type type;
 	short size;
 };
 
@@ -517,7 +512,7 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 			if (!entry)
 				continue;
 
-			switch (entry->cache_type) {
+			switch (entry->type) {
 			case CACHE_L1_INST:	l1i += entry->size; break;
 			case CACHE_L1_DATA:	l1d += entry->size; break;
 			case CACHE_L2:		l2  += entry->size; break;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 29/40] x86/cpu: Use enums for TLB descriptor types
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (27 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 28/40] x86/cacheinfo: Use enums for cache descriptor types Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes Ahmed S. Darwish
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The leaf 0x2 one-byte TLB descriptor types:

	TLB_INST_4K
	TLB_INST_4M
	TLB_INST_2M_4M
	...

are just discriminators to be used within the intel_tlb_table[] mapping.
Their specific values are irrelevant.

Use enums for such types.

Make the enum packed and static assert that its values remain within a
single byte so that the intel_tlb_table[] size do not go out of hand.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h | 26 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel.c        | 28 +++-------------------------
 2 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
index 0051d8c2b9ee..2a4653af2ba2 100644
--- a/arch/x86/include/asm/cpuid/types.h
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -90,4 +90,30 @@ enum _cache_table_type {
 } __packed;
 static_assert(sizeof(enum _cache_table_type) == 1);
 
+/*
+ * Leaf 0x2 1-byte descriptors' TLB types
+ * To be used for their mappings at intel_tlb_table[]
+ */
+enum _tlb_table_type {
+	TLB_INST_4K,
+	TLB_INST_4M,
+	TLB_INST_2M_4M,
+	TLB_INST_ALL,
+
+	TLB_DATA_4K,
+	TLB_DATA_4M,
+	TLB_DATA_2M_4M,
+	TLB_DATA_4K_4M,
+	TLB_DATA_1G,
+	TLB_DATA_1G_2M_4M,
+
+	TLB_DATA0_4K,
+	TLB_DATA0_4M,
+	TLB_DATA0_2M_4M,
+
+	STLB_4K,
+	STLB_4K_2M,
+} __packed;
+static_assert(sizeof(enum _tlb_table_type) == 1);
+
 #endif /* _ASM_X86_CPUID_TYPES_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57e170ffe3ba..884cd1b1e4ff 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -626,28 +626,6 @@ static unsigned int intel_size_cache(struct cpuinfo_x86 *c, unsigned int size)
 }
 #endif
 
-#define TLB_INST_4K		0x01
-#define TLB_INST_4M		0x02
-#define TLB_INST_2M_4M		0x03
-
-#define TLB_INST_ALL		0x05
-#define TLB_INST_1G		0x06
-
-#define TLB_DATA_4K		0x11
-#define TLB_DATA_4M		0x12
-#define TLB_DATA_2M_4M		0x13
-#define TLB_DATA_4K_4M		0x14
-
-#define TLB_DATA_1G		0x16
-#define TLB_DATA_1G_2M_4M	0x17
-
-#define TLB_DATA0_4K		0x21
-#define TLB_DATA0_4M		0x22
-#define TLB_DATA0_2M_4M		0x23
-
-#define STLB_4K			0x41
-#define STLB_4K_2M		0x42
-
 /*
  * All of leaf 0x2's one-byte TLB descriptors implies the same number of
  * entries for their respective TLB types.  The 0x63 descriptor is an
@@ -660,7 +638,7 @@ static unsigned int intel_size_cache(struct cpuinfo_x86 *c, unsigned int size)
 
 struct _tlb_table {
 	unsigned char descriptor;
-	char tlb_type;
+	enum _tlb_table_type type;
 	unsigned int entries;
 };
 
@@ -718,11 +696,11 @@ static void intel_tlb_lookup(const unsigned char desc)
 	     intel_tlb_table[k].descriptor != 0; k++)
 		;
 
-	if (intel_tlb_table[k].tlb_type == 0)
+	if (intel_tlb_table[k].type == 0)
 		return;
 
 	entries = intel_tlb_table[k].entries;
-	switch (intel_tlb_table[k].tlb_type) {
+	switch (intel_tlb_table[k].type) {
 	case STLB_4K:
 		tlb_lli_4k = max(tlb_lli_4k, entries);
 		tlb_lld_4k = max(tlb_lld_4k, entries);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (28 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 29/40] x86/cpu: Use enums for TLB " Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  9:35   ` Ingo Molnar
  2025-03-04  8:51 ` [PATCH v1 31/40] x86/cpu: Consolidate CPUID leaf 0x2 tables Ahmed S. Darwish
                   ` (11 subsequent siblings)
  41 siblings, 1 reply; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Add size macros for 24/192/384 Kilobyes and 3/6/12/18/24 Megabytes.

With that, the x86 subsystem can avoid locally defining its own macros
for CPU cache sizs.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 include/linux/sizes.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/sizes.h b/include/linux/sizes.h
index c3a00b967d18..49039494076f 100644
--- a/include/linux/sizes.h
+++ b/include/linux/sizes.h
@@ -23,17 +23,25 @@
 #define SZ_4K				0x00001000
 #define SZ_8K				0x00002000
 #define SZ_16K				0x00004000
+#define SZ_24K				0x00006000
 #define SZ_32K				0x00008000
 #define SZ_64K				0x00010000
 #define SZ_128K				0x00020000
+#define SZ_192K				0x00030000
 #define SZ_256K				0x00040000
+#define SZ_384K				0x00060000
 #define SZ_512K				0x00080000
 
 #define SZ_1M				0x00100000
 #define SZ_2M				0x00200000
+#define SZ_3M				0x00300000
 #define SZ_4M				0x00400000
+#define SZ_6M				0x00600000
 #define SZ_8M				0x00800000
+#define SZ_12M				0x00c00000
 #define SZ_16M				0x01000000
+#define SZ_18M				0x01200000
+#define SZ_24M				0x01800000
 #define SZ_32M				0x02000000
 #define SZ_64M				0x04000000
 #define SZ_128M				0x08000000
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 31/40] x86/cpu: Consolidate CPUID leaf 0x2 tables
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (29 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 32/40] x86/cacheinfo: Use consolidated leaf 0x2 descriptor table Ahmed S. Darwish
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

From: Thomas Gleixner <tglx@linutronix.de>

CPUID leaf 0x2 describes TLBs and caches. So there are two tables with the
respective decriptor constants in intel.c and cacheinfo.c. The tables
occupy almost 600 byte and require a loop based lookup for each variant.

Combining them into one table occupies exactly 1k rodata and allows to get
rid of the loop based lookup by just using the descriptor byte provided by
CPUID leaf 0x2 as index into the table, which simplifies the code and
reduces text size.

The conversion of the intel and cacheinfo code is done separately.

[darwi: Actually define struct leaf_0x2_table.
	Tab-align all of cpuid_0x2_table[] mapping entries.
	Define needed SZ_* macros at linux/sizes.h instead (parent commit.)
	Use CACHE_L1_{INST,DATA} as names for L1 cache descriptor types.
	Set descriptor 0x63 type as TLB_DATA_1G_2M_4M and explain why.
	Use enums for cache and TLB descriptor types (parent commits.)
	Start enum types at 1 since type 0 is reserved for unknown descriptors.
	Ensure that cache and TLB enum type values do not intersect.
	Add leaf 0x2 table accessor for_each_leaf_0x2_entry() + documentation.]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h    |  66 ++++++++++++-
 arch/x86/kernel/cpu/Makefile          |   2 +-
 arch/x86/kernel/cpu/cpuid_0x2_table.c | 128 ++++++++++++++++++++++++++
 3 files changed, 191 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/cpuid_0x2_table.c

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
index 2a4653af2ba2..c23f187060aa 100644
--- a/arch/x86/include/asm/cpuid/types.h
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -10,7 +10,7 @@
 
 /*
  * CPUID(0x2) parsing helpers
- * Check for_each_leaf_0x2_desc() documentation.
+ * Check for_each_leaf_0x2_entry() documentation.
  */
 
 struct leaf_0x2_reg {
@@ -81,21 +81,32 @@ static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)
 /*
  * Leaf 0x2 1-byte descriptors' cache types
  * To be used for their mappings at cache_table[]
+ *
+ * Start at 1 since type 0 is reserved for HW byte descriptors which are
+ * not recognized by the kernel; i.e., those without an explicit mapping
+ * entry at cpuid_0x2_table[].
  */
 enum _cache_table_type {
-	CACHE_L1_INST,
+	CACHE_L1_INST	= 1,
 	CACHE_L1_DATA,
 	CACHE_L2,
-	CACHE_L3,
+	CACHE_L3
+	/* Adjust __TLB_TABLE_TYPE_BEGIN before adding more types */
 } __packed;
 static_assert(sizeof(enum _cache_table_type) == 1);
 
+/*
+ * Ensure that leaf 0x2 cache and TLB type values do not intersect,
+ * since they share the same field at struct cpuid_0x2_table.
+ */
+#define __TLB_TABLE_TYPE_BEGIN		(CACHE_L3 + 1)
+
 /*
  * Leaf 0x2 1-byte descriptors' TLB types
  * To be used for their mappings at intel_tlb_table[]
  */
 enum _tlb_table_type {
-	TLB_INST_4K,
+	TLB_INST_4K	= __TLB_TABLE_TYPE_BEGIN,
 	TLB_INST_4M,
 	TLB_INST_2M_4M,
 	TLB_INST_ALL,
@@ -116,4 +127,51 @@ enum _tlb_table_type {
 } __packed;
 static_assert(sizeof(enum _tlb_table_type) == 1);
 
+/*
+ * Combined table for leaf 0x2 cache and TLB descriptors.
+ */
+struct leaf_0x2_table {
+	union {
+		enum _cache_table_type	c_type;
+		enum _tlb_table_type	t_type;
+	};
+	union {
+		short			c_size;
+		short			entries;
+	};
+};
+
+extern const struct leaf_0x2_table cpuid_0x2_table[256];
+
+/**
+ * for_each_leaf_0x2_entry() - Iterator for parsed leaf 0x2 descriptors
+ * @regs:   Leaf 0x2 register output, as returned by get_leaf_0x2_regs()
+ * @__ptr:  u8 pointer, for macro internal use only
+ * @entry:  Pointer to parsed descriptor information for each iteration
+ *
+ * Loop over the 1-byte descriptors in the passed leaf 0x2 output registers
+ * @regs.  Provide the parsed information for each descriptor through @entry.
+ * To handle cache specific descriptors, switch on @entry->c_type.  For TLB
+ * specific descriptors, switch on @entry->t_type.
+ *
+ * Example usage for cache descriptors::
+ *
+ *	const struct leaf_0x2_table *entry;
+ *	union leaf_0x2_regs regs;
+ *	u8 *ptr;
+ *
+ *	get_leaf_0x2_regs(&regs);
+ *	for_each_leaf_0x2_entry(regs, ptr, entry) {
+ *		switch (entry->c_type) {
+ *			...
+ *		}
+ *	}
+ *
+ */
+#define for_each_leaf_0x2_entry(regs, __ptr, entry)				\
+	/* Skip the first byte as it is not a descriptor */			\
+	for (__ptr = &(regs).desc[1], entry = &cpuid_0x2_table[*__ptr];		\
+	     __ptr < &(regs).desc[16];						\
+	     __ptr++, entry = &cpuid_0x2_table[*__ptr])
+
 #endif /* _ASM_X86_CPUID_TYPES_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 3a39396d422d..1e26179ff18c 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -24,7 +24,7 @@ obj-y			+= rdrand.o
 obj-y			+= match.o
 obj-y			+= bugs.o
 obj-y			+= aperfmperf.o
-obj-y			+= cpuid-deps.o
+obj-y			+= cpuid-deps.o cpuid_0x2_table.o
 obj-y			+= umwait.o
 obj-y 			+= capflags.o powerflags.o
 
diff --git a/arch/x86/kernel/cpu/cpuid_0x2_table.c b/arch/x86/kernel/cpu/cpuid_0x2_table.c
new file mode 100644
index 000000000000..487b87b1acd3
--- /dev/null
+++ b/arch/x86/kernel/cpu/cpuid_0x2_table.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/cpuid/types.h>
+
+#include <linux/sizes.h>
+
+#include "cpu.h"
+
+#define CACHE_ENTRY(_desc, _type, _size)	\
+	[_desc] = {				\
+		.c_type = (_type),		\
+		.c_size = (_size) / SZ_1K,	\
+	}
+
+#define TLB_ENTRY(_desc, _type, _entries)	\
+	[_desc] = {				\
+		.t_type = (_type),		\
+		.entries = (_entries),		\
+	}
+
+const struct leaf_0x2_table cpuid_0x2_table[256] = {
+	CACHE_ENTRY(0x06, CACHE_L1_INST,	SZ_8K	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x08, CACHE_L1_INST,	SZ_16K	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x09, CACHE_L1_INST,	SZ_32K	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x0a, CACHE_L1_DATA,	SZ_8K	),	/* 2 way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x0c, CACHE_L1_DATA,	SZ_16K	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x0d, CACHE_L1_DATA,	SZ_16K	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x0e, CACHE_L1_DATA,	SZ_24K	),	/* 6-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x21, CACHE_L2,		SZ_256K	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x22, CACHE_L3,		SZ_512K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x23, CACHE_L3,		SZ_1M	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x25, CACHE_L3,		SZ_2M	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x29, CACHE_L3,		SZ_4M	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x2c, CACHE_L1_DATA,	SZ_32K	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x30, CACHE_L1_INST,	SZ_32K	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x39, CACHE_L2,		SZ_128K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x3a, CACHE_L2,		SZ_192K	),	/* 6-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x3b, CACHE_L2,		SZ_128K	),	/* 2-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x3c, CACHE_L2,		SZ_256K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x3d, CACHE_L2,		SZ_384K	),	/* 6-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x3e, CACHE_L2,		SZ_512K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x3f, CACHE_L2,		SZ_256K	),	/* 2-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x41, CACHE_L2,		SZ_128K	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x42, CACHE_L2,		SZ_256K	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x43, CACHE_L2,		SZ_512K	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x44, CACHE_L2,		SZ_1M	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x45, CACHE_L2,		SZ_2M	),	/* 4-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x46, CACHE_L3,		SZ_4M	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x47, CACHE_L3,		SZ_8M	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x48, CACHE_L2,		SZ_3M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x49, CACHE_L3,		SZ_4M	),	/* 16-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x4a, CACHE_L3,		SZ_6M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x4b, CACHE_L3,		SZ_8M	),	/* 16-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x4c, CACHE_L3,		SZ_12M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x4d, CACHE_L3,		SZ_16M	),	/* 16-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x4e, CACHE_L2,		SZ_6M	),	/* 24-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x60, CACHE_L1_DATA,	SZ_16K	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x66, CACHE_L1_DATA,	SZ_8K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x67, CACHE_L1_DATA,	SZ_16K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x68, CACHE_L1_DATA,	SZ_32K	),	/* 4-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x78, CACHE_L2,		SZ_1M	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x79, CACHE_L2,		SZ_128K	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x7a, CACHE_L2,		SZ_256K	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x7b, CACHE_L2,		SZ_512K	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x7c, CACHE_L2,		SZ_1M	),	/* 8-way set assoc, sectored cache, 64 byte line size */
+	CACHE_ENTRY(0x7d, CACHE_L2,		SZ_2M	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x7f, CACHE_L2,		SZ_512K	),	/* 2-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x80, CACHE_L2,		SZ_512K	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x82, CACHE_L2,		SZ_256K	),	/* 8-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x83, CACHE_L2,		SZ_512K	),	/* 8-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x84, CACHE_L2,		SZ_1M	),	/* 8-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x85, CACHE_L2,		SZ_2M	),	/* 8-way set assoc, 32 byte line size */
+	CACHE_ENTRY(0x86, CACHE_L2,		SZ_512K	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0x87, CACHE_L2,		SZ_1M	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xd0, CACHE_L3,		SZ_512K	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xd1, CACHE_L3,		SZ_1M	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xd2, CACHE_L3,		SZ_2M	),	/* 4-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xd6, CACHE_L3,		SZ_1M	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xd7, CACHE_L3,		SZ_2M	),	/* 8-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xd8, CACHE_L3,		SZ_4M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xdc, CACHE_L3,		SZ_2M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xdd, CACHE_L3,		SZ_4M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xde, CACHE_L3,		SZ_8M	),	/* 12-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xe2, CACHE_L3,		SZ_2M	),	/* 16-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xe3, CACHE_L3,		SZ_4M	),	/* 16-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xe4, CACHE_L3,		SZ_8M	),	/* 16-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xea, CACHE_L3,		SZ_12M	),	/* 24-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xeb, CACHE_L3,		SZ_18M	),	/* 24-way set assoc, 64 byte line size */
+	CACHE_ENTRY(0xec, CACHE_L3,		SZ_24M	),	/* 24-way set assoc, 64 byte line size */
+
+	TLB_ENTRY(  0x01, TLB_INST_4K,		32	),	/* TLB_INST 4 KByte pages, 4-way set associative */
+	TLB_ENTRY(  0x02, TLB_INST_4M,		2	),	/* TLB_INST 4 MByte pages, full associative */
+	TLB_ENTRY(  0x03, TLB_DATA_4K,		64	),	/* TLB_DATA 4 KByte pages, 4-way set associative */
+	TLB_ENTRY(  0x04, TLB_DATA_4M,		8	),	/* TLB_DATA 4 MByte pages, 4-way set associative */
+	TLB_ENTRY(  0x05, TLB_DATA_4M,		32	),	/* TLB_DATA 4 MByte pages, 4-way set associative */
+	TLB_ENTRY(  0x0b, TLB_INST_4M,		4	),	/* TLB_INST 4 MByte pages, 4-way set associative */
+	TLB_ENTRY(  0x4f, TLB_INST_4K,		32	),	/* TLB_INST 4 KByte pages */
+	TLB_ENTRY(  0x50, TLB_INST_ALL,		64	),	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
+	TLB_ENTRY(  0x51, TLB_INST_ALL,		128	),	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
+	TLB_ENTRY(  0x52, TLB_INST_ALL,		256	),	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
+	TLB_ENTRY(  0x55, TLB_INST_2M_4M,	7	),	/* TLB_INST 2-MByte or 4-MByte pages, fully associative */
+	TLB_ENTRY(  0x56, TLB_DATA0_4M,		16	),	/* TLB_DATA0 4 MByte pages, 4-way set associative */
+	TLB_ENTRY(  0x57, TLB_DATA0_4K,		16	),	/* TLB_DATA0 4 KByte pages, 4-way associative */
+	TLB_ENTRY(  0x59, TLB_DATA0_4K,		16	),	/* TLB_DATA0 4 KByte pages, fully associative */
+	TLB_ENTRY(  0x5a, TLB_DATA0_2M_4M,	32	),	/* TLB_DATA0 2-MByte or 4 MByte pages, 4-way set associative */
+	TLB_ENTRY(  0x5b, TLB_DATA_4K_4M,	64	),	/* TLB_DATA 4 KByte and 4 MByte pages */
+	TLB_ENTRY(  0x5c, TLB_DATA_4K_4M,	128	),	/* TLB_DATA 4 KByte and 4 MByte pages */
+	TLB_ENTRY(  0x5d, TLB_DATA_4K_4M,	256	),	/* TLB_DATA 4 KByte and 4 MByte pages */
+	TLB_ENTRY(  0x61, TLB_INST_4K,		48	),	/* TLB_INST 4 KByte pages, full associative */
+	TLB_ENTRY(  0x63, TLB_DATA_1G_2M_4M,	4	),	/* TLB_DATA 1 GByte pages, 4-way set associative
+								 * (plus 32 entries TLB_DATA 2 MByte or 4 MByte pages, not encoded here) */
+	TLB_ENTRY(  0x6b, TLB_DATA_4K,		256	),	/* TLB_DATA 4 KByte pages, 8-way associative */
+	TLB_ENTRY(  0x6c, TLB_DATA_2M_4M,	128	),	/* TLB_DATA 2 MByte or 4 MByte pages, 8-way associative */
+	TLB_ENTRY(  0x6d, TLB_DATA_1G,		16	),	/* TLB_DATA 1 GByte pages, fully associative */
+	TLB_ENTRY(  0x76, TLB_INST_2M_4M,	8	),	/* TLB_INST 2-MByte or 4-MByte pages, fully associative */
+	TLB_ENTRY(  0xb0, TLB_INST_4K,		128	),	/* TLB_INST 4 KByte pages, 4-way set associative */
+	TLB_ENTRY(  0xb1, TLB_INST_2M_4M,	4	),	/* TLB_INST 2M pages, 4-way, 8 entries or 4M pages, 4-way entries */
+	TLB_ENTRY(  0xb2, TLB_INST_4K,		64	),	/* TLB_INST 4KByte pages, 4-way set associative */
+	TLB_ENTRY(  0xb3, TLB_DATA_4K,		128	),	/* TLB_DATA 4 KByte pages, 4-way set associative */
+	TLB_ENTRY(  0xb4, TLB_DATA_4K,		256	),	/* TLB_DATA 4 KByte pages, 4-way associative */
+	TLB_ENTRY(  0xb5, TLB_INST_4K,		64	),	/* TLB_INST 4 KByte pages, 8-way set associative */
+	TLB_ENTRY(  0xb6, TLB_INST_4K,		128	),	/* TLB_INST 4 KByte pages, 8-way set associative */
+	TLB_ENTRY(  0xba, TLB_DATA_4K,		64	),	/* TLB_DATA 4 KByte pages, 4-way associative */
+	TLB_ENTRY(  0xc0, TLB_DATA_4K_4M,	8	),	/* TLB_DATA 4 KByte and 4 MByte pages, 4-way associative */
+	TLB_ENTRY(  0xc1, STLB_4K_2M,		1024	),	/* STLB 4 KByte and 2 MByte pages, 8-way associative */
+	TLB_ENTRY(  0xc2, TLB_DATA_2M_4M,	16	),	/* TLB_DATA 2 MByte/4MByte pages, 4-way associative */
+	TLB_ENTRY(  0xca, STLB_4K,		512	),	/* STLB 4 KByte pages, 4-way associative */
+};
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 32/40] x86/cacheinfo: Use consolidated leaf 0x2 descriptor table
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (30 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 31/40] x86/cpu: Consolidate CPUID leaf 0x2 tables Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 33/40] x86/cpu: " Ahmed S. Darwish
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

cpuid leaf 0x2 output is a stream of one-byte descriptors, each implying
certain details about the CPU's cache and TLB entries.

At previous commits, the mapping tables for such descriptors were merged
into one consolidated table.  The mapping was also transformed into a
hash lookup instead of a loop-based lookup for each descriptor.

Use the new consolidated table and its hash-based lookup through the
for_each_leaf_0x2_tlb_entry() accessor.  Remove the old cache-specific
mapping, cache_table[], as it is no longer used.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h |   2 +-
 arch/x86/kernel/cpu/cacheinfo.c    | 114 ++---------------------------
 2 files changed, 9 insertions(+), 107 deletions(-)

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
index c23f187060aa..4af9f6c32895 100644
--- a/arch/x86/include/asm/cpuid/types.h
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -80,7 +80,7 @@ static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)
 
 /*
  * Leaf 0x2 1-byte descriptors' cache types
- * To be used for their mappings at cache_table[]
+ * To be used for their mappings at cpuid_0x2_table[].
  *
  * Start at 1 since type 0 is reserved for HW byte descriptors which are
  * not recognized by the kernel; i.e., those without an explicit mapping
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index a7fccbab268d..a1cfb6716272 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -34,90 +34,6 @@ static cpumask_var_t cpu_cacheinfo_mask;
 /* Kernel controls MTRR and/or PAT MSRs. */
 unsigned int memory_caching_control __ro_after_init;
 
-struct _cache_table {
-	unsigned char descriptor;
-	enum _cache_table_type type;
-	short size;
-};
-
-#define MB(x)	((x) * 1024)
-
-/* All the cache descriptor types we care about (no TLB or
-   trace cache entries) */
-
-static const struct _cache_table cache_table[] =
-{
-	{ 0x06, CACHE_L1_INST,	8	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x08, CACHE_L1_INST,	16	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x09, CACHE_L1_INST,	32	},	/* 4-way set assoc, 64 byte line size */
-	{ 0x0a, CACHE_L1_DATA,	8	},	/* 2 way set assoc, 32 byte line size */
-	{ 0x0c, CACHE_L1_DATA,	16	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x0d, CACHE_L1_DATA,	16	},	/* 4-way set assoc, 64 byte line size */
-	{ 0x0e, CACHE_L1_DATA,	24	},	/* 6-way set assoc, 64 byte line size */
-	{ 0x21, CACHE_L2,	256	},	/* 8-way set assoc, 64 byte line size */
-	{ 0x22, CACHE_L3,	512	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x23, CACHE_L3,	MB(1)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x25, CACHE_L3,	MB(2)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x29, CACHE_L3,	MB(4)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x2c, CACHE_L1_DATA,	32	},	/* 8-way set assoc, 64 byte line size */
-	{ 0x30, CACHE_L1_INST,	32	},	/* 8-way set assoc, 64 byte line size */
-	{ 0x39, CACHE_L2,	128	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3a, CACHE_L2,	192	},	/* 6-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3b, CACHE_L2,	128	},	/* 2-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3c, CACHE_L2,	256	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3d, CACHE_L2,	384	},	/* 6-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3e, CACHE_L2,	512	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x3f, CACHE_L2,	256	},	/* 2-way set assoc, 64 byte line size */
-	{ 0x41, CACHE_L2,	128	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x42, CACHE_L2,	256	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x43, CACHE_L2,	512	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x44, CACHE_L2,	MB(1)	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x45, CACHE_L2,	MB(2)	},	/* 4-way set assoc, 32 byte line size */
-	{ 0x46, CACHE_L3,	MB(4)	},	/* 4-way set assoc, 64 byte line size */
-	{ 0x47, CACHE_L3,	MB(8)	},	/* 8-way set assoc, 64 byte line size */
-	{ 0x48, CACHE_L2,	MB(3)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0x49, CACHE_L3,	MB(4)	},	/* 16-way set assoc, 64 byte line size */
-	{ 0x4a, CACHE_L3,	MB(6)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0x4b, CACHE_L3,	MB(8)	},	/* 16-way set assoc, 64 byte line size */
-	{ 0x4c, CACHE_L3,	MB(12)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0x4d, CACHE_L3,	MB(16)	},	/* 16-way set assoc, 64 byte line size */
-	{ 0x4e, CACHE_L2,	MB(6)	},	/* 24-way set assoc, 64 byte line size */
-	{ 0x60, CACHE_L1_DATA,	16	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x66, CACHE_L1_DATA,	8	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x67, CACHE_L1_DATA,	16	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x68, CACHE_L1_DATA,	32	},	/* 4-way set assoc, sectored cache, 64 byte line size */
-	{ 0x78, CACHE_L2,	MB(1)	},	/* 4-way set assoc, 64 byte line size */
-	{ 0x79, CACHE_L2,	128	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7a, CACHE_L2,	256	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7b, CACHE_L2,	512	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7c, CACHE_L2,	MB(1)	},	/* 8-way set assoc, sectored cache, 64 byte line size */
-	{ 0x7d, CACHE_L2,	MB(2)	},	/* 8-way set assoc, 64 byte line size */
-	{ 0x7f, CACHE_L2,	512	},	/* 2-way set assoc, 64 byte line size */
-	{ 0x80, CACHE_L2,	512	},	/* 8-way set assoc, 64 byte line size */
-	{ 0x82, CACHE_L2,	256	},	/* 8-way set assoc, 32 byte line size */
-	{ 0x83, CACHE_L2,	512	},	/* 8-way set assoc, 32 byte line size */
-	{ 0x84, CACHE_L2,	MB(1)	},	/* 8-way set assoc, 32 byte line size */
-	{ 0x85, CACHE_L2,	MB(2)	},	/* 8-way set assoc, 32 byte line size */
-	{ 0x86, CACHE_L2,	512	},	/* 4-way set assoc, 64 byte line size */
-	{ 0x87, CACHE_L2,	MB(1)	},	/* 8-way set assoc, 64 byte line size */
-	{ 0xd0, CACHE_L3,	512	},	/* 4-way set assoc, 64 byte line size */
-	{ 0xd1, CACHE_L3,	MB(1)	},	/* 4-way set assoc, 64 byte line size */
-	{ 0xd2, CACHE_L3,	MB(2)	},	/* 4-way set assoc, 64 byte line size */
-	{ 0xd6, CACHE_L3,	MB(1)	},	/* 8-way set assoc, 64 byte line size */
-	{ 0xd7, CACHE_L3,	MB(2)	},	/* 8-way set assoc, 64 byte line size */
-	{ 0xd8, CACHE_L3,	MB(4)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0xdc, CACHE_L3,	MB(2)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0xdd, CACHE_L3,	MB(4)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0xde, CACHE_L3,	MB(8)	},	/* 12-way set assoc, 64 byte line size */
-	{ 0xe2, CACHE_L3,	MB(2)	},	/* 16-way set assoc, 64 byte line size */
-	{ 0xe3, CACHE_L3,	MB(4)	},	/* 16-way set assoc, 64 byte line size */
-	{ 0xe4, CACHE_L3,	MB(8)	},	/* 16-way set assoc, 64 byte line size */
-	{ 0xea, CACHE_L3,	MB(12)	},	/* 24-way set assoc, 64 byte line size */
-	{ 0xeb, CACHE_L3,	MB(18)	},	/* 24-way set assoc, 64 byte line size */
-	{ 0xec, CACHE_L3,	MB(24)	},	/* 24-way set assoc, 64 byte line size */
-};
-
-
 enum _cache_type {
 	CTYPE_NULL = 0,
 	CTYPE_DATA = 1,
@@ -436,16 +352,6 @@ void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
 	ci->num_leaves = find_num_cache_leaves(c);
 }
 
-static const struct _cache_table *cache_table_get(u8 desc)
-{
-	for (int i = 0; i < ARRAY_SIZE(cache_table); i++) {
-		if (cache_table[i].descriptor == desc)
-			return &cache_table[i];
-	}
-
-	return NULL;
-}
-
 void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 {
 	/* Cache sizes */
@@ -502,21 +408,17 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 
 	/* Don't use CPUID(2) if CPUID(4) is supported. */
 	if (!ci->num_leaves && c->cpuid_level > 1) {
-		const struct _cache_table *entry;
+		const struct leaf_0x2_table *entry;
 		union leaf_0x2_regs regs;
-		u8 *desc;
+		u8 *ptr;
 
 		get_leaf_0x2_regs(&regs);
-		for_each_leaf_0x2_desc(regs, desc) {
-			entry = cache_table_get(*desc);
-			if (!entry)
-				continue;
-
-			switch (entry->type) {
-			case CACHE_L1_INST:	l1i += entry->size; break;
-			case CACHE_L1_DATA:	l1d += entry->size; break;
-			case CACHE_L2:		l2  += entry->size; break;
-			case CACHE_L3:		l3  += entry->size; break;
+		for_each_leaf_0x2_entry(regs, ptr, entry) {
+			switch (entry->c_type) {
+			case CACHE_L1_INST:	l1i += entry->c_size; break;
+			case CACHE_L1_DATA:	l1d += entry->c_size; break;
+			case CACHE_L2:		l2  += entry->c_size; break;
+			case CACHE_L3:		l3  += entry->c_size; break;
 			}
 		}
 	}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 33/40] x86/cpu: Use consolidated leaf 0x2 descriptor table
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (31 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 32/40] x86/cacheinfo: Use consolidated leaf 0x2 descriptor table Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 34/40] x86/cacheinfo: Separate leaf 0x2 handling and post-processing logic Ahmed S. Darwish
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

cpuid leaf 0x2 output is a stream of one-byte descriptors, each implying
certain details about the CPU's cache and TLB entries.

At previous commits, the mapping tables for such descriptors were merged
into one consolidated table.  The mapping was also transformed into a
hash lookup instead of a loop-based lookup for each descriptor.

Use the new consolidated table and its hash-based lookup through the
for_each_leaf_0x2_tlb_entry() accessor.

Remove the TLB-specific mapping, intel_tlb_table[], as it is now no
longer used.  Remove the cpuid/types.h macro, for_each_leaf_0x2_desc(),
since the converted code was its last user.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h | 33 ++++--------
 arch/x86/kernel/cpu/intel.c        | 83 +++---------------------------
 2 files changed, 17 insertions(+), 99 deletions(-)

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
index 4af9f6c32895..4d4ab8fc4758 100644
--- a/arch/x86/include/asm/cpuid/types.h
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -56,28 +56,6 @@ static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)
 	}
 }
 
-/**
- * for_each_leaf_0x2_desc() - Iterator for leaf 0x2 descriptors
- * @regs:	Leaf 0x2 register output, as returned by get_leaf_0x2_regs()
- * @desc:	Pointer to the returned descriptor for each iteration
- *
- * Loop over the 1-byte descriptors in the passed leaf 0x2 output registers
- * @regs.  Provide each descriptor through @desc.
- *
- * Sample usage::
- *
- *	union leaf_0x2_regs regs;
- *	u8 *desc;
- *
- *	get_leaf_0x2_regs(&regs);
- *	for_each_leaf_0x2_desc(regs, desc) {
- *		// Handle *desc value
- *	}
- */
-#define for_each_leaf_0x2_desc(regs, desc)				\
-	/* Skip the first byte as it is not a descriptor */		\
-	for (desc = &(regs).desc[1]; desc < &(regs).desc[16]; desc++)
-
 /*
  * Leaf 0x2 1-byte descriptors' cache types
  * To be used for their mappings at cpuid_0x2_table[].
@@ -103,7 +81,7 @@ static_assert(sizeof(enum _cache_table_type) == 1);
 
 /*
  * Leaf 0x2 1-byte descriptors' TLB types
- * To be used for their mappings at intel_tlb_table[]
+ * To be used for their mappings at cpuid_0x2_table[]
  */
 enum _tlb_table_type {
 	TLB_INST_4K	= __TLB_TABLE_TYPE_BEGIN,
@@ -174,4 +152,13 @@ extern const struct leaf_0x2_table cpuid_0x2_table[256];
 	     __ptr < &(regs).desc[16];						\
 	     __ptr++, entry = &cpuid_0x2_table[*__ptr])
 
+/*
+ * All of leaf 0x2's one-byte TLB descriptors implies the same number of entries
+ * for their respective TLB types.  TLB descriptor 0x63 is an exception: it
+ * implies 4 dTLB entries for 1GB pages and 32 dTLB entries for 2MB or 4MB pages.
+ * Encode that descriptor's dTLB entry count for 2MB/4MB pages here, as the entry
+ * count for dTLB 1GB pages is already encoded at the cpuid_0x2_table[]'s mapping.
+ */
+#define TLB_0x63_2M_4M_ENTRIES	32
+
 #endif /* _ASM_X86_CPUID_TYPES_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 884cd1b1e4ff..76be957196ef 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -626,81 +626,11 @@ static unsigned int intel_size_cache(struct cpuinfo_x86 *c, unsigned int size)
 }
 #endif
 
-/*
- * All of leaf 0x2's one-byte TLB descriptors implies the same number of
- * entries for their respective TLB types.  The 0x63 descriptor is an
- * exception: it implies 4 dTLB entries for 1GB pages 32 dTLB entries
- * for 2MB or 4MB pages.  Encode descriptor 0x63 dTLB entry count for
- * 2MB/4MB pages here, as its count for dTLB 1GB pages is already at the
- * intel_tlb_table[] mapping.
- */
-#define TLB_0x63_2M_4M_ENTRIES	32
-
-struct _tlb_table {
-	unsigned char descriptor;
-	enum _tlb_table_type type;
-	unsigned int entries;
-};
-
-static const struct _tlb_table intel_tlb_table[] = {
-	{ 0x01, TLB_INST_4K,		32},	/* TLB_INST 4 KByte pages, 4-way set associative */
-	{ 0x02, TLB_INST_4M,		2},	/* TLB_INST 4 MByte pages, full associative */
-	{ 0x03, TLB_DATA_4K,		64},	/* TLB_DATA 4 KByte pages, 4-way set associative */
-	{ 0x04, TLB_DATA_4M,		8},	/* TLB_DATA 4 MByte pages, 4-way set associative */
-	{ 0x05, TLB_DATA_4M,		32},	/* TLB_DATA 4 MByte pages, 4-way set associative */
-	{ 0x0b, TLB_INST_4M,		4},	/* TLB_INST 4 MByte pages, 4-way set associative */
-	{ 0x4f, TLB_INST_4K,		32},	/* TLB_INST 4 KByte pages */
-	{ 0x50, TLB_INST_ALL,		64},	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
-	{ 0x51, TLB_INST_ALL,		128},	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
-	{ 0x52, TLB_INST_ALL,		256},	/* TLB_INST 4 KByte and 2-MByte or 4-MByte pages */
-	{ 0x55, TLB_INST_2M_4M,		7},	/* TLB_INST 2-MByte or 4-MByte pages, fully associative */
-	{ 0x56, TLB_DATA0_4M,		16},	/* TLB_DATA0 4 MByte pages, 4-way set associative */
-	{ 0x57, TLB_DATA0_4K,		16},	/* TLB_DATA0 4 KByte pages, 4-way associative */
-	{ 0x59, TLB_DATA0_4K,		16},	/* TLB_DATA0 4 KByte pages, fully associative */
-	{ 0x5a, TLB_DATA0_2M_4M,	32},	/* TLB_DATA0 2-MByte or 4 MByte pages, 4-way set associative */
-	{ 0x5b, TLB_DATA_4K_4M,		64},	/* TLB_DATA 4 KByte and 4 MByte pages */
-	{ 0x5c, TLB_DATA_4K_4M,		128},	/* TLB_DATA 4 KByte and 4 MByte pages */
-	{ 0x5d, TLB_DATA_4K_4M,		256},	/* TLB_DATA 4 KByte and 4 MByte pages */
-	{ 0x61, TLB_INST_4K,		48},	/* TLB_INST 4 KByte pages, full associative */
-	{ 0x63, TLB_DATA_1G_2M_4M,	4},	/* TLB_DATA 1 GByte pages, 4-way set associative
-						 * (plus 32 entries TLB_DATA 2 MByte or 4 MByte pages, not encoded here) */
-	{ 0x6b, TLB_DATA_4K,		256},	/* TLB_DATA 4 KByte pages, 8-way associative */
-	{ 0x6c, TLB_DATA_2M_4M,		128},	/* TLB_DATA 2 MByte or 4 MByte pages, 8-way associative */
-	{ 0x6d, TLB_DATA_1G,		16},	/* TLB_DATA 1 GByte pages, fully associative */
-	{ 0x76, TLB_INST_2M_4M,		8},	/* TLB_INST 2-MByte or 4-MByte pages, fully associative */
-	{ 0xb0, TLB_INST_4K,		128},	/* TLB_INST 4 KByte pages, 4-way set associative */
-	{ 0xb1, TLB_INST_2M_4M,		4},	/* TLB_INST 2M pages, 4-way, 8 entries or 4M pages, 4-way entries */
-	{ 0xb2, TLB_INST_4K,		64},	/* TLB_INST 4KByte pages, 4-way set associative */
-	{ 0xb3, TLB_DATA_4K,		128},	/* TLB_DATA 4 KByte pages, 4-way set associative */
-	{ 0xb4, TLB_DATA_4K,		256},	/* TLB_DATA 4 KByte pages, 4-way associative */
-	{ 0xb5, TLB_INST_4K,		64},	/* TLB_INST 4 KByte pages, 8-way set associative */
-	{ 0xb6, TLB_INST_4K,		128},	/* TLB_INST 4 KByte pages, 8-way set associative */
-	{ 0xba, TLB_DATA_4K,		64},	/* TLB_DATA 4 KByte pages, 4-way associative */
-	{ 0xc0, TLB_DATA_4K_4M,		8},	/* TLB_DATA 4 KByte and 4 MByte pages, 4-way associative */
-	{ 0xc1, STLB_4K_2M,		1024},	/* STLB 4 KByte and 2 MByte pages, 8-way associative */
-	{ 0xc2, TLB_DATA_2M_4M,		16},	/* TLB_DATA 2 MByte/4MByte pages, 4-way associative */
-	{ 0xca, STLB_4K,		512},	/* STLB 4 KByte pages, 4-way associative */
-	{ 0x00, 0, 0 }
-};
-
-static void intel_tlb_lookup(const unsigned char desc)
+static void intel_tlb_lookup(const struct leaf_0x2_table *entry)
 {
-	unsigned int entries;
-	unsigned char k;
-
-	if (desc == 0)
-		return;
-
-	/* look up this descriptor in the table */
-	for (k = 0; intel_tlb_table[k].descriptor != desc &&
-	     intel_tlb_table[k].descriptor != 0; k++)
-		;
-
-	if (intel_tlb_table[k].type == 0)
-		return;
+	short entries = entry->entries;
 
-	entries = intel_tlb_table[k].entries;
-	switch (intel_tlb_table[k].type) {
+	switch (entry->t_type) {
 	case STLB_4K:
 		tlb_lli_4k = max(tlb_lli_4k, entries);
 		tlb_lld_4k = max(tlb_lld_4k, entries);
@@ -757,15 +687,16 @@ static void intel_tlb_lookup(const unsigned char desc)
 
 static void intel_detect_tlb(struct cpuinfo_x86 *c)
 {
+	const struct leaf_0x2_table *entry;
 	union leaf_0x2_regs regs;
-	u8 *desc;
+	u8 *ptr;
 
 	if (c->cpuid_level < 2)
 		return;
 
 	get_leaf_0x2_regs(&regs);
-	for_each_leaf_0x2_desc(regs, desc)
-		intel_tlb_lookup(*desc);
+	for_each_leaf_0x2_entry(regs, ptr, entry)
+		intel_tlb_lookup(entry);
 }
 
 static const struct cpu_dev intel_cpu_dev = {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 34/40] x86/cacheinfo: Separate leaf 0x2 handling and post-processing logic
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (32 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 33/40] x86/cpu: " Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 35/40] x86/cacheinfo: Separate intel leaf 0x4 handling Ahmed S. Darwish
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The logic of init_intel_cacheinfo() is quite convoluted: it mixes leaf
0x4 parsing, leaf 0x2 parsing, plus some post-processing, in a single
place.

Begin simplifying its logic by extracting the leaf 0x2 parsing code, and
the post-processing logic, into their own functions.  While at it,
rework the SMT LLC topology ID comment for clarity.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 106 +++++++++++++++++---------------
 1 file changed, 58 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index a1cfb6716272..a15538d72432 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -352,14 +352,56 @@ void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
 	ci->num_leaves = find_num_cache_leaves(c);
 }
 
-void init_intel_cacheinfo(struct cpuinfo_x86 *c)
+static void intel_cacheinfo_done(struct cpuinfo_x86 *c, unsigned int l3,
+				 unsigned int l2, unsigned int l1i, unsigned int l1d)
+{
+	/*
+	 * If llc_id is still unset, then cpuid_level < 4, which implies
+	 * that the only possibility left is SMT.  Since CPUID(2) doesn't
+	 * specify any shared caches and SMT shares all caches, we can
+	 * unconditionally set LLC ID to the package ID so that all
+	 * threads share it.
+	 */
+	if (c->topo.llc_id == BAD_APICID)
+		c->topo.llc_id = c->topo.pkg_id;
+
+	c->x86_cache_size = l3 ? l3 : (l2 ? l2 : l1i + l1d);
+
+	if (!l2)
+		cpu_detect_cache_sizes(c);
+}
+
+/*
+ * Legacy Intel CPUID(2) path if CPUID(4) is not available.
+ */
+static void intel_cacheinfo_0x2(struct cpuinfo_x86 *c)
 {
-	/* Cache sizes */
 	unsigned int l1i = 0, l1d = 0, l2 = 0, l3 = 0;
-	unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */
-	unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */
-	unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
+	const struct leaf_0x2_table *entry;
+	union leaf_0x2_regs regs;
+	u8 *ptr;
+
+	if (c->cpuid_level < 2)
+		return;
+
+	get_leaf_0x2_regs(&regs);
+	for_each_leaf_0x2_entry(regs, ptr, entry) {
+		switch (entry->c_type) {
+		case CACHE_L1_INST:	l1i += entry->c_size; break;
+		case CACHE_L1_DATA:	l1d += entry->c_size; break;
+		case CACHE_L2:		l2  += entry->c_size; break;
+		case CACHE_L3:		l3  += entry->c_size; break;
+		}
+	}
+
+	intel_cacheinfo_done(c, l3, l2, l1i, l1d);
+}
+
+void init_intel_cacheinfo(struct cpuinfo_x86 *c)
+{
 	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(c->cpu_index);
+	unsigned int l1i = 0, l1d = 0, l2 = 0, l3 = 0;
+	unsigned int l2_id = 0, l3_id = 0;
 
 	if (c->cpuid_level > 3) {
 		/*
@@ -373,7 +415,8 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 		 * Whenever possible use cpuid(4), deterministic cache
 		 * parameters cpuid leaf to find the cache details
 		 */
-		for (i = 0; i < ci->num_leaves; i++) {
+		for (int i = 0; i < ci->num_leaves; i++) {
+			unsigned int num_threads_sharing, index_msb;
 			struct _cpuid4_info id4 = {};
 			int retval;
 
@@ -384,18 +427,18 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 			switch (id4.eax.split.level) {
 			case 1:
 				if (id4.eax.split.type == CTYPE_DATA)
-					new_l1d = id4.size/1024;
+					l1d = id4.size / 1024;
 				else if (id4.eax.split.type == CTYPE_INST)
-					new_l1i = id4.size/1024;
+					l1i = id4.size / 1024;
 				break;
 			case 2:
-				new_l2 = id4.size/1024;
+				l2 = id4.size / 1024;
 				num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
 				index_msb = get_count_order(num_threads_sharing);
 				l2_id = c->topo.apicid & ~((1 << index_msb) - 1);
 				break;
 			case 3:
-				new_l3 = id4.size/1024;
+				l3 = id4.size / 1024;
 				num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
 				index_msb = get_count_order(num_threads_sharing);
 				l3_id = c->topo.apicid & ~((1 << index_msb) - 1);
@@ -408,52 +451,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 
 	/* Don't use CPUID(2) if CPUID(4) is supported. */
 	if (!ci->num_leaves && c->cpuid_level > 1) {
-		const struct leaf_0x2_table *entry;
-		union leaf_0x2_regs regs;
-		u8 *ptr;
-
-		get_leaf_0x2_regs(&regs);
-		for_each_leaf_0x2_entry(regs, ptr, entry) {
-			switch (entry->c_type) {
-			case CACHE_L1_INST:	l1i += entry->c_size; break;
-			case CACHE_L1_DATA:	l1d += entry->c_size; break;
-			case CACHE_L2:		l2  += entry->c_size; break;
-			case CACHE_L3:		l3  += entry->c_size; break;
-			}
-		}
+		intel_cacheinfo_0x2(c);
+		return;
 	}
 
-	if (new_l1d)
-		l1d = new_l1d;
-
-	if (new_l1i)
-		l1i = new_l1i;
-
-	if (new_l2) {
-		l2 = new_l2;
+	if (l2) {
 		c->topo.llc_id = l2_id;
 		c->topo.l2c_id = l2_id;
 	}
 
-	if (new_l3) {
-		l3 = new_l3;
+	if (l3)
 		c->topo.llc_id = l3_id;
-	}
 
-	/*
-	 * If llc_id is not yet set, this means cpuid_level < 4 which in
-	 * turns means that the only possibility is SMT (as indicated in
-	 * cpuid1). Since cpuid2 doesn't specify shared caches, and we know
-	 * that SMT shares all caches, we can unconditionally set cpu_llc_id to
-	 * c->topo.pkg_id.
-	 */
-	if (c->topo.llc_id == BAD_APICID)
-		c->topo.llc_id = c->topo.pkg_id;
-
-	c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
-
-	if (!l2)
-		cpu_detect_cache_sizes(c);
+	intel_cacheinfo_done(c, l3, l2, l1i, l1d);
 }
 
 static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 35/40] x86/cacheinfo: Separate intel leaf 0x4 handling
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (33 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 34/40] x86/cacheinfo: Separate leaf 0x2 handling and post-processing logic Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 36/40] x86/cacheinfo: Extract out cache level topology ID calculation Ahmed S. Darwish
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

init_intel_cacheinfo() was overly complex.  It parsed leaf 0x4 data,
leaf 0x2 data, and performed post-processing, all within one function.
Parent commit moved leaf 0x2 parsing and the post-processing logic into
their own functions.

Continue the refactoring by extracting leaf 0x4 parsing into its own
function.  Initialize local L2/L3 topology ID variables to BAD_APICID by
default, thus ensuring they can be used unconditionally.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 110 ++++++++++++++++----------------
 1 file changed, 54 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index a15538d72432..7bd3c33b7f04 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -397,73 +397,71 @@ static void intel_cacheinfo_0x2(struct cpuinfo_x86 *c)
 	intel_cacheinfo_done(c, l3, l2, l1i, l1d);
 }
 
-void init_intel_cacheinfo(struct cpuinfo_x86 *c)
+static bool intel_cacheinfo_0x4(struct cpuinfo_x86 *c)
 {
 	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(c->cpu_index);
-	unsigned int l1i = 0, l1d = 0, l2 = 0, l3 = 0;
-	unsigned int l2_id = 0, l3_id = 0;
-
-	if (c->cpuid_level > 3) {
-		/*
-		 * There should be at least one leaf. A non-zero value means
-		 * that the number of leaves has been initialized.
-		 */
-		if (!ci->num_leaves)
-			ci->num_leaves = find_num_cache_leaves(c);
+	unsigned int l2_id = BAD_APICID, l3_id = BAD_APICID;
+	unsigned int l1d = 0, l1i = 0, l2 = 0, l3 = 0;
 
-		/*
-		 * Whenever possible use cpuid(4), deterministic cache
-		 * parameters cpuid leaf to find the cache details
-		 */
-		for (int i = 0; i < ci->num_leaves; i++) {
-			unsigned int num_threads_sharing, index_msb;
-			struct _cpuid4_info id4 = {};
-			int retval;
+	if (c->cpuid_level < 4)
+		return false;
 
-			retval = intel_fill_cpuid4_info(i, &id4);
-			if (retval < 0)
-				continue;
+	/*
+	 * There should be at least one leaf. A non-zero value means
+	 * that the number of leaves has been previously initialized.
+	 */
+	if (!ci->num_leaves)
+		ci->num_leaves = find_num_cache_leaves(c);
 
-			switch (id4.eax.split.level) {
-			case 1:
-				if (id4.eax.split.type == CTYPE_DATA)
-					l1d = id4.size / 1024;
-				else if (id4.eax.split.type == CTYPE_INST)
-					l1i = id4.size / 1024;
-				break;
-			case 2:
-				l2 = id4.size / 1024;
-				num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
-				index_msb = get_count_order(num_threads_sharing);
-				l2_id = c->topo.apicid & ~((1 << index_msb) - 1);
-				break;
-			case 3:
-				l3 = id4.size / 1024;
-				num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
-				index_msb = get_count_order(num_threads_sharing);
-				l3_id = c->topo.apicid & ~((1 << index_msb) - 1);
-				break;
-			default:
-				break;
-			}
+	if (!ci->num_leaves)
+		return false;
+
+	for (int i = 0; i < ci->num_leaves; i++) {
+		unsigned int num_threads_sharing, index_msb;
+		struct _cpuid4_info id4 = {};
+		int ret;
+
+		ret = intel_fill_cpuid4_info(i, &id4);
+		if (ret < 0)
+			continue;
+
+		switch (id4.eax.split.level) {
+		case 1:
+			if (id4.eax.split.type == CTYPE_DATA)
+				l1d = id4.size / 1024;
+			else if (id4.eax.split.type == CTYPE_INST)
+				l1i = id4.size / 1024;
+			break;
+		case 2:
+			l2 = id4.size / 1024;
+			num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
+			index_msb = get_count_order(num_threads_sharing);
+			l2_id = c->topo.apicid & ~((1 << index_msb) - 1);
+			break;
+		case 3:
+			l3 = id4.size / 1024;
+			num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
+			index_msb = get_count_order(num_threads_sharing);
+			l3_id = c->topo.apicid & ~((1 << index_msb) - 1);
+			break;
+		default:
+			break;
 		}
 	}
 
+	c->topo.l2c_id = l2_id;
+	c->topo.llc_id = (l3_id == BAD_APICID) ? l2_id : l3_id;
+	intel_cacheinfo_done(c, l3, l2, l1i, l1d);
+	return true;
+}
+
+void init_intel_cacheinfo(struct cpuinfo_x86 *c)
+{
 	/* Don't use CPUID(2) if CPUID(4) is supported. */
-	if (!ci->num_leaves && c->cpuid_level > 1) {
-		intel_cacheinfo_0x2(c);
+	if (intel_cacheinfo_0x4(c))
 		return;
-	}
-
-	if (l2) {
-		c->topo.llc_id = l2_id;
-		c->topo.l2c_id = l2_id;
-	}
-
-	if (l3)
-		c->topo.llc_id = l3_id;
 
-	intel_cacheinfo_done(c, l3, l2, l1i, l1d);
+	intel_cacheinfo_0x2(c);
 }
 
 static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 36/40] x86/cacheinfo: Extract out cache level topology ID calculation
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (34 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 35/40] x86/cacheinfo: Separate intel leaf 0x4 handling Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks Ahmed S. Darwish
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

For intel leaf 0x4 parsing, refactor the cache level topology ID
calculation code into its own method instead of repeating the same logic
twice for L2 and L3.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 7bd3c33b7f04..254c0b2e1d72 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -397,6 +397,16 @@ static void intel_cacheinfo_0x2(struct cpuinfo_x86 *c)
 	intel_cacheinfo_done(c, l3, l2, l1i, l1d);
 }
 
+static unsigned int calc_cache_topo_id(struct cpuinfo_x86 *c, const struct _cpuid4_info *id4)
+{
+	unsigned int num_threads_sharing;
+	int index_msb;
+
+	num_threads_sharing = 1 + id4->eax.split.num_threads_sharing;
+	index_msb = get_count_order(num_threads_sharing);
+	return c->topo.apicid & ~((1 << index_msb) - 1);
+}
+
 static bool intel_cacheinfo_0x4(struct cpuinfo_x86 *c)
 {
 	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(c->cpu_index);
@@ -417,7 +427,6 @@ static bool intel_cacheinfo_0x4(struct cpuinfo_x86 *c)
 		return false;
 
 	for (int i = 0; i < ci->num_leaves; i++) {
-		unsigned int num_threads_sharing, index_msb;
 		struct _cpuid4_info id4 = {};
 		int ret;
 
@@ -434,15 +443,11 @@ static bool intel_cacheinfo_0x4(struct cpuinfo_x86 *c)
 			break;
 		case 2:
 			l2 = id4.size / 1024;
-			num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
-			index_msb = get_count_order(num_threads_sharing);
-			l2_id = c->topo.apicid & ~((1 << index_msb) - 1);
+			l2_id = calc_cache_topo_id(c, &id4);
 			break;
 		case 3:
 			l3 = id4.size / 1024;
-			num_threads_sharing = 1 + id4.eax.split.num_threads_sharing;
-			index_msb = get_count_order(num_threads_sharing);
-			l3_id = c->topo.apicid & ~((1 << index_msb) - 1);
+			l3_id = calc_cache_topo_id(c, &id4);
 			break;
 		default:
 			break;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (35 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 36/40] x86/cacheinfo: Extract out cache level topology ID calculation Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04 10:38   ` Andrew Cooper
  2025-03-04  8:51 ` [PATCH v1 38/40] x86/cacheinfo: Relocate leaf 0x4 cache_type mapping Ahmed S. Darwish
                   ` (4 subsequent siblings)
  41 siblings, 1 reply; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The logic of not doing a cache flush if the CPU declares cache self
snooping support is repeated across the x86/cacheinfo code.  Extract it
into its own function.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 254c0b2e1d72..ac47d1b4f775 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -640,6 +640,17 @@ int populate_cache_leaves(unsigned int cpu)
 static unsigned long saved_cr4;
 static DEFINE_RAW_SPINLOCK(cache_disable_lock);
 
+/*
+ * Cache flushing is the most time-consuming step when programming the
+ * MTRRs.  On many Intel CPUs without known erratas, it can be skipped
+ * if the CPU declares cache self-snooping support.
+ */
+static void maybe_flush_caches(void)
+{
+	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+		wbinvd();
+}
+
 void cache_disable(void) __acquires(cache_disable_lock)
 {
 	unsigned long cr0;
@@ -657,14 +668,7 @@ void cache_disable(void) __acquires(cache_disable_lock)
 	cr0 = read_cr0() | X86_CR0_CD;
 	write_cr0(cr0);
 
-	/*
-	 * Cache flushing is the most time-consuming step when programming
-	 * the MTRRs. Fortunately, as per the Intel Software Development
-	 * Manual, we can skip it if the processor supports cache self-
-	 * snooping.
-	 */
-	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
-		wbinvd();
+	maybe_flush_caches();
 
 	/* Save value of CR4 and clear Page Global Enable (bit 7) */
 	if (cpu_feature_enabled(X86_FEATURE_PGE)) {
@@ -679,9 +683,7 @@ void cache_disable(void) __acquires(cache_disable_lock)
 	if (cpu_feature_enabled(X86_FEATURE_MTRR))
 		mtrr_disable();
 
-	/* Again, only flush caches if we have to. */
-	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
-		wbinvd();
+	maybe_flush_caches();
 }
 
 void cache_enable(void) __releases(cache_disable_lock)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 38/40] x86/cacheinfo: Relocate leaf 0x4 cache_type mapping
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (36 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 39/40] x86/cacheinfo: Introduce amd_hygon_cpu_has_l3_cache() Ahmed S. Darwish
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The cache_type_map[] array is used to map Intel leaf 0x4 cache_type
values to their corresponding types at linux/cacheinfo.h.

Move that array's definition after the actual cpuid leaf 0x4 structures,
instead of having it in the middle of AMD leaf 0x4 emulation code.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index ac47d1b4f775..bb934f81dcd1 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -78,6 +78,14 @@ struct _cpuid4_info {
 	unsigned long size;
 };
 
+/* Map CPUID(4) EAX.cache_type to linux/cacheinfo.h types */
+static const enum cache_type cache_type_map[] = {
+	[CTYPE_NULL]	= CACHE_TYPE_NOCACHE,
+	[CTYPE_DATA]	= CACHE_TYPE_DATA,
+	[CTYPE_INST]	= CACHE_TYPE_INST,
+	[CTYPE_UNIFIED] = CACHE_TYPE_UNIFIED,
+};
+
 /*
  * Fallback AMD CPUID(4) emulation
  * AMD CPUs with TOPOEXT can just use CPUID(0x8000001d)
@@ -131,13 +139,6 @@ static const unsigned short assocs[] = {
 static const unsigned char levels[] = { 1, 1, 2, 3 };
 static const unsigned char types[] = { 1, 2, 3, 3 };
 
-static const enum cache_type cache_type_map[] = {
-	[CTYPE_NULL] = CACHE_TYPE_NOCACHE,
-	[CTYPE_DATA] = CACHE_TYPE_DATA,
-	[CTYPE_INST] = CACHE_TYPE_INST,
-	[CTYPE_UNIFIED] = CACHE_TYPE_UNIFIED,
-};
-
 static void legacy_amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 			      union _cpuid4_leaf_ebx *ebx, union _cpuid4_leaf_ecx *ecx)
 {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 39/40] x86/cacheinfo: Introduce amd_hygon_cpu_has_l3_cache()
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (37 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 38/40] x86/cacheinfo: Relocate leaf 0x4 cache_type mapping Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  8:51 ` [PATCH v1 40/40] x86/cacheinfo: Apply maintainer-tip coding style fixes Ahmed S. Darwish
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

Multiple code paths at cacheinfo.c and amd_nb.c check for AMD/Hygon CPUs
L3 cache presensce by directly checking leaf 0x80000006 EDX output.

Extract that logic into its own function.  While at it, rework the
AMD/Hygon LLC topology ID caclculation comments for clarity.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/include/asm/cpuid/types.h |  9 +++++++++
 arch/x86/kernel/amd_nb.c           |  7 +++----
 arch/x86/kernel/cpu/cacheinfo.c    | 32 +++++++++++++-----------------
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/cpuid/types.h b/arch/x86/include/asm/cpuid/types.h
index 4d4ab8fc4758..a01cea960ea0 100644
--- a/arch/x86/include/asm/cpuid/types.h
+++ b/arch/x86/include/asm/cpuid/types.h
@@ -161,4 +161,13 @@ extern const struct leaf_0x2_table cpuid_0x2_table[256];
  */
 #define TLB_0x63_2M_4M_ENTRIES	32
 
+/*
+ * CPUID(0x80000006) parsing helpers
+ */
+
+static inline bool amd_hygon_cpu_has_l3_cache(void)
+{
+	return cpuid_edx(0x80000006);
+}
+
 #endif /* _ASM_X86_CPUID_TYPES_H */
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index bac8d3b6f12b..e73697cefa16 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -13,7 +13,9 @@
 #include <linux/export.h>
 #include <linux/spinlock.h>
 #include <linux/pci_ids.h>
+
 #include <asm/amd_nb.h>
+#include <asm/cpuid/types.h>
 
 static u32 *flush_words;
 
@@ -92,10 +94,7 @@ static int amd_cache_northbridges(void)
 	if (amd_gart_present())
 		amd_northbridges.flags |= AMD_NB_GART;
 
-	/*
-	 * Check for L3 cache presence.
-	 */
-	if (!cpuid_edx(0x80000006))
+	if (!amd_hygon_cpu_has_l3_cache())
 		return 0;
 
 	/*
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index bb934f81dcd1..f85a3ddfc3cc 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -278,29 +278,29 @@ static int find_num_cache_leaves(struct cpuinfo_x86 *c)
 	return i;
 }
 
+/*
+ * AMD/Hygon CPUs may have multiple LLCs if L3 caches exist.
+ */
+
 void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id)
 {
-	/*
-	 * We may have multiple LLCs if L3 caches exist, so check if we
-	 * have an L3 cache by looking at the L3 cache CPUID leaf.
-	 */
-	if (!cpuid_edx(0x80000006))
+	if (!amd_hygon_cpu_has_l3_cache())
 		return;
 
 	if (c->x86 < 0x17) {
-		/* LLC is at the node level. */
+		/* Pre-Zen: LLC is at the node level */
 		c->topo.llc_id = die_id;
 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
 		/*
-		 * LLC is at the core complex level.
-		 * Core complex ID is ApicId[3] for these processors.
+		 * Family 17h up to 1F models: LLC is at the core
+		 * complex level.  Core complex ID is ApicId[3].
 		 */
 		c->topo.llc_id = c->topo.apicid >> 3;
 	} else {
 		/*
-		 * LLC ID is calculated from the number of threads sharing the
-		 * cache.
-		 * */
+		 * Newer families: LLC ID is calculated from the number
+		 * of threads sharing the L3 cache.
+		 */
 		u32 eax, ebx, ecx, edx, num_sharing_cache = 0;
 		u32 llc_index = find_num_cache_leaves(c) - 1;
 
@@ -318,16 +318,12 @@ void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id)
 
 void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c)
 {
-	/*
-	 * We may have multiple LLCs if L3 caches exist, so check if we
-	 * have an L3 cache by looking at the L3 cache CPUID leaf.
-	 */
-	if (!cpuid_edx(0x80000006))
+	if (!amd_hygon_cpu_has_l3_cache())
 		return;
 
 	/*
-	 * LLC is at the core complex level.
-	 * Core complex ID is ApicId[3] for these processors.
+	 * Hygons are similar to AMD Family 17h up to 1F models: LLC is
+	 * at the core complex level.  Core complex ID is ApicId[3].
 	 */
 	c->topo.llc_id = c->topo.apicid >> 3;
 }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v1 40/40] x86/cacheinfo: Apply maintainer-tip coding style fixes
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (38 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 39/40] x86/cacheinfo: Introduce amd_hygon_cpu_has_l3_cache() Ahmed S. Darwish
@ 2025-03-04  8:51 ` Ahmed S. Darwish
  2025-03-04  9:19 ` [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ingo Molnar
  2025-03-04  9:33 ` Ingo Molnar
  41 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  8:51 UTC (permalink / raw)
  To: Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, Andrew Cooper, x86,
	x86-cpuid, LKML, Ahmed S. Darwish

The x86/cacheinfo code has been heavily refactored and fleshed out at
parent commits, where any necessary coding style fixes were also done
in place.

Apply maintainer-tip.rst coding style fixes to the rest of the code,
and align its assignment expressions for readability.

At cacheinfo_amd_init_llc_id(), rename variable msb to index_msb as this
is how it's called at the rest of cacheinfo.c code.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
---
 arch/x86/kernel/cpu/cacheinfo.c | 214 ++++++++++++++++----------------
 1 file changed, 108 insertions(+), 106 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index f85a3ddfc3cc..a2359590dde7 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1,11 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
- *	Routines to identify caches on Intel CPU.
+ * x86 CPU caches detection and configuration
  *
- *	Changes:
- *	Venkatesh Pallipadi	: Adding cache identification through cpuid(4)
- *	Ashok Raj <ashok.raj@intel.com>: Work with CPU hotplug infrastructure.
- *	Andi Kleen / Andreas Herrmann	: CPUID4 emulation on AMD.
+ * Previous changes
+ * - Venkatesh Pallipadi:		Cache identification through CPUID(4)
+ * - Ashok Raj <ashok.raj@intel.com>:	Work with CPU hotplug infrastructure
+ * - Andi Kleen / Andreas Herrmann:	CPUID(4) emulation on AMD
  */
 
 #include <linux/cacheinfo.h>
@@ -35,37 +35,37 @@ static cpumask_var_t cpu_cacheinfo_mask;
 unsigned int memory_caching_control __ro_after_init;
 
 enum _cache_type {
-	CTYPE_NULL = 0,
-	CTYPE_DATA = 1,
-	CTYPE_INST = 2,
-	CTYPE_UNIFIED = 3
+	CTYPE_NULL	= 0,
+	CTYPE_DATA	= 1,
+	CTYPE_INST	= 2,
+	CTYPE_UNIFIED	= 3
 };
 
 union _cpuid4_leaf_eax {
 	struct {
-		enum _cache_type	type:5;
-		unsigned int		level:3;
-		unsigned int		is_self_initializing:1;
-		unsigned int		is_fully_associative:1;
-		unsigned int		reserved:4;
-		unsigned int		num_threads_sharing:12;
-		unsigned int		num_cores_on_die:6;
+		enum _cache_type	type			:5;
+		unsigned int		level			:3;
+		unsigned int		is_self_initializing	:1;
+		unsigned int		is_fully_associative	:1;
+		unsigned int		reserved		:4;
+		unsigned int		num_threads_sharing	:12;
+		unsigned int		num_cores_on_die	:6;
 	} split;
 	u32 full;
 };
 
 union _cpuid4_leaf_ebx {
 	struct {
-		unsigned int		coherency_line_size:12;
-		unsigned int		physical_line_partition:10;
-		unsigned int		ways_of_associativity:10;
+		unsigned int		coherency_line_size	:12;
+		unsigned int		physical_line_partition	:10;
+		unsigned int		ways_of_associativity	:10;
 	} split;
 	u32 full;
 };
 
 union _cpuid4_leaf_ecx {
 	struct {
-		unsigned int		number_of_sets:32;
+		unsigned int		number_of_sets		:32;
 	} split;
 	u32 full;
 };
@@ -93,60 +93,59 @@ static const enum cache_type cache_type_map[] = {
 
 union l1_cache {
 	struct {
-		unsigned line_size:8;
-		unsigned lines_per_tag:8;
-		unsigned assoc:8;
-		unsigned size_in_kb:8;
+		unsigned line_size	:8;
+		unsigned lines_per_tag	:8;
+		unsigned assoc		:8;
+		unsigned size_in_kb	:8;
 	};
-	unsigned val;
+	unsigned int val;
 };
 
 union l2_cache {
 	struct {
-		unsigned line_size:8;
-		unsigned lines_per_tag:4;
-		unsigned assoc:4;
-		unsigned size_in_kb:16;
+		unsigned line_size	:8;
+		unsigned lines_per_tag	:4;
+		unsigned assoc		:4;
+		unsigned size_in_kb	:16;
 	};
-	unsigned val;
+	unsigned int val;
 };
 
 union l3_cache {
 	struct {
-		unsigned line_size:8;
-		unsigned lines_per_tag:4;
-		unsigned assoc:4;
-		unsigned res:2;
-		unsigned size_encoded:14;
+		unsigned line_size	:8;
+		unsigned lines_per_tag	:4;
+		unsigned assoc		:4;
+		unsigned res		:2;
+		unsigned size_encoded	:14;
 	};
-	unsigned val;
+	unsigned int val;
 };
 
 static const unsigned short assocs[] = {
-	[1] = 1,
-	[2] = 2,
-	[4] = 4,
-	[6] = 8,
-	[8] = 16,
-	[0xa] = 32,
-	[0xb] = 48,
-	[0xc] = 64,
-	[0xd] = 96,
-	[0xe] = 128,
-	[0xf] = 0xffff /* fully associative - no way to show this currently */
+	[1]		= 1,
+	[2]		= 2,
+	[4]		= 4,
+	[6]		= 8,
+	[8]		= 16,
+	[0xa]		= 32,
+	[0xb]		= 48,
+	[0xc]		= 64,
+	[0xd]		= 96,
+	[0xe]		= 128,
+	[0xf]		= 0xffff	/* Fully associative */
 };
 
 static const unsigned char levels[] = { 1, 1, 2, 3 };
-static const unsigned char types[] = { 1, 2, 3, 3 };
+static const unsigned char types[]  = { 1, 2, 3, 3 };
 
 static void legacy_amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 			      union _cpuid4_leaf_ebx *ebx, union _cpuid4_leaf_ecx *ecx)
 {
 	unsigned int dummy, line_size, lines_per_tag, assoc, size_in_kb;
-	union l1_cache l1i, l1d;
+	union l1_cache l1i, l1d, *l1;
 	union l2_cache l2;
 	union l3_cache l3;
-	union l1_cache *l1 = &l1d;
 
 	eax->full = 0;
 	ebx->full = 0;
@@ -155,6 +154,7 @@ static void legacy_amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 	cpuid(0x80000005, &dummy, &dummy, &l1d.val, &l1i.val);
 	cpuid(0x80000006, &dummy, &dummy, &l2.val, &l3.val);
 
+	l1 = &l1d;
 	switch (index) {
 	case 1:
 		l1 = &l1i;
@@ -162,48 +162,52 @@ static void legacy_amd_cpuid4(int index, union _cpuid4_leaf_eax *eax,
 	case 0:
 		if (!l1->val)
 			return;
-		assoc = assocs[l1->assoc];
-		line_size = l1->line_size;
-		lines_per_tag = l1->lines_per_tag;
-		size_in_kb = l1->size_in_kb;
+
+		assoc		= assocs[l1->assoc];
+		line_size	= l1->line_size;
+		lines_per_tag	= l1->lines_per_tag;
+		size_in_kb	= l1->size_in_kb;
 		break;
 	case 2:
 		if (!l2.val)
 			return;
-		assoc = assocs[l2.assoc];
-		line_size = l2.line_size;
-		lines_per_tag = l2.lines_per_tag;
-		/* cpu_data has errata corrections for K7 applied */
-		size_in_kb = __this_cpu_read(cpu_info.x86_cache_size);
+
+		/* Use x86_cache_size as it might have K7 errata fixes */
+		assoc		= assocs[l2.assoc];
+		line_size	= l2.line_size;
+		lines_per_tag	= l2.lines_per_tag;
+		size_in_kb	= __this_cpu_read(cpu_info.x86_cache_size);
 		break;
 	case 3:
 		if (!l3.val)
 			return;
-		assoc = assocs[l3.assoc];
-		line_size = l3.line_size;
-		lines_per_tag = l3.lines_per_tag;
-		size_in_kb = l3.size_encoded * 512;
+
+		assoc		= assocs[l3.assoc];
+		line_size	= l3.line_size;
+		lines_per_tag	= l3.lines_per_tag;
+		size_in_kb	= l3.size_encoded * 512;
 		if (boot_cpu_has(X86_FEATURE_AMD_DCM)) {
-			size_in_kb = size_in_kb >> 1;
-			assoc = assoc >> 1;
+			size_in_kb	= size_in_kb >> 1;
+			assoc		= assoc >> 1;
 		}
 		break;
 	default:
 		return;
 	}
 
-	eax->split.is_self_initializing = 1;
-	eax->split.type = types[index];
-	eax->split.level = levels[index];
-	eax->split.num_threads_sharing = 0;
-	eax->split.num_cores_on_die = topology_num_cores_per_package();
+	eax->split.is_self_initializing		= 1;
+	eax->split.type				= types[index];
+	eax->split.level			= levels[index];
+	eax->split.num_threads_sharing		= 0;
+	eax->split.num_cores_on_die		= topology_num_cores_per_package();
 
 	if (assoc == 0xffff)
 		eax->split.is_fully_associative = 1;
-	ebx->split.coherency_line_size = line_size - 1;
-	ebx->split.ways_of_associativity = assoc - 1;
-	ebx->split.physical_line_partition = lines_per_tag - 1;
-	ecx->split.number_of_sets = (size_in_kb * 1024) / line_size /
+
+	ebx->split.coherency_line_size		= line_size - 1;
+	ebx->split.ways_of_associativity	= assoc - 1;
+	ebx->split.physical_line_partition	= lines_per_tag - 1;
+	ecx->split.number_of_sets		= (size_in_kb * 1024) / line_size /
 		(ebx->split.ways_of_associativity + 1) - 1;
 }
 
@@ -260,18 +264,14 @@ static int fill_cpuid4_info(int index, struct _cpuid4_info *id4)
 
 static int find_num_cache_leaves(struct cpuinfo_x86 *c)
 {
-	unsigned int		eax, ebx, ecx, edx, op;
-	union _cpuid4_leaf_eax	cache_eax;
-	int 			i = -1;
-
-	if (x86_vendor_amd_or_hygon(c->x86_vendor))
-		op = 0x8000001d;
-	else
-		op = 4;
+	unsigned int eax, ebx, ecx, edx, op;
+	union _cpuid4_leaf_eax cache_eax;
+	int i = -1;
 
+	/* Do a CPUID(op) loop to calculate num_cache_leaves */
+	op = x86_vendor_amd_or_hygon(c->x86_vendor) ? 0x8000001d : 4;
 	do {
 		++i;
-		/* Do cpuid(op) loop to find out num_cache_leaves */
 		cpuid_count(op, i, &eax, &ebx, &ecx, &edx);
 		cache_eax.full = eax;
 	} while (cache_eax.split.type != CTYPE_NULL);
@@ -309,9 +309,9 @@ void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id)
 			num_sharing_cache = ((eax >> 14) & 0xfff) + 1;
 
 		if (num_sharing_cache) {
-			int bits = get_count_order(num_sharing_cache);
+			int index_msb = get_count_order(num_sharing_cache);
 
-			c->topo.llc_id = c->topo.apicid >> bits;
+			c->topo.llc_id = c->topo.apicid >> index_msb;
 		}
 	}
 }
@@ -332,14 +332,10 @@ void init_amd_cacheinfo(struct cpuinfo_x86 *c)
 {
 	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(c->cpu_index);
 
-	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
+	if (boot_cpu_has(X86_FEATURE_TOPOEXT))
 		ci->num_leaves = find_num_cache_leaves(c);
-	} else if (c->extended_cpuid_level >= 0x80000006) {
-		if (cpuid_edx(0x80000006) & 0xf000)
-			ci->num_leaves = 4;
-		else
-			ci->num_leaves = 3;
-	}
+	else if (c->extended_cpuid_level >= 0x80000006)
+		ci->num_leaves = (cpuid_edx(0x80000006) & 0xf000) ? 4 : 3;
 }
 
 void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
@@ -466,6 +462,9 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
 	intel_cacheinfo_0x2(c);
 }
 
+/*
+ * linux/cacheinfo.h shared_cpu_map setup, AMD/Hygon
+ */
 static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 				    const struct _cpuid4_info *id4)
 {
@@ -482,12 +481,12 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 			this_cpu_ci = get_cpu_cacheinfo(i);
 			if (!this_cpu_ci->info_list)
 				continue;
+
 			ci = this_cpu_ci->info_list + index;
 			for_each_cpu(sibling, cpu_llc_shared_mask(cpu)) {
 				if (!cpu_online(sibling))
 					continue;
-				cpumask_set_cpu(sibling,
-						&ci->shared_cpu_map);
+				cpumask_set_cpu(sibling, &ci->shared_cpu_map);
 			}
 		}
 	} else if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
@@ -513,8 +512,7 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 				apicid = cpu_data(sibling).topo.apicid;
 				if ((apicid < first) || (apicid > last))
 					continue;
-				cpumask_set_cpu(sibling,
-						&ci->shared_cpu_map);
+				cpumask_set_cpu(sibling, &ci->shared_cpu_map);
 			}
 		}
 	} else
@@ -523,18 +521,22 @@ static int __cache_amd_cpumap_setup(unsigned int cpu, int index,
 	return 1;
 }
 
+/*
+ * linux/cacheinfo.h shared_cpu_map setup, Intel + fallback AMD/Hygon
+ */
 static void __cache_cpumap_setup(unsigned int cpu, int index,
 				 const struct _cpuid4_info *id4)
 {
 	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	struct cacheinfo *ci, *sibling_ci;
 	unsigned long num_threads_sharing;
 	int index_msb, i;
-	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-	if (x86_vendor_amd_or_hygon(c->x86_vendor))
+	if (x86_vendor_amd_or_hygon(c->x86_vendor)) {
 		if (__cache_amd_cpumap_setup(cpu, index, id4))
 			return;
+	}
 
 	ci = this_cpu_ci->info_list + index;
 	num_threads_sharing = 1 + id4->eax.split.num_threads_sharing;
@@ -549,8 +551,10 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 		if (cpu_data(i).topo.apicid >> index_msb == c->topo.apicid >> index_msb) {
 			struct cpu_cacheinfo *sib_cpu_ci = get_cpu_cacheinfo(i);
 
+			/* Skip if itself or no cacheinfo */
 			if (i == cpu || !sib_cpu_ci->info_list)
-				continue;/* skip if itself or no cacheinfo */
+				continue;
+
 			sibling_ci = sib_cpu_ci->info_list + index;
 			cpumask_set_cpu(i, &ci->shared_cpu_map);
 			cpumask_set_cpu(cpu, &sibling_ci->shared_cpu_map);
@@ -584,7 +588,7 @@ int init_cache_level(unsigned int cpu)
 }
 
 /*
- * The max shared threads number comes from CPUID.4:EAX[25-14] with input
+ * The max shared threads number comes from CPUID(4) EAX[25-14] with input
  * ECX as cache index. Then right shift apicid by the number's order to get
  * cache id for this cache node.
  */
@@ -620,8 +624,8 @@ int populate_cache_leaves(unsigned int cpu)
 		ci_info_init(ci++, &id4, nb);
 		__cache_cpumap_setup(cpu, idx, &id4);
 	}
-	this_cpu_ci->cpu_map_populated = true;
 
+	this_cpu_ci->cpu_map_populated = true;
 	return 0;
 }
 
@@ -653,12 +657,10 @@ void cache_disable(void) __acquires(cache_disable_lock)
 	unsigned long cr0;
 
 	/*
-	 * Note that this is not ideal
-	 * since the cache is only flushed/disabled for this CPU while the
-	 * MTRRs are changed, but changing this requires more invasive
-	 * changes to the way the kernel boots
+	 * This is not ideal since the cache is only flushed/disabled
+	 * for this CPU while the MTRRs are changed, but changing this
+	 * requires more invasive changes to the way the kernel boots.
 	 */
-
 	raw_spin_lock(&cache_disable_lock);
 
 	/* Enter the no-fill (CD=1, NW=0) cache mode and flush caches. */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest
  2025-03-04  8:51 ` [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
@ 2025-03-04  9:14   ` Ingo Molnar
  2025-03-04  9:28     ` Ahmed S. Darwish
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2025-03-04  9:14 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML


* Ahmed S. Darwish <darwi@linutronix.de> wrote:

> Remove the headers at intel.c that are no longer required.
> 
> Alphabetically reorder what remains since more headers will be included
> in further commits.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
> ---
>  arch/x86/kernel/cpu/intel.c | 35 ++++++++++++-----------------------
>  1 file changed, 12 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 134368a3f4b1..72f519534e2b 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -1,40 +1,29 @@
>  // SPDX-License-Identifier: GPL-2.0
> -#include <linux/kernel.h>
> -#include <linux/pgtable.h>
>  
> -#include <linux/string.h>
>  #include <linux/bitops.h>
> -#include <linux/smp.h>
> -#include <linux/sched.h>
> -#include <linux/sched/clock.h>
> -#include <linux/thread_info.h>
>  #include <linux/init.h>
> -#include <linux/uaccess.h>
> +#include <linux/kernel.h>
> +#include <linux/smp.h>
> +#include <linux/string.h>
> +
> +#ifdef CONFIG_X86_64
> +#include <linux/topology.h>
> +#endif
>  
> -#include <asm/cpufeature.h>
> -#include <asm/msr.h>
>  #include <asm/bugs.h>
> +#include <asm/cpu_device_id.h>
> +#include <asm/cpufeature.h>
>  #include <asm/cpu.h>
> +#include <asm/hwcap2.h>
>  #include <asm/intel-family.h>
>  #include <asm/microcode.h>
> -#include <asm/hwcap2.h>
> -#include <asm/elf.h>
> -#include <asm/cpu_device_id.h>
> -#include <asm/resctrl.h>
> +#include <asm/msr.h>
>  #include <asm/numa.h>
> +#include <asm/resctrl.h>
>  #include <asm/thermal.h>
>  
> -#ifdef CONFIG_X86_64
> -#include <linux/topology.h>
> -#endif
> -
>  #include "cpu.h"
>  
> -#ifdef CONFIG_X86_LOCAL_APIC
> -#include <asm/mpspec.h>
> -#include <asm/apic.h>
> -#endif

This patch has an unexpected side-effect on i386 allmodconfig builds:

  arch/x86/kernel/cpu/intel.c: In function ‘intel_workarounds’:
  arch/x86/kernel/cpu/intel.c:452:17: error: ‘movsl_mask’ undeclared (first use in this function)
  arch/x86/kernel/cpu/intel.c:452:17: note: each undeclared identifier is reported only once for each function it appears in
  make[5]: *** [scripts/Makefile.build:207: arch/x86/kernel/cpu/intel.o] Error 1

Due to the removal of the <asm/uaccess.h> header.

The attached patch fixes it.

Thanks,

	Ingo

==================>
 arch/x86/kernel/cpu/intel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index a7d297f6bc11..291c82816797 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -22,6 +22,7 @@
 #include <asm/numa.h>
 #include <asm/resctrl.h>
 #include <asm/thermal.h>
+#include <asm/uaccess.h>
 
 #include "cpu.h"
 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (39 preceding siblings ...)
  2025-03-04  8:51 ` [PATCH v1 40/40] x86/cacheinfo: Apply maintainer-tip coding style fixes Ahmed S. Darwish
@ 2025-03-04  9:19 ` Ingo Molnar
  2025-03-04  9:38   ` Ingo Molnar
  2025-03-04  9:33 ` Ingo Molnar
  41 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2025-03-04  9:19 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML


* Ahmed S. Darwish <darwi@linutronix.de> wrote:

> Hi,
> 
> As part of the onging x86-cpuid work [*], we've found that the handling
> of leaf 0x2 and leaf 0x4 code paths is difficult to work with in its
> current state.  This was mostly due to the organic growth of the x86/cpu
> and x86/cacheinfo logic since its very early Linux days.
> 
> This series cleans up and refactors these code paths in preparation for
> the new x86-cpuid model.

Nice!

> Summary:
> 
> - Patches 1 to 3 are independent bugfixes that were discovered during
>   this refactoring work.

I've applied these three to tip:x86/urgent. I added Cc: stable to all 3 
commits, because while these are old bugs, the first one had Cc: stable 
and if we do it for one it's justified for all of them AFAICS. Arguably 
our cacheinfo output in procps was inaccurate at times, and possibly 
these bugs were part of the problem.

> - Patches 4 to 10 are x86/cpu refactorings for code size and
>   readability.

I've applied patches 4 to 9 to tip:x86/cpu (with x86/urgent merged in 
due to dependencies and to give a singular topical base branch in the 
x86 tree), they look good and obvious. (I added the build fix to 05/40)

I've left 10 to 40 for further review by others too.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers
  2025-03-04  8:51 ` [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers Ahmed S. Darwish
@ 2025-03-04  9:26   ` Ingo Molnar
  2025-03-05 16:01     ` Ahmed S. Darwish
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2025-03-04  9:26 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML


* Ahmed S. Darwish <darwi@linutronix.de> wrote:

> Leaf 0x2 output includes a "query count" byte where it was supposed to
> specify the number of repeated cpuid leaf 0x2 subleaf 0 queries needed
> to extract all of the hardware's cache and TLB descriptors.

s/cpuid
 /CPUID

Please do this in the rest of the series too. (I did it for the first 9 
patches.)

> +++ b/arch/x86/include/asm/cpuid/types.h
> @@ -0,0 +1,79 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_CPUID_TYPES_H
> +#define _ASM_X86_CPUID_TYPES_H
> +
> +#include <linux/types.h>
> +
> +#include <asm/cpuid.h>

So that header organization is a bit messy: if <asm/cpuid.h> is 
supposed to be the main header, why is there a <asm/cpuid/types.h>?

I'd suggest we follow the FPU header structure:

  starship:~/tip/arch/x86/include/asm/fpu> ls -l
  total 48
  -rw-rw-r-- 1 mingo mingo  5732 Feb 27 19:24 api.h
  -rw-rw-r-- 1 mingo mingo   671 Feb 26 16:13 regset.h
  -rw-rw-r-- 1 mingo mingo  2203 Feb 27 13:52 sched.h
  -rw-rw-r-- 1 mingo mingo  1110 Feb 27 19:24 signal.h
  -rw-rw-r-- 1 mingo mingo 14741 Feb 27 19:24 types.h
  -rw-rw-r-- 1 mingo mingo   811 Feb 26 16:13 xcr.h
  -rw-rw-r-- 1 mingo mingo  4401 Feb 27 23:01 xstate.h

With <asm/cpuid/api.h> being the main header - established via a 
separate preparatory patch.

This followup patch can then add <asm/cpuid/types.h> which will also be 
included in <asm/cpuid/api.h>.


> +/*
> + * CPUID(0x2) parsing helpers
> + * Check for_each_leaf_0x2_desc() documentation.
> + */
> +
> +struct leaf_0x2_reg {
> +		u32		: 31,
> +			invalid	: 1;
> +};
> +
> +union leaf_0x2_regs {
> +	struct leaf_0x2_reg	reg[4];
> +	u32			regv[4];
> +	u8			desc[16];
> +};
> +
> +/**
> + * get_leaf_0x2_regs() - Return sanitized leaf 0x2 register output
> + * @regs:	Output parameter
> + *
> + * Get leaf 0x2 register output and store it in @regs.  Invalid byte
> + * descriptors returned by the hardware will be force set to zero (the
> + * NULL cache/TLB descriptor) before returning them to the caller.
> + */
> +static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)


Please prefix all new cpuid API functions and types with cpuid_.

> +#define for_each_leaf_0x2_desc(regs, desc)				\
> +	/* Skip the first byte as it is not a descriptor */		\
> +	for (desc = &(regs).desc[1]; desc < &(regs).desc[16]; desc++)

The comment line can come before the macro.

> +	get_leaf_0x2_regs(&regs);
> +	for_each_leaf_0x2_desc(regs, desc)
> +		intel_tlb_lookup(*desc);

Nice interface otherwise.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest
  2025-03-04  9:14   ` Ingo Molnar
@ 2025-03-04  9:28     ` Ahmed S. Darwish
  0 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-04  9:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML

On Tue, 04 Mar 2025, Ingo Molnar wrote:
>
> This patch has an unexpected side-effect on i386 allmodconfig builds:
>
>   arch/x86/kernel/cpu/intel.c: In function ‘intel_workarounds’:
>   arch/x86/kernel/cpu/intel.c:452:17: error: ‘movsl_mask’ undeclared (first use in this function)
>   arch/x86/kernel/cpu/intel.c:452:17: note: each undeclared identifier is reported only once for each function it appears in
>   make[5]: *** [scripts/Makefile.build:207: arch/x86/kernel/cpu/intel.o] Error 1
>
> Due to the removal of the <asm/uaccess.h> header.
>
> The attached patch fixes it.
>

Thanks a lot, will do.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings
  2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
                   ` (40 preceding siblings ...)
  2025-03-04  9:19 ` [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ingo Molnar
@ 2025-03-04  9:33 ` Ingo Molnar
  2025-03-05 16:38   ` Ahmed S. Darwish
  41 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2025-03-04  9:33 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML


* Ahmed S. Darwish <darwi@linutronix.de> wrote:

> Ahmed S. Darwish (33):
>   x86/cacheinfo: Validate cpuid leaf 0x2 EDX output
>   x86/cpu: Validate cpuid leaf 0x2 EDX output
>   x86/cpu: Properly parse leaf 0x2 TLB descriptor 0x63
>   x86/cpuid: Include linux/build_bug.h
>   x86/cpu: Remove unnecessary headers and reorder the rest
>   x86/cpu: Use max() for leaf 0x2 TLB descriptors parsing
>   x86/cpu: Simplify TLB entry count storage
>   x86/cpu: Remove leaf 0x2 parsing loop and add helpers
>   x86/cacheinfo: Remove unnecessary headers and reorder the rest
>   x86/cacheinfo: Use cpuid leaf 0x2 parsing helpers
>   x86/cacheinfo: Constify _cpuid4_info_regs instances
>   x86/cacheinfo: Align ci_info_init() assignment expressions
>   x86/cacheinfo: Standardize _cpuid4_info_regs instance naming
>   x86: treewide: Introduce x86_vendor_amd_or_hygon()
>   x86/cacheinfo: Consolidate AMD/Hygon leaf 0x8000001d calls
>   x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs
>   x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file
>   x86/cacheinfo: Use sysfs_emit() for sysfs attributes show()
>   x86/cacheinfo: Separate Intel and AMD leaf 0x4 code paths
>   x86/cacheinfo: Rename _cpuid4_info_regs to _cpuid4_info
>   x86/cacheinfo: Clarify type markers for leaf 0x2 cache descriptors
>   x86/cacheinfo: Use enums for cache descriptor types
>   x86/cpu: Use enums for TLB descriptor types
>   sizes.h: Cover all possible x86 cpu cache sizes
>   x86/cacheinfo: Use consolidated leaf 0x2 descriptor table
>   x86/cpu: Use consolidated leaf 0x2 descriptor table
>   x86/cacheinfo: Separate leaf 0x2 handling and post-processing logic
>   x86/cacheinfo: Separate intel leaf 0x4 handling
>   x86/cacheinfo: Extract out cache level topology ID calculation
>   x86/cacheinfo: Extract out cache self-snoop checks
>   x86/cacheinfo: Relocate leaf 0x4 cache_type mapping
>   x86/cacheinfo: Introduce amd_hygon_cpu_has_l3_cache()
>   x86/cacheinfo: Apply maintainer-tip coding style fixes

Meta spelling comments for the entire series:

Please capitalize acronyms and names consistently in titles, changelogs 
and comments alike:

 s/cpu
  /CPU

 s/intel
  /Intel

When referring to headers, please write out their canonical names where 
appropriate - for example:

   - x86/cpuid: Include linux/build_bug.h
   + x86/cpuid: Include <linux/build_bug.h>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes
  2025-03-04  8:51 ` [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes Ahmed S. Darwish
@ 2025-03-04  9:35   ` Ingo Molnar
  2025-03-05 16:18     ` Ahmed S. Darwish
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2025-03-04  9:35 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML


* Ahmed S. Darwish <darwi@linutronix.de> wrote:

> Add size macros for 24/192/384 Kilobyes and 3/6/12/18/24 Megabytes.
> 
> With that, the x86 subsystem can avoid locally defining its own macros
> for CPU cache sizs.

Please take some time to read your own changelogs:

 s/Kilobyes
  /Kilobytes

 s/sizs
  /sizes

 s/cpu
  /CPU

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings
  2025-03-04  9:19 ` [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ingo Molnar
@ 2025-03-04  9:38   ` Ingo Molnar
  2025-03-05 17:36     ` Ahmed S. Darwish
  0 siblings, 1 reply; 57+ messages in thread
From: Ingo Molnar @ 2025-03-04  9:38 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML


* Ingo Molnar <mingo@kernel.org> wrote:

> > Summary:
> > 
> > - Patches 1 to 3 are independent bugfixes that were discovered during
> >   this refactoring work.
> 
> I've applied these three to tip:x86/urgent. I added Cc: stable to all 3 
> commits, because while these are old bugs, the first one had Cc: stable 
> and if we do it for one it's justified for all of them AFAICS. Arguably 
> our cacheinfo output in procps was inaccurate at times, and possibly 
> these bugs were part of the problem.
> 
> > - Patches 4 to 10 are x86/cpu refactorings for code size and
> >   readability.
> 
> I've applied patches 4 to 9 to tip:x86/cpu (with x86/urgent merged in 
> due to dependencies and to give a singular topical base branch in the 
> x86 tree), they look good and obvious. (I added the build fix to 05/40)
> 
> I've left 10 to 40 for further review by others too.

While going through the rest I also picked up these patches as easy 
preparatory commits in tip:x86/cpu, there was no reason to have them 
later in the series:

  29517791c478 x86/cacheinfo: Remove the P4 trace leftovers for real
  d61b5118f719 x86/cacheinfo: Remove unnecessary headers and reorder the rest
  0d22030c49bf <linux/sizes.h>: Cover all possible x86 CPU cache sizes

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks
  2025-03-04  8:51 ` [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks Ahmed S. Darwish
@ 2025-03-04 10:38   ` Andrew Cooper
  2025-03-05 18:40     ` Ahmed S. Darwish
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2025-03-04 10:38 UTC (permalink / raw)
  To: Ahmed S. Darwish, Borislav Petkov, Ingo Molnar, Dave Hansen
  Cc: Thomas Gleixner, John Ogness, H. Peter Anvin, x86, x86-cpuid,
	LKML

On 04/03/2025 8:51 am, Ahmed S. Darwish wrote:
> The logic of not doing a cache flush if the CPU declares cache self
> snooping support is repeated across the x86/cacheinfo code.  Extract it
> into its own function.
>
> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>

I know you're just refactoring code, but the SDM has basically reverted
this statement about it being safe to skip WBINVD based on SELFSNOOP.

It turns out not to be safe in cases where the underlying physical
memory changes from cacheable to unchangeable.  By skipping the WBINVD
as part of changing the memory type, you end up with spurious writebacks
at a later point when the memory is expected to be UC.  Apparently this
is a problem for CLX devices, hence the change in the SDM.

~Andrew

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers
  2025-03-04  9:26   ` Ingo Molnar
@ 2025-03-05 16:01     ` Ahmed S. Darwish
  0 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-05 16:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML

Hi Ingo,

On Tue, 04 Mar 2025, Ingo Molnar wrote:
> ...
> * Ahmed S. Darwish <darwi@linutronix.de> wrote:
> > +
> > +#include <linux/types.h>
> > +
> > +#include <asm/cpuid.h>
>
> So that header organization is a bit messy: if <asm/cpuid.h> is
> supposed to be the main header, why is there a <asm/cpuid/types.h>?
>
> I'd suggest we follow the FPU header structure:
>
>   starship:~/tip/arch/x86/include/asm/fpu> ls -l
>   total 48
>   -rw-rw-r-- 1 mingo mingo  5732 Feb 27 19:24 api.h
>   -rw-rw-r-- 1 mingo mingo   671 Feb 26 16:13 regset.h
>   -rw-rw-r-- 1 mingo mingo  2203 Feb 27 13:52 sched.h
>   -rw-rw-r-- 1 mingo mingo  1110 Feb 27 19:24 signal.h
>   -rw-rw-r-- 1 mingo mingo 14741 Feb 27 19:24 types.h
>   -rw-rw-r-- 1 mingo mingo   811 Feb 26 16:13 xcr.h
>   -rw-rw-r-- 1 mingo mingo  4401 Feb 27 23:01 xstate.h
>
> With <asm/cpuid/api.h> being the main header - established via a
> separate preparatory patch.
>
> This followup patch can then add <asm/cpuid/types.h> which will also be
> included in <asm/cpuid/api.h>.
>

Sounds sensible.  Thanks!

FYI, in our CPUID-model patch queue (the one after this), we have
something like:

    <asm/cpuid/>
    │
    ├── leaves.h   CPUID bitfields; auto-generated by x86-cpuid-db
    ├── data.h     Internal data structures for the model
    ├── api.h      The new CPUID-model API
    └── ops.h      The raw CPUID ops [Formerly <asm/cpuid.h>]

So doing this from within this PQ should fit nicely.

> > +
> > +/**
> > + * get_leaf_0x2_regs() - Return sanitized leaf 0x2 register output
> > + * @regs:	Output parameter
> > + *
> > + * Get leaf 0x2 register output and store it in @regs.  Invalid byte
> > + * descriptors returned by the hardware will be force set to zero (the
> > + * NULL cache/TLB descriptor) before returning them to the caller.
> > + */
> > +static inline void get_leaf_0x2_regs(union leaf_0x2_regs *regs)
>
> Please prefix all new cpuid API functions and types with cpuid_.
>

ACK.

> > +#define for_each_leaf_0x2_desc(regs, desc)				\
> > +	/* Skip the first byte as it is not a descriptor */		\
> > +	for (desc = &(regs).desc[1]; desc < &(regs).desc[16]; desc++)
>
> The comment line can come before the macro.
>

ACK.

> > +	get_leaf_0x2_regs(&regs);
> > +	for_each_leaf_0x2_desc(regs, desc)
> > +		intel_tlb_lookup(*desc);
>
> Nice interface otherwise.
>

Thanks!
Ahmed

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes
  2025-03-04  9:35   ` Ingo Molnar
@ 2025-03-05 16:18     ` Ahmed S. Darwish
  0 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-05 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML

Hi Ingo,

On Tue, 04 Mar 2025, Ingo Molnar wrote:
>
> * Ahmed S. Darwish <darwi@linutronix.de> wrote:
>
> > Add size macros for 24/192/384 Kilobyes and 3/6/12/18/24 Megabytes.
> >
> > With that, the x86 subsystem can avoid locally defining its own macros
> > for CPU cache sizs.
>
> Please take some time to read your own changelogs:
>
>  s/Kilobyes
>   /Kilobytes
>
>  s/sizs
>   /sizes
>

Sorry about that.

I just don't see my own typos, no matter how many times my eyes actually
passes over the text.  I'll integrate a checker in my workflow.

Thanks,
Ahmed

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings
  2025-03-04  9:33 ` Ingo Molnar
@ 2025-03-05 16:38   ` Ahmed S. Darwish
  0 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-05 16:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML

On Tue, 04 Mar 2025, Ingo Molnar wrote:
...
>
> Please capitalize acronyms and names consistently in titles, changelogs
> and comments alike:
...
> When referring to headers, please write out their canonical names where
> appropriate - for example:
>

Thanks, will do.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings
  2025-03-04  9:38   ` Ingo Molnar
@ 2025-03-05 17:36     ` Ahmed S. Darwish
  0 siblings, 0 replies; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-05 17:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, Andrew Cooper, x86, x86-cpuid, LKML

On Tue, 04 Mar 2025, Ingo Molnar wrote:
> >
> > I've applied these three to tip:x86/urgent. I added Cc: stable to all 3
> > commits, because while these are old bugs, the first one had Cc: stable
> > and if we do it for one it's justified for all of them AFAICS. Arguably
> > our cacheinfo output in procps was inaccurate at times, and possibly
> > these bugs were part of the problem.
> >
...
> >
> > I've applied patches 4 to 9 to tip:x86/cpu (with x86/urgent merged in
> > due to dependencies and to give a singular topical base branch in the
> > x86 tree), they look good and obvious. (I added the build fix to 05/40)
> >
> > I've left 10 to 40 for further review by others too.
>
> While going through the rest I also picked up these patches as easy
> preparatory commits in tip:x86/cpu, there was no reason to have them
> later in the series:
>
>   29517791c478 x86/cacheinfo: Remove the P4 trace leftovers for real
>   d61b5118f719 x86/cacheinfo: Remove unnecessary headers and reorder the rest
>   0d22030c49bf <linux/sizes.h>: Cover all possible x86 CPU cache sizes
>

ACK on all the notes.

Thanks a lot for the amazing turnaround time :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks
  2025-03-04 10:38   ` Andrew Cooper
@ 2025-03-05 18:40     ` Ahmed S. Darwish
  2025-03-05 18:42       ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-05 18:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, x86, x86-cpuid, LKML

Hi Andrew,

On Tue, 04 Mar 2025, Andrew Cooper wrote:
>
> On 04/03/2025 8:51 am, Ahmed S. Darwish wrote:
> > The logic of not doing a cache flush if the CPU declares cache self
> > snooping support is repeated across the x86/cacheinfo code.  Extract it
> > into its own function.
> >
> > Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
>
> I know you're just refactoring code, but the SDM has basically reverted
> this statement about it being safe to skip WBINVD based on SELFSNOOP.
>

Still, thanks a lot for sharing :)

> It turns out not to be safe in cases where the underlying physical
> memory changes from cacheable to unchangeable.  By skipping the WBINVD
> as part of changing the memory type, you end up with spurious writebacks
> at a later point when the memory is expected to be UC.  Apparently this
> is a problem for CLX devices, hence the change in the SDM.

While writing that refactoring patch, I indeed noticed that there is an
errata list of CPUs where X86_FEATURE_SELFSNOOP is force disabled, thus
ensuring WBINVD is never skipped:

    static void check_memory_type_self_snoop_errata(...)
    {
     	switch (c->x86_vfm) {
     	case INTEL_CORE_YONAH:
     	case INTEL_CORE2_MEROM:
     	case INTEL_CORE2_MEROM_L:
     	case INTEL_CORE2_PENRYN:
     	case INTEL_CORE2_DUNNINGTON:
     	case INTEL_NEHALEM:
     	case INTEL_NEHALEM_G:
     	case INTEL_NEHALEM_EP:
     	case INTEL_NEHALEM_EX:
     	case INTEL_WESTMERE:
     	case INTEL_WESTMERE_EP:
     	case INTEL_SANDYBRIDGE:
     		setup_clear_cpu_cap(X86_FEATURE_SELFSNOOP);
     	}
    }

That's why I added "CPUs without known erratas" in the comments:

    /*
     * Cache flushing is the most time-consuming step when programming
     * the MTRRs.  On many Intel CPUs without known erratas, it can be
     * skipped if the CPU declares cache self-snooping support.
     */
    static void maybe_flush_caches(void)
    {
           if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
                   wbinvd();
    }

But, interestingly, CLX devices (intel-family.h CASCADELAKE_X /
SKYLAKE_X) are not part of the kernel's Self Snoop errata list above.

@Thomas, @Ingo, any ideas?

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks
  2025-03-05 18:40     ` Ahmed S. Darwish
@ 2025-03-05 18:42       ` Andrew Cooper
  2025-03-05 18:58         ` Ahmed S. Darwish
  0 siblings, 1 reply; 57+ messages in thread
From: Andrew Cooper @ 2025-03-05 18:42 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, x86, x86-cpuid, LKML

On 05/03/2025 6:40 pm, Ahmed S. Darwish wrote:
> Hi Andrew,
>
> On Tue, 04 Mar 2025, Andrew Cooper wrote:
>> On 04/03/2025 8:51 am, Ahmed S. Darwish wrote:
>>> The logic of not doing a cache flush if the CPU declares cache self
>>> snooping support is repeated across the x86/cacheinfo code.  Extract it
>>> into its own function.
>>>
>>> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
>> I know you're just refactoring code, but the SDM has basically reverted
>> this statement about it being safe to skip WBINVD based on SELFSNOOP.
>>
> Still, thanks a lot for sharing :)
>
>> It turns out not to be safe in cases where the underlying physical
>> memory changes from cacheable to unchangeable.  By skipping the WBINVD
>> as part of changing the memory type, you end up with spurious writebacks
>> at a later point when the memory is expected to be UC.  Apparently this
>> is a problem for CLX devices, hence the change in the SDM.
> While writing that refactoring patch, I indeed noticed that there is an
> errata list of CPUs where X86_FEATURE_SELFSNOOP is force disabled, thus
> ensuring WBINVD is never skipped:
>
>     static void check_memory_type_self_snoop_errata(...)
>     {
>      	switch (c->x86_vfm) {
>      	case INTEL_CORE_YONAH:
>      	case INTEL_CORE2_MEROM:
>      	case INTEL_CORE2_MEROM_L:
>      	case INTEL_CORE2_PENRYN:
>      	case INTEL_CORE2_DUNNINGTON:
>      	case INTEL_NEHALEM:
>      	case INTEL_NEHALEM_G:
>      	case INTEL_NEHALEM_EP:
>      	case INTEL_NEHALEM_EX:
>      	case INTEL_WESTMERE:
>      	case INTEL_WESTMERE_EP:
>      	case INTEL_SANDYBRIDGE:
>      		setup_clear_cpu_cap(X86_FEATURE_SELFSNOOP);
>      	}
>     }
>
> That's why I added "CPUs without known erratas" in the comments:
>
>     /*
>      * Cache flushing is the most time-consuming step when programming
>      * the MTRRs.  On many Intel CPUs without known erratas, it can be
>      * skipped if the CPU declares cache self-snooping support.
>      */
>     static void maybe_flush_caches(void)
>     {
>            if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
>                    wbinvd();
>     }
>
> But, interestingly, CLX devices (intel-family.h CASCADELAKE_X /
> SKYLAKE_X) are not part of the kernel's Self Snoop errata list above.

CLX (Cascade Lake) != CXL (Compute eXpress Link).

CXL is the new PCIe.  (So say the CXL consortium at least.)

~Andrew

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks
  2025-03-05 18:42       ` Andrew Cooper
@ 2025-03-05 18:58         ` Ahmed S. Darwish
  2025-03-05 19:01           ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Ahmed S. Darwish @ 2025-03-05 18:58 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, x86, x86-cpuid, LKML

On Wed, 05 Mar 2025, Andrew Cooper wrote:
...
> It turns out not to be safe in cases where the underlying physical
> memory changes from cacheable to unchangeable.  By skipping the WBINVD
> as part of changing the memory type, you end up with spurious writebacks
> at a later point when the memory is expected to be UC.  Apparently this
> is a problem for CLX devices, hence the change in the SDM.
...
>
> CLX (Cascade Lake) != CXL (Compute eXpress Link).
>
> CXL is the new PCIe.  (So say the CXL consortium at least.)
>

Oh, sorry, you wrote "CLX devices" above, not CXL... Only thing my poor
brain could come up with was CASCADELAKE_X :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks
  2025-03-05 18:58         ` Ahmed S. Darwish
@ 2025-03-05 19:01           ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2025-03-05 19:01 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Borislav Petkov, Ingo Molnar, Dave Hansen, Thomas Gleixner,
	John Ogness, H. Peter Anvin, x86, x86-cpuid, LKML

On 05/03/2025 6:58 pm, Ahmed S. Darwish wrote:
> On Wed, 05 Mar 2025, Andrew Cooper wrote:
> ...
>> It turns out not to be safe in cases where the underlying physical
>> memory changes from cacheable to unchangeable.  By skipping the WBINVD
>> as part of changing the memory type, you end up with spurious writebacks
>> at a later point when the memory is expected to be UC.  Apparently this
>> is a problem for CLX devices, hence the change in the SDM.
> ...
>> CLX (Cascade Lake) != CXL (Compute eXpress Link).
>>
>> CXL is the new PCIe.  (So say the CXL consortium at least.)
>>
> Oh, sorry, you wrote "CLX devices" above, not CXL... Only thing my poor
> brain could come up with was CASCADELAKE_X :)

Oops, so I did.  Sorry.

~Andrew

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2025-03-05 19:01 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04  8:51 [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 01/40] x86/cacheinfo: Validate cpuid leaf 0x2 EDX output Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 02/40] x86/cpu: " Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 03/40] x86/cpu: Properly parse leaf 0x2 TLB descriptor 0x63 Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 04/40] x86/cpuid: Include linux/build_bug.h Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 05/40] x86/cpu: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
2025-03-04  9:14   ` Ingo Molnar
2025-03-04  9:28     ` Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 06/40] x86/cpu: Use max() for leaf 0x2 TLB descriptors parsing Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 07/40] x86/cpu: Simplify TLB entry count storage Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 08/40] x86/cpu: Get rid of smp_store_cpu_info() indirection Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 09/40] x86/cpu: Remove unused TLB strings Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 10/40] x86/cpu: Remove leaf 0x2 parsing loop and add helpers Ahmed S. Darwish
2025-03-04  9:26   ` Ingo Molnar
2025-03-05 16:01     ` Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 11/40] x86/cacheinfo: Remove the P4 trace leftovers for real Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 12/40] x86/cacheinfo: Remove unnecessary headers and reorder the rest Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 13/40] x86/cacheinfo: Use cpuid leaf 0x2 parsing helpers Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 14/40] x86/cacheinfo: Refactor leaf 0x2 cache descriptor lookup Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 15/40] x86/cacheinfo: Properly name amd_cpuid4()'s first parameter Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 16/40] x86/cacheinfo: Use proper name for cacheinfo instances Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 17/40] x86/cacheinfo: Constify _cpuid4_info_regs instances Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 18/40] x86/cacheinfo: Align ci_info_init() assignment expressions Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 19/40] x86/cacheinfo: Standardize _cpuid4_info_regs instance naming Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 20/40] x86: treewide: Introduce x86_vendor_amd_or_hygon() Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 21/40] x86/cacheinfo: Consolidate AMD/Hygon leaf 0x8000001d calls Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 22/40] x86/cacheinfo: Separate amd_northbridge from _cpuid4_info_regs Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 23/40] x86/cacheinfo: Move AMD cache_disable_0/1 handling to separate file Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 24/40] x86/cacheinfo: Use sysfs_emit() for sysfs attributes show() Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 25/40] x86/cacheinfo: Separate Intel and AMD leaf 0x4 code paths Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 26/40] x86/cacheinfo: Rename _cpuid4_info_regs to _cpuid4_info Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 27/40] x86/cacheinfo: Clarify type markers for leaf 0x2 cache descriptors Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 28/40] x86/cacheinfo: Use enums for cache descriptor types Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 29/40] x86/cpu: Use enums for TLB " Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 30/40] sizes.h: Cover all possible x86 cpu cache sizes Ahmed S. Darwish
2025-03-04  9:35   ` Ingo Molnar
2025-03-05 16:18     ` Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 31/40] x86/cpu: Consolidate CPUID leaf 0x2 tables Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 32/40] x86/cacheinfo: Use consolidated leaf 0x2 descriptor table Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 33/40] x86/cpu: " Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 34/40] x86/cacheinfo: Separate leaf 0x2 handling and post-processing logic Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 35/40] x86/cacheinfo: Separate intel leaf 0x4 handling Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 36/40] x86/cacheinfo: Extract out cache level topology ID calculation Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 37/40] x86/cacheinfo: Extract out cache self-snoop checks Ahmed S. Darwish
2025-03-04 10:38   ` Andrew Cooper
2025-03-05 18:40     ` Ahmed S. Darwish
2025-03-05 18:42       ` Andrew Cooper
2025-03-05 18:58         ` Ahmed S. Darwish
2025-03-05 19:01           ` Andrew Cooper
2025-03-04  8:51 ` [PATCH v1 38/40] x86/cacheinfo: Relocate leaf 0x4 cache_type mapping Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 39/40] x86/cacheinfo: Introduce amd_hygon_cpu_has_l3_cache() Ahmed S. Darwish
2025-03-04  8:51 ` [PATCH v1 40/40] x86/cacheinfo: Apply maintainer-tip coding style fixes Ahmed S. Darwish
2025-03-04  9:19 ` [PATCH v1 00/40] x86: Leaf 0x2 and leaf 0x4 refactorings Ingo Molnar
2025-03-04  9:38   ` Ingo Molnar
2025-03-05 17:36     ` Ahmed S. Darwish
2025-03-04  9:33 ` Ingo Molnar
2025-03-05 16:38   ` Ahmed S. Darwish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox