[PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature
@ 2024-10-01 16:10 Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

Hi all,

I'd like to ask initial feedback on this series enabling the staging
feature. Thanks!

== Latency Spike Issue ==

As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [1]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.

== Staging for Latency Reduction ==

Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.

== Cache Flush Removal ==

Before resolving this latency spike caused by larger images, another
major latency issue -- cache invalidation [2] -- must first be addressed.
Originally introduced to handle a specific erratum, this cache
invalidation is now unnecessary because the problematic microcode images
have been banned. This cache flush has been found to negate the benefits
of staging, so this patch series begins by removing the WRINVD
instruction.

== Validation ==

We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.

Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].

== Call for Review ==

This RFC series aims to present the proposed approach for community
review, to assess its soundness, and to discuss potential alternatives
if necessary. There are several key points to highlight for feedback:

  1. Staging Integration Approach

     In the core code, the high-level sequence for late loading is:

     (1) request_microcode_fw(), and
     (2) load_late_stop_cpus()->apply_microcode()

     Staging doesn't fit neatly into either steps, as it involves the
     loading process but not the activation. Therefore, a new callback is
     introduced:

       core::load_late_locked()
       -> intel::staging_microcode()
          -> intel_staging::staging_work()
             -> intel_staging::...

  2. Code Abstraction

     The newly added intel_staging.c file contains all staging-related
     code to keep it self-contained. Ideally, the entire firmware
     interaction could eventually be abstracted into a single MSR write,
     which remains a long-term goal. Fortunately, recent protocol
     simplifications have made this more feasible.

  3. Staging Policy (TODO)

     While staging is always attempted, the system will fall back to the
     legacy update method if staging fails. There is an open question
     regarding staging policy: should it be mandatory, without fallback,
     in certain usage scenarios? This could lead further refinements in
     the flow depending on feedback and use cases.

  4. Specification Updates

     Recent specification updates have simplified the staging protocol
     and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
     with staging:

     4.1. Protocol Simplification

     The specification update [3] has significantly reduced the
     complexity of staging code, trimming the kernel code from ~1K lines
     in preliminary implementations. Thanks to Dave for guiding this
     redesign effort.

     4.2. Clarification of Legacy Update Behavior

     Chapter 5 of the specification adds further clarification on
     MSR_IA32_UCODE_WRITE. Key points are summarized below:

     (a) When staging is not performed or failed, a WRMSR will still load
     the patch image, but with higher latency.

     (b) During an active staging process, MSR_IA32_UCODE_WRITE can
     load a new microcode image, again with higher latency.

     (c) If the versions differ between the staged microcode and the
     version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
     the MSR takes precedence.

     I'd also make sure there is no further ambiguity in this documentation
     [3]. Feel free to provide feedback if anything seems unclear or
     unreasonable.

As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.

This series is based on 6.12-rc1. You can also find it from this repo:
    git://github.com/intel-staging/microcode.git staging_rfc-v1

Thanks,
Chang

[1]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[2]: https://lore.kernel.org/all/20240701212012.21499-1-chang.seok.bae@intel.com/
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
     ‘Uniform’ feature series.

Chang S. Bae (7):
  x86/microcode/intel: Remove unnecessary cache writeback and
    invalidation
  x86/microcode: Introduce staging option to reduce late-loading latency
  x86/msr-index: Define MSR index and bit for the microcode staging
    feature
  x86/microcode/intel: Prepare for microcode staging
  x86/microcode/intel_staging: Implement staging logic
  x86/microcode/intel_staging: Support mailbox data transfer
  x86/microcode/intel: Enable staging when available

 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |  12 +-
 arch/x86/kernel/cpu/microcode/intel.c         |  77 ++++++++-
 arch/x86/kernel/cpu/microcode/intel_staging.c | 154 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   5 +-
 6 files changed, 247 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

-- 
2.43.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-10-25 16:24   ` [tip: x86/microcode] " tip-bot2 for Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae, Yan Hua Wu,
	William Xie, Ashok Raj

Currently, an unconditional cache flush is performed during every
microcode update. Although the original changelog did not mention a
specific erratum, this measure was primarily intended to address a
specific microcode bug, the load of which has already been blocked by
is_blacklisted(). Therefore, this cache flush is no longer necessary.

Additionally, the side effects of doing this have been overlooked. It
increases CPU rendezvous time during late loading, where the cache flush
takes between 1x to 3.5x longer than the actual microcode update.

Remove native_wbinvd() and update the erratum name to align with the
latest errata documentation:

    https://cdrdv2.intel.com/v1/dl/getContent/334165
    # Document 334163 Version 022US

Fixes: 91df9fdf5149 ("x86/microcode/intel: Writeback and invalidate caches before updating microcode")
Reported-by: Yan Hua Wu <yanhua1.wu@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Yan Hua Wu <yanhua1.wu@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
---
The original patch [1] was posted earlier, with a cover letter [2]
providing context on the prior investigation into the cache flush.

Changes in this revision include:
* Added Ashok's acknowledgment tag
* Corrected the subject prefix
* Included the link to the errata documentation in the changelog
* Refined the changelog

This fix is particularly relevant to staging, as the cache flush was
found to significantly increase latency during staged late loading.

[1]: https://lore.kernel.org/all/20240701212012.21499-2-chang.seok.bae@intel.com/
[2]: https://lore.kernel.org/all/20240701212012.21499-1-chang.seok.bae@intel.com/
---
 arch/x86/kernel/cpu/microcode/intel.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 815fa67356a2..f3d534807d91 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -319,12 +319,6 @@ static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
 		return UCODE_OK;
 	}
 
-	/*
-	 * Writeback and invalidate caches before updating microcode to avoid
-	 * internal issues depending on what the microcode is updating.
-	 */
-	native_wbinvd();
-
 	/* write microcode via MSR 0x79 */
 	native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
 
@@ -574,14 +568,14 @@ static bool is_blacklisted(unsigned int cpu)
 	/*
 	 * Late loading on model 79 with microcode revision less than 0x0b000021
 	 * and LLC size per core bigger than 2.5MB may result in a system hang.
-	 * This behavior is documented in item BDF90, #334165 (Intel Xeon
+	 * This behavior is documented in item BDX90, #334165 (Intel Xeon
 	 * Processor E7-8800/4800 v4 Product Family).
 	 */
 	if (c->x86_vfm == INTEL_BROADWELL_X &&
 	    c->x86_stepping == 0x01 &&
 	    llc_size_per_core > 2621440 &&
 	    c->microcode < 0x0b000021) {
-		pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
+		pr_err_once("Erratum BDX90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
 		pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
 		return true;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [tip: x86/microcode] x86/microcode/intel: Remove unnecessary cache writeback and invalidation
  2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
@ 2024-10-25 16:24   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2024-10-25 16:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yan Hua Wu, William Xie, Chang S. Bae, Borislav Petkov (AMD),
	Ashok Raj, x86, linux-kernel

The following commit has been merged into the x86/microcode branch of tip:

Commit-ID:     9a819753b0209c6edebdea447a1aa53e8c697653
Gitweb:        https://git.kernel.org/tip/9a819753b0209c6edebdea447a1aa53e8c697653
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Tue, 01 Oct 2024 09:10:36 -07:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Fri, 25 Oct 2024 18:12:03 +02:00

x86/microcode/intel: Remove unnecessary cache writeback and invalidation

Currently, an unconditional cache flush is performed during every
microcode update. Although the original changelog did not mention
a specific erratum, this measure was primarily intended to address
a specific microcode bug, the load of which has already been blocked by
is_blacklisted(). Therefore, this cache flush is no longer necessary.

Additionally, the side effects of doing this have been overlooked. It
increases CPU rendezvous time during late loading, where the cache flush
takes between 1x to 3.5x longer than the actual microcode update.

Remove native_wbinvd() and update the erratum name to align with the
latest errata documentation, document ID 334163 Version 022US.

  [ bp: Zap the flaky documentation URL. ]

Fixes: 91df9fdf5149 ("x86/microcode/intel: Writeback and invalidate caches before updating microcode")
Reported-by: Yan Hua Wu <yanhua1.wu@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Yan Hua Wu <yanhua1.wu@intel.com>
Link: https://lore.kernel.org/r/20241001161042.465584-2-chang.seok.bae@intel.com
---
 arch/x86/kernel/cpu/microcode/intel.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 815fa67..f3d5348 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -319,12 +319,6 @@ static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
 		return UCODE_OK;
 	}
 
-	/*
-	 * Writeback and invalidate caches before updating microcode to avoid
-	 * internal issues depending on what the microcode is updating.
-	 */
-	native_wbinvd();
-
 	/* write microcode via MSR 0x79 */
 	native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
 
@@ -574,14 +568,14 @@ static bool is_blacklisted(unsigned int cpu)
 	/*
 	 * Late loading on model 79 with microcode revision less than 0x0b000021
 	 * and LLC size per core bigger than 2.5MB may result in a system hang.
-	 * This behavior is documented in item BDF90, #334165 (Intel Xeon
+	 * This behavior is documented in item BDX90, #334165 (Intel Xeon
 	 * Processor E7-8800/4800 v4 Product Family).
 	 */
 	if (c->x86_vfm == INTEL_BROADWELL_X &&
 	    c->x86_stepping == 0x01 &&
 	    llc_size_per_core > 2621440 &&
 	    c->microcode < 0x0b000021) {
-		pr_err_once("Erratum BDF90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
+		pr_err_once("Erratum BDX90: late loading with revision < 0x0b000021 (0x%x) disabled.\n", c->microcode);
 		pr_err_once("Please consider either early loading through initrd/built-in or a potential BIOS update.\n");
 		return true;
 	}

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-11-04 10:45   ` Borislav Petkov
  2024-10-01 16:10 ` [PATCH RFC 3/7] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

As microcode patch sizes continue to grow, the latency during
late-loading can spike, leading to timeouts and interruptions in running
workloads. This trend of increasing patch sizes is expected to continue,
so a foundational solution is needed to address the issue.

To mitigate the problem, a new staging feature is introduced. This option
processes most of the microcode update (excluding activation) on a
non-critical path, allowing CPUs to remain operational during the
majority of the update. By moving most of the work off the critical path,
the latency spike can be significantly reduced.

Integrate the staging process as an additional step in the late-loading
flow. Introduce a new callback for staging, which is invoked after the
microcode patch image is prepared but before entering the CPU rendezvous
for triggering the update.

Staging follows an opportunistic model: it is attempted when available.
If successful, it reduces CPU rendezvous time; if not, the process falls
back to the legacy loading, potentially exposing the system to higher
latency.

Extend struct microcode_ops to incorporate staging properties, which will
be updated in the vendor code from subsequent patches.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Whether staging should be mandatory is a policy decision that is beyond
the scope of this patch at the moment. For now, the focus is on
establishing a basic flow, with the intention of attracting focused
reviews, while deferring the discussion on staging policy later.

In terms of the flow, an alternative approach could be to integrate
staging as part of microcode preparation on the vendor code side.
However, this was deemed too implicit, as staging involves loading and
validating the microcode image, which differs from typical microcode file
handling.
---
 arch/x86/kernel/cpu/microcode/core.c     | 12 ++++++++++--
 arch/x86/kernel/cpu/microcode/internal.h |  4 +++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b3658d11e7b6..4ddb5ba42f3f 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -676,19 +676,27 @@ static bool setup_cpus(void)
 
 static int load_late_locked(void)
 {
+	bool is_safe = false;
+
 	if (!setup_cpus())
 		return -EBUSY;
 
 	switch (microcode_ops->request_microcode_fw(0, &microcode_pdev->dev)) {
 	case UCODE_NEW:
-		return load_late_stop_cpus(false);
+		break;
 	case UCODE_NEW_SAFE:
-		return load_late_stop_cpus(true);
+		is_safe = true;
+		break;
 	case UCODE_NFOUND:
 		return -ENOENT;
 	default:
 		return -EBADFD;
 	}
+
+	if (microcode_ops->use_staging)
+		microcode_ops->staging_microcode();
+
+	return load_late_stop_cpus(is_safe);
 }
 
 static ssize_t reload_store(struct device *dev,
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 21776c529fa9..cb58e83e4934 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -31,10 +31,12 @@ struct microcode_ops {
 	 * See also the "Synchronization" section in microcode_core.c.
 	 */
 	enum ucode_state	(*apply_microcode)(int cpu);
+	void			(*staging_microcode)(void);
 	int			(*collect_cpu_info)(int cpu, struct cpu_signature *csig);
 	void			(*finalize_late_load)(int result);
 	unsigned int		nmi_safe	: 1,
-				use_nmi		: 1;
+				use_nmi		: 1,
+				use_staging	: 1;
 };
 
 struct early_load_data {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency
  2024-10-01 16:10 ` [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
@ 2024-11-04 10:45   ` Borislav Petkov
  0 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2024-11-04 10:45 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Tue, Oct 01, 2024 at 09:10:37AM -0700, Chang S. Bae wrote:
> diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
> index 21776c529fa9..cb58e83e4934 100644
> --- a/arch/x86/kernel/cpu/microcode/internal.h
> +++ b/arch/x86/kernel/cpu/microcode/internal.h
> @@ -31,10 +31,12 @@ struct microcode_ops {
>  	 * See also the "Synchronization" section in microcode_core.c.
>  	 */
>  	enum ucode_state	(*apply_microcode)(int cpu);
> +	void			(*staging_microcode)(void);

"stage_microcode"

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH RFC 3/7] x86/msr-index: Define MSR index and bit for the microcode staging feature
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

The microcode staging feature involves two key MSR entities, the
presence of which is indicated by bit 16 of IA32_ARCH_CAPABILITIES:

* Bit 4 in IA32_MCU_ENUMERATION shows the availability of the microcode
  staging feature.

* Staging is managed through MMIO registers, with
  IA32_MCU_STAGING_MBOX_ADDR MSR specifying the physical address of the
  first MMIO register.

Define the MSR index and bit assignments, helping the upcoming staging
code to make use of the hardware feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/msr-index.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6d..2840a2fe340b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -164,6 +164,10 @@
 						 * Processor MMIO stale data
 						 * vulnerabilities.
 						 */
+#define ARCH_CAP_MCU_ENUM		BIT(16) /*
+						 * Indicates the presence of microcode update
+						 * feature enumeration and status information
+						 */
 #define ARCH_CAP_FB_CLEAR		BIT(17)	/*
 						 * VERW clears CPU fill buffer
 						 * even on MDS_NO CPUs.
@@ -884,6 +888,11 @@
 #define MSR_IA32_UCODE_WRITE		0x00000079
 #define MSR_IA32_UCODE_REV		0x0000008b
 
+#define MSR_IA32_MCU_ENUMERATION	0x0000007b
+#define MCU_STAGING			BIT(4)
+
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR	0x000007a5
+
 /* Intel SGX Launch Enclave Public Key Hash MSRs */
 #define MSR_IA32_SGXLEPUBKEYHASH0	0x0000008C
 #define MSR_IA32_SGXLEPUBKEYHASH1	0x0000008D
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                   ` (2 preceding siblings ...)
  2024-10-01 16:10 ` [PATCH RFC 3/7] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-11-04 11:16   ` Borislav Petkov
  2024-10-01 16:10 ` [PATCH RFC 5/7] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

When microcode staging is initiated, operations are carried out through
an MMIO interface. Each interface is independently discoverable per CPU
via the IA32_MCU_STAGING_MBOX_ADDR MSR, which points to a set of
dword-sized registers.

Software must first ensure the microcode image is dword-aligned, then
proceed to stage the update for each exposed MMIO space as specified.

Follow these two steps to arrange staging process. Identify each unique
MMIO interface by iterating over the CPUs and reading the MSR for each
one. While this process can be tedious, it remains simple enough and
acceptable in the slow path.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
Note:

1. Initially, a per-package and parallel staging invocation approach was
considered, but it seemed overly complex. Dave helped to identify a
simpler way.

2. Ideally, the staging_work() function would be as simple as a single
WRMSR execution. If this were the case, the staging flow could be
completed with this patch. From this perspective, the software handling
for interacting with the staging firmware has been separated from this
vendor code and moved into a new file dedicated to staging logic.
---
 arch/x86/kernel/cpu/microcode/intel.c    | 50 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h |  5 +++
 2 files changed, 55 insertions(+)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index f3d534807d91..03c4b0e7e97c 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -299,6 +299,55 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
 	return size ? NULL : patch;
 }
 
+static inline u64 staging_addr(u32 cpu)
+{
+	u32 lo, hi;
+
+	rdmsr_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &lo, &hi);
+	return lo | ((u64)hi << 32);
+}
+
+static bool need_staging(u64 *mmio_addrs, u64 pa)
+{
+	unsigned int i;
+
+	for (i = 0; mmio_addrs[i] != 0; i++) {
+		if (mmio_addrs[i] == pa)
+			return false;
+	}
+	mmio_addrs[i] = pa;
+	return true;
+}
+
+static void staging_microcode(void)
+{
+	u64 *mmio_addrs, mmio_pa;
+	unsigned int totalsize;
+	int cpu;
+
+	totalsize = get_totalsize(&ucode_patch_late->hdr);
+	if (!IS_ALIGNED(totalsize, sizeof(u32)))
+		return;
+
+	mmio_addrs = kcalloc(nr_cpu_ids, sizeof(*mmio_addrs), GFP_KERNEL);
+	if (WARN_ON_ONCE(!mmio_addrs))
+		return;
+
+	for_each_cpu(cpu, cpu_online_mask) {
+		mmio_pa = staging_addr(cpu);
+
+		if (need_staging(mmio_addrs, mmio_pa) &&
+		    !staging_work(mmio_pa, ucode_patch_late, totalsize)) {
+			pr_err("Error: staging failed.\n");
+			goto out;
+		}
+	}
+
+	pr_info("Staging succeeded.\n");
+out:
+	kfree(mmio_addrs);
+}
+
 static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
 					  struct microcode_intel *mc,
 					  u32 *cur_rev)
@@ -627,6 +676,7 @@ static struct microcode_ops microcode_intel_ops = {
 	.collect_cpu_info	= collect_cpu_info,
 	.apply_microcode	= apply_microcode_late,
 	.finalize_late_load	= finalize_late_load,
+	.staging_microcode	= staging_microcode,
 	.use_nmi		= IS_ENABLED(CONFIG_X86_64),
 };
 
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index cb58e83e4934..06c3c8db4ceb 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -120,6 +120,11 @@ void load_ucode_intel_bsp(struct early_load_data *ed);
 void load_ucode_intel_ap(void);
 void reload_ucode_intel(void);
 struct microcode_ops *init_intel_microcode(void);
+static inline bool staging_work(u64 mmio_pa, void *ucode_ptr, unsigned int totalsize)
+{
+	pr_debug_once("Need to implement the Staging code.\n");
+	return false;
+}
 #else /* CONFIG_CPU_SUP_INTEL */
 static inline void load_ucode_intel_bsp(struct early_load_data *ed) { }
 static inline void load_ucode_intel_ap(void) { }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-10-01 16:10 ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
@ 2024-11-04 11:16   ` Borislav Petkov
  2024-11-04 16:08     ` Dave Hansen
  2024-11-06 18:28     ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
  0 siblings, 2 replies; 44+ messages in thread
From: Borislav Petkov @ 2024-11-04 11:16 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
> +static inline u64 staging_addr(u32 cpu)
> +{
> +	u32 lo, hi;
> +
> +	rdmsr_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &lo, &hi);
> +	return lo | ((u64)hi << 32);
> +}

A single usage site. Move its code there and get rid of the function.

> +
> +static bool need_staging(u64 *mmio_addrs, u64 pa)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; mmio_addrs[i] != 0; i++) {
> +		if (mmio_addrs[i] == pa)
> +			return false;
> +	}
> +	mmio_addrs[i] = pa;

This is a weird function - its name is supposed to mean it queries something
but then it has side effects too.

> +	return true;
> +}
> +
> +static void staging_microcode(void)

stage_microcode().

Functions should have verbs in their name and have the meaning of
a "do-something".

> +{
> +	u64 *mmio_addrs, mmio_pa;
> +	unsigned int totalsize;
> +	int cpu;
> +
> +	totalsize = get_totalsize(&ucode_patch_late->hdr);
> +	if (!IS_ALIGNED(totalsize, sizeof(u32)))
> +		return;
> +
> +	mmio_addrs = kcalloc(nr_cpu_ids, sizeof(*mmio_addrs), GFP_KERNEL);
> +	if (WARN_ON_ONCE(!mmio_addrs))
> +		return;
> +
> +	for_each_cpu(cpu, cpu_online_mask) {

Oh great, and someone went and offlined one of those CPUs right here. Fun.

> +		mmio_pa = staging_addr(cpu);
> +
> +		if (need_staging(mmio_addrs, mmio_pa) &&
> +		    !staging_work(mmio_pa, ucode_patch_late, totalsize)) {

do_stage()

> +			pr_err("Error: staging failed.\n");

							... on CPU%d, err_val: 0x%x"\n"

perhaps?

For more info debugging something like that?

> +			goto out;
> +		}
> +	}
> +
> +	pr_info("Staging succeeded.\n");

		"Staging of patch revision 0x%x succeeded.\n"...

more user-friendly.

> +out:
> +	kfree(mmio_addrs);
> +}
> +
>  static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
>  					  struct microcode_intel *mc,
>  					  u32 *cur_rev)
> @@ -627,6 +676,7 @@ static struct microcode_ops microcode_intel_ops = {
>  	.collect_cpu_info	= collect_cpu_info,
>  	.apply_microcode	= apply_microcode_late,
>  	.finalize_late_load	= finalize_late_load,
> +	.staging_microcode	= staging_microcode,
>  	.use_nmi		= IS_ENABLED(CONFIG_X86_64),
>  };
>  
> diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
> index cb58e83e4934..06c3c8db4ceb 100644
> --- a/arch/x86/kernel/cpu/microcode/internal.h
> +++ b/arch/x86/kernel/cpu/microcode/internal.h
> @@ -120,6 +120,11 @@ void load_ucode_intel_bsp(struct early_load_data *ed);
>  void load_ucode_intel_ap(void);
>  void reload_ucode_intel(void);
>  struct microcode_ops *init_intel_microcode(void);
> +static inline bool staging_work(u64 mmio_pa, void *ucode_ptr, unsigned int totalsize)
> +{
> +	pr_debug_once("Need to implement the Staging code.\n");
> +	return false;
> +}
>  #else /* CONFIG_CPU_SUP_INTEL */

You better have an empty stub for that work function here too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-04 11:16   ` Borislav Petkov
@ 2024-11-04 16:08     ` Dave Hansen
  2024-11-04 18:34       ` Chang S. Bae
  2024-11-06 18:28     ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
  1 sibling, 1 reply; 44+ messages in thread
From: Dave Hansen @ 2024-11-04 16:08 UTC (permalink / raw)
  To: Borislav Petkov, Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On 11/4/24 03:16, Borislav Petkov wrote:
> On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
>> +static inline u64 staging_addr(u32 cpu)
>> +{
>> +	u32 lo, hi;
>> +
>> +	rdmsr_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &lo, &hi);
>> +	return lo | ((u64)hi << 32);
>> +}
> A single usage site. Move its code there and get rid of the function.

Yeah, and it'll look a lot nicer if you use:

	rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &addr);

and don't have to do the high/lo munging.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-04 16:08     ` Dave Hansen
@ 2024-11-04 18:34       ` Chang S. Bae
  2024-11-04 20:10         ` Chang S. Bae
  0 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-11-04 18:34 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On 11/4/2024 8:08 AM, Dave Hansen wrote:
> On 11/4/24 03:16, Borislav Petkov wrote:
>> On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
>>> +static inline u64 staging_addr(u32 cpu)
>>> +{
>>> +	u32 lo, hi;
>>> +
>>> +	rdmsr_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &lo, &hi);
>>> +	return lo | ((u64)hi << 32);
>>> +}
>> A single usage site. Move its code there and get rid of the function.
> 
> Yeah, and it'll look a lot nicer if you use:
> 
> 	rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &addr);
> 
> and don't have to do the high/lo munging.

Oh, silly me missed this function. Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-04 18:34       ` Chang S. Bae
@ 2024-11-04 20:10         ` Chang S. Bae
  2024-11-06 18:23           ` [PATCH] cpufreq: Simplify MSR read on the boot CPU Chang S. Bae
  0 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-11-04 20:10 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On 11/4/2024 10:34 AM, Chang S. Bae wrote:
> On 11/4/2024 8:08 AM, Dave Hansen wrote:
>> On 11/4/24 03:16, Borislav Petkov wrote:
>>> On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
>>>> +static inline u64 staging_addr(u32 cpu)
>>>> +{
>>>> +    u32 lo, hi;
>>>> +
>>>> +    rdmsr_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &lo, &hi);
>>>> +    return lo | ((u64)hi << 32);
>>>> +}
>>> A single usage site. Move its code there and get rid of the function.
>>
>> Yeah, and it'll look a lot nicer if you use:
>>
>>     rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &addr);
>>
>> and don't have to do the high/lo munging.
> 
> Oh, silly me missed this function. Thanks.

Okay, I took another look and found a similar case:

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 0f04feb6cafa..b942cd11e179 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -73,20 +73,17 @@ static unsigned int acpi_pstate_strict;

  static bool boost_state(unsigned int cpu)
  {
-       u32 lo, hi;
         u64 msr;

         switch (boot_cpu_data.x86_vendor) {
         case X86_VENDOR_INTEL:
         case X86_VENDOR_CENTAUR:
         case X86_VENDOR_ZHAOXIN:
-               rdmsr_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &lo, &hi);
-               msr = lo | ((u64)hi << 32);
+               rdmsrl_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &msr);
                 return !(msr & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
         case X86_VENDOR_HYGON:
         case X86_VENDOR_AMD:
-               rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
-               msr = lo | ((u64)hi << 32);
+               rdmsrl_on_cpu(cpu, MSR_K7_HWCR, &msr);
                 return !(msr & MSR_K7_HWCR_CPB_DIS);
         }
         return false;

I'll be following up with a patch to clean this up as well.

Thanks,
Chang

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH] cpufreq: Simplify MSR read on the boot CPU
  2024-11-04 20:10         ` Chang S. Bae
@ 2024-11-06 18:23           ` Chang S. Bae
  2024-11-12 20:44             ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-11-06 18:23 UTC (permalink / raw)
  To: linux-pm, linux-kernel
  Cc: rafael, viresh.kumar, dave.hansen, bp, chang.seok.bae

Replace the 32-bit MSR access function with a 64-bit variant to simplify
the call site, eliminating unnecessary 32-bit value manipulations.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
---
I've received feedback to use rdmsrl_on_cpu() instead of rdmsr_on_cpu()
for similar code in my feature-enabling series [1]. While auditing the
tree, I found this case as well, so here's another cleanup.

[1] https://lore.kernel.org/lkml/e9afefb7-3c4e-48ee-aab1-2f338c4e989d@intel.com/
---
 drivers/cpufreq/acpi-cpufreq.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 0f04feb6cafa..b942cd11e179 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -73,20 +73,17 @@ static unsigned int acpi_pstate_strict;
 
 static bool boost_state(unsigned int cpu)
 {
-	u32 lo, hi;
 	u64 msr;
 
 	switch (boot_cpu_data.x86_vendor) {
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_CENTAUR:
 	case X86_VENDOR_ZHAOXIN:
-		rdmsr_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &lo, &hi);
-		msr = lo | ((u64)hi << 32);
+		rdmsrl_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &msr);
 		return !(msr & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
 	case X86_VENDOR_HYGON:
 	case X86_VENDOR_AMD:
-		rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
-		msr = lo | ((u64)hi << 32);
+		rdmsrl_on_cpu(cpu, MSR_K7_HWCR, &msr);
 		return !(msr & MSR_K7_HWCR_CPB_DIS);
 	}
 	return false;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] cpufreq: Simplify MSR read on the boot CPU
  2024-11-06 18:23           ` [PATCH] cpufreq: Simplify MSR read on the boot CPU Chang S. Bae
@ 2024-11-12 20:44             ` Rafael J. Wysocki
  0 siblings, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2024-11-12 20:44 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-pm, linux-kernel, rafael, viresh.kumar, dave.hansen, bp

On Wed, Nov 6, 2024 at 7:23 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> Replace the 32-bit MSR access function with a 64-bit variant to simplify
> the call site, eliminating unnecessary 32-bit value manipulations.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> ---
> I've received feedback to use rdmsrl_on_cpu() instead of rdmsr_on_cpu()
> for similar code in my feature-enabling series [1]. While auditing the
> tree, I found this case as well, so here's another cleanup.
>
> [1] https://lore.kernel.org/lkml/e9afefb7-3c4e-48ee-aab1-2f338c4e989d@intel.com/
> ---
>  drivers/cpufreq/acpi-cpufreq.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
> index 0f04feb6cafa..b942cd11e179 100644
> --- a/drivers/cpufreq/acpi-cpufreq.c
> +++ b/drivers/cpufreq/acpi-cpufreq.c
> @@ -73,20 +73,17 @@ static unsigned int acpi_pstate_strict;
>
>  static bool boost_state(unsigned int cpu)
>  {
> -       u32 lo, hi;
>         u64 msr;
>
>         switch (boot_cpu_data.x86_vendor) {
>         case X86_VENDOR_INTEL:
>         case X86_VENDOR_CENTAUR:
>         case X86_VENDOR_ZHAOXIN:
> -               rdmsr_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &lo, &hi);
> -               msr = lo | ((u64)hi << 32);
> +               rdmsrl_on_cpu(cpu, MSR_IA32_MISC_ENABLE, &msr);
>                 return !(msr & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
>         case X86_VENDOR_HYGON:
>         case X86_VENDOR_AMD:
> -               rdmsr_on_cpu(cpu, MSR_K7_HWCR, &lo, &hi);
> -               msr = lo | ((u64)hi << 32);
> +               rdmsrl_on_cpu(cpu, MSR_K7_HWCR, &msr);
>                 return !(msr & MSR_K7_HWCR_CPB_DIS);
>         }
>         return false;
> --

Applied as 6.13 material, thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-04 11:16   ` Borislav Petkov
  2024-11-04 16:08     ` Dave Hansen
@ 2024-11-06 18:28     ` Chang S. Bae
  2024-11-07  1:12       ` Thomas Gleixner
  1 sibling, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-11-06 18:28 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On 11/4/2024 3:16 AM, Borislav Petkov wrote:
> On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
>> +
>> +static bool need_staging(u64 *mmio_addrs, u64 pa)
>> +{
>> +	unsigned int i;
>> +
>> +	for (i = 0; mmio_addrs[i] != 0; i++) {
>> +		if (mmio_addrs[i] == pa)
>> +			return false;
>> +	}
>> +	mmio_addrs[i] = pa;
> 
> This is a weird function - its name is supposed to mean it queries something
> but then it has side effects too.

I think I've carved out a bit more out of the loop and convoluted them
into a single function.

Instead, let the helper simply find the position in the array,

         static unsigned int find_pos(u64 *mmio_addrs, u64 mmio_pa)
         {
                 unsigned int i;

                 for (i = 0; mmio_addrs[i] != 0; i++) {
                         if (mmio_addrs[i] == mmio_pa)
                                 break;
                 }
                 return i;
         }

and update the array from there:

         static void stage_microcode(void)
         {
                 unsigned int pos;
                 ...
                 for_each_cpu(cpu, cpu_online_mask) {
                         /* set 'mmio_pa' from RDMSR */

                         pos = find_pos(mmio_addrs, mmio_pa);
                         if (mmio_addrs[pos] == mmio_pa)
                                 continue;

                         /* do staging */

                         mmio_addrs[pos] = mmio_pa;
                 }
                 ...
         }

Or, even the loop code can include it:

         for_each_cpu(...) {
                 /* set 'mmio_pa' from RDMSR */

                 /* Find either the last position or a match */
                 for (i = 0; mmio_addrs[i] != 0 && mmio_addrs[i] != 
mmio_pa; i++);

                 if (mmio_addrs[i] == mmio_pa)
                         continue;

                 /* do staging */

                 mmio_addrs[i] = mmio_pa;
         }

Maybe, if this unusual one line for-loop is not an issue, this can make
the code simple. Otherwise, at least the helper should remain doing one
thing.

    >> +	for_each_cpu(cpu, cpu_online_mask) {
> 
> Oh great, and someone went and offlined one of those CPUs right here. Fun.

Yes, offlining matters. Let me summarize a few things related to the
patchset for the record:

   *  The staging flow is tiggered by load_late_locked(), which already
      holds cpus_read_lock() [1].

   *  When any SMT primary threads is offlined, setup_cpus() makes it
      exit right away [2].

   *  When an offlined CPU is brought up online, it follows the early
      loading path, which does not include staging.

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/microcode/core.c#n705
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/cpu/microcode/core.c#n658

> 
>> +		mmio_pa = staging_addr(cpu);
>> +
>> +		if (need_staging(mmio_addrs, mmio_pa) &&
>> +		    !staging_work(mmio_pa, ucode_patch_late, totalsize)) {
> 
> do_stage()
> 
>> +			pr_err("Error: staging failed.\n");
> 
> 							... on CPU%d, err_val: 0x%x"\n"
> 
> perhaps?
> 
> For more info debugging something like that?

Maybe, either one of two ways:

The work function remains returning bool value, then it can print the
error message from there.

         bool do_stage(...)
         {
                 ...
                 pr_err("Error: staging failed on CPU%d, with %s.\n",
                         smp_processor_id(), state == UCODE_TIMEOUT ? 
"timeout" : "error");
                 return false
         }

Then, this function may simply return on error.

         static void stage_microcode(void)
         {
                 ...
                 if (!do_stage(...))
                         goto out;
                 ...
         out:
                 kfree(mmio_addrs);
         }

Or, let the work function return a value like an enum ucode_state
indicating the status, and print the error message here.

         static void stage_microcode(void)
         {
                 enum ucode_state state;

                 ...
                 state = do_stage(...);
                 if (state != UCODE_OK)
                         goto err_out;

         err_out:
                 pr_err("Error: ...);
         out:
                 ...
         }

> 
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	pr_info("Staging succeeded.\n");
> 
> 		"Staging of patch revision 0x%x succeeded.\n"...
> 
> more user-friendly.

Yes, that seems to be useful to put it on:

+       pr_info("Staging of patch revision 0x%x succeeded.\n",
+               ((struct microcode_header_intel *)ucode_patch_late)->rev);

If successful, users will see something like this:

[   25.352716] microcode: Staging is available.
[   25.357684] microcode: Current revision: 0x1234
[  136.203269] microcode: Staging of patch revision 0xabcd succeeded.
[  137.398491] microcode: load: updated on xx primary CPUs with yy siblings
[  137.427386] microcode: revision: 0x1234 -> 0xabcd

Thanks,
Chang

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-06 18:28     ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
@ 2024-11-07  1:12       ` Thomas Gleixner
  2024-11-08 22:42         ` Chang S. Bae
  2024-11-08 22:51         ` Dave Hansen
  0 siblings, 2 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-11-07  1:12 UTC (permalink / raw)
  To: Chang S. Bae, Borislav Petkov; +Cc: linux-kernel, x86, mingo, dave.hansen

On Wed, Nov 06 2024 at 10:28, Chang S. Bae wrote:
> On 11/4/2024 3:16 AM, Borislav Petkov wrote:
>> On Tue, Oct 01, 2024 at 09:10:39AM -0700, Chang S. Bae wrote:
>>> +
>>> +static bool need_staging(u64 *mmio_addrs, u64 pa)
>>> +{
>>> +	unsigned int i;
>>> +
>>> +	for (i = 0; mmio_addrs[i] != 0; i++) {
>>> +		if (mmio_addrs[i] == pa)
>>> +			return false;
>>> +	}
>>> +	mmio_addrs[i] = pa;
>> 
>> This is a weird function - its name is supposed to mean it queries something
>> but then it has side effects too.
>
> I think I've carved out a bit more out of the loop and convoluted them
> into a single function.
>
> Instead, let the helper simply find the position in the array,
>
>          static unsigned int find_pos(u64 *mmio_addrs, u64 mmio_pa)
>          {
>                  unsigned int i;
>
>                  for (i = 0; mmio_addrs[i] != 0; i++) {
>                          if (mmio_addrs[i] == mmio_pa)
>                                  break;
>                  }
>                  return i;
>          }
>
> and update the array from there:
>
>          static void stage_microcode(void)
>          {
>                  unsigned int pos;
>                  ...
>                  for_each_cpu(cpu, cpu_online_mask) {
>                          /* set 'mmio_pa' from RDMSR */
>
>                          pos = find_pos(mmio_addrs, mmio_pa);
>                          if (mmio_addrs[pos] == mmio_pa)
>                                  continue;
>
>                          /* do staging */
>
>                          mmio_addrs[pos] = mmio_pa;
>                  }
>                  ...
>          }
>
> Or, even the loop code can include it:
>
>          for_each_cpu(...) {
>                  /* set 'mmio_pa' from RDMSR */
>
>                  /* Find either the last position or a match */
>                  for (i = 0; mmio_addrs[i] != 0 && mmio_addrs[i] != 
> mmio_pa; i++);
>
>                  if (mmio_addrs[i] == mmio_pa)
>                          continue;
>
>                  /* do staging */
>
>                  mmio_addrs[i] = mmio_pa;
>          }

This looks all overly complicated. The documentation says:

  "There is one set of mailbox registers and internal staging buffers per
   physical processor package. Therefore, the IA32_MCU_STAGING_MBOX_ADDR
   MSR is package-scoped and will provide a different physical address on
   each physical package."

So why going through loops and hoops?

   pkg_id = UINT_MAX;

   for_each_online_cpu(cpu) {
   	if (topology_logical_package_id(cpu) == pkg_id)
              continue;
        pkg_id = topology_logical_package_id(cpu);

        rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &pa);
        staging_work(pa, ucode_patch_late, totalsize);
   }

Something like that should just work, no?

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-07  1:12       ` Thomas Gleixner
@ 2024-11-08 22:42         ` Chang S. Bae
  2024-11-08 22:51         ` Dave Hansen
  1 sibling, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-11-08 22:42 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov; +Cc: linux-kernel, x86, mingo, dave.hansen

On 11/6/2024 5:12 PM, Thomas Gleixner wrote:
> 
> This looks all overly complicated. The documentation says:
> 
>    "There is one set of mailbox registers and internal staging buffers per
>     physical processor package. Therefore, the IA32_MCU_STAGING_MBOX_ADDR
>     MSR is package-scoped and will provide a different physical address on
>     each physical package."
> 
> So why going through loops and hoops?

While the initial approach was like the one below, the aspect I bought 
was to avoid relying on topology knowledge by simply searching for 
unique addresses. But you're right -- this introduces unnecessary loops, 
complicating the code anyway.

I should have explicitly highlighted this as a review point, so thank 
you for taking a closer look and correcting the approach.

Given that the scope is clearly stated in the spec as packaged-scoped 
and should be permanent, I think there is no question in leveraging it 
to make the code simple as you suggested:

> 
>     pkg_id = UINT_MAX;
> 
>     for_each_online_cpu(cpu) {
>     	if (topology_logical_package_id(cpu) == pkg_id)
>                continue;
>          pkg_id = topology_logical_package_id(cpu);
> 
>          rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &pa);
>          staging_work(pa, ucode_patch_late, totalsize);
>     }
> 
> Something like that should just work, no?

Yes, indeed.

Then I observed the package number repeated according to the CPU number 
mapping. For example, all SMT sibling threads map in the later CPU numbers.

Unless staging redundancy is intentional, it can be avoided by tracking 
package ids. But, in this case, the loop may be streamlined to SMT 
primary threads instead of all online CPUs, since core::setup_cpus() 
already ensures primary threads are online. Perhaps,

     /*
      * The MMIO address is unique per package, and all the SMT
      * primary threads are ensured online. Find staging addresses
      * by their package ids.
      */
     for_each_cpu(cpu, cpu_primary_thread_mask) {
         ...
     }

Thanks,
Chang

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging
  2024-11-07  1:12       ` Thomas Gleixner
  2024-11-08 22:42         ` Chang S. Bae
@ 2024-11-08 22:51         ` Dave Hansen
  1 sibling, 0 replies; 44+ messages in thread
From: Dave Hansen @ 2024-11-08 22:51 UTC (permalink / raw)
  To: Thomas Gleixner, Chang S. Bae, Borislav Petkov
  Cc: linux-kernel, x86, mingo, dave.hansen

On 11/6/24 17:12, Thomas Gleixner wrote:
> This looks all overly complicated. The documentation says:
> 
>   "There is one set of mailbox registers and internal staging buffers per
>    physical processor package. Therefore, the IA32_MCU_STAGING_MBOX_ADDR
>    MSR is package-scoped and will provide a different physical address on
>    each physical package."
> 
> So why going through loops and hoops?

I'm to blame for that one.

It was the smallest amount of code I could think of at the time that
could work when all the CPUs in a package aren't consecutively numbered.
 It also happens to work even if the topology parsing or firmware goes
wonky.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH RFC 5/7] x86/microcode/intel_staging: Implement staging logic
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                   ` (3 preceding siblings ...)
  2024-10-01 16:10 ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 6/7] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

The staging firmware operates through a protocol via the MMIO interface.
The protocol defines a serialized sequence that begins by clearing the
hardware with an abort request. It then proceeds through iterative
process of sending data, initiating transactions, waiting for processing,
and reading responses.

To facilitate this interaction, follow the outlined protocol. Refactor
the waiting code to manage loop breaks more effectively. Data transfer
involves a next level of detail to handle the mailbox format. While
defining helpers, leave them empty for now.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/intel_staging.c | 105 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   6 +-
 3 files changed, 107 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

diff --git a/arch/x86/kernel/cpu/microcode/Makefile b/arch/x86/kernel/cpu/microcode/Makefile
index 193d98b33a0a..a9f79aaffcb0 100644
--- a/arch/x86/kernel/cpu/microcode/Makefile
+++ b/arch/x86/kernel/cpu/microcode/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 microcode-y				:= core.o
 obj-$(CONFIG_MICROCODE)			+= microcode.o
-microcode-$(CONFIG_CPU_SUP_INTEL)	+= intel.o
+microcode-$(CONFIG_CPU_SUP_INTEL)	+= intel.o intel_staging.o
 microcode-$(CONFIG_CPU_SUP_AMD)		+= amd.o
diff --git a/arch/x86/kernel/cpu/microcode/intel_staging.c b/arch/x86/kernel/cpu/microcode/intel_staging.c
new file mode 100644
index 000000000000..9989a78f9ef2
--- /dev/null
+++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define pr_fmt(fmt) "microcode: " fmt
+#include <linux/delay.h>
+#include <linux/io.h>
+
+#include "internal.h"
+
+#define MBOX_REG_NUM		4
+#define MBOX_REG_SIZE		sizeof(u32)
+
+#define MBOX_CONTROL_OFFSET	0x0
+#define MBOX_STATUS_OFFSET	0x4
+
+#define MASK_MBOX_CTRL_ABORT	BIT(0)
+
+#define MASK_MBOX_STATUS_ERROR	BIT(2)
+#define MASK_MBOX_STATUS_READY	BIT(31)
+
+#define MBOX_XACTION_LEN	PAGE_SIZE
+#define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)
+#define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)
+
+#define STAGING_OFFSET_END	0xffffffff
+
+static inline void abort_xaction(void __iomem *base)
+{
+	writel(MASK_MBOX_CTRL_ABORT, base + MBOX_CONTROL_OFFSET);
+}
+
+static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
+{
+	pr_debug_once("Need to implement staging mailbox loading code.\n");
+}
+
+static enum ucode_state wait_for_xaction(void __iomem *base)
+{
+	u32 timeout, status;
+
+	for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT; timeout++) {
+		msleep(1);
+		status = readl(base + MBOX_STATUS_OFFSET);
+		if (status & MASK_MBOX_STATUS_READY)
+			break;
+	}
+
+	status = readl(base + MBOX_STATUS_OFFSET);
+	if (status & MASK_MBOX_STATUS_ERROR)
+		return UCODE_ERROR;
+	if (!(status & MASK_MBOX_STATUS_READY))
+		return UCODE_TIMEOUT;
+
+	return UCODE_OK;
+}
+
+static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
+{
+	pr_debug_once("Need to implement staging response handler.\n");
+	return UCODE_ERROR;
+}
+
+static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)
+{
+	WARN_ON_ONCE(totalsize < offset);
+	return min(MBOX_XACTION_LEN, totalsize - offset);
+}
+
+bool staging_work(u64 mmio_pa, void *ucode_ptr, unsigned int totalsize)
+{
+	unsigned int xaction_bytes = 0, offset = 0, chunksize;
+	void __iomem *mmio_base;
+	enum ucode_state state;
+
+	mmio_base = ioremap(mmio_pa, MBOX_REG_NUM * MBOX_REG_SIZE);
+	if (WARN_ON_ONCE(!mmio_base))
+		return false;
+
+	abort_xaction(mmio_base);
+
+	while (offset != STAGING_OFFSET_END) {
+		chunksize = get_chunksize(totalsize, offset);
+		if (xaction_bytes + chunksize > MBOX_XACTION_MAX(totalsize)) {
+			state = UCODE_TIMEOUT;
+			break;
+		}
+
+		request_xaction(mmio_base, ucode_ptr + offset, chunksize);
+		state = wait_for_xaction(mmio_base);
+		if (state != UCODE_OK)
+			break;
+
+		xaction_bytes += chunksize;
+		state = read_xaction_response(mmio_base, &offset);
+		if (state != UCODE_OK)
+			break;
+	}
+
+	iounmap(mmio_base);
+
+	if (state == UCODE_OK)
+		return true;
+
+	pr_err("Staging failed with %s.\n", state == UCODE_TIMEOUT ? "timeout" : "error");
+	return false;
+}
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 06c3c8db4ceb..6b0d9a4374db 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -120,11 +120,7 @@ void load_ucode_intel_bsp(struct early_load_data *ed);
 void load_ucode_intel_ap(void);
 void reload_ucode_intel(void);
 struct microcode_ops *init_intel_microcode(void);
-static inline bool staging_work(u64 mmio_pa, void *ucode_ptr, unsigned int totalsize)
-{
-	pr_debug_once("Need to implement the Staging code.\n");
-	return false;
-}
+bool staging_work(u64 mmio_pa, void *ucode_ptr, unsigned int totalsize);
 #else /* CONFIG_CPU_SUP_INTEL */
 static inline void load_ucode_intel_bsp(struct early_load_data *ed) { }
 static inline void load_ucode_intel_ap(void) { }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RFC 6/7] x86/microcode/intel_staging: Support mailbox data transfer
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                   ` (4 preceding siblings ...)
  2024-10-01 16:10 ` [PATCH RFC 5/7] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 7/7] x86/microcode/intel: Enable staging when available Chang S. Bae
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  7 siblings, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

The staging architecture features a narrowed interface for data transfer.
Instead of allocating MMIO space based on data chunk size, it utilizes
two data registers: one for reading and one for writing, enforcing the
serialization of read and write operations. Additionally, it defines a
mailbox data format.

To facilitate data transfer, implement helper functions in line with this
specified format for reading and writing staging data. This mailbox
format is a customized version and is not compatible with the existing
mailbox code, so reuse is not feasible.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel_staging.c | 55 ++++++++++++++++++-
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_staging.c b/arch/x86/kernel/cpu/microcode/intel_staging.c
index 9989a78f9ef2..d56bad30164c 100644
--- a/arch/x86/kernel/cpu/microcode/intel_staging.c
+++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
@@ -3,6 +3,7 @@
 #define pr_fmt(fmt) "microcode: " fmt
 #include <linux/delay.h>
 #include <linux/io.h>
+#include <linux/pci_ids.h>
 
 #include "internal.h"
 
@@ -11,17 +12,44 @@
 
 #define MBOX_CONTROL_OFFSET	0x0
 #define MBOX_STATUS_OFFSET	0x4
+#define MBOX_WRDATA_OFFSET	0x8
+#define MBOX_RDDATA_OFFSET	0xc
 
 #define MASK_MBOX_CTRL_ABORT	BIT(0)
+#define MASK_MBOX_CTRL_GO	BIT(31)
 
 #define MASK_MBOX_STATUS_ERROR	BIT(2)
 #define MASK_MBOX_STATUS_READY	BIT(31)
 
+#define MASK_MBOX_RESP_SUCCESS	BIT(0)
+#define MASK_MBOX_RESP_PROGRESS	BIT(1)
+#define MASK_MBOX_RESP_ERROR	BIT(2)
+
+#define MBOX_CMD_LOAD		0x3
+#define MBOX_OBJ_STAGING	0xb
+#define MBOX_HDR		(PCI_VENDOR_ID_INTEL | (MBOX_OBJ_STAGING << 16))
+#define MBOX_HDR_SIZE		16
+
 #define MBOX_XACTION_LEN	PAGE_SIZE
 #define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)
 #define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)
 
 #define STAGING_OFFSET_END	0xffffffff
+#define DWORD_SIZE(s)		((s) / sizeof(u32))
+
+static inline u32 read_mbox_dword(void __iomem *base)
+{
+	u32 dword = readl(base + MBOX_RDDATA_OFFSET);
+
+	/* Inform the read completion to the staging firmware */
+	writel(0, base + MBOX_RDDATA_OFFSET);
+	return dword;
+}
+
+static inline void write_mbox_dword(void __iomem *base, u32 dword)
+{
+	writel(dword, base + MBOX_WRDATA_OFFSET);
+}
 
 static inline void abort_xaction(void __iomem *base)
 {
@@ -30,7 +58,18 @@ static inline void abort_xaction(void __iomem *base)
 
 static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
 {
-	pr_debug_once("Need to implement staging mailbox loading code.\n");
+	unsigned int i, dwsize = DWORD_SIZE(chunksize);
+
+	write_mbox_dword(base, MBOX_HDR);
+	write_mbox_dword(base, dwsize + DWORD_SIZE(MBOX_HDR_SIZE));
+
+	write_mbox_dword(base, MBOX_CMD_LOAD);
+	write_mbox_dword(base, 0);
+
+	for (i = 0; i < dwsize; i++)
+		write_mbox_dword(base, chunk[i]);
+
+	writel(MASK_MBOX_CTRL_GO, base + MBOX_CONTROL_OFFSET);
 }
 
 static enum ucode_state wait_for_xaction(void __iomem *base)
@@ -55,8 +94,18 @@ static enum ucode_state wait_for_xaction(void __iomem *base)
 
 static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
 {
-	pr_debug_once("Need to implement staging response handler.\n");
-	return UCODE_ERROR;
+	u32 flag;
+
+	WARN_ON_ONCE(read_mbox_dword(base) != MBOX_HDR);
+	WARN_ON_ONCE(read_mbox_dword(base) != DWORD_SIZE(MBOX_HDR_SIZE));
+
+	*offset = read_mbox_dword(base);
+
+	flag = read_mbox_dword(base);
+	if (flag & MASK_MBOX_RESP_ERROR)
+		return UCODE_ERROR;
+
+	return UCODE_OK;
 }
 
 static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH RFC 7/7] x86/microcode/intel: Enable staging when available
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                   ` (5 preceding siblings ...)
  2024-10-01 16:10 ` [PATCH RFC 6/7] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
@ 2024-10-01 16:10 ` Chang S. Bae
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  7 siblings, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

With the staging code being ready, check the relevant MSRs and set the
feature chicken bit to allow staging to be invoked from the core
microcode update process.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 03c4b0e7e97c..8f2476fbe8f2 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -688,6 +688,18 @@ static __init void calc_llc_size_per_core(struct cpuinfo_x86 *c)
 	llc_size_per_core = (unsigned int)llc_size;
 }
 
+static __init bool staging_available(void)
+{
+	u64 val;
+
+	val = x86_read_arch_cap_msr();
+	if (!(val & ARCH_CAP_MCU_ENUM))
+		return false;
+
+	rdmsrl(MSR_IA32_MCU_ENUMERATION, val);
+	return !!(val & MCU_STAGING);
+}
+
 struct microcode_ops * __init init_intel_microcode(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -698,6 +710,11 @@ struct microcode_ops * __init init_intel_microcode(void)
 		return NULL;
 	}
 
+	if (staging_available()) {
+		pr_info("Staging is available.\n");
+		microcode_intel_ops.use_staging = true;
+	}
+
 	calc_llc_size_per_core(c);
 
 	return &microcode_intel_ops;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                   ` (6 preceding siblings ...)
  2024-10-01 16:10 ` [PATCH RFC 7/7] x86/microcode/intel: Enable staging when available Chang S. Bae
@ 2024-12-11  1:42 ` Chang S. Bae
  2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
                     ` (6 more replies)
  7 siblings, 7 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

Hi all,

Changes since the RFC posting [1]:

* Simplified the staging address discovery code. Leveraging the staging
  topology, stage only if package id changes (Thomas).

* Cleaned up the MSR read logic (Boris and Dave).

* Renamed functions to align with the do_something naming convention
  (Boris).

* Polished staging result messages (Boris).

* Dropped the WBINVD removal as mainlined now.

This series is based on 6.13-rc2. You can also find it from this repo:
    git://github.com/intel-staging/microcode.git staging_v1

I would appreciate further reviews and feedback.

Thanks,
Chang

---

Here is the original cover letter with minor updates -- removing the
WBINVD story and updating the function names:

== Latency Spike Issue ==

As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [2]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.

== Staging for Latency Reduction ==

Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.

== Validation ==

We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.

Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].

== Call for Review ==

Here are several key points to highlight for feedback:

  1. Staging Integration Approach:

     In the core code, the high-level sequence for late loading is:

     (1) request_microcode_fw(), and
     (2) load_late_stop_cpus()->apply_microcode()

     Staging doesn't fit neatly into either steps, as it involves the
     loading process but not the activation. Therefore, a new callback is
     introduced:

       core::load_late_locked()
       -> intel::staging_microcode()
          -> intel_staging::do_stage()

  2. Code Abstraction:

     The newly added intel_staging.c file contains all staging-related
     code to keep it self-contained. Ideally, the entire firmware
     interaction could eventually be abstracted into a single MSR write,
     which remains a long-term goal. Fortunately, recent protocol
     simplifications have made this more feasible.

  3. Staging Policy (TODO):

     While staging is always attempted, the system will fall back to the
     legacy update method if staging fails. There is an open question
     regarding staging policy: should it be mandatory, without fallback,
     in certain usage scenarios? This could lead further refinements in
     the flow depending on feedback and use cases.

  4. Specification Updates

     Recent specification updates have simplified the staging protocol
     and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
     with staging:

     4.1. Protocol Simplification

     The specification update [3] has significantly reduced the
     complexity of staging code, trimming the kernel code from ~1K lines
     in preliminary implementations. Thanks to Dave for guiding this
     redesign effort.

     4.2. Clarification of Legacy Update Behavior

     Chapter 5 of the specification adds further clarification on
     MSR_IA32_UCODE_WRITE. Key points are summarized below:

     (a) When staging is not performed or failed, a WRMSR will still load
     the patch image, but with higher latency.

     (b) During an active staging process, MSR_IA32_UCODE_WRITE can
     load a new microcode image, again with higher latency.

     (c) If the versions differ between the staged microcode and the
     version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
     the MSR takes precedence.

     I'd also make sure there is no further ambiguity in this documentation
     [3]. Feel free to provide feedback if anything seems unclear or
     unreasonable.

As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.

[1]: https://lore.kernel.org/all/20241001161042.465584-1-chang.seok.bae@intel.com/
[2]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
     ‘Uniform’ feature series.

Chang S. Bae (6):
  x86/microcode: Introduce staging option to reduce late-loading latency
  x86/msr-index: Define MSR index and bit for the microcode staging
    feature
  x86/microcode/intel: Prepare for microcode staging
  x86/microcode/intel_staging: Implement staging logic
  x86/microcode/intel_staging: Support mailbox data transfer
  x86/microcode/intel: Enable staging when available

 arch/x86/include/asm/msr-index.h              |   9 ++
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |  12 +-
 arch/x86/kernel/cpu/microcode/intel.c         |  53 +++++++
 arch/x86/kernel/cpu/microcode/intel_staging.c | 149 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   7 +-
 6 files changed, 228 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

-- 
2.45.2

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
@ 2024-12-11  1:42   ` Chang S. Bae
  2025-02-17 13:33     ` Borislav Petkov
  2025-02-18 15:16     ` Dave Hansen
  2024-12-11  1:42   ` [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
                     ` (5 subsequent siblings)
  6 siblings, 2 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

As microcode patch sizes continue to grow, the latency during
late-loading can spike, leading to timeouts and interruptions in running
workloads. This trend of increasing patch sizes is expected to continue,
so a foundational solution is needed to address the issue.

To mitigate the problem, a new staging feature is introduced. This option
processes most of the microcode update (excluding activation) on a
non-critical path, allowing CPUs to remain operational during the
majority of the update. By moving most of the work off the critical path,
the latency spike can be significantly reduced.

Integrate the staging process as an additional step in the late-loading
flow. Introduce a new callback for staging, which is invoked after the
microcode patch image is prepared but before entering the CPU rendezvous
for triggering the update.

Staging follows an opportunistic model: it is attempted when available.
If successful, it reduces CPU rendezvous time; if not, the process falls
back to the legacy loading, potentially exposing the system to higher
latency.

Extend struct microcode_ops to incorporate staging properties, which will
be updated in the vendor code from subsequent patches.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V1 -> V1: Rename the function name to the do_something() style (Boris).

Note.

Whether staging should be mandatory is a policy decision that is beyond
the scope of this patch at the moment. For now, the focus is on
establishing a basic flow, with the intention of attracting focused
reviews, while deferring the discussion on staging policy later.

In terms of the flow, an alternative approach could be to integrate
staging as part of microcode preparation on the vendor code side.
However, this was deemed too implicit, as staging involves loading and
validating the microcode image, which differs from typical microcode file
handling.
---
 arch/x86/kernel/cpu/microcode/core.c     | 12 ++++++++++--
 arch/x86/kernel/cpu/microcode/internal.h |  4 +++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index b3658d11e7b6..0967fd15be6e 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -676,19 +676,27 @@ static bool setup_cpus(void)

 static int load_late_locked(void)
 {
+	bool is_safe = false;
+
 	if (!setup_cpus())
 		return -EBUSY;

 	switch (microcode_ops->request_microcode_fw(0, &microcode_pdev->dev)) {
 	case UCODE_NEW:
-		return load_late_stop_cpus(false);
+		break;
 	case UCODE_NEW_SAFE:
-		return load_late_stop_cpus(true);
+		is_safe = true;
+		break;
 	case UCODE_NFOUND:
 		return -ENOENT;
 	default:
 		return -EBADFD;
 	}
+
+	if (microcode_ops->use_staging)
+		microcode_ops->stage_microcode();
+
+	return load_late_stop_cpus(is_safe);
 }

 static ssize_t reload_store(struct device *dev,
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 21776c529fa9..b27cb8e1228d 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -31,10 +31,12 @@ struct microcode_ops {
 	 * See also the "Synchronization" section in microcode_core.c.
 	 */
 	enum ucode_state	(*apply_microcode)(int cpu);
+	void			(*stage_microcode)(void);
 	int			(*collect_cpu_info)(int cpu, struct cpu_signature *csig);
 	void			(*finalize_late_load)(int result);
 	unsigned int		nmi_safe	: 1,
-				use_nmi		: 1;
+				use_nmi		: 1,
+				use_staging	: 1;
 };

 struct early_load_data {
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency
  2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
@ 2025-02-17 13:33     ` Borislav Petkov
  2025-02-18  7:51       ` Chang S. Bae
  2025-02-18 15:16     ` Dave Hansen
  1 sibling, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2025-02-17 13:33 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Tue, Dec 10, 2024 at 05:42:07PM -0800, Chang S. Bae wrote:
>  static int load_late_locked(void)
>  {
> +	bool is_safe = false;
> +
>  	if (!setup_cpus())
>  		return -EBUSY;
>  
>  	switch (microcode_ops->request_microcode_fw(0, &microcode_pdev->dev)) {
>  	case UCODE_NEW:
> -		return load_late_stop_cpus(false);
> +		break;
>  	case UCODE_NEW_SAFE:
> -		return load_late_stop_cpus(true);
> +		is_safe = true;
> +		break;
>  	case UCODE_NFOUND:
>  		return -ENOENT;
>  	default:
>  		return -EBADFD;
>  	}
> +
> +	if (microcode_ops->use_staging)
> +		microcode_ops->stage_microcode();
> +
> +	return load_late_stop_cpus(is_safe);
>  }

Why are you even touching this function instead of doing the staging in the
beginning of load_late_stop_cpus()?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency
  2025-02-17 13:33     ` Borislav Petkov
@ 2025-02-18  7:51       ` Chang S. Bae
  2025-02-18 11:36         ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2025-02-18  7:51 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On 2/17/2025 5:33 AM, Borislav Petkov wrote:
> 
> Why are you even touching this function instead of doing the staging in the
> beginning of load_late_stop_cpus()?

I thought staging is logically distinguishable. While 
load_late_stop_cpus() currently performs loading when CPUs are 
_stopped_, staging occurs on a non-critical path and remains 
interruptible. So, the function name itself seems misaligned with the 
staging process.

It looks like commit 4b753955e915 ("x86/microcode: Add per CPU result 
state") renamed microcode_reload_late() to the current name:

   -/*
   - * Reload microcode late on all CPUs. Wait for a sec until they
   - * all gather together.
   - */
   -static int microcode_reload_late(void)
   +static int load_late_stop_cpus(void)

which primarily narrowed the function’s scope to microcode rendezvous 
for late loading.


Given them all, maybe another option is to introduce a wrapper, instead 
of modifying load_late_locked() directly, like:

@@ -536,11 +536,6 @@ static int load_late_stop_cpus(bool is_safe)
         int old_rev = boot_cpu_data.microcode;
         struct cpuinfo_x86 prev_info;

-       if (!is_safe) {
-               pr_err("Late microcode loading without minimal revision 
check.\n");
-               pr_err("You should switch to early loading, if 
possible.\n");
-       }
-
         atomic_set(&late_cpus_in, num_online_cpus());
         atomic_set(&offline_in_nmi, 0);
         loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);
@@ -674,6 +669,20 @@ static bool setup_cpus(void)
         return true;
  }

+static int load_late_apply(bool is_safe)
+{
+       if (!is_safe) {
+               pr_err("Late microcode loading without minimal revision 
check.\n");
+               pr_err("You should switch to early loading, if 
possible.\n");
+       }
+
+       /* Stage microcode without stopping CPUs */
+       if (microcode_ops->use_staging)
+               microcode_ops->stage_microcode();
+
+       return load_late_stop_cpus(is_safe);
+}
+
  static int load_late_locked(void)
  {
         if (!setup_cpus())
@@ -681,9 +690,9 @@ static int load_late_locked(void)

         switch (microcode_ops->request_microcode_fw(0, 
&microcode_pdev->dev)) {
         case UCODE_NEW:
-               return load_late_stop_cpus(false);
+               return load_late_apply(false);
         case UCODE_NEW_SAFE:
-               return load_late_stop_cpus(true);
+               return load_late_apply(true);
         case UCODE_NFOUND:
                 return -ENOENT;
         default:

Thanks,
Chang

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency
  2025-02-18  7:51       ` Chang S. Bae
@ 2025-02-18 11:36         ` Borislav Petkov
  0 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2025-02-18 11:36 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Mon, Feb 17, 2025 at 11:51:28PM -0800, Chang S. Bae wrote:
> On 2/17/2025 5:33 AM, Borislav Petkov wrote:
> > 
> > Why are you even touching this function instead of doing the staging in the
> > beginning of load_late_stop_cpus()?
> 
> I thought staging is logically distinguishable.

What does that even mean?

> While load_late_stop_cpus() currently performs loading when CPUs are
> _stopped_,

Not entirely - it does preparatory work and then stops the CPUs. Staging could
be part of that prep work.

> staging occurs on a non-critical path and remains interruptible.

So if you really wanna do that and be really "free", then you should do it
outside of load_late_locked() because that runs with the hotplug lock taken.

But then the only thing that matters is, when *exactly* you should stage. If
->request_microcode_fw() fails, staging would be unnecessary work.

So instead of trying to too hard, just stick the staging at the beginning of
load_late_stop_cpus() and be done with it.

Also, if you want to send a patch, don't send it from a mail client which will
mangle it so that it is inapplicable and no one can play with it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency
  2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
  2025-02-17 13:33     ` Borislav Petkov
@ 2025-02-18 15:16     ` Dave Hansen
  1 sibling, 0 replies; 44+ messages in thread
From: Dave Hansen @ 2025-02-18 15:16 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen

On 12/10/24 17:42, Chang S. Bae wrote:
> +	if (microcode_ops->use_staging)
> +		microcode_ops->stage_microcode();

I don't know if any non-Intel vendors will implement one of these, but
could we please comment this a _bit_?

Somebody is going to come along at some point and ask themselves whether
they should add a new staging handler or dump some new code in an
existing one. The key aspects of the existing staging handler are:

	1. Helps the future actual microcode load in some way
	2. Has little impact on the rest of the system
	3. Can succeed or fail without affecting functionality

Did I miss any?

Could we add a comment here, or maybe even at the Intel
stage_microcode() function explaining this intent?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
@ 2024-12-11  1:42   ` Chang S. Bae
  2025-02-26 17:19     ` Borislav Petkov
  2024-12-11  1:42   ` [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

The microcode staging feature involves two key MSR entities, the
presence of which is indicated by bit 16 of IA32_ARCH_CAPABILITIES:

* Bit 4 in IA32_MCU_ENUMERATION shows the availability of the microcode
  staging feature.

* Staging is managed through MMIO registers, with
  IA32_MCU_STAGING_MBOX_ADDR MSR specifying the physical address of the
  first MMIO register.

Define the MSR index and bit assignments, helping the upcoming staging
code to make use of the hardware feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/include/asm/msr-index.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6d..2840a2fe340b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -164,6 +164,10 @@
 						 * Processor MMIO stale data
 						 * vulnerabilities.
 						 */
+#define ARCH_CAP_MCU_ENUM		BIT(16) /*
+						 * Indicates the presence of microcode update
+						 * feature enumeration and status information
+						 */
 #define ARCH_CAP_FB_CLEAR		BIT(17)	/*
 						 * VERW clears CPU fill buffer
 						 * even on MDS_NO CPUs.
@@ -884,6 +888,11 @@
 #define MSR_IA32_UCODE_WRITE		0x00000079
 #define MSR_IA32_UCODE_REV		0x0000008b
 
+#define MSR_IA32_MCU_ENUMERATION	0x0000007b
+#define MCU_STAGING			BIT(4)
+
+#define MSR_IA32_MCU_STAGING_MBOX_ADDR	0x000007a5
+
 /* Intel SGX Launch Enclave Public Key Hash MSRs */
 #define MSR_IA32_SGXLEPUBKEYHASH0	0x0000008C
 #define MSR_IA32_SGXLEPUBKEYHASH1	0x0000008D
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature
  2024-12-11  1:42   ` [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
@ 2025-02-26 17:19     ` Borislav Petkov
  0 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2025-02-26 17:19 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Tue, Dec 10, 2024 at 05:42:08PM -0800, Chang S. Bae wrote:
> The microcode staging feature involves two key MSR entities, the
> presence of which is indicated by bit 16 of IA32_ARCH_CAPABILITIES:
> 
> * Bit 4 in IA32_MCU_ENUMERATION shows the availability of the microcode
>   staging feature.
> 
> * Staging is managed through MMIO registers, with
>   IA32_MCU_STAGING_MBOX_ADDR MSR specifying the physical address of the
>   first MMIO register.
> 
> Define the MSR index and bit assignments, helping the upcoming staging
> code to make use of the hardware feature.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
>  arch/x86/include/asm/msr-index.h | 9 +++++++++
>  1 file changed, 9 insertions(+)

Merge this patch with the patch where those MSR bits are used - no need for
a separate one.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
  2024-12-11  1:42   ` [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
@ 2024-12-11  1:42   ` Chang S. Bae
  2025-02-26 17:52     ` Borislav Petkov
  2024-12-11  1:42   ` [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

When microcode staging is initiated, operations are carried out through
an MMIO interface. Each package has a unique interface specified by the
IA32_MCU_STAGING_MBOX_ADDR MSR, which points to a set of dword-sized
registers.

Prepare staging with the following steps: First, ensure the microcode
image is dword-aligned to correspond with MMIO registers. Next,
identify each MMIO interface based on its per-package scope. Then,
invoke the staging function for each identified interface.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V1 -> V1:
* Simplify code by leveraging the architectural per-package staging scope
  (Thomas).
* Fix MSR read code (Boris and Dave).
* Rename the staging function: staging_work() -> do_stage() (Boris).
* Polish the result messages (Boris).
* Add a prototype for builds without CONFIG_CPU_SUP_INTEL (Boris).
* Massage the changelog.

Note:

1. Using a direct reference to 'cpu_primary_thread_mask' in
   for_each_cpu(...) causes a build error when !CONFIG_SMP. Instead, use
   the wrapper function topology_is_primary_thread() to avoid it.

2. Ideally, the do_stage() function would be as simple as a single WRMSR
   execution. If this were the case, the staging flow could be completed
   with this patch. From this perspective, the software handling for
   interacting with the staging firmware has been separated from this
   vendor code and moved into a new file dedicated to staging logic.
---
 arch/x86/kernel/cpu/microcode/intel.c    | 36 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h |  7 +++++
 2 files changed, 43 insertions(+)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index f3d534807d91..325068bb5524 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -299,6 +299,41 @@ static __init struct microcode_intel *scan_microcode(void *data, size_t size,
 	return size ? NULL : patch;
 }
 
+static void stage_microcode(void)
+{
+	unsigned int totalsize, pkg_id = UINT_MAX;
+	enum ucode_state state;
+	int cpu;
+	u64 pa;
+
+	totalsize = get_totalsize(&ucode_patch_late->hdr);
+	if (!IS_ALIGNED(totalsize, sizeof(u32)))
+		return;
+
+	/*
+	 * The MMIO address is unique per package, and all the SMT
+	 * primary threads are online here. Find each MMIO space by
+	 * their package ids to avoid duplicate staging.
+	 */
+	for_each_cpu(cpu, cpu_online_mask) {
+		if (!topology_is_primary_thread(cpu) || topology_logical_package_id(cpu) == pkg_id)
+			continue;
+		pkg_id = topology_logical_package_id(cpu);
+
+		rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &pa);
+
+		state = do_stage(pa, ucode_patch_late, totalsize);
+		if (state != UCODE_OK) {
+			pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
+			       state == UCODE_TIMEOUT ? "timeout" : "error state", cpu, pkg_id);
+			return;
+		}
+	}
+
+	pr_info("Staging of patch revision 0x%x succeeded.\n",
+		((struct microcode_header_intel *)ucode_patch_late)->rev);
+}
+
 static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
 					  struct microcode_intel *mc,
 					  u32 *cur_rev)
@@ -627,6 +662,7 @@ static struct microcode_ops microcode_intel_ops = {
 	.collect_cpu_info	= collect_cpu_info,
 	.apply_microcode	= apply_microcode_late,
 	.finalize_late_load	= finalize_late_load,
+	.stage_microcode	= stage_microcode,
 	.use_nmi		= IS_ENABLED(CONFIG_X86_64),
 };
 
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index b27cb8e1228d..158429d80f93 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -120,11 +120,18 @@ void load_ucode_intel_bsp(struct early_load_data *ed);
 void load_ucode_intel_ap(void);
 void reload_ucode_intel(void);
 struct microcode_ops *init_intel_microcode(void);
+static inline enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)
+{
+	pr_debug_once("Need to implement the staging code.\n");
+	return UCODE_ERROR;
+}
 #else /* CONFIG_CPU_SUP_INTEL */
 static inline void load_ucode_intel_bsp(struct early_load_data *ed) { }
 static inline void load_ucode_intel_ap(void) { }
 static inline void reload_ucode_intel(void) { }
 static inline struct microcode_ops *init_intel_microcode(void) { return NULL; }
+static inline enum ucode_state
+do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize) { return UCODE_ERROR; }
 #endif  /* !CONFIG_CPU_SUP_INTEL */
 
 #endif /* _X86_MICROCODE_INTERNAL_H */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging
  2024-12-11  1:42   ` [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
@ 2025-02-26 17:52     ` Borislav Petkov
  0 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2025-02-26 17:52 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Tue, Dec 10, 2024 at 05:42:09PM -0800, Chang S. Bae wrote:
> +static void stage_microcode(void)
> +{
> +	unsigned int totalsize, pkg_id = UINT_MAX;
> +	enum ucode_state state;
> +	int cpu;
> +	u64 pa;
> +
> +	totalsize = get_totalsize(&ucode_patch_late->hdr);
> +	if (!IS_ALIGNED(totalsize, sizeof(u32)))
> +		return;


	... lockdep assert hotplug lock held blabla...

> +
> +	/*
> +	 * The MMIO address is unique per package, and all the SMT
> +	 * primary threads are online here. Find each MMIO space by
> +	 * their package ids to avoid duplicate staging.
> +	 */
> +	for_each_cpu(cpu, cpu_online_mask) {
> +		if (!topology_is_primary_thread(cpu) || topology_logical_package_id(cpu) == pkg_id)
> +			continue;

		if (!topology_is_primary_thread(cpu) ||
		     topology_logical_package_id(cpu) == pkg_id)


> +		pkg_id = topology_logical_package_id(cpu);
> +
> +		rdmsrl_on_cpu(cpu, MSR_IA32_MCU_STAGING_MBOX_ADDR, &pa);
> +
> +		state = do_stage(pa, ucode_patch_late, totalsize);
> +		if (state != UCODE_OK) {
> +			pr_err("Error: staging failed with %s for CPU%d at package %u.\n",
> +			       state == UCODE_TIMEOUT ? "timeout" : "error state", cpu, pkg_id);
> +			return;
> +		}
> +	}
> +
> +	pr_info("Staging of patch revision 0x%x succeeded.\n",
> +		((struct microcode_header_intel *)ucode_patch_late)->rev);
> +}
> +
>  static enum ucode_state __apply_microcode(struct ucode_cpu_info *uci,
>  					  struct microcode_intel *mc,
>  					  u32 *cur_rev)
> @@ -627,6 +662,7 @@ static struct microcode_ops microcode_intel_ops = {
>  	.collect_cpu_info	= collect_cpu_info,
>  	.apply_microcode	= apply_microcode_late,
>  	.finalize_late_load	= finalize_late_load,
> +	.stage_microcode	= stage_microcode,

Btw, you can get rid of ->use_staging and simply check whether
->stage_microcode is not NULL.

>  	.use_nmi		= IS_ENABLED(CONFIG_X86_64),
>  };
>  
> diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
> index b27cb8e1228d..158429d80f93 100644
> --- a/arch/x86/kernel/cpu/microcode/internal.h
> +++ b/arch/x86/kernel/cpu/microcode/internal.h
> @@ -120,11 +120,18 @@ void load_ucode_intel_bsp(struct early_load_data *ed);
>  void load_ucode_intel_ap(void);
>  void reload_ucode_intel(void);
>  struct microcode_ops *init_intel_microcode(void);
> +static inline enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)
> +{
> +	pr_debug_once("Need to implement the staging code.\n");
> +	return UCODE_ERROR;
> +}
>  #else /* CONFIG_CPU_SUP_INTEL */
>  static inline void load_ucode_intel_bsp(struct early_load_data *ed) { }
>  static inline void load_ucode_intel_ap(void) { }
>  static inline void reload_ucode_intel(void) { }
>  static inline struct microcode_ops *init_intel_microcode(void) { return NULL; }
> +static inline enum ucode_state
> +do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize) { return UCODE_ERROR; }
>  #endif  /* !CONFIG_CPU_SUP_INTEL */
>
You don't need to export those if staging is going to be done only by the
Intel side. Just keep everything in intel.c

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                     ` (2 preceding siblings ...)
  2024-12-11  1:42   ` [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
@ 2024-12-11  1:42   ` Chang S. Bae
  2025-02-18 20:16     ` Dave Hansen
  2025-02-26 17:56     ` Borislav Petkov
  2024-12-11  1:42   ` [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
                     ` (2 subsequent siblings)
  6 siblings, 2 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

The staging firmware operates through a protocol via the MMIO interface.
The protocol defines a serialized sequence that begins by clearing the
hardware with an abort request. It then proceeds through iterative
process of sending data, initiating transactions, waiting for processing,
and reading responses.

To facilitate this interaction, follow the outlined protocol. Refactor
the waiting code to manage loop breaks more effectively. Data transfer
involves a next level of detail to handle the mailbox format. While
defining helpers, leave them empty for now.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V1 -> V1: Rename the function name and change the return type.
---
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/intel_staging.c | 100 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   6 +-
 3 files changed, 102 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

diff --git a/arch/x86/kernel/cpu/microcode/Makefile b/arch/x86/kernel/cpu/microcode/Makefile
index 193d98b33a0a..a9f79aaffcb0 100644
--- a/arch/x86/kernel/cpu/microcode/Makefile
+++ b/arch/x86/kernel/cpu/microcode/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 microcode-y				:= core.o
 obj-$(CONFIG_MICROCODE)			+= microcode.o
-microcode-$(CONFIG_CPU_SUP_INTEL)	+= intel.o
+microcode-$(CONFIG_CPU_SUP_INTEL)	+= intel.o intel_staging.o
 microcode-$(CONFIG_CPU_SUP_AMD)		+= amd.o
diff --git a/arch/x86/kernel/cpu/microcode/intel_staging.c b/arch/x86/kernel/cpu/microcode/intel_staging.c
new file mode 100644
index 000000000000..2fc8667cab45
--- /dev/null
+++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#define pr_fmt(fmt) "microcode: " fmt
+#include <linux/delay.h>
+#include <linux/io.h>
+
+#include "internal.h"
+
+#define MBOX_REG_NUM		4
+#define MBOX_REG_SIZE		sizeof(u32)
+
+#define MBOX_CONTROL_OFFSET	0x0
+#define MBOX_STATUS_OFFSET	0x4
+
+#define MASK_MBOX_CTRL_ABORT	BIT(0)
+
+#define MASK_MBOX_STATUS_ERROR	BIT(2)
+#define MASK_MBOX_STATUS_READY	BIT(31)
+
+#define MBOX_XACTION_LEN	PAGE_SIZE
+#define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)
+#define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)
+
+#define STAGING_OFFSET_END	0xffffffff
+
+static inline void abort_xaction(void __iomem *base)
+{
+	writel(MASK_MBOX_CTRL_ABORT, base + MBOX_CONTROL_OFFSET);
+}
+
+static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
+{
+	pr_debug_once("Need to implement staging mailbox loading code.\n");
+}
+
+static enum ucode_state wait_for_xaction(void __iomem *base)
+{
+	u32 timeout, status;
+
+	for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT; timeout++) {
+		msleep(1);
+		status = readl(base + MBOX_STATUS_OFFSET);
+		if (status & MASK_MBOX_STATUS_READY)
+			break;
+	}
+
+	status = readl(base + MBOX_STATUS_OFFSET);
+	if (status & MASK_MBOX_STATUS_ERROR)
+		return UCODE_ERROR;
+	if (!(status & MASK_MBOX_STATUS_READY))
+		return UCODE_TIMEOUT;
+
+	return UCODE_OK;
+}
+
+static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
+{
+	pr_debug_once("Need to implement staging response handler.\n");
+	return UCODE_ERROR;
+}
+
+static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)
+{
+	WARN_ON_ONCE(totalsize < offset);
+	return min(MBOX_XACTION_LEN, totalsize - offset);
+}
+
+enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)
+{
+	unsigned int xaction_bytes = 0, offset = 0, chunksize;
+	void __iomem *mmio_base;
+	enum ucode_state state;
+
+	mmio_base = ioremap(pa, MBOX_REG_NUM * MBOX_REG_SIZE);
+	if (WARN_ON_ONCE(!mmio_base))
+		return UCODE_ERROR;
+
+	abort_xaction(mmio_base);
+
+	while (offset != STAGING_OFFSET_END) {
+		chunksize = get_chunksize(totalsize, offset);
+		if (xaction_bytes + chunksize > MBOX_XACTION_MAX(totalsize)) {
+			state = UCODE_TIMEOUT;
+			break;
+		}
+
+		request_xaction(mmio_base, ucode_ptr + offset, chunksize);
+		state = wait_for_xaction(mmio_base);
+		if (state != UCODE_OK)
+			break;
+
+		xaction_bytes += chunksize;
+		state = read_xaction_response(mmio_base, &offset);
+		if (state != UCODE_OK)
+			break;
+	}
+
+	iounmap(mmio_base);
+	return state;
+}
diff --git a/arch/x86/kernel/cpu/microcode/internal.h b/arch/x86/kernel/cpu/microcode/internal.h
index 158429d80f93..787524e4ef1e 100644
--- a/arch/x86/kernel/cpu/microcode/internal.h
+++ b/arch/x86/kernel/cpu/microcode/internal.h
@@ -120,11 +120,7 @@ void load_ucode_intel_bsp(struct early_load_data *ed);
 void load_ucode_intel_ap(void);
 void reload_ucode_intel(void);
 struct microcode_ops *init_intel_microcode(void);
-static inline enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)
-{
-	pr_debug_once("Need to implement the staging code.\n");
-	return UCODE_ERROR;
-}
+enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize);
 #else /* CONFIG_CPU_SUP_INTEL */
 static inline void load_ucode_intel_bsp(struct early_load_data *ed) { }
 static inline void load_ucode_intel_ap(void) { }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic
  2024-12-11  1:42   ` [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
@ 2025-02-18 20:16     ` Dave Hansen
  2025-02-26 17:56     ` Borislav Petkov
  1 sibling, 0 replies; 44+ messages in thread
From: Dave Hansen @ 2025-02-18 20:16 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen

Remember that the subject prefixes are logical areas of the code, not
filenames.

Bad:

	x86/microcode/intel_staging

Good:

	x86/microcode/intel


On 12/10/24 17:42, Chang S. Bae wrote:
> The staging firmware operates through a protocol via the MMIO interface.
> The protocol defines a serialized sequence that begins by clearing the
> hardware with an abort request. It then proceeds through iterative
> process of sending data, initiating transactions, waiting for processing,
> and reading responses.

I'm not sure how much value this adds. It's rehashing a few things that
are or should be blatantly obvious from the code.

For instance, mentioning that it's an "MMIO interface" is kinda obvious
from the ioremap() and variable named "mmio_base".

> To facilitate this interaction, follow the outlined protocol. Refactor
> the waiting code to manage loop breaks more effectively. Data transfer
> involves a next level of detail to handle the mailbox format. While
> defining helpers, leave them empty for now.

I'm not sure what's being refactored here.


> --- /dev/null
> +++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#define pr_fmt(fmt) "microcode: " fmt
> +#include <linux/delay.h>
> +#include <linux/io.h>
> +
> +#include "internal.h"
> +
> +#define MBOX_REG_NUM		4
> +#define MBOX_REG_SIZE		sizeof(u32)
> +
> +#define MBOX_CONTROL_OFFSET	0x0
> +#define MBOX_STATUS_OFFSET	0x4
> +
> +#define MASK_MBOX_CTRL_ABORT	BIT(0)
> +
> +#define MASK_MBOX_STATUS_ERROR	BIT(2)
> +#define MASK_MBOX_STATUS_READY	BIT(31)
> +
> +#define MBOX_XACTION_LEN	PAGE_SIZE
> +#define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)

The MBOX_XACTION_MAX() concept needs to be explained _somewhere_. The
concept as I remember it is:

	Each image is broken up into pieces which are at most
	MBOX_XACTION_LEN in length. So a 10*MBOX_XACTION_LEN
	will need at least 10 actions. The hardware on the other side of
	the mbox has very few resources. It may not be able to cache the
	entire image and may ask for the same part of the image more
	than once. This means that it may take more than 10 actions to
	send a 10-piece image. Allow a 10-piece image to try 20 times to
	send the whole thing.

Can we comment that here or in the loop?

That's a rather verbose comment, but it is a kinda complicated thing. I
remember trying to talk the hardware guys out of this, but they were
rather insistent that it's required. But in the end it does make sense
to me that the (relatively) svelte device on the other end of this
mailbox might not have megabytes of spare storage and it's a relatively
simple thing to have the CPU send some data more than once.

> +#define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)

I find these a lot more self-explanatory when they gave units on them.

#define MBOX_XACTION_TIMEOUT_MS	(10 * MSEC_PER_SEC)

plus:

	for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT_MS; timeout++)
		msleep(1);

makes it a bit more obvious why there's an msleep(1) if the timeout is
obviously in milliseconds in the first place.

> +#define STAGING_OFFSET_END	0xffffffff

Should this get an explicit type?

> +static inline void abort_xaction(void __iomem *base)
> +{
> +	writel(MASK_MBOX_CTRL_ABORT, base + MBOX_CONTROL_OFFSET);
> +}
> +
> +static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
> +{
> +	pr_debug_once("Need to implement staging mailbox loading code.\n");
> +}

Can we comment this a little more please?

/*
 * Callers should use this after a request_xaction() to handle
 * explicit errors or timeouts when the hardware does not respond.
 *
 * Returns UCODE_OK on success.
 */
> +static enum ucode_state wait_for_xaction(void __iomem *base)
> +{
> +	u32 timeout, status;
> +

	/* Give the hardware time to perform the action: */

> +	for (timeout = 0; timeout < MBOX_XACTION_TIMEOUT; timeout++) {
> +		msleep(1);
> +		status = readl(base + MBOX_STATUS_OFFSET);

		/* Break out early if the hardware is ready: */
> +		if (status & MASK_MBOX_STATUS_READY)
> +			break;
> +	}
> +
> +	status = readl(base + MBOX_STATUS_OFFSET);

	/* The hardware signaled an explicit error: */
> +	if (status & MASK_MBOX_STATUS_ERROR)
> +		return UCODE_ERROR;

	/*
	 * Hardware neither responded to the action nor
	 * signaled an error. It must be out to lunch.
	 */
> +	if (!(status & MASK_MBOX_STATUS_READY))
> +		return UCODE_TIMEOUT;
> +
> +	return UCODE_OK;
> +}
> +
> +static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
> +{
> +	pr_debug_once("Need to implement staging response handler.\n");
> +	return UCODE_ERROR;
> +}
> +
> +static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)
> +{
> +	WARN_ON_ONCE(totalsize < offset);
> +	return min(MBOX_XACTION_LEN, totalsize - offset);
> +}

I honestly forgot what this is trying to do. Is it trying to break up a
"totalsize" action into pieces that are at most MBOX_XACTION_LEN bytes
in size? But it is also trying to ensure that if it has 10 bytes left
that it doesn't try to request MBOX_XACTION_LEN?

Also, "chunk_size", please. "chunksize" is a bit less readable.

> +enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)

"totalsize" is the length of the data at 'ucode_ptr', right? Would the
connection be clearer if we had:

	void *ucode_ptr,
	int ucode_len_bytes);

? Or even "ucode_len"?

> +{

Could we rename "pa" to "mmio_pa", please? It makes it much more clear
that it's not the "pa" of ucode_ptr or something.

> +	unsigned int xaction_bytes = 0, offset = 0, chunksize;
> +	void __iomem *mmio_base;
> +	enum ucode_state state;
> +
> +	mmio_base = ioremap(pa, MBOX_REG_NUM * MBOX_REG_SIZE);
> +	if (WARN_ON_ONCE(!mmio_base))
> +		return UCODE_ERROR;
> +
> +	abort_xaction(mmio_base);

Why is this aborting first? Why doesn't it have to wait for the abort to
complete?

> +	while (offset != STAGING_OFFSET_END) {
> +		chunksize = get_chunksize(totalsize, offset);
> +		if (xaction_bytes + chunksize > MBOX_XACTION_MAX(totalsize)) {
> +			state = UCODE_TIMEOUT;
> +			break;
> +		}

So, "xaction_bytes" is the number of bytes that we tried to send. If it
exceeds MBOX_XACTION_MAX(), then too many retries occurred. That's a
timeout.

Right?

I don't think that's obvious from the code.

> +
> +		request_xaction(mmio_base, ucode_ptr + offset, chunksize);
> +		state = wait_for_xaction(mmio_base);
> +		if (state != UCODE_OK)
> +			break;
> +
> +		xaction_bytes += chunksize;
> +		state = read_xaction_response(mmio_base, &offset);
> +		if (state != UCODE_OK)
> +			break;
> +	}

So, a dumb me would look at this loop and expect it to look like this:

	while (need_moar_data) {
		if (too_many_retries())
			break;

		send_data();
	}

But it doesn't. There's a two-step process to send data, make sure the
device got the data, and then read a separate response.  It _seems_ like
it's double-checking the response.

Could we comment it something more like this to make that more clear?

	while (need_moar_data) {
		if (too_many_retries())
			break;


		/* Send the data: */
		ret = hw_send_data(offset);
		if (ret)
			break;

		/*
		 * Ask the hardware which piece of the image it needs 	
		 * next. The same piece may be sent more than once.
		 */
		ret = hw_get_next_offset(&offset);
		if (ret)
			break;
	}


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic
  2024-12-11  1:42   ` [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
  2025-02-18 20:16     ` Dave Hansen
@ 2025-02-26 17:56     ` Borislav Petkov
  1 sibling, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2025-02-26 17:56 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, x86, tglx, mingo, dave.hansen

On Tue, Dec 10, 2024 at 05:42:10PM -0800, Chang S. Bae wrote:
> diff --git a/arch/x86/kernel/cpu/microcode/Makefile b/arch/x86/kernel/cpu/microcode/Makefile
> index 193d98b33a0a..a9f79aaffcb0 100644
> --- a/arch/x86/kernel/cpu/microcode/Makefile
> +++ b/arch/x86/kernel/cpu/microcode/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  microcode-y				:= core.o
>  obj-$(CONFIG_MICROCODE)			+= microcode.o
> -microcode-$(CONFIG_CPU_SUP_INTEL)	+= intel.o
> +microcode-$(CONFIG_CPU_SUP_INTEL)	+= intel.o intel_staging.o
						   ^^^^^^^^^^^^^^^

No need for that guy - just stick everything in intel.c

> +enum ucode_state do_stage(u64 pa, void *ucode_ptr, unsigned int totalsize)

static

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                     ` (3 preceding siblings ...)
  2024-12-11  1:42   ` [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
@ 2024-12-11  1:42   ` Chang S. Bae
  2025-02-18 20:54     ` Dave Hansen
  2024-12-11  1:42   ` [PATCH 6/6] x86/microcode/intel: Enable staging when available Chang S. Bae
  2025-02-07 18:37   ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  6 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

The staging architecture features a narrowed interface for data transfer.
Instead of allocating MMIO space based on data chunk size, it utilizes
two data registers: one for reading and one for writing, enforcing the
serialization of read and write operations. Additionally, it defines a
mailbox data format.

To facilitate data transfer, implement helper functions in line with this
specified format for reading and writing staging data. This mailbox
format is a customized version and is not compatible with the existing
mailbox code, so reuse is not feasible.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel_staging.c | 55 ++++++++++++++++++-
 1 file changed, 52 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_staging.c b/arch/x86/kernel/cpu/microcode/intel_staging.c
index 2fc8667cab45..eab6e891db9c 100644
--- a/arch/x86/kernel/cpu/microcode/intel_staging.c
+++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
@@ -3,6 +3,7 @@
 #define pr_fmt(fmt) "microcode: " fmt
 #include <linux/delay.h>
 #include <linux/io.h>
+#include <linux/pci_ids.h>
 
 #include "internal.h"
 
@@ -11,17 +12,44 @@
 
 #define MBOX_CONTROL_OFFSET	0x0
 #define MBOX_STATUS_OFFSET	0x4
+#define MBOX_WRDATA_OFFSET	0x8
+#define MBOX_RDDATA_OFFSET	0xc
 
 #define MASK_MBOX_CTRL_ABORT	BIT(0)
+#define MASK_MBOX_CTRL_GO	BIT(31)
 
 #define MASK_MBOX_STATUS_ERROR	BIT(2)
 #define MASK_MBOX_STATUS_READY	BIT(31)
 
+#define MASK_MBOX_RESP_SUCCESS	BIT(0)
+#define MASK_MBOX_RESP_PROGRESS	BIT(1)
+#define MASK_MBOX_RESP_ERROR	BIT(2)
+
+#define MBOX_CMD_LOAD		0x3
+#define MBOX_OBJ_STAGING	0xb
+#define MBOX_HDR		(PCI_VENDOR_ID_INTEL | (MBOX_OBJ_STAGING << 16))
+#define MBOX_HDR_SIZE		16
+
 #define MBOX_XACTION_LEN	PAGE_SIZE
 #define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)
 #define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)
 
 #define STAGING_OFFSET_END	0xffffffff
+#define DWORD_SIZE(s)		((s) / sizeof(u32))
+
+static inline u32 read_mbox_dword(void __iomem *base)
+{
+	u32 dword = readl(base + MBOX_RDDATA_OFFSET);
+
+	/* Inform the read completion to the staging firmware */
+	writel(0, base + MBOX_RDDATA_OFFSET);
+	return dword;
+}
+
+static inline void write_mbox_dword(void __iomem *base, u32 dword)
+{
+	writel(dword, base + MBOX_WRDATA_OFFSET);
+}
 
 static inline void abort_xaction(void __iomem *base)
 {
@@ -30,7 +58,18 @@ static inline void abort_xaction(void __iomem *base)
 
 static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
 {
-	pr_debug_once("Need to implement staging mailbox loading code.\n");
+	unsigned int i, dwsize = DWORD_SIZE(chunksize);
+
+	write_mbox_dword(base, MBOX_HDR);
+	write_mbox_dword(base, dwsize + DWORD_SIZE(MBOX_HDR_SIZE));
+
+	write_mbox_dword(base, MBOX_CMD_LOAD);
+	write_mbox_dword(base, 0);
+
+	for (i = 0; i < dwsize; i++)
+		write_mbox_dword(base, chunk[i]);
+
+	writel(MASK_MBOX_CTRL_GO, base + MBOX_CONTROL_OFFSET);
 }
 
 static enum ucode_state wait_for_xaction(void __iomem *base)
@@ -55,8 +94,18 @@ static enum ucode_state wait_for_xaction(void __iomem *base)
 
 static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
 {
-	pr_debug_once("Need to implement staging response handler.\n");
-	return UCODE_ERROR;
+	u32 flag;
+
+	WARN_ON_ONCE(read_mbox_dword(base) != MBOX_HDR);
+	WARN_ON_ONCE(read_mbox_dword(base) != DWORD_SIZE(MBOX_HDR_SIZE));
+
+	*offset = read_mbox_dword(base);
+
+	flag = read_mbox_dword(base);
+	if (flag & MASK_MBOX_RESP_ERROR)
+		return UCODE_ERROR;
+
+	return UCODE_OK;
 }
 
 static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer
  2024-12-11  1:42   ` [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
@ 2025-02-18 20:54     ` Dave Hansen
  2025-03-20 23:42       ` Chang S. Bae
  0 siblings, 1 reply; 44+ messages in thread
From: Dave Hansen @ 2025-02-18 20:54 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen

On 12/10/24 17:42, Chang S. Bae wrote:
> The staging architecture features a narrowed interface for data transfer.
> Instead of allocating MMIO space based on data chunk size, it utilizes
> two data registers: one for reading and one for writing, enforcing the
> serialization of read and write operations. Additionally, it defines a
> mailbox data format.

So I'm going back and reading this _after_ all of the code. I don't
think I comprehended what "narrowed interface" or "serialization" really
meant when I read this. I was thinking "serializing instructions".

Maybe something like this would help:

Let's say you want to write 2 bytes of data to a device. Typically, the
device would expose 2 bytes of "wide" MMIO space. To send the data to
the device, you could do:

	writeb(buf[0], io_addr + 0);
	writeb(buf[1], io_addr + 1);

But this mailbox is a bit different. Instead of having a "wide"
interface where there is separate MMIO space for each word in a
transaction, it has a "narrow" interface where several words are written
to the same spot in MMIO space:

	writeb(buf[0], io_addr);
	writeb(buf[1], io_addr);

The same goes for the read side.

To me, it's much more akin to talking over a serial port than how I
think of devices attached via MMIO.

> To facilitate data transfer, implement helper functions in line with this
> specified format for reading and writing staging data. This mailbox
> format is a customized version and is not compatible with the existing
> mailbox code, so reuse is not feasible.

This is getting a bit too flowery.

	Implement helper functions that send and receive data to and
	from the device.

	Note: the kernel has support for similar mailboxes. But none of
	them are compatible with this one. Trying to share code resulted
	in a bloated mess, so this code is standalone.

> diff --git a/arch/x86/kernel/cpu/microcode/intel_staging.c b/arch/x86/kernel/cpu/microcode/intel_staging.c
> index 2fc8667cab45..eab6e891db9c 100644
> --- a/arch/x86/kernel/cpu/microcode/intel_staging.c
> +++ b/arch/x86/kernel/cpu/microcode/intel_staging.c
> @@ -3,6 +3,7 @@
>  #define pr_fmt(fmt) "microcode: " fmt
>  #include <linux/delay.h>
>  #include <linux/io.h>
> +#include <linux/pci_ids.h>
>  
>  #include "internal.h"
>  
> @@ -11,17 +12,44 @@
>  
>  #define MBOX_CONTROL_OFFSET	0x0
>  #define MBOX_STATUS_OFFSET	0x4
> +#define MBOX_WRDATA_OFFSET	0x8
> +#define MBOX_RDDATA_OFFSET	0xc
>  
>  #define MASK_MBOX_CTRL_ABORT	BIT(0)
> +#define MASK_MBOX_CTRL_GO	BIT(31)
>  
>  #define MASK_MBOX_STATUS_ERROR	BIT(2)
>  #define MASK_MBOX_STATUS_READY	BIT(31)
>  
> +#define MASK_MBOX_RESP_SUCCESS	BIT(0)
> +#define MASK_MBOX_RESP_PROGRESS	BIT(1)
> +#define MASK_MBOX_RESP_ERROR	BIT(2)
> +
> +#define MBOX_CMD_LOAD		0x3
> +#define MBOX_OBJ_STAGING	0xb
> +#define MBOX_HDR		(PCI_VENDOR_ID_INTEL | (MBOX_OBJ_STAGING << 16))
> +#define MBOX_HDR_SIZE		16
> +
>  #define MBOX_XACTION_LEN	PAGE_SIZE
>  #define MBOX_XACTION_MAX(imgsz)	((imgsz) * 2)
>  #define MBOX_XACTION_TIMEOUT	(10 * MSEC_PER_SEC)
>  
>  #define STAGING_OFFSET_END	0xffffffff
> +#define DWORD_SIZE(s)		((s) / sizeof(u32))
> +
> +static inline u32 read_mbox_dword(void __iomem *base)
> +{
> +	u32 dword = readl(base + MBOX_RDDATA_OFFSET);
> +
> +	/* Inform the read completion to the staging firmware */
> +	writel(0, base + MBOX_RDDATA_OFFSET);
> +	return dword;
> +}

That comment doesn't quite parse for me.  Maybe:

	/* Inform the staging firmware that the read is complete: */

BTW, is this kind of read/write normal for MMIO reads? It looks really
goofy to me, but I don't deal with devices much.

> +static inline void write_mbox_dword(void __iomem *base, u32 dword)
> +{
> +	writel(dword, base + MBOX_WRDATA_OFFSET);
> +}
>  
>  static inline void abort_xaction(void __iomem *base)
>  {
> @@ -30,7 +58,18 @@ static inline void abort_xaction(void __iomem *base)
>  
>  static void request_xaction(void __iomem *base, u32 *chunk, unsigned int chunksize)
>  {
> -	pr_debug_once("Need to implement staging mailbox loading code.\n");
> +	unsigned int i, dwsize = DWORD_SIZE(chunksize);
> +
> +	write_mbox_dword(base, MBOX_HDR);
> +	write_mbox_dword(base, dwsize + DWORD_SIZE(MBOX_HDR_SIZE));
> +
> +	write_mbox_dword(base, MBOX_CMD_LOAD);
> +	write_mbox_dword(base, 0);

This last write is a mystery. Why is it here?

> +
> +	for (i = 0; i < dwsize; i++)
> +		write_mbox_dword(base, chunk[i]);

So, again, I'm a device dummy here. But this _looks_ nonsensical like
the code is just scribbling over itself repeatedly by rewriting data at
'base' over and over.

Would a comment like this help?

/*
 * 'base' is mapped UnCached (UC). Each write shows up at the device
 * as a separate "packet" in program order. That property allows the
 * device to reassemble everything in order on its side.
 */

> +	writel(MASK_MBOX_CTRL_GO, base + MBOX_CONTROL_OFFSET);
>  }

Can we comment the MASK_MBOX_CTRL_GO?  Maybe:

	/*
	 * Tell the device that this chunk is
	 * complete and can be processed.
	 */

>  static enum ucode_state wait_for_xaction(void __iomem *base)
> @@ -55,8 +94,18 @@ static enum ucode_state wait_for_xaction(void __iomem *base)
>  
>  static enum ucode_state read_xaction_response(void __iomem *base, unsigned int *offset)
>  {
> -	pr_debug_once("Need to implement staging response handler.\n");
> -	return UCODE_ERROR;
> +	u32 flag;

	/*
	 * All responses begin with the same header
	 * word and are the same length: 4 dwords.
	 */
	
> +	WARN_ON_ONCE(read_mbox_dword(base) != MBOX_HDR);
> +	WARN_ON_ONCE(read_mbox_dword(base) != DWORD_SIZE(MBOX_HDR_SIZE));
> +
> +	*offset = read_mbox_dword(base);
> +
> +	flag = read_mbox_dword(base);
> +	if (flag & MASK_MBOX_RESP_ERROR)
> +		return UCODE_ERROR;
> +
> +	return UCODE_OK;
>  }
>  
>  static inline unsigned int get_chunksize(unsigned int totalsize, unsigned int offset)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer
  2025-02-18 20:54     ` Dave Hansen
@ 2025-03-20 23:42       ` Chang S. Bae
  0 siblings, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2025-03-20 23:42 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen

On 2/18/2025 12:54 PM, Dave Hansen wrote:
> 
> BTW, is this kind of read/write normal for MMIO reads? It looks really
> goofy to me, but I don't deal with devices much.

Yeah, looking at some device code, functions like memcpy_toio() and 
__iowrite*_copy() seem to assume sufficient MMIO space to copy data from 
kernel memory. I also noticed that some other mailbox code calls back 
one of these functions.

So, I think your assessment is correct:

   > To me, it's much more akin to talking over a serial port than how I
   > think of devices attached via MMIO.

Certainly, this staging mailbox looks somewhat similar to a serial port. 
I suspect the staging portion is quite resource-limited. That appears to 
be one of key characteristics of this interface, which I should 
highlight more comprehensively.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 6/6] x86/microcode/intel: Enable staging when available
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                     ` (4 preceding siblings ...)
  2024-12-11  1:42   ` [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
@ 2024-12-11  1:42   ` Chang S. Bae
  2025-02-07 18:37   ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
  6 siblings, 0 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-12-11  1:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

With the staging code being ready, check the relevant MSRs and set the
feature chicken bit to allow staging to be invoked from the core
microcode update process.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC-V1 -> V1: Massage the enabling message.
---
 arch/x86/kernel/cpu/microcode/intel.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 325068bb5524..c988b6f8672f 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -674,6 +674,18 @@ static __init void calc_llc_size_per_core(struct cpuinfo_x86 *c)
 	llc_size_per_core = (unsigned int)llc_size;
 }
 
+static __init bool staging_available(void)
+{
+	u64 val;
+
+	val = x86_read_arch_cap_msr();
+	if (!(val & ARCH_CAP_MCU_ENUM))
+		return false;
+
+	rdmsrl(MSR_IA32_MCU_ENUMERATION, val);
+	return !!(val & MCU_STAGING);
+}
+
 struct microcode_ops * __init init_intel_microcode(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -684,6 +696,11 @@ struct microcode_ops * __init init_intel_microcode(void)
 		return NULL;
 	}
 
+	if (staging_available()) {
+		microcode_intel_ops.use_staging = true;
+		pr_info("Enabled staging feature.\n");
+	}
+
 	calc_llc_size_per_core(c);
 
 	return &microcode_intel_ops;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
                     ` (5 preceding siblings ...)
  2024-12-11  1:42   ` [PATCH 6/6] x86/microcode/intel: Enable staging when available Chang S. Bae
@ 2025-02-07 18:37   ` Chang S. Bae
  2025-02-28 22:27     ` Colin Mitchell
  6 siblings, 1 reply; 44+ messages in thread
From: Chang S. Bae @ 2025-02-07 18:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen

On 12/10/2024 5:42 PM, Chang S. Bae wrote:
> 
> This series is based on 6.13-rc2. You can also find it from this repo:
>      git://github.com/intel-staging/microcode.git staging_v1

I've rebased on 6.14-rc1 and found no conflict, pushing to the repo:
     git://github.com/intel-staging/microcode.git staging_v1-for-6.14-rc1

>    3. Staging Policy (TODO):
> 
>       While staging is always attempted, the system will fall back to the
>       legacy update method if staging fails. There is an open question
>       regarding staging policy: should it be mandatory, without fallback,
>       in certain usage scenarios? This could lead further refinements in
>       the flow depending on feedback and use cases.

While I noted this as "TODO", the policy part may require a separate 
follow-up.

At the moment, I’d like to assess its current state and ensure the 
review process covers details up to the mailbox handling logic, while 
fully respecting the maintainers' schedule.

Also, feedback from those considering this feature is highly appreciated 
here. If the series looks good, I hope to collect ack or review tags to 
help move it forward.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2025-02-07 18:37   ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
@ 2025-02-28 22:27     ` Colin Mitchell
  2025-02-28 22:52       ` Borislav Petkov
  2025-02-28 23:05       ` Dave Hansen
  0 siblings, 2 replies; 44+ messages in thread
From: Colin Mitchell @ 2025-02-28 22:27 UTC (permalink / raw)
  To: chang.seok.bae; +Cc: bp, dave.hansen, linux-kernel, mingo, tglx, x86

As a potential user, I'd advocate for an option to disable legacy
loading if staging is supported.

The difference in quiesce time between staging and legacy loading
can be significant. Since late loading is more of an explicit active
action, I would believe allowing the initiating process freedom of a retry
loop or handling any errors from staging makes sense.

Presumably load_late_locked could return an error on failure
to stage leading to an appropriate print.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2025-02-28 22:27     ` Colin Mitchell
@ 2025-02-28 22:52       ` Borislav Petkov
  2025-02-28 23:23         ` Dave Hansen
  2025-02-28 23:05       ` Dave Hansen
  1 sibling, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2025-02-28 22:52 UTC (permalink / raw)
  To: Colin Mitchell
  Cc: chang.seok.bae, dave.hansen, linux-kernel, mingo, tglx, x86

On Fri, Feb 28, 2025 at 02:27:15PM -0800, Colin Mitchell wrote:
> As a potential user, I'd advocate for an option to disable legacy
> loading if staging is supported.

What happens if staging is failing indefinitely, for whatever reason?

You can't load any microcode anymore if you've disabled the legacy method
too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2025-02-28 22:52       ` Borislav Petkov
@ 2025-02-28 23:23         ` Dave Hansen
  2025-03-26 21:29           ` Colin Mitchell
  0 siblings, 1 reply; 44+ messages in thread
From: Dave Hansen @ 2025-02-28 23:23 UTC (permalink / raw)
  To: Borislav Petkov, Colin Mitchell
  Cc: chang.seok.bae, dave.hansen, linux-kernel, mingo, tglx, x86

On 2/28/25 14:52, Borislav Petkov wrote:
> On Fri, Feb 28, 2025 at 02:27:15PM -0800, Colin Mitchell wrote:
>> As a potential user, I'd advocate for an option to disable legacy
>> loading if staging is supported.
> What happens if staging is failing indefinitely, for whatever reason?
> 
> You can't load any microcode anymore if you've disabled the legacy method
> too.

Yeah, I'm perplexed by this choice too.

You seem to be saying that you'd rather be (for instance) insecure
running old microcode than have the latency blip from a legacy microcode
load.

What action would you take if a staging-load fails? Retry again a few
times? Go back to the CPU vendor and get a new image? Or just ignore it?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2025-02-28 23:23         ` Dave Hansen
@ 2025-03-26 21:29           ` Colin Mitchell
  2025-04-02 17:14             ` Dave Hansen
  0 siblings, 1 reply; 44+ messages in thread
From: Colin Mitchell @ 2025-03-26 21:29 UTC (permalink / raw)
  To: dave.hansen
  Cc: bp, chang.seok.bae, colinmitchell, dave.hansen, linux-kernel,
	mingo, tglx, x86

> On 2/28/25 14:52, Borislav Petkov wrote:
> You can't load any microcode anymore if you've disabled the legacy method
> too.

Staging, if I've read the code correctly here, is only used for late loading.
There is performance reason to use staging for early kernel microcode 
loading pre-SMP. Therefore, if staging perpetually fails, it can be applied
without staging on the next reboot.

> On 2/28/25 15:23, Dave Hansen wrote:
> You seem to be saying that you'd rather be (for instance) insecure
> running old microcode than have the latency blip from a legacy microcode
> load.
> What action would you take if a staging-load fails? Retry again a few
> times? Go back to the CPU vendor and get a new image? Or just ignore it?

That's correct, but the latency tradeoff scales with the platform specific
size of the microcode patch. I'd prefer to have a more deterministic
update path and believe the potential latency blip would be significant
enough to justify the option.

Adding configuration would allow me to handle the error as needed.
A retry loop would be a first step but I could also look to migrate VMs
off the machine if the platform specific latency blip would negatively 
affect sensitive guest VMs. While an ideal solution imo would then
allow me to force legacy loading, I could also settle with it being done
through a reboot where early boot would already skip staging.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2025-03-26 21:29           ` Colin Mitchell
@ 2025-04-02 17:14             ` Dave Hansen
  0 siblings, 0 replies; 44+ messages in thread
From: Dave Hansen @ 2025-04-02 17:14 UTC (permalink / raw)
  To: Colin Mitchell
  Cc: bp, chang.seok.bae, dave.hansen, linux-kernel, mingo, tglx, x86

On 3/26/25 14:29, Colin Mitchell wrote:
>> On 2/28/25 15:23, Dave Hansen wrote:
>> You seem to be saying that you'd rather be (for instance) insecure
>> running old microcode than have the latency blip from a legacy microcode
>> load.
>> What action would you take if a staging-load fails? Retry again a few
>> times? Go back to the CPU vendor and get a new image? Or just ignore it?
> That's correct, but the latency tradeoff scales with the platform specific
> size of the microcode patch. I'd prefer to have a more deterministic
> update path and believe the potential latency blip would be significant
> enough to justify the option.
> 
> Adding configuration would allow me to handle the error as needed.
> A retry loop would be a first step but I could also look to migrate VMs
> off the machine if the platform specific latency blip would negatively 
> affect sensitive guest VMs. While an ideal solution imo would then
> allow me to force legacy loading, I could also settle with it being done
> through a reboot where early boot would already skip staging.

There's a lot to unpack there.

But, for the purposes of this series, I think what's here is fine for
now. Let's leave staging _purely_ as an opportunistic optimization.

If folks want to make this more configurable like making staging
*mandatory* and disabling legacy loading then we'll look at the patches
(and their justifications) as folks submit them. A good justification
would be something along these lines:

	Legacy microcode loading causes a 5,000ms latency blip. Our
	customers have been complaining to us for years about those
	legacy loading blips. Migrating a VM causes a 1ms latency blip.
	Those 4,999ms mean a lot to the folks running those VMs. As a
	CSP, we would like the flexibility to avoid the gigantic legacy
	microcode loading blips because they are bad and getting worse.

It becomes less compelling if it's something like this:

	Legacy microcode loading causes a 50ms latency blip. Migrating a
	VM causes a 49ms latency blip. That millisecond is super
	important.

... and increasingly less so as it becomes:

	We like knobs and flexibility for $REASONS.

You don't have to have down-to-the-millisecond numbers here. Orders of
magnitude are fine. But if you can't demonstrate (or don't anticipate)
orders of magnitude improvement from the knob, then it's probably not
worth it. It better be a 10x improvement, not 10%.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] x86/microcode: Support for Intel Staging Feature
  2025-02-28 22:27     ` Colin Mitchell
  2025-02-28 22:52       ` Borislav Petkov
@ 2025-02-28 23:05       ` Dave Hansen
  1 sibling, 0 replies; 44+ messages in thread
From: Dave Hansen @ 2025-02-28 23:05 UTC (permalink / raw)
  To: Colin Mitchell, chang.seok.bae
  Cc: bp, dave.hansen, linux-kernel, mingo, tglx, x86

On 2/28/25 14:27, Colin Mitchell wrote:
> As a potential user, I'd advocate for an option to disable legacy
> loading if staging is supported.
> 
> The difference in quiesce time between staging and legacy loading
> can be significant. Since late loading is more of an explicit active
> action, I would believe allowing the initiating process freedom of a retry
> loop or handling any errors from staging makes sense.
> 
> Presumably load_late_locked could return an error on failure
> to stage leading to an appropriate print.

Hey Colin!

You might not have noticed, but this series hasn't even been merged yet.
If this series is an important thing to your employer, I'd really
appreciate some reviews on it from you or anyone else to whom it is
important. That would really speed things along!

Requesting new features to be piled on top of an out-of-tree
under-reviewed patch set is likely to slow things down more than speed
them up.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-04-02 17:14 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
2024-10-25 16:24   ` [tip: x86/microcode] " tip-bot2 for Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
2024-11-04 10:45   ` Borislav Petkov
2024-10-01 16:10 ` [PATCH RFC 3/7] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
2024-11-04 11:16   ` Borislav Petkov
2024-11-04 16:08     ` Dave Hansen
2024-11-04 18:34       ` Chang S. Bae
2024-11-04 20:10         ` Chang S. Bae
2024-11-06 18:23           ` [PATCH] cpufreq: Simplify MSR read on the boot CPU Chang S. Bae
2024-11-12 20:44             ` Rafael J. Wysocki
2024-11-06 18:28     ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
2024-11-07  1:12       ` Thomas Gleixner
2024-11-08 22:42         ` Chang S. Bae
2024-11-08 22:51         ` Dave Hansen
2024-10-01 16:10 ` [PATCH RFC 5/7] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 6/7] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 7/7] x86/microcode/intel: Enable staging when available Chang S. Bae
2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
2025-02-17 13:33     ` Borislav Petkov
2025-02-18  7:51       ` Chang S. Bae
2025-02-18 11:36         ` Borislav Petkov
2025-02-18 15:16     ` Dave Hansen
2024-12-11  1:42   ` [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
2025-02-26 17:19     ` Borislav Petkov
2024-12-11  1:42   ` [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
2025-02-26 17:52     ` Borislav Petkov
2024-12-11  1:42   ` [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
2025-02-18 20:16     ` Dave Hansen
2025-02-26 17:56     ` Borislav Petkov
2024-12-11  1:42   ` [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
2025-02-18 20:54     ` Dave Hansen
2025-03-20 23:42       ` Chang S. Bae
2024-12-11  1:42   ` [PATCH 6/6] x86/microcode/intel: Enable staging when available Chang S. Bae
2025-02-07 18:37   ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
2025-02-28 22:27     ` Colin Mitchell
2025-02-28 22:52       ` Borislav Petkov
2025-02-28 23:23         ` Dave Hansen
2025-03-26 21:29           ` Colin Mitchell
2025-04-02 17:14             ` Dave Hansen
2025-02-28 23:05       ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox