public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature
@ 2024-10-01 16:10 Chang S. Bae
  2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Chang S. Bae @ 2024-10-01 16:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, bp, dave.hansen, chang.seok.bae

Hi all,

I'd like to ask initial feedback on this series enabling the staging
feature. Thanks!

== Latency Spike Issue ==

As microcode images have increased in size, a corresponding rise in load
latency has become inevitable. This latency spike significantly impacts
late loading, which remains in use despite the cautions highlighted in
the documentation [1]. The issue is especially critical for continuously
running workloads and virtual machines, where excessive delays can lead
to timeouts.

== Staging for Latency Reduction ==

Currently, writing to MSR_IA32_UCODE_WRITE triggers the entire update
process -- loading, validating, and activation -- all of which contribute
to the latency during CPU halt. The staging feature mitigates this by
refactoring all but the activation step out of the critical path,
allowing CPUs to continue serving workloads while staging takes place.

== Cache Flush Removal ==

Before resolving this latency spike caused by larger images, another
major latency issue -- cache invalidation [2] -- must first be addressed.
Originally introduced to handle a specific erratum, this cache
invalidation is now unnecessary because the problematic microcode images
have been banned. This cache flush has been found to negate the benefits
of staging, so this patch series begins by removing the WRINVD
instruction.

== Validation ==

We internally established pseudocode to clearly define all essential
steps for interacting with the firmware. Any firmware implementation
supporting staging should adhere to this contract. This patch set
incorporates that staging logic, which I successfully tested on one
firmware implementation. Multiple teams at Intel have also validated the
feature across different implementations.

Preliminary results from a pre-production system show a significant
reduction in latency (about 40%) with the staging approach alone.
Further improvements are possible with additional optimizations [*].

== Call for Review ==

This RFC series aims to present the proposed approach for community
review, to assess its soundness, and to discuss potential alternatives
if necessary. There are several key points to highlight for feedback:

  1. Staging Integration Approach

     In the core code, the high-level sequence for late loading is:

     (1) request_microcode_fw(), and
     (2) load_late_stop_cpus()->apply_microcode()

     Staging doesn't fit neatly into either steps, as it involves the
     loading process but not the activation. Therefore, a new callback is
     introduced:

       core::load_late_locked()
       -> intel::staging_microcode()
          -> intel_staging::staging_work()
             -> intel_staging::...

  2. Code Abstraction

     The newly added intel_staging.c file contains all staging-related
     code to keep it self-contained. Ideally, the entire firmware
     interaction could eventually be abstracted into a single MSR write,
     which remains a long-term goal. Fortunately, recent protocol
     simplifications have made this more feasible.

  3. Staging Policy (TODO)

     While staging is always attempted, the system will fall back to the
     legacy update method if staging fails. There is an open question
     regarding staging policy: should it be mandatory, without fallback,
     in certain usage scenarios? This could lead further refinements in
     the flow depending on feedback and use cases.

  4. Specification Updates

     Recent specification updates have simplified the staging protocol
     and clarified the behavior of MSR_IA32_UCODE_WRITE in conjunction
     with staging:

     4.1. Protocol Simplification

     The specification update [3] has significantly reduced the
     complexity of staging code, trimming the kernel code from ~1K lines
     in preliminary implementations. Thanks to Dave for guiding this
     redesign effort.

     4.2. Clarification of Legacy Update Behavior

     Chapter 5 of the specification adds further clarification on
     MSR_IA32_UCODE_WRITE. Key points are summarized below:

     (a) When staging is not performed or failed, a WRMSR will still load
     the patch image, but with higher latency.

     (b) During an active staging process, MSR_IA32_UCODE_WRITE can
     load a new microcode image, again with higher latency.

     (c) If the versions differ between the staged microcode and the
     version loaded via MSR_IA32_UCODE_WRITE, the version loaded through
     the MSR takes precedence.

     I'd also make sure there is no further ambiguity in this documentation
     [3]. Feel free to provide feedback if anything seems unclear or
     unreasonable.

As noted [*], an additional series focused on further latency
optimizations will follow. However, the staging approach was prioritized
due to its significant first-order impact on latency.

This series is based on 6.12-rc1. You can also find it from this repo:
    git://github.com/intel-staging/microcode.git staging_rfc-v1

Thanks,
Chang

[1]: https://docs.kernel.org/arch/x86/microcode.html#why-is-late-loading-dangerous
[2]: https://lore.kernel.org/all/20240701212012.21499-1-chang.seok.bae@intel.com/
[3]: https://cdrdv2.intel.com/v1/dl/getContent/782715
[*]: Further latency improvements will be addressed in the upcoming
     ‘Uniform’ feature series.

Chang S. Bae (7):
  x86/microcode/intel: Remove unnecessary cache writeback and
    invalidation
  x86/microcode: Introduce staging option to reduce late-loading latency
  x86/msr-index: Define MSR index and bit for the microcode staging
    feature
  x86/microcode/intel: Prepare for microcode staging
  x86/microcode/intel_staging: Implement staging logic
  x86/microcode/intel_staging: Support mailbox data transfer
  x86/microcode/intel: Enable staging when available

 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/kernel/cpu/microcode/Makefile        |   2 +-
 arch/x86/kernel/cpu/microcode/core.c          |  12 +-
 arch/x86/kernel/cpu/microcode/intel.c         |  77 ++++++++-
 arch/x86/kernel/cpu/microcode/intel_staging.c | 154 ++++++++++++++++++
 arch/x86/kernel/cpu/microcode/internal.h      |   5 +-
 6 files changed, 247 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/microcode/intel_staging.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2025-04-02 17:14 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 16:10 [PATCH RFC 0/7] x86/microcode: Support for Intel Staging Feature Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 1/7] x86/microcode/intel: Remove unnecessary cache writeback and invalidation Chang S. Bae
2024-10-25 16:24   ` [tip: x86/microcode] " tip-bot2 for Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 2/7] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
2024-11-04 10:45   ` Borislav Petkov
2024-10-01 16:10 ` [PATCH RFC 3/7] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
2024-11-04 11:16   ` Borislav Petkov
2024-11-04 16:08     ` Dave Hansen
2024-11-04 18:34       ` Chang S. Bae
2024-11-04 20:10         ` Chang S. Bae
2024-11-06 18:23           ` [PATCH] cpufreq: Simplify MSR read on the boot CPU Chang S. Bae
2024-11-12 20:44             ` Rafael J. Wysocki
2024-11-06 18:28     ` [PATCH RFC 4/7] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
2024-11-07  1:12       ` Thomas Gleixner
2024-11-08 22:42         ` Chang S. Bae
2024-11-08 22:51         ` Dave Hansen
2024-10-01 16:10 ` [PATCH RFC 5/7] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 6/7] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
2024-10-01 16:10 ` [PATCH RFC 7/7] x86/microcode/intel: Enable staging when available Chang S. Bae
2024-12-11  1:42 ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
2024-12-11  1:42   ` [PATCH 1/6] x86/microcode: Introduce staging option to reduce late-loading latency Chang S. Bae
2025-02-17 13:33     ` Borislav Petkov
2025-02-18  7:51       ` Chang S. Bae
2025-02-18 11:36         ` Borislav Petkov
2025-02-18 15:16     ` Dave Hansen
2024-12-11  1:42   ` [PATCH 2/6] x86/msr-index: Define MSR index and bit for the microcode staging feature Chang S. Bae
2025-02-26 17:19     ` Borislav Petkov
2024-12-11  1:42   ` [PATCH 3/6] x86/microcode/intel: Prepare for microcode staging Chang S. Bae
2025-02-26 17:52     ` Borislav Petkov
2024-12-11  1:42   ` [PATCH 4/6] x86/microcode/intel_staging: Implement staging logic Chang S. Bae
2025-02-18 20:16     ` Dave Hansen
2025-02-26 17:56     ` Borislav Petkov
2024-12-11  1:42   ` [PATCH 5/6] x86/microcode/intel_staging: Support mailbox data transfer Chang S. Bae
2025-02-18 20:54     ` Dave Hansen
2025-03-20 23:42       ` Chang S. Bae
2024-12-11  1:42   ` [PATCH 6/6] x86/microcode/intel: Enable staging when available Chang S. Bae
2025-02-07 18:37   ` [PATCH 0/6] x86/microcode: Support for Intel Staging Feature Chang S. Bae
2025-02-28 22:27     ` Colin Mitchell
2025-02-28 22:52       ` Borislav Petkov
2025-02-28 23:23         ` Dave Hansen
2025-03-26 21:29           ` Colin Mitchell
2025-04-02 17:14             ` Dave Hansen
2025-02-28 23:05       ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox