* [Discussion] x86: Guest Support for APX
@ 2025-09-19 20:14 Chang S. Bae
2025-09-19 22:13 ` Paolo Bonzini
0 siblings, 1 reply; 3+ messages in thread
From: Chang S. Bae @ 2025-09-19 20:14 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini; +Cc: kvm@vger.kernel.org
Dear KVM maintainers,
We'd like to seek clarification on how to approach guest support for a
new feature. Specifically, this concerns Advanced Performance Extensions
(APX). As you might notice, host support was merged in v6.16, and we are
now working on the KVM side.
At first glance, guest enablement seemed straightforward: advertise
CPUID, rely on the existing XSAVE infrastructure in the host, and ensure
conflicting MPX are rejected.
Then, we've noticed your policy statements [1,2] during the discussion
of supervisor CET guest support, which I think makes clear the
expectation that a VM should be architecturally compatible before a
feature is exposed to guests.
Since APX introduces new general-purpose registers (GPRs), legacy
instructions are extended to access them, which may lead to associated
VM exits. For example, MOV may now reference these registers in MMIO
operations for emulated devices. The spec [3] lists other instructions
that may similarly exit.
Now, interpreting your policy in this context, it seems that enabling
APX for guests needs to support the full set of possible APX-induced exits.
We may proceed with posting an RFC version that emulates all of them and
gather feedback. But as we internally discussed, we think it would be
better to clarify the scope up front, if possible, to avoid unnecessary
churn.
At the moment, we also noticed another interesing precedent case:
MOVDIR64/MOVDIRI. These instructions can optimize MMIO operations by
bypassing caches, yet KVM emulation does not support them [4]. It is
unclear if this was a deliberate decision or simply something not
implemented yet -- picking up the set [5]. If it was intentional, that
suggests we may need to define a more selective approach to APX
emulation as well.
In summary, we'd like to clarify:
* Should we target complete emulation coverage for all APX-induced
exits (from the start)?
* Or is a narrower scope (e.g., only MOV) practically a considerable
option, given the limited likelihood of other exits?
* Alternatively, can we even consider a pragmatic path like MOVDIR* --
supporting only when practically useful?
Thanks for your time and consideration. We'd appreciate your guidance on
this.
Chang
[1] Link:
https://lore.kernel.org/all/2597a87b-1248-b8ce-ce60-94074bc67ea4@intel.com/
On 8/28/2023 2:00 PM, Dave Hansen wrote:
> On 8/10/23 08:15, Paolo Bonzini wrote:
>> On 8/10/23 16:29, Dave Hansen wrote:
>>> What actual OSes need this support?
>>
>> I think Xen could use it when running nested. But KVM cannot expose
>> support for CET in CPUID, and at the same time fake support for
>> MSR_IA32_PL{0,1,2}_SSP (e.g. inject a #GP if it's ever written to a
>> nonzero value).
>>
>> I suppose we could invent our own paravirtualized CPUID bit for
>> "supervisor IBT works but supervisor SHSTK doesn't". Linux could check
>> that but I don't think it's a good idea.
>>
>> So... do, or do not. There is no try. :)
>
> Ahh, that makes sense. This is needed for implementing the
> *architecture*, not because some OS actually wants to _do_ it.
[2] Link: https://lore.kernel.org/all/ZNUETFZK7K5zyr3X@google.com/
On 8/10/2023 8:37 AM, Sean Christopherson wrote:
>
> As Paolo alluded to, this is about KVM faithfully emulating the
architecture.
> There is no combination of CPUID bits that allows KVM to advertise
SHSTK for
> userspace without advertising SHSTK for supervisor.
>
> Whether or not there are any users in the short term is unfortunately
irrelevant
> from KVM's perspective.
[3] Architecture Specification for Intel APX: Table 3.10: Intel APX
Interactions with Instruction Execution Info or Exit Qualification
Link: https://cdrdv2.intel.com/v1/dl/getContent/784266
[4] The MOVDIR64 opcode is "66 0F 38 F8 ..." but opcode_table[] in
emulate.c looks currently missing it:
/* 0x60 - 0x67 */
I(ImplicitOps | Stack | No64, em_pusha),
I(ImplicitOps | Stack | No64, em_popa),
N, MD(ModRM, &mode_dual_63),
N, N, N, N,
[5]
https://lore.kernel.org/lkml/1541483728-7826-1-git-send-email-jingqi.liu@intel.com/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Discussion] x86: Guest Support for APX
2025-09-19 20:14 [Discussion] x86: Guest Support for APX Chang S. Bae
@ 2025-09-19 22:13 ` Paolo Bonzini
2025-09-21 22:52 ` Chang S. Bae
0 siblings, 1 reply; 3+ messages in thread
From: Paolo Bonzini @ 2025-09-19 22:13 UTC (permalink / raw)
To: Chang S. Bae; +Cc: Sean Christopherson, kvm
On Fri, Sep 19, 2025, 22:16 Chang S. Bae <chang.seok.bae@intel.com> wrote:
> Dear KVM maintainers,
>
> Since APX introduces new general-purpose registers (GPRs), legacy
> instructions are extended to access them, which may lead to associated
> VM exits. For example, MOV may now reference these registers in MMIO
> operations for emulated devices. The spec [3] lists other instructions
> that may similarly exit.
You're right that gets very complicated quickly, while most cases of
MMIO emulation are for legacy devices and R16-R31 are unlikely to
appear in MMIO instructions for these legacy devices.
However, at least MOVs should be extended to support APX registers as
source or destination operands, and there should also be support for
base and index in the addresses. This means you have to parse REX2,
but EVEX shouldn't be needed as these instructions are in "legacy map
0" (aka one-byte).
At this point, singling out MOVs is not useful and you might as well
implement REX2 for all instructions. EVEX adds a lot of extra cases
including three operand integer instructions and no flag update, but
REX2 is relatively simple.
> In summary, we'd like to clarify:
>
> * Should we target complete emulation coverage for all APX-induced
> exits (from the start)?
>
> * Or is a narrower scope (e.g., only MOV) practically a considerable
> option, given the limited likelihood of other exits?
See above. I hope it answers both questions.
> * Alternatively, can we even consider a pragmatic path like MOVDIR* --
> supporting only when practically useful?
I think pragmatic is fine, but in some cases too restrictive makes it
harder to track what is implemented and what isn't. Again, see the
above comment about implementing REX2 fully while limiting EVEX
support to the minimum (or hopefully leaving it out altogether).
> [4] The MOVDIR64 opcode is "66 0F 38 F8 ..." but opcode_table[] in
> emulate.c looks currently missing it:
>
> /* 0x60 - 0x67 */
> I(ImplicitOps | Stack | No64, em_pusha),
> I(ImplicitOps | Stack | No64, em_popa),
> N, MD(ModRM, &mode_dual_63),
> N, N, N, N,
0x66 is a prefix so you have to look at F8 in the table for the 0F 38
three-byte opcodes (opcode_map_0f_38) and add a new
three_byte_0f_38_f8 table.
MOVDIR* and many other instructions are not implemented because they
are pretty much never used with emulated (legacy) MMIO such as VGA
framebuffers. By the way MOVDIR* is not a REX2-accepted instruction,
so you would have to implement EVEX in order to support it for APX
registers.
Paolo
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Discussion] x86: Guest Support for APX
2025-09-19 22:13 ` Paolo Bonzini
@ 2025-09-21 22:52 ` Chang S. Bae
0 siblings, 0 replies; 3+ messages in thread
From: Chang S. Bae @ 2025-09-21 22:52 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Sean Christopherson, kvm
On 9/19/2025 3:13 PM, Paolo Bonzini wrote:
>
> You're right that gets very complicated quickly, while most cases of
> MMIO emulation are for legacy devices and R16-R31 are unlikely to
> appear in MMIO instructions for these legacy devices.
>
> However, at least MOVs should be extended to support APX registers as
> source or destination operands, and there should also be support for
> base and index in the addresses. This means you have to parse REX2,
> but EVEX shouldn't be needed as these instructions are in "legacy map
> 0" (aka one-byte).
>
> At this point, singling out MOVs is not useful and you might as well
> implement REX2 for all instructions. EVEX adds a lot of extra cases
> including three operand integer instructions and no flag update, but
> REX2 is relatively simple.
...>
> I think pragmatic is fine, but in some cases too restrictive makes it
> harder to track what is implemented and what isn't. Again, see the
> above comment about implementing REX2 fully while limiting EVEX
> support to the minimum (or hopefully leaving it out altogether).
Thanks for the guidance. This makes sense to me.
I think the high-level direction is clear now. I'll prepare and post an
RFC series, once it's ready, to walk through the details.
Thanks,
Chang
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-09-21 22:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-19 20:14 [Discussion] x86: Guest Support for APX Chang S. Bae
2025-09-19 22:13 ` Paolo Bonzini
2025-09-21 22:52 ` Chang S. Bae
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox