From: "Jürgen Groß" <jgross@suse.com>
To: Xin Li <xin@zytor.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-perf-users@vger.kernel.org, linux-hyperv@vger.kernel.org,
virtualization@lists.linux.dev, linux-pm@vger.kernel.org,
linux-edac@vger.kernel.org, xen-devel@lists.xenproject.org,
linux-acpi@vger.kernel.org, linux-hwmon@vger.kernel.org,
netdev@vger.kernel.org, platform-driver-x86@vger.kernel.org
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
acme@kernel.org, andrew.cooper3@citrix.com, peterz@infradead.org,
namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com,
kan.liang@linux.intel.com, wei.liu@kernel.org,
ajay.kaher@broadcom.com, bcm-kernel-feedback-list@broadcom.com,
tony.luck@intel.com, pbonzini@redhat.com, vkuznets@redhat.com,
seanjc@google.com, luto@kernel.org, boris.ostrovsky@oracle.com,
kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com
Subject: Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
Date: Wed, 23 Apr 2025 18:05:19 +0200 [thread overview]
Message-ID: <37c88ea3-dd24-4607-9ee1-0f19025aaef3@suse.com> (raw)
In-Reply-To: <f7198308-e8f8-4cc5-b884-24bc5f408a2a@zytor.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 6940 bytes --]
On 23.04.25 10:51, Xin Li wrote:
> On 4/22/2025 2:57 AM, Jürgen Groß wrote:
>> On 22.04.25 10:22, Xin Li (Intel) wrote:
>>> The story started from tglx's reply in [1]:
>>>
>>> For actual performance relevant code the current PV ops mechanics
>>> are a horrorshow when the op defaults to the native instruction.
>>>
>>> look at wrmsrl():
>>>
>>> wrmsrl(msr, val
>>> wrmsr(msr, (u32)val, (u32)val >> 32))
>>> paravirt_write_msr(msr, low, high)
>>> PVOP_VCALL3(cpu.write_msr, msr, low, high)
>>>
>>> Which results in
>>>
>>> mov $msr, %edi
>>> mov $val, %rdx
>>> mov %edx, %esi
>>> shr $0x20, %rdx
>>> call native_write_msr
>>>
>>> and native_write_msr() does at minimum:
>>>
>>> mov %edi,%ecx
>>> mov %esi,%eax
>>> wrmsr
>>> ret
>>>
>>> In the worst case 'ret' is going through the return thunk. Not to
>>> talk about function prologues and whatever.
>>>
>>> This becomes even more silly for trivial instructions like STI/CLI
>>> or in the worst case paravirt_nop().
>>
>> This is nonsense.
>>
>> In the non-Xen case the initial indirect call is directly replaced with
>> STI/CLI via alternative patching, while for Xen it is replaced by a direct
>> call.
>>
>> The paravirt_nop() case is handled in alt_replace_call() by replacing the
>> indirect call with a nop in case the target of the call was paravirt_nop()
>> (which is in fact no_func()).
>>
>>>
>>> The call makes only sense, when the native default is an actual
>>> function, but for the trivial cases it's a blatant engineering
>>> trainwreck.
>>
>> The trivial cases are all handled as stated above: a direct replacement
>> instruction is placed at the indirect call position.
>
> The above comment was given in 2023 IIRC, and you have addressed it.
>
>>
>>> Later a consensus was reached to utilize the alternatives mechanism to
>>> eliminate the indirect call overhead introduced by the pv_ops APIs:
>>>
>>> 1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>> disabled feature, preventing the Xen code from being built
>>> and ensuring the native code is executed unconditionally.
>>
>> This is the case today already. There is no need for any change to have
>> this in place.
>>
>>>
>>> 2) When built with CONFIG_XEN_PV:
>>>
>>> 2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>> the kernel runtime binary is patched to unconditionally
>>> jump to the native MSR write code.
>>>
>>> 2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>> kernel runtime binary is patched to unconditionally jump
>>> to the Xen MSR write code.
>>
>> I can't see what is different here compared to today's state.
>>
>>>
>>> The alternatives mechanism is also used to choose the new immediate
>>> form MSR write instruction when it's available.
>>
>> Yes, this needs to be added.
>>
>>> Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.
>>
>> I still don't see a major difference to today's solution.
>
> The existing code generates:
>
> ...
> bf e0 06 00 00 mov $0x6e0,%edi
> 89 d6 mov %edx,%esi
> 48 c1 ea 20 shr $0x20,%rdx
> ff 15 07 48 8c 01 call *0x18c4807(%rip) # <pv_ops+0xb8>
> 31 c0 xor %eax,%eax
> ...
>
> And on native, the indirect call instruction is patched to a direct call
> as you mentioned:
>
> ...
> bf e0 06 00 00 mov $0x6e0,%edi
> 89 d6 mov %edx,%esi
> 48 c1 ea 20 shr $0x20,%rdx
> e8 60 3e 01 00 call <{native,xen}_write_msr> # direct
> 90 nop
> 31 c0 xor %eax,%eax
> ...
>
>
> This patch set generates assembly w/o CALL on native:
>
> ...
> e9 e6 22 c6 01 jmp 1f # on native or nop on Xen
> b9 e0 06 00 00 mov $0x6e0,%ecx
> e8 91 d4 fa ff call ffffffff8134ee80 <asm_xen_write_msr>
> e9 a4 9f eb 00 jmp ffffffff8225b9a0 <__x86_return_thunk>
> ...
> 1: b9 e0 06 00 00 mov $0x6e0,%ecx # immediate form here
> 48 89 c2 mov %rax,%rdx
> 48 c1 ea 20 shr $0x20,%rdx
> 3e 0f 30 ds wrmsr
> ...
>
> It's not a major change, but when it is patched to use the immediate form MSR
> write instruction, it's straightforwardly streamlined.
It should be rather easy to switch the current wrmsr/rdmsr paravirt patching
locations to use the rdmsr/wrmsr instructions instead of doing a call to
native_*msr().
The case of the new immediate form could be handled the same way.
>
>>
>> Only the "paravirt" term has been eliminated.
>
> Yes.
>
> But a PV guest doesn't operate at the highest privilege level, which
> means MSR instructions typically result in a #GP fault. I actually think the
> pv_ops MSR APIs are unnecessary because of this inherent
> limitation.
>
> Looking at the Xen MSR code, except PMU and just a few MSRs, it falls
> back to executes native MSR instructions. As MSR instructions trigger
> #GP, Xen takes control and handles them in 2 ways:
>
> 1) emulate (or ignore) a MSR operation and skip the guest instruction.
>
> 2) inject the #GP back to guest OS and let its #GP handler handle it.
> But Linux MSR exception handler just ignores the MSR instruction
> (MCE MSR exception will panic).
>
> So why not let Xen handle all the details which it already tries to do?
Some MSRs are not handled that way, but via a kernel internal emulation.
And those are handled that way mostly due to performance reasons. And some
need special treatment.
> (Linux w/ such a change may not be able to run on old Xen hypervisors.)
Yes, and this is something to avoid.
And remember that Linux isn't the only PV-mode guest existing.
> BTW, if performance is a concern, writes to MSR_KERNEL_GS_BASE and
> MSR_GS_BASE anyway are hpyercalls into Xen.
Yes, and some other MSR writes are just NOPs with Xen-PV.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
next prev parent reply other threads:[~2025-04-23 16:05 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
2025-04-22 8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
2025-04-23 14:13 ` Dave Hansen
2025-04-23 17:12 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 02/34] x86/msr: Remove rdpmc() Xin Li (Intel)
2025-04-23 14:23 ` Dave Hansen
2025-04-22 8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
2025-04-23 14:24 ` Dave Hansen
2025-04-23 14:28 ` Sean Christopherson
2025-04-23 15:06 ` Dave Hansen
2025-04-23 17:23 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function Xin Li (Intel)
2025-04-23 14:25 ` Dave Hansen
2025-04-22 8:21 ` [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions Xin Li (Intel)
2025-04-22 8:40 ` Jürgen Groß
2025-04-22 8:21 ` [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC Xin Li (Intel)
2025-04-22 8:38 ` Jürgen Groß
2025-04-22 9:12 ` Xin Li
2025-04-22 9:28 ` Juergen Gross
2025-04-23 7:40 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 07/34] x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses Xin Li (Intel)
2025-04-22 8:21 ` [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq() Xin Li (Intel)
2025-04-23 15:51 ` Dave Hansen
2025-04-23 17:27 ` Xin Li
2025-04-23 23:23 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 09/34] x86/msr: Add the native_rdmsrq() helper Xin Li (Intel)
2025-04-22 8:21 ` [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses Xin Li (Intel)
2025-04-22 15:09 ` Sean Christopherson
2025-04-23 9:27 ` Xin Li
2025-04-23 13:37 ` Sean Christopherson
2025-04-23 14:02 ` Dave Hansen
2025-04-22 8:21 ` [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}() Xin Li (Intel)
2025-04-24 6:25 ` Mi, Dapeng
2025-04-24 7:16 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
2025-04-24 6:33 ` Mi, Dapeng
2025-04-24 7:21 ` Xin Li
2025-04-24 7:43 ` Mi, Dapeng
2025-04-24 7:50 ` Xin Li
2025-04-24 10:05 ` Jürgen Groß
2025-04-24 17:49 ` Xin Li
2025-04-24 21:14 ` H. Peter Anvin
2025-04-24 22:24 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg() Xin Li (Intel)
2025-04-24 10:11 ` Jürgen Groß
2025-04-24 17:50 ` Xin Li
2025-04-22 8:21 ` [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}() Xin Li (Intel)
2025-04-24 10:16 ` Jürgen Groß
2025-04-22 8:21 ` [RFC PATCH v2 15/34] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) Xin Li (Intel)
2025-04-22 8:21 ` [RFC PATCH v2 16/34] x86/msr: Change function type of native_read_msr_safe() Xin Li (Intel)
2025-04-22 8:21 ` [RFC PATCH v2 17/34] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
2025-04-22 8:21 ` [RFC PATCH v2 18/34] x86/opcode: Add immediate form MSR instructions Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 19/34] x86/extable: Add support for " Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 20/34] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR Xin Li (Intel)
2025-04-22 9:57 ` Jürgen Groß
2025-04-23 8:51 ` Xin Li
2025-04-23 16:05 ` Jürgen Groß [this message]
2025-04-24 8:06 ` Xin Li
2025-04-24 8:14 ` Jürgen Groß
2025-04-25 1:15 ` H. Peter Anvin
2025-04-25 3:44 ` H. Peter Anvin
2025-04-25 7:01 ` Jürgen Groß
2025-04-25 15:28 ` H. Peter Anvin
2025-04-25 6:51 ` Jürgen Groß
2025-04-25 12:33 ` Peter Zijlstra
2025-04-25 12:51 ` Jürgen Groß
2025-04-25 20:12 ` H. Peter Anvin
2025-04-25 15:29 ` H. Peter Anvin
2025-04-25 7:11 ` Peter Zijlstra
2025-04-22 8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
2025-04-22 8:59 ` Jürgen Groß
2025-04-22 9:20 ` Xin Li
2025-04-22 9:57 ` Jürgen Groß
2025-04-22 11:12 ` Jürgen Groß
2025-04-23 9:03 ` Xin Li
2025-04-23 16:11 ` Jürgen Groß
2025-04-22 8:22 ` [RFC PATCH v2 23/34] x86/extable: Remove new dead code in ex_handler_msr() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 24/34] x86/mce: Use native MSR API __native_{wr,rd}msrq() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 25/34] x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 26/34] x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 27/34] x86/msr: Rename native_write_msr() to native_wrmsrq() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 28/34] x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 29/34] x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 30/34] x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 31/34] x86/msr: Rename native_read_msr() to native_rdmsrq() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 32/34] x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe() Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 33/34] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
2025-04-22 8:22 ` [RFC PATCH v2 34/34] x86/msr: Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses Xin Li (Intel)
2025-04-22 15:03 ` [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Sean Christopherson
2025-04-22 17:51 ` Xin Li
2025-04-22 18:05 ` Luck, Tony
2025-04-22 19:44 ` Ingo Molnar
2025-04-22 19:51 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37c88ea3-dd24-4607-9ee1-0f19025aaef3@suse.com \
--to=jgross@suse.com \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ajay.kaher@broadcom.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=hpa@zytor.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=kys@microsoft.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-hwmon@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=platform-driver-x86@vger.kernel.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=virtualization@lists.linux.dev \
--cc=vkuznets@redhat.com \
--cc=wei.liu@kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=xin@zytor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).