public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Xin Li <xin@zytor.com>
To: "Jürgen Groß" <jgross@suse.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-perf-users@vger.kernel.org, linux-hyperv@vger.kernel.org,
	virtualization@lists.linux.dev, linux-pm@vger.kernel.org,
	linux-edac@vger.kernel.org, xen-devel@lists.xenproject.org,
	linux-acpi@vger.kernel.org, linux-hwmon@vger.kernel.org,
	netdev@vger.kernel.org, platform-driver-x86@vger.kernel.org
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	acme@kernel.org, andrew.cooper3@citrix.com, peterz@infradead.org,
	namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, adrian.hunter@intel.com,
	kan.liang@linux.intel.com, wei.liu@kernel.org,
	ajay.kaher@broadcom.com, bcm-kernel-feedback-list@broadcom.com,
	tony.luck@intel.com, pbonzini@redhat.com, vkuznets@redhat.com,
	seanjc@google.com, luto@kernel.org, boris.ostrovsky@oracle.com,
	kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com
Subject: Re: [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR
Date: Wed, 23 Apr 2025 01:51:38 -0700	[thread overview]
Message-ID: <f7198308-e8f8-4cc5-b884-24bc5f408a2a@zytor.com> (raw)
In-Reply-To: <b2624e84-6fab-44a3-affc-ce0847cd3da4@suse.com>

On 4/22/2025 2:57 AM, Jürgen Groß wrote:
> On 22.04.25 10:22, Xin Li (Intel) wrote:
>> The story started from tglx's reply in [1]:
>>
>>    For actual performance relevant code the current PV ops mechanics
>>    are a horrorshow when the op defaults to the native instruction.
>>
>>    look at wrmsrl():
>>
>>    wrmsrl(msr, val
>>     wrmsr(msr, (u32)val, (u32)val >> 32))
>>      paravirt_write_msr(msr, low, high)
>>        PVOP_VCALL3(cpu.write_msr, msr, low, high)
>>
>>    Which results in
>>
>>     mov    $msr, %edi
>>     mov    $val, %rdx
>>     mov    %edx, %esi
>>     shr    $0x20, %rdx
>>     call    native_write_msr
>>
>>    and native_write_msr() does at minimum:
>>
>>     mov    %edi,%ecx
>>     mov    %esi,%eax
>>     wrmsr
>>     ret
>>
>>    In the worst case 'ret' is going through the return thunk. Not to
>>    talk about function prologues and whatever.
>>
>>    This becomes even more silly for trivial instructions like STI/CLI
>>    or in the worst case paravirt_nop().
> 
> This is nonsense.
> 
> In the non-Xen case the initial indirect call is directly replaced with
> STI/CLI via alternative patching, while for Xen it is replaced by a direct
> call.
> 
> The paravirt_nop() case is handled in alt_replace_call() by replacing the
> indirect call with a nop in case the target of the call was paravirt_nop()
> (which is in fact no_func()).
> 
>>
>>    The call makes only sense, when the native default is an actual
>>    function, but for the trivial cases it's a blatant engineering
>>    trainwreck.
> 
> The trivial cases are all handled as stated above: a direct replacement
> instruction is placed at the indirect call position.

The above comment was given in 2023 IIRC, and you have addressed it.

> 
>> Later a consensus was reached to utilize the alternatives mechanism to
>> eliminate the indirect call overhead introduced by the pv_ops APIs:
>>
>>      1) When built with !CONFIG_XEN_PV, X86_FEATURE_XENPV becomes a
>>         disabled feature, preventing the Xen code from being built
>>         and ensuring the native code is executed unconditionally.
> 
> This is the case today already. There is no need for any change to have
> this in place.
> 
>>
>>      2) When built with CONFIG_XEN_PV:
>>
>>         2.1) If not running on the Xen hypervisor (!X86_FEATURE_XENPV),
>>              the kernel runtime binary is patched to unconditionally
>>              jump to the native MSR write code.
>>
>>         2.2) If running on the Xen hypervisor (X86_FEATURE_XENPV), the
>>              kernel runtime binary is patched to unconditionally jump
>>              to the Xen MSR write code.
> 
> I can't see what is different here compared to today's state.
> 
>>
>> The alternatives mechanism is also used to choose the new immediate
>> form MSR write instruction when it's available.
> 
> Yes, this needs to be added.
> 
>> Consequently, remove the pv_ops MSR write APIs and the Xen callbacks.
> 
> I still don't see a major difference to today's solution.

The existing code generates:

     ...
     bf e0 06 00 00          mov    $0x6e0,%edi
     89 d6                   mov    %edx,%esi
     48 c1 ea 20             shr    $0x20,%rdx
     ff 15 07 48 8c 01       call   *0x18c4807(%rip)  # <pv_ops+0xb8>
     31 c0                   xor    %eax,%eax
     ...

And on native, the indirect call instruction is patched to a direct call
as you mentioned:

     ...
     bf e0 06 00 00          mov    $0x6e0,%edi
     89 d6                   mov    %edx,%esi
     48 c1 ea 20             shr    $0x20,%rdx
     e8 60 3e 01 00          call   <{native,xen}_write_msr> # direct
     90                      nop
     31 c0                   xor    %eax,%eax
     ...


This patch set generates assembly w/o CALL on native:

     ...
     e9 e6 22 c6 01          jmp    1f   # on native or nop on Xen
     b9 e0 06 00 00          mov    $0x6e0,%ecx
     e8 91 d4 fa ff          call   ffffffff8134ee80 <asm_xen_write_msr>
     e9 a4 9f eb 00          jmp    ffffffff8225b9a0 <__x86_return_thunk>
         ...
1:  b9 e0 06 00 00          mov    $0x6e0,%ecx   # immediate form here
     48 89 c2                mov    %rax,%rdx
     48 c1 ea 20             shr    $0x20,%rdx
     3e 0f 30                ds wrmsr
     ...

It's not a major change, but when it is patched to use the immediate 
form MSR write instruction, it's straightforwardly streamlined.

> 
> Only the "paravirt" term has been eliminated.

Yes.

But a PV guest doesn't operate at the highest privilege level, which
means MSR instructions typically result in a #GP fault.  I actually 
think the pv_ops MSR APIs are unnecessary because of this inherent
limitation.

Looking at the Xen MSR code, except PMU and just a few MSRs, it falls
back to executes native MSR instructions.  As MSR instructions trigger
#GP, Xen takes control and handles them in 2 ways:

   1) emulate (or ignore) a MSR operation and skip the guest instruction.

   2) inject the #GP back to guest OS and let its #GP handler handle it.
      But Linux MSR exception handler just ignores the MSR instruction
      (MCE MSR exception will panic).

So why not let Xen handle all the details which it already tries to do?
(Linux w/ such a change may not be able to run on old Xen hypervisors.)

BTW, if performance is a concern, writes to MSR_KERNEL_GS_BASE and
MSR_GS_BASE anyway are hpyercalls into Xen.

Thanks!
     Xin

  reply	other threads:[~2025-04-23  8:53 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22  8:21 [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 01/34] x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h> Xin Li (Intel)
2025-04-23 14:13   ` Dave Hansen
2025-04-23 17:12     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 02/34] x86/msr: Remove rdpmc() Xin Li (Intel)
2025-04-23 14:23   ` Dave Hansen
2025-04-22  8:21 ` [RFC PATCH v2 03/34] x86/msr: Rename rdpmcl() to rdpmcq() Xin Li (Intel)
2025-04-23 14:24   ` Dave Hansen
2025-04-23 14:28   ` Sean Christopherson
2025-04-23 15:06     ` Dave Hansen
2025-04-23 17:23       ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 04/34] x86/msr: Convert rdpmcq() into a function Xin Li (Intel)
2025-04-23 14:25   ` Dave Hansen
2025-04-22  8:21 ` [RFC PATCH v2 05/34] x86/msr: Return u64 consistently in Xen PMC read functions Xin Li (Intel)
2025-04-22  8:40   ` Jürgen Groß
2025-04-22  8:21 ` [RFC PATCH v2 06/34] x86/msr: Use the alternatives mechanism to read PMC Xin Li (Intel)
2025-04-22  8:38   ` Jürgen Groß
2025-04-22  9:12     ` Xin Li
2025-04-22  9:28       ` Juergen Gross
2025-04-23  7:40         ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 07/34] x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() uses Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 08/34] x86/msr: Convert a native_wrmsr() use to native_wrmsrq() Xin Li (Intel)
2025-04-23 15:51   ` Dave Hansen
2025-04-23 17:27     ` Xin Li
2025-04-23 23:23     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 09/34] x86/msr: Add the native_rdmsrq() helper Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 10/34] x86/msr: Convert __rdmsr() uses to native_rdmsrq() uses Xin Li (Intel)
2025-04-22 15:09   ` Sean Christopherson
2025-04-23  9:27     ` Xin Li
2025-04-23 13:37       ` Sean Christopherson
2025-04-23 14:02       ` Dave Hansen
2025-04-22  8:21 ` [RFC PATCH v2 11/34] x86/msr: Remove calling native_{read,write}_msr{,_safe}() in pmu_msr_{read,write}() Xin Li (Intel)
2025-04-24  6:25   ` Mi, Dapeng
2025-04-24  7:16     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 12/34] x86/msr: Remove pmu_msr_{read,write}() Xin Li (Intel)
2025-04-24  6:33   ` Mi, Dapeng
2025-04-24  7:21     ` Xin Li
2025-04-24  7:43       ` Mi, Dapeng
2025-04-24  7:50         ` Xin Li
2025-04-24 10:05   ` Jürgen Groß
2025-04-24 17:49     ` Xin Li
2025-04-24 21:14       ` H. Peter Anvin
2025-04-24 22:24         ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 13/34] x86/xen/msr: Remove the error pointer argument from set_reg() Xin Li (Intel)
2025-04-24 10:11   ` Jürgen Groß
2025-04-24 17:50     ` Xin Li
2025-04-22  8:21 ` [RFC PATCH v2 14/34] x86/msr: refactor pv_cpu_ops.write_msr{_safe}() Xin Li (Intel)
2025-04-24 10:16   ` Jürgen Groß
2025-04-22  8:21 ` [RFC PATCH v2 15/34] x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low) Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 16/34] x86/msr: Change function type of native_read_msr_safe() Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 17/34] x86/cpufeatures: Add a CPU feature bit for MSR immediate form instructions Xin Li (Intel)
2025-04-22  8:21 ` [RFC PATCH v2 18/34] x86/opcode: Add immediate form MSR instructions Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 19/34] x86/extable: Add support for " Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 20/34] x86/extable: Implement EX_TYPE_FUNC_REWIND Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 21/34] x86/msr: Utilize the alternatives mechanism to write MSR Xin Li (Intel)
2025-04-22  9:57   ` Jürgen Groß
2025-04-23  8:51     ` Xin Li [this message]
2025-04-23 16:05       ` Jürgen Groß
2025-04-24  8:06         ` Xin Li
2025-04-24  8:14           ` Jürgen Groß
2025-04-25  1:15             ` H. Peter Anvin
2025-04-25  3:44               ` H. Peter Anvin
2025-04-25  7:01                 ` Jürgen Groß
2025-04-25 15:28                   ` H. Peter Anvin
2025-04-25  6:51               ` Jürgen Groß
2025-04-25 12:33         ` Peter Zijlstra
2025-04-25 12:51           ` Jürgen Groß
2025-04-25 20:12             ` H. Peter Anvin
2025-04-25 15:29           ` H. Peter Anvin
2025-04-25  7:11     ` Peter Zijlstra
2025-04-22  8:22 ` [RFC PATCH v2 22/34] x86/msr: Utilize the alternatives mechanism to read MSR Xin Li (Intel)
2025-04-22  8:59   ` Jürgen Groß
2025-04-22  9:20     ` Xin Li
2025-04-22  9:57       ` Jürgen Groß
2025-04-22 11:12   ` Jürgen Groß
2025-04-23  9:03     ` Xin Li
2025-04-23 16:11       ` Jürgen Groß
2025-04-22  8:22 ` [RFC PATCH v2 23/34] x86/extable: Remove new dead code in ex_handler_msr() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 24/34] x86/mce: Use native MSR API __native_{wr,rd}msrq() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 25/34] x86/msr: Rename native_wrmsrq() to native_wrmsrq_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 26/34] x86/msr: Rename native_wrmsr() to native_wrmsr_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 27/34] x86/msr: Rename native_write_msr() to native_wrmsrq() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 28/34] x86/msr: Rename native_write_msr_safe() to native_wrmsrq_safe() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 29/34] x86/msr: Rename native_rdmsrq() to native_rdmsrq_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 30/34] x86/msr: Rename native_rdmsr() to native_rdmsr_no_trace() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 31/34] x86/msr: Rename native_read_msr() to native_rdmsrq() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 32/34] x86/msr: Rename native_read_msr_safe() to native_rdmsrq_safe() Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 33/34] x86/msr: Move the ARGS macros after the MSR read/write APIs Xin Li (Intel)
2025-04-22  8:22 ` [RFC PATCH v2 34/34] x86/msr: Convert native_rdmsr_no_trace() uses to native_rdmsrq_no_trace() uses Xin Li (Intel)
2025-04-22 15:03 ` [RFC PATCH v2 00/34] MSR refactor with new MSR instructions support Sean Christopherson
2025-04-22 17:51   ` Xin Li
2025-04-22 18:05     ` Luck, Tony
2025-04-22 19:44       ` Ingo Molnar
2025-04-22 19:51         ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7198308-e8f8-4cc5-b884-24bc5f408a2a@zytor.com \
    --to=xin@zytor.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ajay.kaher@broadcom.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=jgross@suse.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=virtualization@lists.linux.dev \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox