From: andrew.cooper3@citrix.com
To: Thomas Gleixner <tglx@linutronix.de>, Xin Li <xin3.li@intel.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org,
kvm@vger.kernel.org, xen-devel@lists.xenproject.org
Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, hpa@zytor.com, luto@kernel.org,
pbonzini@redhat.com, seanjc@google.com, peterz@infradead.org,
jgross@suse.com, ravi.v.shankar@intel.com, mhiramat@kernel.org,
jiangshanlai@gmail.com
Subject: Re: [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support
Date: Fri, 15 Sep 2023 00:46:30 +0100 [thread overview]
Message-ID: <7ba4ae3e-f75d-66a8-7669-b6eb17c1aa1c@citrix.com> (raw)
In-Reply-To: <87y1h81ht4.ffs@tglx>
On 15/09/2023 12:00 am, Thomas Gleixner wrote:
>> and despite what Juergen said, I'm going to recommend that you do wire
>> this through the paravirt infrastructure, for the benefit of regular
>> users having a nice API, not because XenPV is expecting to do something
>> wildly different here.
> I fundamentaly hate adding this to the PV infrastructure. We don't want
> more PV ops, quite the contrary.
What I meant was "there should be the two top-level APIs, and under the
covers they DTRT". Part of doing the right thing is to wire up paravirt
for configs where that is specified.
Anything else is going to force people to write logic of the form:
if (WRMSRNS && !XENPV)
wrmsr_ns(...)
else
wrmsr(...)
which is going to be worse overall. And there really is one example of
this antipattern already in the series.
> For the initial use case at hand, there is an explicit FRED dependency
> and the code in question really wants to use WRMSRNS directly and not
> through a PV function call.
>
> I agree with your reasoning for the more generic use case where we can
> gain performance independent of FRED by using WRMSRNS for cases where
> the write has no serialization requirements.
>
> But this made me look into PV ops some more. For actual performance
> relevant code the current PV ops mechanics are a horrorshow when the op
> defaults to the native instruction.
>
> Let's look at wrmsrl():
>
> wrmsrl(msr, val
> wrmsr(msr, (u32)val, (u32)val >> 32))
> paravirt_write_msr(msr, low, high)
> PVOP_VCALL3(cpu.write_msr, msr, low, high)
>
> Which results in
>
> mov $msr, %edi
> mov $val, %rdx
> mov %edx, %esi
> shr $0x20, %rdx
> call native_write_msr
>
> and native_write_msr() does at minimum:
>
> mov %edi,%ecx
> mov %esi,%eax
> wrmsr
> ret
Yeah, this is daft. But it can also be fixed irrespective of WRMSRNS.
WRMSR has one complexity that most other PV-ops don't, and that's the
exception table reference for the instruction itself.
In a theoretical future ought to look like:
mov $msr, %ecx
mov $lo, %eax
mov $hi, %edx
1: {call paravirt_blah(%rip) | cs...cs wrmsr | cs...cs wrmsrns }
_ASM_EXTABLE(1b, ...)
In paravirt builds, the CALL needs to be the emitted form, because it
needs to function in very early boot.
But once the paravirt-ness has been chosen and alternatives run, the
as-native paths are fully inline.
The alternative which processes this site wants to conclude that, in the
case it does not alter from the CALL, to clobber the EXTABLE reference.
CALL instructions can #GP, and you don't want to end up thinking you're
handling a WRMSR #GP when in fact it was a non-canonical function pointer.
> In the worst case 'ret' is going through the return thunk. Not to talk
> about function prologues and whatever.
>
> This becomes even more silly for trivial instructions like STI/CLI or in
> the worst case paravirt_nop().
STI/CLI are already magic. Are they not inlined?
> The call makes only sense, when the native default is an actual
> function, but for the trivial cases it's a blatant engineering
> trainwreck.
>
> I wouldn't care at all if CONFIG_PARAVIRT_XXL would be the esoteric use
> case, but AFAICT it's default enabled on all major distros.
>
> So no. I'm fundamentally disagreeing with your recommendation. The way
> forward is:
>
> 1) Provide the native variant for wrmsrns(), i.e. rename the proposed
> wrmsrns() to native_wrmsr_ns() and have the X86_FEATURE_WRMSRNS
> safety net as you pointed out.
>
> That function can be used in code which is guaranteed to be not
> affected by the PV_XXL madness.
>
> 2) Come up with a sensible solution for the PV_XXL horrorshow
>
> 3) Implement a sane general variant of wrmsr_ns() which handles
> both X86_FEATURE_WRMSRNS and X86_MISFEATURE_PV_XXL
>
> 4) Convert other code which benefits from the non-serializing variant
> to wrmsr_ns()
Well - point 1 is mostly work that needs reverting as part of completing
point 3, and point 2 clearly needs doing irrespective of anything else.
Thanks,
~Andrew
next prev parent reply other threads:[~2023-09-14 23:46 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-14 4:47 [PATCH v10 00/38] x86: enable FRED for x86-64 Xin Li
2023-09-14 4:47 ` [PATCH v10 01/38] x86/cpufeatures: Add the cpu feature bit for WRMSRNS Xin Li
2023-09-14 4:47 ` [PATCH v10 02/38] x86/opcode: Add the WRMSRNS instruction to the x86 opcode map Xin Li
2023-09-15 5:47 ` Masami Hiramatsu
2023-09-14 4:47 ` [PATCH v10 03/38] x86/msr: Add the WRMSRNS instruction support Xin Li
2023-09-14 6:02 ` Juergen Gross
2023-09-14 13:01 ` andrew.cooper3
2023-09-14 14:05 ` andrew.cooper3
2023-09-14 23:00 ` Thomas Gleixner
2023-09-14 23:34 ` H. Peter Anvin
2023-09-14 23:46 ` andrew.cooper3 [this message]
2023-09-15 0:12 ` Thomas Gleixner
2023-09-15 0:33 ` andrew.cooper3
2023-09-15 0:38 ` H. Peter Anvin
2023-09-15 1:46 ` andrew.cooper3
2023-09-15 2:06 ` H. Peter Anvin
2023-09-15 0:42 ` Thomas Gleixner
2023-09-15 1:01 ` H. Peter Anvin
2023-09-15 1:16 ` andrew.cooper3
2023-09-15 5:32 ` Juergen Gross
2023-09-20 15:00 ` Peter Zijlstra
2023-09-20 15:04 ` Juergen Gross
2023-09-20 7:58 ` Nikolay Borisov
2023-09-20 8:18 ` Li, Xin3
2023-09-22 8:16 ` Li, Xin3
2023-09-22 15:00 ` Thomas Gleixner
2023-09-22 23:21 ` Li, Xin3
2023-09-14 4:47 ` [PATCH v10 04/38] x86/entry: Remove idtentry_sysvec from entry_{32,64}.S Xin Li
2023-09-14 4:47 ` [PATCH v10 05/38] x86/trapnr: Add event type macros to <asm/trapnr.h> Xin Li
2023-09-14 14:22 ` andrew.cooper3
2023-09-14 4:47 ` [PATCH v10 06/38] Documentation/x86/64: Add a documentation for FRED Xin Li
2023-09-20 9:44 ` Nikolay Borisov
2023-09-14 4:47 ` [PATCH v10 07/38] x86/fred: Add Kconfig option for FRED (CONFIG_X86_FRED) Xin Li
2023-09-14 4:47 ` [PATCH v10 08/38] x86/cpufeatures: Add the cpu feature bit for FRED Xin Li
2023-09-14 6:03 ` Juergen Gross
2023-09-14 6:09 ` Jan Beulich
2023-09-14 13:15 ` andrew.cooper3
2023-09-15 1:07 ` Thomas Gleixner
2023-09-15 5:27 ` Juergen Gross
2023-09-14 4:47 ` [PATCH v10 09/38] x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled Xin Li
2023-09-20 10:19 ` Nikolay Borisov
2023-09-14 4:47 ` [PATCH v10 10/38] x86/fred: Disable FRED by default in its early stage Xin Li
2023-09-14 4:47 ` [PATCH v10 11/38] x86/opcode: Add ERET[US] instructions to the x86 opcode map Xin Li
2023-09-14 4:47 ` [PATCH v10 12/38] x86/objtool: Teach objtool about ERET[US] Xin Li
2023-09-14 4:47 ` [PATCH v10 13/38] x86/cpu: Add X86_CR4_FRED macro Xin Li
2023-09-20 10:50 ` Nikolay Borisov
2023-09-20 17:25 ` Li, Xin3
2023-09-14 4:47 ` [PATCH v10 14/38] x86/cpu: Add MSR numbers for FRED configuration Xin Li
2023-09-14 4:47 ` [PATCH v10 15/38] x86/ptrace: Cleanup the definition of the pt_regs structure Xin Li
2023-09-14 4:47 ` [PATCH v10 16/38] x86/ptrace: Add FRED additional information to " Xin Li
2023-09-20 12:57 ` Nikolay Borisov
2023-09-20 17:23 ` Li, Xin3
2023-09-21 6:07 ` Nikolay Borisov
2023-09-21 6:24 ` Li, Xin3
2023-09-14 4:47 ` [PATCH v10 17/38] x86/fred: Add a new header file for FRED definitions Xin Li
2023-09-14 4:47 ` [PATCH v10 18/38] x86/fred: Reserve space for the FRED stack frame Xin Li
2023-09-14 4:47 ` [PATCH v10 19/38] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch Xin Li
2023-09-14 4:47 ` [PATCH v10 20/38] x86/fred: Disallow the swapgs instruction when FRED is enabled Xin Li
2023-09-14 4:47 ` [PATCH v10 21/38] x86/fred: No ESPFIX needed " Xin Li
2023-09-14 4:47 ` [PATCH v10 22/38] x86/fred: Allow single-step trap and NMI when starting a new task Xin Li
2023-09-14 4:47 ` [PATCH v10 23/38] x86/fred: Make exc_page_fault() work for FRED Xin Li
2023-09-14 4:47 ` [PATCH v10 24/38] x86/idtentry: Incorporate definitions/declarations of the FRED entries Xin Li
2023-09-14 4:47 ` [PATCH v10 25/38] x86/fred: Add a debug fault entry stub for FRED Xin Li
2023-09-14 4:47 ` [PATCH v10 26/38] x86/fred: Add a NMI " Xin Li
2023-09-14 4:47 ` [PATCH v10 27/38] x86/fred: Add a machine check " Xin Li
2023-09-14 4:47 ` [PATCH v10 28/38] x86/fred: FRED entry/exit and dispatch code Xin Li
2023-09-21 9:48 ` Nikolay Borisov
2023-09-21 10:08 ` Thomas Gleixner
2023-09-21 17:54 ` Li, Xin3
2023-09-14 4:47 ` [PATCH v10 29/38] x86/traps: Add sysvec_install() to install a system interrupt handler Xin Li
2023-09-14 4:47 ` [PATCH v10 30/38] x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled Xin Li
2023-09-14 4:47 ` [PATCH v10 31/38] x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user Xin Li
2023-09-14 4:47 ` [PATCH v10 32/38] x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code Xin Li
2023-09-14 4:48 ` [PATCH v10 33/38] x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI Xin Li
2023-09-20 17:54 ` Paolo Bonzini
2023-09-20 23:10 ` Li, Xin3
2023-09-21 12:11 ` Nikolay Borisov
2023-09-21 12:38 ` Paolo Bonzini
2023-09-14 4:48 ` [PATCH v10 34/38] KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling Xin Li
2023-09-20 17:54 ` Paolo Bonzini
2023-09-14 4:48 ` [PATCH v10 35/38] x86/syscall: Split IDT syscall setup code into idt_syscall_init() Xin Li
2023-09-14 4:48 ` [PATCH v10 36/38] x86/fred: Add fred_syscall_init() Xin Li
2023-09-19 8:28 ` Thomas Gleixner
2023-09-20 4:33 ` Li, Xin3
2023-09-20 8:18 ` Thomas Gleixner
2023-09-21 2:24 ` H. Peter Anvin
2023-09-14 4:48 ` [PATCH v10 37/38] x86/fred: Add FRED initialization functions Xin Li
2023-09-14 4:48 ` [PATCH v10 38/38] x86/fred: Invoke FRED initialization code to enable FRED Xin Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7ba4ae3e-f75d-66a8-7669-b6eb17c1aa1c@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=jiangshanlai@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=ravi.v.shankar@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
--cc=xin3.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).