linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mukesh R <mrathor@linux.microsoft.com>
To: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, kys@microsoft.com,
	haiyangz@microsoft.com, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, arnd@arndb.de
Subject: Re: [PATCH v2 4/6] x86/hyperv: Add trampoline asm code to transition from hypervisor
Date: Wed, 1 Oct 2025 14:07:52 -0700	[thread overview]
Message-ID: <c74ad4a7-fcf0-6276-ee50-a3cb255e3f5b@linux.microsoft.com> (raw)
In-Reply-To: <20251001060002.GA603271@liuwe-devbox-debian-v2.local>

On 9/30/25 23:00, Wei Liu wrote:
> On Tue, Sep 23, 2025 at 02:46:07PM -0700, Mukesh Rathor wrote:
>> Introduce a small asm stub to transition from the hypervisor to Linux
>> after devirtualization. Devirtualization means disabling hypervisor on
>> the fly, so after it is done, the code is running on physical processor
>> instead of virtual, and hypervisor is gone. This can be done by a
>> root/dom0 vm only.
> 
> I want to scrub "dom0" from comments and commit messages. We drew
> parallels to Xen when we first wrote this code, but it's not a useful
> term externally. "root" or "root partition" should be sufficient.
> 
>>
>> At a high level, during panic of either the hypervisor or the dom0 (aka
>> root), the NMI handler asks hypervisor to devirtualize. As part of that,
>> the arguments include an entry point to return back to Linux. This asm
>> stub implements that entry point.
>>
>> The stub is entered in protected mode, uses temporary gdt and page table
>> to enable long mode and get to kernel entry point which then restores full
>> kernel context to resume execution to kexec.
>>
>> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
>> ---
>>  arch/x86/hyperv/hv_trampoline.S | 101 ++++++++++++++++++++++++++++++++
>>  1 file changed, 101 insertions(+)
>>  create mode 100644 arch/x86/hyperv/hv_trampoline.S
>>
>> diff --git a/arch/x86/hyperv/hv_trampoline.S b/arch/x86/hyperv/hv_trampoline.S
>> new file mode 100644
>> index 000000000000..25f02ff12286
>> --- /dev/null
>> +++ b/arch/x86/hyperv/hv_trampoline.S
>> @@ -0,0 +1,101 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * X86 specific Hyper-V kdump/crash related code.
>> + *
>> + * Copyright (C) 2025, Microsoft, Inc.
>> + *
>> + */
>> +#include <linux/linkage.h>
>> +#include <asm/alternative.h>
>> +#include <asm/msr.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/nospec-branch.h>
>> +
>> +/*
>> + * void noreturn hv_crash_asm32(arg1)
>> + *    arg1 == edi == 32bit PA of struct hv_crash_tramp_data
>> + *
>> + * The hypervisor jumps here upon devirtualization in protected mode. This
>> + * code gets copied to a page in the low 4G ie, 32bit space so it can run
>> + * in the protected mode. Hence we cannot use any compile/link time offsets or
>> + * addresses. It restores long mode via temporary gdt and page tables and
>> + * eventually jumps to kernel code entry at HV_CRASHDATA_OFFS_C_entry.
>> + *
>> + * PreCondition (ie, Hypervisor call back ABI):
>> + *  o CR0 is set to 0x0021: PE(prot mode) and NE are set, paging is disabled
>> + *  o CR4 is set to 0x0
>> + *  o IA32_EFER is set to 0x901 (SCE and NXE are set)
>> + *  o EDI is set to the Arg passed to HVCALL_DISABLE_HYP_EX.
>> + *  o CS, DS, ES, FS, GS are all initialized with a base of 0 and limit 0xFFFF
>> + *  o IDTR, TR and GDTR are initialized with a base of 0 and limit of 0xFFFF
>> + *  o LDTR is initialized as invalid (limit of 0)
>> + *  o MSR PAT is power on default.
>> + *  o Other state/registers are cleared. All TLBs flushed.
>> + */
>> +
>> +#define HV_CRASHDATA_OFFS_TRAMPCR3    0x0    /*  0 */
>> +#define HV_CRASHDATA_OFFS_KERNCR3     0x8    /*  8 */
>> +#define HV_CRASHDATA_OFFS_GDTRLIMIT  0x12    /* 18 */
>> +#define HV_CRASHDATA_OFFS_CS_JMPTGT  0x28    /* 40 */
>> +#define HV_CRASHDATA_OFFS_C_entry    0x30    /* 48 */
>> +
>> +	.text
>> +	.code32
>> +
> 
> I recently learned that instrumentation may be problematic for context
> switching code. I have not studied this code and noinstr usage in tree
> extensively so cannot make a judgement here.
> 
> It is worth checking out the recent discussion on the VTL transition
> code.
> 
> https://lore.kernel.org/linux-hyperv/27e50bb7-7f0e-48fb-bdbc-6c6d606e7113@redhat.com/
> 
> And check out the in-tree document Documentation/core-api/entry.rst.

Thanks, we should be ok here because this is actually copied to another
below 4G page for protected mode transfer. It is then executed from there,
and not the default section it is linked in. For example, 

arch/x86/kernel/relocate_kernel_64.S

does not have .noinstr. 

Thanks,
-Mukesh


> Wei
> 
>> +SYM_CODE_START(hv_crash_asm32)
>> +	UNWIND_HINT_UNDEFINED
>> +	ENDBR
>> +	movl	$X86_CR4_PAE, %ecx
>> +	movl	%ecx, %cr4
>> +
>> +	movl %edi, %ebx
>> +	add $HV_CRASHDATA_OFFS_TRAMPCR3, %ebx
>> +	movl %cs:(%ebx), %eax
>> +	movl %eax, %cr3
>> +
>> +	/* Setup EFER for long mode now */
>> +	movl	$MSR_EFER, %ecx
>> +	rdmsr
>> +	btsl	$_EFER_LME, %eax
>> +	wrmsr
>> +
>> +	/* Turn paging on using the temp 32bit trampoline page table */
>> +	movl %cr0, %eax
>> +	orl $(X86_CR0_PG), %eax
>> +	movl %eax, %cr0
>> +
>> +	/* since kernel cr3 could be above 4G, we need to be in the long mode
>> +	 * before we can load 64bits of the kernel cr3. We use a temp gdt for
>> +	 * that with CS.L=1 and CS.D=0 */
>> +	mov %edi, %eax
>> +	add $HV_CRASHDATA_OFFS_GDTRLIMIT, %eax
>> +	lgdtl %cs:(%eax)
>> +
>> +	/* not done yet, restore CS now to switch to CS.L=1 */
>> +	mov %edi, %eax
>> +	add $HV_CRASHDATA_OFFS_CS_JMPTGT, %eax
>> +	ljmp %cs:*(%eax)
>> +SYM_CODE_END(hv_crash_asm32)
>> +
>> +	/* we now run in full 64bit IA32-e long mode, CS.L=1 and CS.D=0 */
>> +	.code64
>> +	.balign 8
>> +SYM_CODE_START(hv_crash_asm64)
>> +	UNWIND_HINT_UNDEFINED
>> +	ENDBR
>> +	/* restore kernel page tables so we can jump to kernel code */
>> +	mov %edi, %eax
>> +	add $HV_CRASHDATA_OFFS_KERNCR3, %eax
>> +	movq %cs:(%eax), %rbx
>> +	movq %rbx, %cr3
>> +
>> +	mov %edi, %eax
>> +	add $HV_CRASHDATA_OFFS_C_entry, %eax
>> +	movq %cs:(%eax), %rbx
>> +	ANNOTATE_RETPOLINE_SAFE
>> +	jmp *%rbx
>> +
>> +	int $3
>> +
>> +SYM_INNER_LABEL(hv_crash_asm_end, SYM_L_GLOBAL)
>> +SYM_CODE_END(hv_crash_asm64)
>> -- 
>> 2.36.1.vfs.0.0
>>
>>


  reply	other threads:[~2025-10-01 21:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-23 21:46 [PATCH v2 0/6] Hyper-V: Implement hypervisor core collection Mukesh Rathor
2025-09-23 21:46 ` [PATCH v2 1/6] x86/hyperv: Rename guest crash shutdown function Mukesh Rathor
2025-09-23 21:46 ` [PATCH v2 2/6] hyperv: Add two new hypercall numbers to guest ABI public header Mukesh Rathor
2025-09-23 21:46 ` [PATCH v2 3/6] hyperv: Add definitions for hypervisor crash dump support Mukesh Rathor
2025-09-23 21:46 ` [PATCH v2 4/6] x86/hyperv: Add trampoline asm code to transition from hypervisor Mukesh Rathor
2025-10-01  6:00   ` Wei Liu
2025-10-01 21:07     ` Mukesh R [this message]
2025-09-23 21:46 ` [PATCH v2 5/6] x86/hyperv: Implement hypervisor RAM collection into vmcore Mukesh Rathor
2025-10-02 21:42   ` Wei Liu
2025-10-02 22:07     ` Mukesh R
2025-09-23 21:46 ` [PATCH v2 6/6] x86/hyperv: Enable build of hypervisor crashdump collection files Mukesh Rathor
2025-09-24 17:07   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c74ad4a7-fcf0-6276-ee50-a3cb255e3f5b@linux.microsoft.com \
    --to=mrathor@linux.microsoft.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=kys@microsoft.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).