Re: [PATCH 1/2] x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Michael Kelley <mikelley@microsoft.com>
Cc: mikelley@microsoft.com, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	catalin.marinas@arm.com, will@kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, x86@kernel.org,
	kvm@vger.kernel.org
Subject: Re: [PATCH 1/2] x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
Date: Mon, 22 May 2023 10:56:05 +0200	[thread overview]
Message-ID: <87o7mczqvu.fsf@redhat.com> (raw)
In-Reply-To: <1684506832-41392-1-git-send-email-mikelley@microsoft.com>

Michael Kelley <mikelley@microsoft.com> writes:

> These commits
>
> a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
> 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
>
> update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
> because that memory will be correctly marked as decrypted or encrypted
> for all VM types (CoCo or normal). But problems ensue when CPUs in the
> VM go online or offline after virtual PCI devices have been configured.
>
> When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
> initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
> But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
> may call the virtual PCI driver and fault trying to use the as yet
> uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
> VM if the MMIO read and write hypercalls are used from state
> CPUHP_AP_IRQ_AFFINITY_ONLINE.
>
> When a CPU is taken offline, IRQs may be reassigned in state
> CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
> use the hyperv_pcpu_input_arg that has already been freed by a
> higher state.
>
> Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
> immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
> and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
> Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
> early enough.
>
> Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
> a CPU goes offline. Retain the allocated memory, and reuse it if
> the CPU comes back online later.
>
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c  |  2 +-
>  drivers/hv/hv_common.c     | 48 +++++++++++++++++++++++-----------------------
>  include/linux/cpuhotplug.h |  1 +
>  3 files changed, 26 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a5f9474..6c04b52 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -416,7 +416,7 @@ void __init hyperv_init(void)
>  			goto free_vp_assist_page;
>  	}
>  
> -	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
> +	cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
>  				  hv_cpu_init, hv_cpu_die);
>  	if (cpuhp < 0)
>  		goto free_ghcb_page;
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 64f9cec..4c5cee4 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -364,13 +364,20 @@ int hv_common_cpu_init(unsigned int cpu)
>  	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
>  
>  	inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	*inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> -	if (!(*inputarg))
> -		return -ENOMEM;
>  
> -	if (hv_root_partition) {
> -		outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> -		*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> +	/*
> +	 * hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory is already
> +	 * allocated if this CPU was previously online and then taken offline
> +	 */
> +	if (!*inputarg) {
> +		*inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> +		if (!(*inputarg))
> +			return -ENOMEM;
> +
> +		if (hv_root_partition) {
> +			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +			*outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> +		}
>  	}
>  
>  	msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX);
> @@ -385,24 +392,17 @@ int hv_common_cpu_init(unsigned int cpu)
>  
>  int hv_common_cpu_die(unsigned int cpu)
>  {
> -	unsigned long flags;
> -	void **inputarg, **outputarg;
> -	void *mem;
> -
> -	local_irq_save(flags);
> -
> -	inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	mem = *inputarg;
> -	*inputarg = NULL;
> -
> -	if (hv_root_partition) {
> -		outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> -		*outputarg = NULL;
> -	}
> -
> -	local_irq_restore(flags);
> -
> -	kfree(mem);
> +	/*
> +	 * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory
> +	 * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg
> +	 * may be used by the Hyper-V vPCI driver in reassigning interrupts
> +	 * as part of the offlining process.  The interrupt reassignment
> +	 * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and
> +	 * called this function.
> +	 *
> +	 * If a previously offlined CPU is brought back online again, the
> +	 * originally allocated memory is reused in hv_common_cpu_init().
> +	 */
>  
>  	return 0;
>  }
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 0f1001d..696004a 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -201,6 +201,7 @@ enum cpuhp_state {
>  	/* Online section invoked on the hotplugged CPU from the hotplug thread */
>  	CPUHP_AP_ONLINE_IDLE,
>  	CPUHP_AP_KVM_ONLINE,
> +	CPUHP_AP_HYPERV_ONLINE,

(Cc: KVM)

I would very much prefer we swap the order with KVM here: hv_cpu_init()
allocates and sets vCPU's VP assist page which is used by KVM on
CPUHP_AP_KVM_ONLINE:

kvm_online_cpu() -> __hardware_enable_nolock() ->
kvm_arch_hardware_enable() -> vmx_hardware_enable():

        /*
         * This can happen if we hot-added a CPU but failed to allocate
         * VP assist page for it.
         */
	if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu))
		return -EFAULT;

With the change, this is likely broken.

FWIF, KVM also needs current vCPU's VP index (also set by hv_cpu_init())
through __kvm_x86_vendor_init() -> set_hv_tscchange_cb() call chain but
this happens upon KVM module load so CPU hotplug ordering should not
matter as this always happens on a CPU which is already online.

Generally, as 'KVM on Hyper-V' is a supported scenario, doing Hyper-V
init before KVM's (and vice versa on teardown) makes sense.

>  	CPUHP_AP_SCHED_WAIT_EMPTY,
>  	CPUHP_AP_SMPBOOT_THREADS,
>  	CPUHP_AP_X86_VDSO_VMA_ONLINE,

-- 
Vitaly

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel