linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining
@ 2022-11-03 19:06 Vitaly Kuznetsov
  2022-11-04 16:23 ` Michael Kelley (LINUX)
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2022-11-03 19:06 UTC (permalink / raw)
  To: linux-hyperv
  Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Michael Kelley, linux-kernel, x86

Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
'if (*hvp)' condition. This works for root partition as hv_cpu_die()
does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
for them. This causes VP assist page to remain unset after CPU
offline/online cycle:

$ rdmsr -p 24 0x40000073
  10212f001
$ echo 0 > /sys/devices/system/cpu/cpu24/online
$ echo 1 > /sys/devices/system/cpu/cpu24/online
$ rdmsr -p 24 0x40000073
  0

Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
NULL (and it's also NULL initially).

Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
present a (potential) issue for KVM. While Hyper-V uses
CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses CPUHP_AP_KVM_STARTING
which comes earlier in CPU teardown sequence. It is theoretically
possible that Enlightened VMCS is still in use. It is unclear if the
issue is real and if using KVM with Hyper-V root partition is even
possible.

While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().

Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index f49bc3ec76e6..a269049a43ce 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
 static int hv_cpu_init(unsigned int cpu)
 {
 	union hv_vp_assist_msr_contents msr = { 0 };
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 	int ret;
 
 	ret = hv_common_cpu_init(cpu);
@@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu)
 	if (!hv_vp_assist_page)
 		return 0;
 
-	if (!*hvp) {
-		if (hv_root_partition) {
-			/*
-			 * For root partition we get the hypervisor provided VP assist
-			 * page, instead of allocating a new page.
-			 */
-			rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
-			*hvp = memremap(msr.pfn <<
-					HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
-					PAGE_SIZE, MEMREMAP_WB);
-		} else {
-			/*
-			 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
-			 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
-			 * out to make sure we always write the EOI MSR in
-			 * hv_apic_eoi_write() *after* the EOI optimization is disabled
-			 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
-			 * case of CPU offlining and the VM will hang.
-			 */
+	if (hv_root_partition) {
+		/*
+		 * For root partition we get the hypervisor provided VP assist
+		 * page, instead of allocating a new page.
+		 */
+		rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
+		*hvp = memremap(msr.pfn << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
+				PAGE_SIZE, MEMREMAP_WB);
+	} else {
+		/*
+		 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
+		 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
+		 * out to make sure we always write the EOI MSR in
+		 * hv_apic_eoi_write() *after* the EOI optimization is disabled
+		 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
+		 * case of CPU offlining and the VM will hang.
+		 */
+		if (!*hvp)
 			*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
-			if (*hvp)
-				msr.pfn = vmalloc_to_pfn(*hvp);
-		}
-		WARN_ON(!(*hvp));
-		if (*hvp) {
-			msr.enable = 1;
-			wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
-		}
+		if (*hvp)
+			msr.pfn = vmalloc_to_pfn(*hvp);
+
+	}
+	if (!WARN_ON(!(*hvp))) {
+		msr.enable = 1;
+		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
 	}
 
 	return hyperv_init_ghcb();
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining
  2022-11-03 19:06 [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining Vitaly Kuznetsov
@ 2022-11-04 16:23 ` Michael Kelley (LINUX)
  2022-11-04 16:46   ` Vitaly Kuznetsov
  2022-11-04 20:57 ` Michael Kelley (LINUX)
  2022-11-11 23:34 ` Wei Liu
  2 siblings, 1 reply; 5+ messages in thread
From: Michael Kelley (LINUX) @ 2022-11-04 16:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov, linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-kernel@vger.kernel.org, x86@kernel.org

From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, November 3, 2022 12:06 PM
> 
> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
> 'if (*hvp)' condition. This works for root partition as hv_cpu_die()
> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
> for them. 

Did you consider having hv_cpu_die() free the VP assist page and
set hv_vp_assist_page[cpu] to NULL in the non-root case?  That would
make the root and non-root cases more consistent, and it would make
hv_cpu_init() and hv_cpu_die() more symmetrical.   The hv_cpu_die()
path frees up pretty much all the other per-CPU resources.  I don't
know why it keeps the VP assist page for re-use if the CPU comes back
online later.

You added the original code for allocating the vp_assist_page in
commit a46d15cc1a, so maybe you remember if there are any
gotchas. :-)

Michael

> This causes VP assist page to remain unset after CPU
> offline/online cycle:
> 
> $ rdmsr -p 24 0x40000073
>   10212f001
> $ echo 0 > /sys/devices/system/cpu/cpu24/online
> $ echo 1 > /sys/devices/system/cpu/cpu24/online
> $ rdmsr -p 24 0x40000073
>   0
> 
> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
> NULL (and it's also NULL initially).
> 
> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
> present a (potential) issue for KVM. While Hyper-V uses
> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses
> CPUHP_AP_KVM_STARTING
> which comes earlier in CPU teardown sequence. It is theoretically
> possible that Enlightened VMCS is still in use. It is unclear if the
> issue is real and if using KVM with Hyper-V root partition is even
> possible.
> 
> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
> 
> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist
> page MSR")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
>  1 file changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f49bc3ec76e6..a269049a43ce 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	int ret;
> 
>  	ret = hv_common_cpu_init(cpu);
> @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> -	if (!*hvp) {
> -		if (hv_root_partition) {
> -			/*
> -			 * For root partition we get the hypervisor provided VP assist
> -			 * page, instead of allocating a new page.
> -			 */
> -			rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> -			*hvp = memremap(msr.pfn <<
> -
> 	HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
> -					PAGE_SIZE, MEMREMAP_WB);
> -		} else {
> -			/*
> -			 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
> -			 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
> -			 * out to make sure we always write the EOI MSR in
> -			 * hv_apic_eoi_write() *after* the EOI optimization is disabled
> -			 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
> -			 * case of CPU offlining and the VM will hang.
> -			 */
> +	if (hv_root_partition) {
> +		/*
> +		 * For root partition we get the hypervisor provided VP assist
> +		 * page, instead of allocating a new page.
> +		 */
> +		rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> +		*hvp = memremap(msr.pfn <<
> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
> +				PAGE_SIZE, MEMREMAP_WB);
> +	} else {
> +		/*
> +		 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
> +		 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
> +		 * out to make sure we always write the EOI MSR in
> +		 * hv_apic_eoi_write() *after* the EOI optimization is disabled
> +		 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
> +		 * case of CPU offlining and the VM will hang.
> +		 */
> +		if (!*hvp)
>  			*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
> -			if (*hvp)
> -				msr.pfn = vmalloc_to_pfn(*hvp);
> -		}
> -		WARN_ON(!(*hvp));
> -		if (*hvp) {
> -			msr.enable = 1;
> -			wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> -		}
> +		if (*hvp)
> +			msr.pfn = vmalloc_to_pfn(*hvp);
> +
> +	}
> +	if (!WARN_ON(!(*hvp))) {
> +		msr.enable = 1;
> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>  	}
> 
>  	return hyperv_init_ghcb();
> --
> 2.38.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining
  2022-11-04 16:23 ` Michael Kelley (LINUX)
@ 2022-11-04 16:46   ` Vitaly Kuznetsov
  0 siblings, 0 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2022-11-04 16:46 UTC (permalink / raw)
  To: Michael Kelley (LINUX), linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-kernel@vger.kernel.org, x86@kernel.org

[-- Attachment #1: Type: text/plain, Size: 5913 bytes --]

"Michael Kelley (LINUX)" <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, November 3, 2022 12:06 PM
>> 
>> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
>> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
>> 'if (*hvp)' condition. This works for root partition as hv_cpu_die()
>> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
>> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
>> for them. 
>
> Did you consider having hv_cpu_die() free the VP assist page and
> set hv_vp_assist_page[cpu] to NULL in the non-root case?

Oh yes, I did, and I even wrote a patch (attached) but it failed in
testing :-( My testing was to run CPU onlining/offlining loop and and
try using KVM on the same CPU at the same time. I was able to see issues
when KVM was not able to reach to Enlightened VMCS because VP assist
page was NULL.

>  That would
> make the root and non-root cases more consistent, and it would make
> hv_cpu_init() and hv_cpu_die() more symmetrical.   The hv_cpu_die()
> path frees up pretty much all the other per-CPU resources.  I don't
> know why it keeps the VP assist page for re-use if the CPU comes back
> online later.
>
> You added the original code for allocating the vp_assist_page in
> commit a46d15cc1a, so maybe you remember if there are any
> gotchas. :-)

The root cause of the problem I observed seems to be the order of CPU
hotplug. Hyper-V uses the last CPUHP_AP_ONLINE_DYN stage while KVM has
his own dedicated CPUHP_AP_KVM_STARTING one so we end up freeing VP
assist page before turning KVM off on the CPU. It may be sufficient to
introduce a new stage for Hyper-V and put it before KVM's. It's in my
backlog to explore this path but it may take me some time to get back to
it :-( 

>
> Michael
>
>> This causes VP assist page to remain unset after CPU
>> offline/online cycle:
>> 
>> $ rdmsr -p 24 0x40000073
>>   10212f001
>> $ echo 0 > /sys/devices/system/cpu/cpu24/online
>> $ echo 1 > /sys/devices/system/cpu/cpu24/online
>> $ rdmsr -p 24 0x40000073
>>   0
>> 
>> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
>> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
>> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
>> NULL (and it's also NULL initially).
>> 
>> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
>> present a (potential) issue for KVM. While Hyper-V uses
>> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses
>> CPUHP_AP_KVM_STARTING
>> which comes earlier in CPU teardown sequence. It is theoretically
>> possible that Enlightened VMCS is still in use. It is unclear if the
>> issue is real and if using KVM with Hyper-V root partition is even
>> possible.
>> 
>> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
>> 
>> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist
>> page MSR")
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
>>  1 file changed, 26 insertions(+), 28 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index f49bc3ec76e6..a269049a43ce 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
>>  static int hv_cpu_init(unsigned int cpu)
>>  {
>>  	union hv_vp_assist_msr_contents msr = { 0 };
>> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>>  	int ret;
>> 
>>  	ret = hv_common_cpu_init(cpu);
>> @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu)
>>  	if (!hv_vp_assist_page)
>>  		return 0;
>> 
>> -	if (!*hvp) {
>> -		if (hv_root_partition) {
>> -			/*
>> -			 * For root partition we get the hypervisor provided VP assist
>> -			 * page, instead of allocating a new page.
>> -			 */
>> -			rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>> -			*hvp = memremap(msr.pfn <<
>> -
>> 	HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
>> -					PAGE_SIZE, MEMREMAP_WB);
>> -		} else {
>> -			/*
>> -			 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
>> -			 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
>> -			 * out to make sure we always write the EOI MSR in
>> -			 * hv_apic_eoi_write() *after* the EOI optimization is disabled
>> -			 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
>> -			 * case of CPU offlining and the VM will hang.
>> -			 */
>> +	if (hv_root_partition) {
>> +		/*
>> +		 * For root partition we get the hypervisor provided VP assist
>> +		 * page, instead of allocating a new page.
>> +		 */
>> +		rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>> +		*hvp = memremap(msr.pfn <<
>> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
>> +				PAGE_SIZE, MEMREMAP_WB);
>> +	} else {
>> +		/*
>> +		 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
>> +		 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
>> +		 * out to make sure we always write the EOI MSR in
>> +		 * hv_apic_eoi_write() *after* the EOI optimization is disabled
>> +		 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
>> +		 * case of CPU offlining and the VM will hang.
>> +		 */
>> +		if (!*hvp)
>>  			*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
>> -			if (*hvp)
>> -				msr.pfn = vmalloc_to_pfn(*hvp);
>> -		}
>> -		WARN_ON(!(*hvp));
>> -		if (*hvp) {
>> -			msr.enable = 1;
>> -			wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>> -		}
>> +		if (*hvp)
>> +			msr.pfn = vmalloc_to_pfn(*hvp);
>> +
>> +	}
>> +	if (!WARN_ON(!(*hvp))) {
>> +		msr.enable = 1;
>> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>>  	}
>> 
>>  	return hyperv_init_ghcb();
>> --
>> 2.38.1
>

-- 
Vitaly


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-x86-hyperv-Free-VP-assist-page-from-hv_cpu_die.patch --]
[-- Type: text/x-patch, Size: 3075 bytes --]

From 9b9e30e3fc182f6a43efca0f93b5851392074a3a Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Thu, 3 Nov 2022 16:27:41 +0100
Subject: [PATCH] x86/hyperv: Free VP assist page from hv_cpu_die()
Content-Type: text/plain

Normally, 'hv_vp_assist_page[cpu]' points to CPU's VP assist page mapping.
In case of Hyper-V root partition, this is 'memremap()' of the PFN given
by the hypervisor. In case of a non-root partition, it's vmalloc(). When
the CPU goes offline, hv_cpu_die() disables VP assist page by writing
HV_X64_MSR_VP_ASSIST_PAGE and in case of root partition, does memunmap().
For non-root partitions, the vmalloc()ed page remains allocated and
thus hv_cpu_init() has to check whether a new allocation is
needed. This is unnecessary complicated. Let's always free the page
from hv_cpu_die() and allocate it back from hv_cpu_init(). All VP
assist page users have to be prepared to 'hv_vp_assist_page[cpu]'
becoming NULL anyway as that's what happes already for the root
partition.

VP assist page has two users: KVM and APIC PV EOI. When a CPU goes
offline, there cannot be a running guest and thus KVM's use case
should be safe. As correctly noted in commit e320ab3cec7dd ("x86/hyper-v:
Zero out the VP ASSIST PAGE on allocation"), it is possible to see
interrupts after hv_cpu_die() and before the CPU is fully
dead. hv_apic_eoi_write() is, however, also prepared to see NULL in
'hv_vp_assist_page[smp_processor_id()]'. Moreover, checking the
page which is already unmapped from the hypervisor is incorrect in the
first place.

While on it, adjust VP assist page disabling a bit: always write to
HV_X64_MSR_VP_ASSIST_PAGE first and unmap/free the corresponding page
after, this is to make sure the hypervisor doesn't write to the
already freed memory in the interim.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_init.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index a0165df3c4d8..74be6f145fc4 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -104,8 +104,7 @@ static int hv_cpu_init(unsigned int cpu)
 		 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
 		 * case of CPU offlining and the VM will hang.
 		 */
-		if (!*hvp)
-			*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
+		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
 		if (*hvp)
 			msr.pfn = vmalloc_to_pfn(*hvp);
 
@@ -233,12 +232,17 @@ static int hv_cpu_die(unsigned int cpu)
 			 * page here and nullify it, so that in future we have
 			 * correct page address mapped in hv_cpu_init.
 			 */
-			memunmap(hv_vp_assist_page[cpu]);
-			hv_vp_assist_page[cpu] = NULL;
 			rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
 			msr.enable = 0;
 		}
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
+
+		if (hv_root_partition)
+			memunmap(hv_vp_assist_page[cpu]);
+		else
+			vfree(hv_vp_assist_page[cpu]);
+
+		hv_vp_assist_page[cpu] = NULL;
 	}
 
 	if (hv_reenlightenment_cb == NULL)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining
  2022-11-03 19:06 [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining Vitaly Kuznetsov
  2022-11-04 16:23 ` Michael Kelley (LINUX)
@ 2022-11-04 20:57 ` Michael Kelley (LINUX)
  2022-11-11 23:34 ` Wei Liu
  2 siblings, 0 replies; 5+ messages in thread
From: Michael Kelley (LINUX) @ 2022-11-04 20:57 UTC (permalink / raw)
  To: Vitaly Kuznetsov, linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-kernel@vger.kernel.org, x86@kernel.org

From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Thursday, November 3, 2022 12:06 PM
> 
> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
> 'if (*hvp)' condition. This works for root partition as hv_cpu_die()
> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
> for them. This causes VP assist page to remain unset after CPU
> offline/online cycle:
> 
> $ rdmsr -p 24 0x40000073
>   10212f001
> $ echo 0 > /sys/devices/system/cpu/cpu24/online
> $ echo 1 > /sys/devices/system/cpu/cpu24/online
> $ rdmsr -p 24 0x40000073
>   0
> 
> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
> NULL (and it's also NULL initially).
> 
> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
> present a (potential) issue for KVM. While Hyper-V uses
> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses
> CPUHP_AP_KVM_STARTING
> which comes earlier in CPU teardown sequence. It is theoretically
> possible that Enlightened VMCS is still in use. It is unclear if the
> issue is real and if using KVM with Hyper-V root partition is even
> possible.
> 
> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
> 
> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist
> page MSR")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
>  1 file changed, 26 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f49bc3ec76e6..a269049a43ce 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	union hv_vp_assist_msr_contents msr = { 0 };
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	int ret;
> 
>  	ret = hv_common_cpu_init(cpu);
> @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> -	if (!*hvp) {
> -		if (hv_root_partition) {
> -			/*
> -			 * For root partition we get the hypervisor provided VP assist
> -			 * page, instead of allocating a new page.
> -			 */
> -			rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> -			*hvp = memremap(msr.pfn <<
> -
> 	HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
> -					PAGE_SIZE, MEMREMAP_WB);
> -		} else {
> -			/*
> -			 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
> -			 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
> -			 * out to make sure we always write the EOI MSR in
> -			 * hv_apic_eoi_write() *after* the EOI optimization is disabled
> -			 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
> -			 * case of CPU offlining and the VM will hang.
> -			 */
> +	if (hv_root_partition) {
> +		/*
> +		 * For root partition we get the hypervisor provided VP assist
> +		 * page, instead of allocating a new page.
> +		 */
> +		rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> +		*hvp = memremap(msr.pfn <<
> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
> +				PAGE_SIZE, MEMREMAP_WB);
> +	} else {
> +		/*
> +		 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
> +		 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
> +		 * out to make sure we always write the EOI MSR in
> +		 * hv_apic_eoi_write() *after* the EOI optimization is disabled
> +		 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
> +		 * case of CPU offlining and the VM will hang.
> +		 */
> +		if (!*hvp)
>  			*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
> -			if (*hvp)
> -				msr.pfn = vmalloc_to_pfn(*hvp);
> -		}
> -		WARN_ON(!(*hvp));
> -		if (*hvp) {
> -			msr.enable = 1;
> -			wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> -		}
> +		if (*hvp)
> +			msr.pfn = vmalloc_to_pfn(*hvp);
> +
> +	}
> +	if (!WARN_ON(!(*hvp))) {
> +		msr.enable = 1;
> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>  	}
> 
>  	return hyperv_init_ghcb();
> --
> 2.38.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining
  2022-11-03 19:06 [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining Vitaly Kuznetsov
  2022-11-04 16:23 ` Michael Kelley (LINUX)
  2022-11-04 20:57 ` Michael Kelley (LINUX)
@ 2022-11-11 23:34 ` Wei Liu
  2 siblings, 0 replies; 5+ messages in thread
From: Wei Liu @ 2022-11-11 23:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Michael Kelley, linux-kernel, x86

On Thu, Nov 03, 2022 at 08:06:01PM +0100, Vitaly Kuznetsov wrote:
> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
> 'if (*hvp)' condition. This works for root partition as hv_cpu_die()
> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
> for them. This causes VP assist page to remain unset after CPU
> offline/online cycle:
> 
> $ rdmsr -p 24 0x40000073
>   10212f001
> $ echo 0 > /sys/devices/system/cpu/cpu24/online
> $ echo 1 > /sys/devices/system/cpu/cpu24/online
> $ rdmsr -p 24 0x40000073
>   0
> 
> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
> NULL (and it's also NULL initially).
> 
> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
> present a (potential) issue for KVM. While Hyper-V uses
> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses CPUHP_AP_KVM_STARTING
> which comes earlier in CPU teardown sequence. It is theoretically
> possible that Enlightened VMCS is still in use. It is unclear if the
> issue is real and if using KVM with Hyper-V root partition is even
> possible.
> 
> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
> 
> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Applied to hyperv-fixes. Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-11 23:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-03 19:06 [PATCH] x86/hyperv: Restore VP assist page after cpu offlining/onlining Vitaly Kuznetsov
2022-11-04 16:23 ` Michael Kelley (LINUX)
2022-11-04 16:46   ` Vitaly Kuznetsov
2022-11-04 20:57 ` Michael Kelley (LINUX)
2022-11-11 23:34 ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).