[PATCH v2] x86/sev: Update ghcb

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] x86/sev: Update ghcb_version only once
@ 2023-11-29 10:40 Ashwin Dayanand Kamat
  2023-11-29 10:42 ` kernel test robot
  2023-11-30  9:30 ` [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version Ingo Molnar
  0 siblings, 2 replies; 4+ messages in thread
From: Ashwin Dayanand Kamat @ 2023-11-29 10:40 UTC (permalink / raw)
  To: linux-kernel, thomas.lendacky, bp, brijesh.singh
  Cc: kashwindayan, tglx, mingo, dave.hansen, x86, hpa, jroedel, stable,
	ganb, tkundu, vsirnapalli, akaher, amakhalov, namit

From: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>

kernel crash was observed because of page fault, while running
cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
observed during hotplug after the CPU was offlined and the process
was migrated to different cpu. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
This results in pagefault.

From logs,
[  256.447466] BUG: unable to handle page fault for address: ffffffffba556e70
[  256.447476] #PF: supervisor write access in kernel mode
[  256.447478] #PF: error_code(0x0003) - permissions violation
[  256.447479] PGD 8000667c0f067 P4D 8000667c0f067 PUD 8000667c10063 PMD 80080006674001e1
[  256.447483] Oops: 0003 [#1] PREEMPT SMP NOPTI
[  256.447487] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.45-8.ph5 #1-photon
.
.
.
.
.
[  256.447511] CR2: ffffffffba556e70 CR3: 0008000667c0a004 CR4: 0000000000770ee0
[  256.447514] PKRU: 55555554
[  256.447515] Call Trace:
[  256.447516]  <TASK>
[  256.447519]  ? __die_body.cold+0x1a/0x1f
[  256.447526]  ? __die+0x2a/0x35
[  256.447528]  ? page_fault_oops+0x10c/0x270
[  256.447531]  ? setup_ghcb+0x71/0x100
[  256.447533]  ? __x86_return_thunk+0x5/0x6
[  256.447537]  ? search_exception_tables+0x60/0x70
[  256.447541]  ? __x86_return_thunk+0x5/0x6
[  256.447543]  ? fixup_exception+0x27/0x320
[  256.447546]  ? kernelmode_fixup_or_oops+0xa2/0x120
[  256.447549]  ? __bad_area_nosemaphore+0x16a/0x1b0
[  256.447551]  ? kernel_exc_vmm_communication+0x60/0xb0
[  256.447556]  ? bad_area_nosemaphore+0x16/0x20
[  256.447558]  ? do_kern_addr_fault+0x7a/0x90
[  256.447560]  ? exc_page_fault+0xbd/0x160
[  256.447563]  ? asm_exc_page_fault+0x27/0x30
[  256.447570]  ? setup_ghcb+0x71/0x100
[  256.447572]  ? setup_ghcb+0xe/0x100
[  256.447574]  cpu_init_exception_handling+0x1b9/0x1f0

Fix is to call sev_es_negotiate_protocol() only in the BSP boot phase (and
it only needs to be done once)

Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Co-developed-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
---
v2:
As per the review comments given by Tom Lendacky,  did below changes in v2,
 - Moved sev_es_negotiate_protocol() after initial_vc_handler if-check in setup_ghcb()
 - Added Signed-off of Co-developer
---
 arch/x86/kernel/sev.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 70472eebe719..c67285824e82 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1234,10 +1234,6 @@ void setup_ghcb(void)
 	if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
 		return;
 
-	/* First make sure the hypervisor talks a supported protocol. */
-	if (!sev_es_negotiate_protocol())
-		sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
-
 	/*
 	 * Check whether the runtime #VC exception handler is active. It uses
 	 * the per-CPU GHCB page which is set up by sev_es_init_vc_handling().
@@ -1254,6 +1250,13 @@ void setup_ghcb(void)
 		return;
 	}
 
+	/*
+	 * Make sure the hypervisor talks a supported protocol.
+	 * This gets called only in the BSP boot phase.
+	 */
+	if (!sev_es_negotiate_protocol())
+		sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
 	/*
 	 * Clear the boot_ghcb. The first exception comes in before the bss
 	 * section is cleared.
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/sev: Update ghcb_version only once
  2023-11-29 10:40 [PATCH v2] x86/sev: Update ghcb_version only once Ashwin Dayanand Kamat
@ 2023-11-29 10:42 ` kernel test robot
  2023-11-30  9:30 ` [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version Ingo Molnar
  1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2023-11-29 10:42 UTC (permalink / raw)
  To: Ashwin Dayanand Kamat; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH v2] x86/sev: Update ghcb_version only once
Link: https://lore.kernel.org/stable/1701254429-18250-1-git-send-email-kashwindayan%40vmware.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version
  2023-11-29 10:40 [PATCH v2] x86/sev: Update ghcb_version only once Ashwin Dayanand Kamat
  2023-11-29 10:42 ` kernel test robot
@ 2023-11-30  9:30 ` Ingo Molnar
  2023-11-30 16:07   ` Tom Lendacky
  1 sibling, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2023-11-30  9:30 UTC (permalink / raw)
  To: Ashwin Dayanand Kamat
  Cc: linux-kernel, thomas.lendacky, bp, brijesh.singh, tglx, mingo,
	dave.hansen, x86, hpa, jroedel, stable, ganb, tkundu, vsirnapalli,
	akaher, amakhalov, namit


* Ashwin Dayanand Kamat <kashwindayan@vmware.com> wrote:

> From: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
> 
> kernel crash was observed because of page fault, while running
> cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
> observed during hotplug after the CPU was offlined and the process
> was migrated to different cpu. setup_ghcb() is called again which
> tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
> is a read_only variable which is initialised during booting.
> This results in pagefault.

Applied to tip:x86/urgent, thanks.

Tom: I've added your Suggested-by and Acked-by, which appeared to be the 
case given the v1 discussion, let me know if that's not accurate.

I've also tidied up the changelog - final version attached below.

Thanks,

	Ingo

============>
From: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
Date: Wed, 29 Nov 2023 16:10:29 +0530
Subject: [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version

A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.

Trying to write it results in a pagefault:

  BUG: unable to handle page fault for address: ffffffffba556e70
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  [ ...]
  Call Trace:
   <TASK>
   ? __die_body.cold+0x1a/0x1f
   ? __die+0x2a/0x35
   ? page_fault_oops+0x10c/0x270
   ? setup_ghcb+0x71/0x100
   ? __x86_return_thunk+0x5/0x6
   ? search_exception_tables+0x60/0x70
   ? __x86_return_thunk+0x5/0x6
   ? fixup_exception+0x27/0x320
   ? kernelmode_fixup_or_oops+0xa2/0x120
   ? __bad_area_nosemaphore+0x16a/0x1b0
   ? kernel_exc_vmm_communication+0x60/0xb0
   ? bad_area_nosemaphore+0x16/0x20
   ? do_kern_addr_fault+0x7a/0x90
   ? exc_page_fault+0xbd/0x160
   ? asm_exc_page_fault+0x27/0x30
   ? setup_ghcb+0x71/0x100
   ? setup_ghcb+0xe/0x100
   cpu_init_exception_handling+0x1b9/0x1f0

The fix is to call sev_es_negotiate_protocol() only in the BSP boot phase,
and it only needs to be done once in any case.

[ mingo: Refined the changelog. ]

Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Co-developed-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Bo Gan <bo.gan@broadcom.com>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/1701254429-18250-1-git-send-email-kashwindayan@vmware.com
---
 arch/x86/kernel/sev.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 70472eebe719..c67285824e82 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1234,10 +1234,6 @@ void setup_ghcb(void)
 	if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
 		return;
 
-	/* First make sure the hypervisor talks a supported protocol. */
-	if (!sev_es_negotiate_protocol())
-		sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
-
 	/*
 	 * Check whether the runtime #VC exception handler is active. It uses
 	 * the per-CPU GHCB page which is set up by sev_es_init_vc_handling().
@@ -1254,6 +1250,13 @@ void setup_ghcb(void)
 		return;
 	}
 
+	/*
+	 * Make sure the hypervisor talks a supported protocol.
+	 * This gets called only in the BSP boot phase.
+	 */
+	if (!sev_es_negotiate_protocol())
+		sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
 	/*
 	 * Clear the boot_ghcb. The first exception comes in before the bss
 	 * section is cleared.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version
  2023-11-30  9:30 ` [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version Ingo Molnar
@ 2023-11-30 16:07   ` Tom Lendacky
  0 siblings, 0 replies; 4+ messages in thread
From: Tom Lendacky @ 2023-11-30 16:07 UTC (permalink / raw)
  To: Ingo Molnar, Ashwin Dayanand Kamat
  Cc: linux-kernel, bp, brijesh.singh, tglx, mingo, dave.hansen, x86,
	hpa, jroedel, stable, ganb, tkundu, vsirnapalli, akaher,
	amakhalov, namit

On 11/30/23 03:30, Ingo Molnar wrote:
> 
> * Ashwin Dayanand Kamat <kashwindayan@vmware.com> wrote:
> 
>> From: Ashwin Dayanand Kamat <ashwin.kamat@broadcom.com>
>>
>> kernel crash was observed because of page fault, while running
>> cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
>> observed during hotplug after the CPU was offlined and the process
>> was migrated to different cpu. setup_ghcb() is called again which
>> tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
>> is a read_only variable which is initialised during booting.
>> This results in pagefault.
> 
> Applied to tip:x86/urgent, thanks.
> 
> Tom: I've added your Suggested-by and Acked-by, which appeared to be the
> case given the v1 discussion, let me know if that's not accurate.

All good.

Thanks,
Tom

> 
> I've also tidied up the changelog - final version attached below.
> 
> Thanks,
> 
> 	Ingo
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-30 16:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-29 10:40 [PATCH v2] x86/sev: Update ghcb_version only once Ashwin Dayanand Kamat
2023-11-29 10:42 ` kernel test robot
2023-11-30  9:30 ` [PATCH] x86/sev: Fix kernel crash due to late update to read-only ghcb_version Ingo Molnar
2023-11-30 16:07   ` Tom Lendacky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox