kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests
       [not found] <20210913155603.28383-1-joro@8bytes.org>
@ 2021-09-13 16:02 ` Dave Hansen
  2021-09-13 16:14   ` Joerg Roedel
       [not found] ` <20210913155603.28383-2-joro@8bytes.org>
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2021-09-13 16:02 UTC (permalink / raw)
  To: Joerg Roedel, x86
  Cc: Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On 9/13/21 8:55 AM, Joerg Roedel wrote:
> This does not work under SEV-ES, because the hypervisor has no access
> to the vCPU registers and can't make modifications to them. So an
> SEV-ES guest needs to reset the vCPU itself and park it using the
> AP-reset-hold protocol. Upon wakeup the guest needs to jump to
> real-mode and to the reset-vector configured in the AP-Jump-Table.

How does this end up looking to an end user that tries to kexec() from a
an SEV-ES kernel?  Does it just hang?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests
  2021-09-13 16:02 ` [PATCH v2 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests Dave Hansen
@ 2021-09-13 16:14   ` Joerg Roedel
  2021-09-13 16:21     ` Dave Hansen
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2021-09-13 16:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Joerg Roedel, x86, Eric Biederman, kexec, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 09:02:38AM -0700, Dave Hansen wrote:
> On 9/13/21 8:55 AM, Joerg Roedel wrote:
> > This does not work under SEV-ES, because the hypervisor has no access
> > to the vCPU registers and can't make modifications to them. So an
> > SEV-ES guest needs to reset the vCPU itself and park it using the
> > AP-reset-hold protocol. Upon wakeup the guest needs to jump to
> > real-mode and to the reset-vector configured in the AP-Jump-Table.
> 
> How does this end up looking to an end user that tries to kexec() from a
> an SEV-ES kernel?  Does it just hang?

Yes, the kexec will just hang. This patch-set contains code to disable
the kexec syscalls in situations where it would not work for that
reason.

Actually with the changes to the decompressor in this patch-set the
kexec'ed kernel could boot, but would fail to bring up all the APs.

Regards,

	Joerg

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests
  2021-09-13 16:14   ` Joerg Roedel
@ 2021-09-13 16:21     ` Dave Hansen
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Hansen @ 2021-09-13 16:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, x86, Eric Biederman, kexec, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On 9/13/21 9:14 AM, Joerg Roedel wrote:
> On Mon, Sep 13, 2021 at 09:02:38AM -0700, Dave Hansen wrote:
>> On 9/13/21 8:55 AM, Joerg Roedel wrote:
>>> This does not work under SEV-ES, because the hypervisor has no access
>>> to the vCPU registers and can't make modifications to them. So an
>>> SEV-ES guest needs to reset the vCPU itself and park it using the
>>> AP-reset-hold protocol. Upon wakeup the guest needs to jump to
>>> real-mode and to the reset-vector configured in the AP-Jump-Table.
>> How does this end up looking to an end user that tries to kexec() from a
>> an SEV-ES kernel?  Does it just hang?
> Yes, the kexec will just hang. This patch-set contains code to disable
> the kexec syscalls in situations where it would not work for that
> reason.

Got it.  The end-user-visible symptom just wasn't obvious.  If you
revise these, it might be nice to add that so that folks who cherry-pick
stable patches or update to new stable kernels have an idea what this
might fix.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime
       [not found] ` <20210913155603.28383-2-joro@8bytes.org>
@ 2021-11-01 16:10   ` Borislav Petkov
  2021-11-01 21:11     ` Eric W. Biederman
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2021-11-01 16:10 UTC (permalink / raw)
  To: Eric Biederman
  Cc: Joerg Roedel, x86, kexec, Joerg Roedel, stable, hpa,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:55:52PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> Allow a runtime opt-out of kexec support for architecture code in case
> the kernel is running in an environment where kexec is not properly
> supported yet.
> 
> This will be used on x86 when the kernel is running as an SEV-ES
> guest. SEV-ES guests need special handling for kexec to hand over all
> CPUs to the new kernel. This requires special hypervisor support and
> handling code in the guest which is not yet implemented.
> 
> Cc: stable@vger.kernel.org # v5.10+
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  include/linux/kexec.h |  1 +
>  kernel/kexec.c        | 14 ++++++++++++++
>  kernel/kexec_file.c   |  9 +++++++++
>  3 files changed, 24 insertions(+)

I guess I can take this through the tip tree along with the next one.

Eric?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime
  2021-11-01 16:10   ` [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime Borislav Petkov
@ 2021-11-01 21:11     ` Eric W. Biederman
  2021-11-02 16:37       ` Joerg Roedel
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Eric W. Biederman @ 2021-11-01 21:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, x86, kexec, Joerg Roedel, stable, hpa,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

Borislav Petkov <bp@alien8.de> writes:

> On Mon, Sep 13, 2021 at 05:55:52PM +0200, Joerg Roedel wrote:
>> From: Joerg Roedel <jroedel@suse.de>
>> 
>> Allow a runtime opt-out of kexec support for architecture code in case
>> the kernel is running in an environment where kexec is not properly
>> supported yet.
>> 
>> This will be used on x86 when the kernel is running as an SEV-ES
>> guest. SEV-ES guests need special handling for kexec to hand over all
>> CPUs to the new kernel. This requires special hypervisor support and
>> handling code in the guest which is not yet implemented.
>> 
>> Cc: stable@vger.kernel.org # v5.10+
>> Signed-off-by: Joerg Roedel <jroedel@suse.de>
>> ---
>>  include/linux/kexec.h |  1 +
>>  kernel/kexec.c        | 14 ++++++++++++++
>>  kernel/kexec_file.c   |  9 +++++++++
>>  3 files changed, 24 insertions(+)
>
> I guess I can take this through the tip tree along with the next one.

I seem to remember the consensus when this was reviewed that it was
unnecessary and there is already support for doing something like
this at a more fine grained level so we don't need a new kexec hook.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime
  2021-11-01 21:11     ` Eric W. Biederman
@ 2021-11-02 16:37       ` Joerg Roedel
  2021-11-02 17:00       ` Joerg Roedel
  2021-11-02 17:17       ` Borislav Petkov
  2 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2021-11-02 16:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Borislav Petkov, Joerg Roedel, x86, kexec, stable, hpa,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
> I seem to remember the consensus when this was reviewed that it was
> unnecessary and there is already support for doing something like
> this at a more fine grained level so we don't need a new kexec hook.

It was a discussion, no consenus :)

I still think it is better to solve this in generic code for everybody
to re-use than with an hack in the architecture hooks.

More and more platforms which enable confidential computing features
may need this hook in the future.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime
  2021-11-01 21:11     ` Eric W. Biederman
  2021-11-02 16:37       ` Joerg Roedel
@ 2021-11-02 17:00       ` Joerg Roedel
  2021-11-02 18:17         ` Eric W. Biederman
  2021-11-02 17:17       ` Borislav Petkov
  2 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2021-11-02 17:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Borislav Petkov, Joerg Roedel, x86, kexec, stable, hpa,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

Hi again,

On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
> I seem to remember the consensus when this was reviewed that it was
> unnecessary and there is already support for doing something like
> this at a more fine grained level so we don't need a new kexec hook.

Forgot to state to problem again which these patches solve:

Currently a Linux kernel running as an SEV-ES guest has no way to
successfully kexec into a new kernel. The normal SIPI sequence to reset
the non-boot VCPUs does not work in SEV-ES guests and special code is
needed in Linux to safely hand over the VCPUs from one kernel to the
next. What happens currently is that the kexec'ed kernel will just hang.

The code which implements the VCPU hand-over is also included in this
patch-set, but it requires a certain level of Hypervisor support which
is not available everywhere.

To make it clear to the user that kexec will not work in their
environment, it is best to disable the respected syscalls. This is what
the hook is needed for.

Regards,

-- 
Jörg Rödel
jroedel@suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 Nürnberg
Germany
 
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime
  2021-11-01 21:11     ` Eric W. Biederman
  2021-11-02 16:37       ` Joerg Roedel
  2021-11-02 17:00       ` Joerg Roedel
@ 2021-11-02 17:17       ` Borislav Petkov
  2 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-11-02 17:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Joerg Roedel, x86, kexec, Joerg Roedel, stable, hpa,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
> I seem to remember the consensus when this was reviewed that it was
> unnecessary and there is already support for doing something like
> this at a more fine grained level so we don't need a new kexec hook.

Well, the executive summary is that you have a guest whose memory *and*
registers are encrypted so the hypervisor cannot have a poke inside and
reset the vCPU like it would normally do. So you need to do that dance
differently, i.e, the patchset.

If you try to kexec such a guest now, it'll init only the BSP, as Joerg
said. So I guess a single-threaded kdump.

And yes, one of the prominent use cases is kdumping from such a guest,
as distros love doing kdump for debugging.

I hope that explains it better.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime
  2021-11-02 17:00       ` Joerg Roedel
@ 2021-11-02 18:17         ` Eric W. Biederman
  0 siblings, 0 replies; 18+ messages in thread
From: Eric W. Biederman @ 2021-11-02 18:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Borislav Petkov, Joerg Roedel, x86, kexec, stable, hpa,
	Andy Lutomirski, Dave Hansen, Peter Zijlstra, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	David Rientjes, Cfir Cohen, Erdem Aktas, Masami Hiramatsu,
	Mike Stunes, Sean Christopherson, Martin Radev, Arvind Sankar,
	linux-coco, linux-kernel, kvm, virtualization

Joerg Roedel <jroedel@suse.de> writes:

> Hi again,
>
> On Mon, Nov 01, 2021 at 04:11:42PM -0500, Eric W. Biederman wrote:
>> I seem to remember the consensus when this was reviewed that it was
>> unnecessary and there is already support for doing something like
>> this at a more fine grained level so we don't need a new kexec hook.
>
> Forgot to state to problem again which these patches solve:
>
> Currently a Linux kernel running as an SEV-ES guest has no way to
> successfully kexec into a new kernel. The normal SIPI sequence to reset
> the non-boot VCPUs does not work in SEV-ES guests and special code is
> needed in Linux to safely hand over the VCPUs from one kernel to the
> next. What happens currently is that the kexec'ed kernel will just hang.
>
> The code which implements the VCPU hand-over is also included in this
> patch-set, but it requires a certain level of Hypervisor support which
> is not available everywhere.
>
> To make it clear to the user that kexec will not work in their
> environment, it is best to disable the respected syscalls. This is what
> the hook is needed for.

Note this is environmental.  This is the equivalent of a driver for a
device without some feature.

The kernel already has machine_kexec_prepare, which is perfectly capable
of detecting this is a problem and causing kexec_load to fail.  Which
is all that is required.

We don't need a new hook and a new code path to test for one
architecture.

So when we can reliably cause the system call to fail with a specific
error code I don't think it makes sense to make clutter up generic code
because of one architecture's design mistakes.


My honest preference would be to go farther and have a
firmware/hypervisor/platform independent rendezvous for the cpus so we
don't have to worry about what bugs the code under has implemented for
this special case.  Because frankly there when there are layers of
software if a bug can slip through it always seems to and causes
problems.


But definitely there is no reason to add another generic hook when the
existing hook is quite good enough.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/12] x86/sev: Save and print negotiated GHCB protocol version
       [not found] ` <20210913155603.28383-4-joro@8bytes.org>
@ 2021-11-03 14:27   ` Borislav Petkov
  2022-01-26  9:27     ` Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2021-11-03 14:27 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:55:54PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> Save the results of the GHCB protocol negotiation into a data structure
> and print information about versions supported and used to the kernel
> log.

Which is useful for?

> +/*
> + * struct sev_ghcb_protocol_info - Used to return GHCB protocol
> + *				   negotiation details.
> + *
> + * @hv_proto_min:	Minimum GHCB protocol version supported by Hypervisor
> + * @hv_proto_max:	Maximum GHCB protocol version supported by Hypervisor
> + * @vm_proto:		Protocol version the VM (this kernel) will use
> + */
> +struct sev_ghcb_protocol_info {

Too long a name - ghcb_info is perfectly fine.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 05/12] x86/sev: Use GHCB protocol version 2 if supported
       [not found] ` <20210913155603.28383-6-joro@8bytes.org>
@ 2021-11-03 16:05   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-11-03 16:05 UTC (permalink / raw)
  To: Joerg Roedel, Brijesh Singh
  Cc: x86, Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:55:56PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> Check whether the hypervisor supports GHCB version 2 and use it if
> available.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/boot/compressed/sev.c | 10 ++++++++--
>  arch/x86/include/asm/sev.h     |  4 ++--
>  arch/x86/kernel/sev-shared.c   | 17 ++++++++++++++---
>  3 files changed, 24 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 101e08c67296..7f8416f76be7 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -119,16 +119,22 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
>  /* Include code for early handlers */
>  #include "../../kernel/sev-shared.c"
>  
> +static unsigned int ghcb_protocol;

I guess you need to sync up with Brijesh on what to use:

https://lore.kernel.org/r/20211008180453.462291-7-brijesh.singh@amd.com

And if ghcb_version there is __ro_after_init I think that's perfectly
fine and doesn't need an accessor...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 06/12] x86/sev: Cache AP Jump Table Address
       [not found] ` <20210913155603.28383-7-joro@8bytes.org>
@ 2021-11-08 18:14   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-11-08 18:14 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:55:57PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> Store the physical address of the AP Jump Table in kernel memory so
> that it does not need to be fetched from the Hypervisor again.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/kernel/sev.c | 26 ++++++++++++++------------
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index 5d3422e8b25e..eedba56b6bac 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -42,6 +42,9 @@ static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>   */
>  static struct ghcb __initdata *boot_ghcb;
>  
> +/* Cached AP Jump Table Address */
> +static phys_addr_t sev_es_jump_table_pa;

This is static, so "jump_table_pa" should be enough.

Also, to the prefixes, everything which is not SEV-ES only, should be
simply prefixed with "sev_" if externally visible.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 07/12] x86/sev: Setup code to park APs in the AP Jump Table
       [not found] ` <20210913155603.28383-8-joro@8bytes.org>
@ 2021-11-10 16:37   ` Borislav Petkov
  2022-01-26 14:26     ` Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2021-11-10 16:37 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:55:58PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> The AP Jump Table under SEV-ES contains the reset vector where non-boot
> CPUs start executing when coming out of reset. This means that a CPU
> coming out of the AP-reset-hold VMGEXIT also needs to start executing at
> the reset vector stored in the AP Jump Table.
> 
> The problem is to find a safe place to put the real-mode code which
> executes the VMGEXIT and jumps to the reset vector. The code can not be
> in kernel memory, because after kexec that memory is owned by the new
> kernel and the code might have been overwritten.
> 
> Fortunately the AP Jump Table itself is a safe place, because the
> memory is not owned by the OS and will not be overwritten by a new
> kernel started through kexec. The table is 4k in size and only the
> first 4 bytes are used for the reset vector. This leaves enough space
> for some 16-bit code to do the job and even a small stack.

"The AP jump table must be 4K in size, in encrypted memory and it must
be 4K (page) aligned. There can only be one AP jump table and it should
reside in memory that has been marked as reserved by UEFI."

I think we need to state in the spec that some of that space can be used
by the OS so that future changes to the spec do not cause trouble.

> Install 16-bit code into the AP Jump Table under SEV-ES after the APs
> have been brought up. The code will do an AP-reset-hold VMGEXIT and jump
> to the reset vector after being woken up.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/realmode.h         |   2 +
>  arch/x86/include/asm/sev-ap-jumptable.h |  25 +++++
>  arch/x86/kernel/sev.c                   | 105 +++++++++++++++++++
>  arch/x86/realmode/Makefile              |   9 +-
>  arch/x86/realmode/rmpiggy.S             |   6 ++
>  arch/x86/realmode/sev/Makefile          |  41 ++++++++
>  arch/x86/realmode/sev/ap_jump_table.S   | 130 ++++++++++++++++++++++++
>  arch/x86/realmode/sev/ap_jump_table.lds |  24 +++++
>  8 files changed, 341 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
>  create mode 100644 arch/x86/realmode/sev/Makefile
>  create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
>  create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds
> 
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index 5db5d083c873..29590a4ddf24 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -62,6 +62,8 @@ extern unsigned long initial_gs;
>  extern unsigned long initial_stack;
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  extern unsigned long initial_vc_handler;
> +extern unsigned char rm_ap_jump_table_blob[];
> +extern unsigned char rm_ap_jump_table_blob_end[];
>  #endif
>  
>  extern unsigned char real_mode_blob[];
> diff --git a/arch/x86/include/asm/sev-ap-jumptable.h b/arch/x86/include/asm/sev-ap-jumptable.h
> new file mode 100644
> index 000000000000..1c8b2ce779e2
> --- /dev/null
> +++ b/arch/x86/include/asm/sev-ap-jumptable.h

Why a separate header? arch/x86/include/asm/sev.h looks small enough.

> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD Encrypted Register State Support
> + *
> + * Author: Joerg Roedel <jroedel@suse.de>
> + */
> +#ifndef __ASM_SEV_AP_JUMPTABLE_H
> +#define __ASM_SEV_AP_JUMPTABLE_H
> +
> +#define	SEV_APJT_CS16	0x8
> +#define	SEV_APJT_DS16	0x10
> +
> +#define SEV_APJT_ENTRY	0x10
> +
> +#ifndef __ASSEMBLY__
> +
> +struct sev_ap_jump_table_header {
> +	u16	reset_ip;
> +	u16	reset_cs;
> +	u16	gdt_offset;

I guess you should state that the first two members are as the spec
mandates and cannot be moved around or changed or so.

Also, this gdt_offset thing looks like it wants to be ap_jumptable_gdt,
no?

> +};
> +
> +#endif /* !__ASSEMBLY__ */
> +
> +#endif /* __ASM_SEV_AP_JUMPTABLE_H */
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index eedba56b6bac..a98eab926682 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -19,6 +19,7 @@
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  
> +#include <asm/sev-ap-jumptable.h>
>  #include <asm/cpu_entry_area.h>
>  #include <asm/stacktrace.h>
>  #include <asm/sev.h>
> @@ -45,6 +46,9 @@ static struct ghcb __initdata *boot_ghcb;
>  /* Cached AP Jump Table Address */
>  static phys_addr_t sev_es_jump_table_pa;
>  
> +/* Whether the AP Jump Table blob was successfully installed */
> +static bool sev_ap_jumptable_blob_installed __ro_after_init;
> +
>  /* #VC handler runtime per-CPU data */
>  struct sev_es_runtime_data {
>  	struct ghcb ghcb_page;
> @@ -749,6 +753,107 @@ static void __init sev_es_setup_play_dead(void)
>  static inline void sev_es_setup_play_dead(void) { }
>  #endif
>  
> +/*
> + * This function make the necessary runtime changes to the AP Jump Table blob.

s/This function make/Make/

Ditto for the other "This function" below.

> + * For now this only sets up the GDT used while the code executes. The GDT needs
> + * to contain 16-bit code and data segments with a base that points to AP Jump
> + * Table page.
> + */
> +void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa)

Why is this a separate function?

It is all part of the jump table setup.

> +	struct sev_ap_jump_table_header *header;
> +	struct desc_ptr *gdt_descr;
> +	u64 *ap_jumptable_gdt;
> +
> +	header = base;
> +
> +	/*
> +	 * Setup 16-bit protected mode code and data segments for AP Jumptable.
> +	 * Set the segment limits to 0xffff to already be compatible with
> +	 * real-mode.
> +	 */
> +	ap_jumptable_gdt = (u64 *)(base + header->gdt_offset);
> +	ap_jumptable_gdt[SEV_APJT_CS16 / 8] = GDT_ENTRY(0x9b, pa, 0xffff);
> +	ap_jumptable_gdt[SEV_APJT_DS16 / 8] = GDT_ENTRY(0x93, pa, 0xffff);
> +
> +	/* Write correct GDT base address into GDT descriptor */
> +	gdt_descr = (struct desc_ptr *)(base + header->gdt_offset);
> +	gdt_descr->address += pa;
> +}
> +
> +/*
> + * This function sets up the AP Jump Table blob which contains code which runs
> + * in 16-bit protected mode to park an AP. After the AP is woken up again the
> + * code will disable protected mode and jump to the reset vector which is also
> + * stored in the AP Jump Table.
> + *
> + * The Jump Table is a safe place to park an AP, because it is owned by the
> + * BIOS and writable by the OS. Putting the code in kernel memory would break
> + * with kexec, because by the time th APs wake up the memory is owned by

				     the

> + * the new kernel, and possibly already overwritten.
> + *
> + * Kexec is also the reason this function is called as an init-call after SMP

s/called as //

> + * bringup. Only after all CPUs are up there is a guarantee that no AP is still
> + * parked in AP jump-table code.
> + */
> +static int __init sev_es_setup_ap_jump_table_blob(void)

Everywhere: use prefix sev_ pls. IOW:

		sev_setup_ap_jump_table()

plain and simple.

> +{
> +	size_t blob_size = rm_ap_jump_table_blob_end - rm_ap_jump_table_blob;
> +	u16 startup_cs, startup_ip;
> +	u16 __iomem *jump_table;
> +	phys_addr_t pa;
> +
> +	if (!sev_es_active())

	if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))

> +		return 0;
> +
> +	if (sev_get_ghcb_proto_ver() < 2) {
> +		pr_info("AP Jump Table parking requires at least GHCB protocol version 2\n");

Not pr_warn?

Also, can we drop everywhere this first-letter capitalized spelling?

		AP jump table parking...

is ok already.

> +		return 0;

Why are you returning 0 here and below?

> +	}
> +
> +	pa = get_jump_table_addr();
> +
> +	/* Overflow and size checks for untrusted Jump Table address */

	/* Check overflow and size...

> +	if (pa + PAGE_SIZE < pa || pa + PAGE_SIZE > SZ_4G) {
> +		pr_info("AP Jump Table is above 4GB - not enabling AP Jump Table parking\n");

That error message needs to say about the overflow too.

> +		return 0;
> +	}
> +
> +	/* On UP guests there is no jump table so this is not a failure */
> +	if (!pa)
> +		return 0;

So this check needs to happen right after the get_ call.

> +
> +	jump_table = ioremap_encrypted(pa, PAGE_SIZE);
> +	if (WARN_ON(!jump_table))

> +		return -EINVAL;
> +
> +	/*
> +	 * Safe reset vector to restore it later because the blob will

	   Save...

> +	 * overwrite it.
> +	 */
> +	startup_ip = jump_table[0];
> +	startup_cs = jump_table[1];
> +
> +	/* Install AP Jump Table Blob with real mode AP parking code */
> +	memcpy_toio(jump_table, rm_ap_jump_table_blob, blob_size);
> +
> +	/* Setup AP Jumptable GDT */
> +	sev_es_setup_ap_jump_table_data(jump_table, (u32)pa);
> +
> +	writew(startup_ip, &jump_table[0]);
> +	writew(startup_cs, &jump_table[1]);
> +
> +	iounmap(jump_table);
> +
> +	pr_info("AP Jump Table Blob successfully set up\n");
> +
> +	/* Mark AP Jump Table blob as available */
> +	sev_ap_jumptable_blob_installed = true;

I don't like those random boolean variables all over the place but at
least it is static.

> +
> +	return 0;
> +}
> +core_initcall(sev_es_setup_ap_jump_table_blob);
> +
>  static void __init alloc_runtime_data(int cpu)
>  {
>  	struct sev_es_runtime_data *data;
> diff --git a/arch/x86/realmode/Makefile b/arch/x86/realmode/Makefile
> index a0b491ae2de8..00f3cceb9580 100644
> --- a/arch/x86/realmode/Makefile
> +++ b/arch/x86/realmode/Makefile
> @@ -11,12 +11,19 @@
>  KASAN_SANITIZE			:= n
>  KCSAN_SANITIZE			:= n
>  
> +RMPIGGY-y				 = $(obj)/rm/realmode.bin
> +RMPIGGY-$(CONFIG_AMD_MEM_ENCRYPT)	+= $(obj)/sev/ap_jump_table.bin
> +
>  subdir- := rm
> +subdir- := sev
>  
>  obj-y += init.o
>  obj-y += rmpiggy.o
>  
> -$(obj)/rmpiggy.o: $(obj)/rm/realmode.bin
> +$(obj)/rmpiggy.o: $(RMPIGGY-y)
>  
>  $(obj)/rm/realmode.bin: FORCE
>  	$(Q)$(MAKE) $(build)=$(obj)/rm $@
> +
> +$(obj)/sev/ap_jump_table.bin: FORCE
> +	$(Q)$(MAKE) $(build)=$(obj)/sev $@
> diff --git a/arch/x86/realmode/rmpiggy.S b/arch/x86/realmode/rmpiggy.S
> index c8fef76743f6..a659f98617ff 100644
> --- a/arch/x86/realmode/rmpiggy.S
> +++ b/arch/x86/realmode/rmpiggy.S
> @@ -17,3 +17,9 @@ SYM_DATA_END_LABEL(real_mode_blob, SYM_L_GLOBAL, real_mode_blob_end)
>  SYM_DATA_START(real_mode_relocs)
>  	.incbin	"arch/x86/realmode/rm/realmode.relocs"
>  SYM_DATA_END(real_mode_relocs)
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +SYM_DATA_START(rm_ap_jump_table_blob)
> +	.incbin "arch/x86/realmode/sev/ap_jump_table.bin"
> +SYM_DATA_END_LABEL(rm_ap_jump_table_blob, SYM_L_GLOBAL, rm_ap_jump_table_blob_end)
> +#endif
> diff --git a/arch/x86/realmode/sev/Makefile b/arch/x86/realmode/sev/Makefile
> new file mode 100644
> index 000000000000..5a96a518ccb3
> --- /dev/null
> +++ b/arch/x86/realmode/sev/Makefile
> @@ -0,0 +1,41 @@

<--- # SPDX-License-Identifier: GPL-2.0

We don't do that GPL text anymore.

> +#
> +# arch/x86/sev/Makefile
> +#
> +# This file is subject to the terms and conditions of the GNU General Public
> +# License.  See the file "COPYING" in the main directory of this archive
> +# for more details.
> +#
> +
> +# Sanitizer runtimes are unavailable and cannot be linked here.
> +KASAN_SANITIZE			:= n
> +KCSAN_SANITIZE			:= n
> +OBJECT_FILES_NON_STANDARD	:= y
> +
> +# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> +KCOV_INSTRUMENT		:= n
> +
> +always-y := ap_jump_table.bin
> +
> +ap_jump_table-y				+= ap_jump_table.o

The vertical alignment of those is kinda random. Please unify.

> +
> +targets	+= $(ap_jump_table-y)
> +
> +APJUMPTABLE_OBJS = $(addprefix $(obj)/,$(ap_jump_table-y))
> +
> +LDFLAGS_ap_jump_table.elf := -m elf_i386 -T
> +
> +targets += ap_jump_table.elf
> +$(obj)/ap_jump_table.elf: $(obj)/ap_jump_table.lds $(APJUMPTABLE_OBJS) FORCE
> +	$(call if_changed,ld)
> +
> +OBJCOPYFLAGS_ap_jump_table.bin := -O binary
> +
> +targets += ap_jump_table.bin
> +$(obj)/ap_jump_table.bin: $(obj)/ap_jump_table.elf FORCE
> +	$(call if_changed,objcopy)
> +
> +# ---------------------------------------------------------------------------
> +
> +KBUILD_AFLAGS	:= $(REALMODE_CFLAGS) -D__ASSEMBLY__
> +GCOV_PROFILE := n
> +UBSAN_SANITIZE := n
> diff --git a/arch/x86/realmode/sev/ap_jump_table.S b/arch/x86/realmode/sev/ap_jump_table.S
> new file mode 100644
> index 000000000000..547cb363bb94
> --- /dev/null
> +++ b/arch/x86/realmode/sev/ap_jump_table.S
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include <linux/linkage.h>
> +#include <asm/sev-ap-jumptable.h>
> +
> +/*
> + * This file contains the source code for the binary blob which gets copied to
> + * the SEV-ES AP Jumptable to park APs while offlining CPUs or booting a new

I've seen "Jumptable", "Jump Table" and "jump table" at least. I'd say, do
the last one everywhere pls.

> + * kernel via KEXEC.
> + *
> + * The AP Jumptable is the only safe place to put this code, as any memory the
> + * kernel allocates will be owned (and possibly overwritten) by the new kernel
> + * once the APs are woken up.
> + *
> + * This code runs in 16-bit protected mode, the CS, DS, and SS segment bases are
> + * set to the beginning of the AP Jumptable page.
> + *
> + * Since the GDT will also be gone when the AP wakes up, this blob contains its
> + * own GDT, which is set up by the AP Jumptable setup code with the correct
> + * offsets.
> + *
> + * Author: Joerg Roedel <jroedel@suse.de>
> + */
> +
> +	.text
> +	.org 0x0
> +	.code16
> +SYM_DATA_START(ap_jumptable_header)
> +	.word	0			/* reset IP */
> +	.word	0			/* reset CS */
> +	.word	ap_jumptable_gdt	/* GDT Offset   */
> +SYM_DATA_END(ap_jumptable_header)
> +
> +	.org	SEV_APJT_ENTRY

So this hardcodes the fact that the first 16 bytes are header and the
rest is free game. I think the spec needs to play along here...

> +SYM_CODE_START(ap_park_asm)

This whole file is asm. I guess simply "ap_park" is enough.

> +
> +	/* Switch to AP Jumptable GDT first */
> +	lgdtl	ap_jumptable_gdt
> +
> +	/* Reload CS */
> +	ljmpw	$SEV_APJT_CS16, $1f
> +1:
> +
> +	/* Reload DS and SS */
> +	movl	$SEV_APJT_DS16, %ecx
> +	movl	%ecx, %ds
> +	movl	%ecx, %ss
> +
> +	/*
> +	 * Setup a stack pointing to the end of the AP Jumptable page.
> +	 * The stack is needed ot reset EFLAGS after wakeup.

s/ot/to/

> +	 */
> +	movl	$0x1000, %esp
> +
> +	/* Execute AP reset hold VMGEXIT */
> +2:	xorl	%edx, %edx
> +	movl	$0x6, %eax
> +	movl	$0xc0010130, %ecx

MSR_AMD64_SEV_ES_GHCB

> +	wrmsr
> +	rep; vmmcall
> +	rdmsr
> +	movl	%eax, %ecx
> +	andl	$0xfff, %ecx
> +	cmpl	$0x7, %ecx
> +	jne	2b
> +	shrl	$12, %eax
> +	jnz	3f
> +	testl	%edx, %edx
> +	jnz	3f
> +	jmp	2b

You usually document your asm pretty nicely but those after the RDMSR
are a bit lacking...

> +3:
> +	/*
> +	 * Successfully woken up - Patch the correct target into the far jump at

				   patch

> +	 * the end. An indirect far jump does not work here, because at the time
> +	 * the jump is executed DS is already loaded with real-mode values.
> +	 */
> +
> +	/* Jump target is at address 0x0 - copy it to the far jump instruction */
> +	movl	$0, %ecx
> +	movl	(%ecx), %eax
> +	movl	%eax, jump_target
> +
> +	/* Reset EFLAGS */
> +	pushl	$2

I'm assuming that two is bit 1 in rFLAGS which is always 1? Comment pls.

> +	popfl
> +
> +	/* Setup DS and SS for real-mode */
> +	movl	$0x18, %ecx
> +	movl	%ecx, %ds
> +	movl	%ecx, %ss
> +
> +	/* Reset remaining registers */
> +	movl	$0, %esp
> +	movl	$0, %eax
> +	movl	$0, %ebx
> +	movl	$0, %edx

All 4: use xor

> +
> +	/* Reset CR0 to get out of protected mode */
> +	movl	$0x60000010, %ecx

Another magic naked number.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 08/12] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
       [not found] ` <20210913155603.28383-9-joro@8bytes.org>
@ 2021-11-12 16:33   ` Borislav Petkov
  2022-01-27  9:01     ` Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2021-11-12 16:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:55:59PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> GHCB protocol version 2 adds the MSR-based AP-reset-hold VMGEXIT which
> does not need a GHCB. Use that to park APs in 16-bit protected mode on
> the AP Jump Table.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/realmode.h    |  3 +
>  arch/x86/kernel/sev.c              | 48 ++++++++++++++--
>  arch/x86/realmode/rm/Makefile      | 11 ++--
>  arch/x86/realmode/rm/header.S      |  3 +
>  arch/x86/realmode/rm/sev_ap_park.S | 89 ++++++++++++++++++++++++++++++
>  5 files changed, 144 insertions(+), 10 deletions(-)
>  create mode 100644 arch/x86/realmode/rm/sev_ap_park.S
> 
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index 29590a4ddf24..668de0a8b1ae 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -23,6 +23,9 @@ struct real_mode_header {
>  	u32	trampoline_header;
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  	u32	sev_es_trampoline_start;
> +	u32	sev_real_ap_park_asm;

sev_ap_park;

> +	u32	sev_real_ap_park_seg;

sev_ap_park_seg;

> +	u32	sev_ap_park_gdt;

Yap, like thist one.

>  #endif
>  #ifdef CONFIG_X86_64
>  	u32	trampoline_pgd;
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index a98eab926682..20b439986d86 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -27,6 +27,7 @@
>  #include <asm/fpu/internal.h>
>  #include <asm/processor.h>
>  #include <asm/realmode.h>
> +#include <asm/tlbflush.h>
>  #include <asm/traps.h>
>  #include <asm/svm.h>
>  #include <asm/smp.h>
> @@ -695,6 +696,35 @@ static bool __init sev_es_setup_ghcb(void)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +void __noreturn sev_jumptable_ap_park(void)
> +{
> +	local_irq_disable();
> +
> +	write_cr3(real_mode_header->trampoline_pgd);
> +
> +	/* Exiting long mode will fail if CR4.PCIDE is set. */
> +	if (boot_cpu_has(X86_FEATURE_PCID))

cpu_feature_enabled() is what we use everywhere now.

> +		cr4_clear_bits(X86_CR4_PCIDE);
> +
> +	asm volatile("xorq	%%r15, %%r15\n"
> +		     "xorq	%%r14, %%r14\n"
> +		     "xorq	%%r13, %%r13\n"
> +		     "xorq	%%r12, %%r12\n"
> +		     "xorq	%%r11, %%r11\n"
> +		     "xorq	%%r10, %%r10\n"
> +		     "xorq	%%r9,  %%r9\n"
> +		     "xorq	%%r8,  %%r8\n"
> +		     "xorq	%%rsi, %%rsi\n"
> +		     "xorq	%%rdi, %%rdi\n"
> +		     "xorq	%%rsp, %%rsp\n"
> +		     "xorq	%%rbp, %%rbp\n"

Use xorl and the 32-bit regs is enough - zero extension.

> +		     "ljmpl	*%0" : :
> +		     "m" (real_mode_header->sev_real_ap_park_asm),
> +		     "b" (sev_es_jump_table_pa >> 4));

In any case, this asm needs comments: why those regs, why
sev_es_jump_table_pa >> 4 in rbx (I found later in the patch why) and so
on.

> diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
> index 8c1db5bf5d78..6c17f8fd1eb4 100644
> --- a/arch/x86/realmode/rm/header.S
> +++ b/arch/x86/realmode/rm/header.S
> @@ -22,6 +22,9 @@ SYM_DATA_START(real_mode_header)
>  	.long	pa_trampoline_header
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  	.long	pa_sev_es_trampoline_start
> +	.long	pa_sev_ap_park_asm
> +	.long	__KERNEL32_CS
> +	.long	pa_sev_ap_park_gdt;
>  #endif
>  #ifdef CONFIG_X86_64
>  	.long	pa_trampoline_pgd;
> diff --git a/arch/x86/realmode/rm/sev_ap_park.S b/arch/x86/realmode/rm/sev_ap_park.S

arch/x86/realmode/rm/sev.S

is perfectly fine I guess.

> new file mode 100644
> index 000000000000..0b63d0569d4d
> --- /dev/null
> +++ b/arch/x86/realmode/rm/sev_ap_park.S
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/page_types.h>
> +#include <asm/processor-flags.h>
> +#include <asm/msr-index.h>
> +#include <asm/sev-ap-jumptable.h>
> +#include "realmode.h"
> +
> +	.section ".text32", "ax"
> +	.code32
> +/*

"This is executed by ... when ... "

> + * The following code switches to 16-bit protected mode and sets up the
> + * execution environment for the AP Jump Table blob. Then it jumps to the AP
> + * Jump Table to park the AP.
> + *
> + * The code was copied from reboot.S and modified to fit the SEV-ES requirements
> + * for AP parking.

That sentence belongs at most in the commit message.

> When this code is entered, all registers except %EAX-%EDX are

%eax, etc. Lowercase pls.

> + * in reset state.
> + *
> + * The AP Jump Table physical base address is in %EBX upon entry.
> + *
> + * %EAX, %ECX, %EDX and EFLAGS are undefined. Only use registers %EAX-%EDX and
> + * %ESP in this code.
> + */
> +SYM_CODE_START(sev_ap_park_asm)

sev_ap_park

> +
> +	/* Switch to trampoline GDT as it is guaranteed < 4 GiB */
> +	movl	$__KERNEL_DS, %eax
> +	movl	%eax, %ds
> +	lgdt	pa_tr_gdt
> +
> +	/* Disable paging to drop us out of long mode */
> +	movl	%cr0, %eax
> +	btcl	$X86_CR0_PG_BIT, %eax
> +	movl	%eax, %cr0
> +

	/* Start executing from 32-bit addresses or so, I guess...

> +	ljmpl	$__KERNEL32_CS, $pa_sev_ap_park_paging_off

Please add a comment also about those pa_ things because they look like
magic but they're sed-generated into arch/x86/realmode/rm/pasyms.h by
the Makefile in that same dir.

> +SYM_INNER_LABEL(sev_ap_park_paging_off, SYM_L_GLOBAL)

Global symbol but used only in this file. .L-prefix then?

> +	/* Clear EFER */
> +	movl	$0, %eax
> +	movl	$0, %edx

both:	xorl

> +	movl	$MSR_EFER, %ecx
> +	wrmsr
> +
> +	/* Clear CR3 */
> +	movl	$0, %ecx

ditto

> +	movl	%ecx, %cr3
> +
> +	/* Set up the IDT for real mode. */
> +	lidtl	pa_machine_real_restart_idt
> +
> +	/*
> +	 * Load the GDT with the 16-bit segments for the AP Jump Table
> +	 */

	/* Load the GDT with the 16-bit segments for the AP Jump Table  */

works too.

> +	lgdtl	pa_sev_ap_park_gdt
> +
> +	/* Setup Code and Data segments for AP Jump Table */

	... code and data segments ...

you have been reading too much vendor text where they love to capitalize
everything.

> +	movw	$SEV_APJT_DS16, %ax
> +	movw	%ax, %ds
> +	movw	%ax, %ss
> +
> +	/* Jump to the AP Jump Table into 16 bit protected mode */
> +	ljmpw	$SEV_APJT_CS16, $SEV_APJT_ENTRY
> +SYM_CODE_END(sev_ap_park_asm)
> +
> +	.data
> +	.balign	16
> +SYM_DATA_START(sev_ap_park_gdt)
> +	/* Self-pointer */
> +	.word	sev_ap_park_gdt_end - sev_ap_park_gdt - 1
> +	.long	pa_sev_ap_park_gdt
> +	.word	0
> +
> +	/*
> +	 * Offset 0x8
> +	 * 32 bit code segment descriptor pointing to AP Jump table base
> +	 * Setup at runtime in sev_es_setup_ap_jump_table_data().
> +	 */
> +	.quad	0
> +
> +	/*
> +	 * Offset 0x10
> +	 * 32 bit data segment descriptor pointing to AP Jump table base
> +	 * Setup at runtime in sev_es_setup_ap_jump_table_data().
> +	 */
> +	.quad	0
> +SYM_DATA_END_LABEL(sev_ap_park_gdt, SYM_L_GLOBAL, sev_ap_park_gdt_end)
> -- 
> 2.33.0
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 09/12] x86/sev: Use AP Jump Table blob to stop CPU
       [not found] ` <20210913155603.28383-10-joro@8bytes.org>
@ 2021-11-15 18:44   ` Borislav Petkov
  0 siblings, 0 replies; 18+ messages in thread
From: Borislav Petkov @ 2021-11-15 18:44 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, Eric Biederman, kexec, Joerg Roedel, hpa, Andy Lutomirski,
	Dave Hansen, Peter Zijlstra, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, David Rientjes,
	Cfir Cohen, Erdem Aktas, Masami Hiramatsu, Mike Stunes,
	Sean Christopherson, Martin Radev, Arvind Sankar, linux-coco,
	linux-kernel, kvm, virtualization

On Mon, Sep 13, 2021 at 05:56:00PM +0200, Joerg Roedel wrote:
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 134a7c9d91b6..cd14b6e10f12 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -81,12 +81,19 @@ static __always_inline void sev_es_nmi_complete(void)
>  		__sev_es_nmi_complete();
>  }
>  extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
> +void __sev_es_stop_this_cpu(void);
> +static __always_inline void sev_es_stop_this_cpu(void)

What's that for?

IOW, the below seems to build too:

---
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 1f16fc907636..398105580862 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -87,12 +87,7 @@ extern enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
 					  struct es_em_ctxt *ctxt,
 					  u64 exit_code, u64 exit_info_1,
 					  u64 exit_info_2);
-void __sev_es_stop_this_cpu(void);
-static __always_inline void sev_es_stop_this_cpu(void)
-{
-	if (static_branch_unlikely(&sev_es_enable_key))
-		__sev_es_stop_this_cpu();
-}
+void sev_es_stop_this_cpu(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 39378357dc5a..7a74b3273f1a 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -694,8 +694,11 @@ void __noreturn sev_jumptable_ap_park(void)
 }
 STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park);
 
-void __sev_es_stop_this_cpu(void)
+void sev_es_stop_this_cpu(void)
 {
+	if (!static_branch_unlikely(&sev_es_enable_key))
+		return;
+
 	/* Only park in the AP Jump Table when the code has been installed */
 	if (!sev_ap_jumptable_blob_installed)
 		return;

---

And as previously mentioned s/sev_es/sev/ if those are going to be used
on SNP guests too.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 03/12] x86/sev: Save and print negotiated GHCB protocol version
  2021-11-03 14:27   ` [PATCH v2 03/12] x86/sev: Save and print negotiated GHCB protocol version Borislav Petkov
@ 2022-01-26  9:27     ` Joerg Roedel
  0 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2022-01-26  9:27 UTC (permalink / raw)
  To: kexec

On Wed, Nov 03, 2021 at 03:27:23PM +0100, Borislav Petkov wrote:
> On Mon, Sep 13, 2021 at 05:55:54PM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <jroedel@suse.de>
> > 
> > Save the results of the GHCB protocol negotiation into a data structure
> > and print information about versions supported and used to the kernel
> > log.
> 
> Which is useful for?

For easier debugging, I added a sentence about that to the changelog.

> > +struct sev_ghcb_protocol_info {
> 
> Too long a name - ghcb_info is perfectly fine.

Changed, thanks.

-- 
J?rg R?del
jroedel at suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 N?rnberg
Germany
 
(HRB 36809, AG N?rnberg)
Gesch?ftsf?hrer: Ivo Totev



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 07/12] x86/sev: Setup code to park APs in the AP Jump Table
  2021-11-10 16:37   ` [PATCH v2 07/12] x86/sev: Setup code to park APs in the AP Jump Table Borislav Petkov
@ 2022-01-26 14:26     ` Joerg Roedel
  0 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2022-01-26 14:26 UTC (permalink / raw)
  To: kexec

On Wed, Nov 10, 2021 at 05:37:32PM +0100, Borislav Petkov wrote:
> On Mon, Sep 13, 2021 at 05:55:58PM +0200, Joerg Roedel wrote:
> >  extern unsigned char real_mode_blob[];
> > diff --git a/arch/x86/include/asm/sev-ap-jumptable.h b/arch/x86/include/asm/sev-ap-jumptable.h
> > new file mode 100644
> > index 000000000000..1c8b2ce779e2
> > --- /dev/null
> > +++ b/arch/x86/include/asm/sev-ap-jumptable.h
> 
> Why a separate header? arch/x86/include/asm/sev.h looks small enough.

The header is included in assembly, so I made a separate one.

> > +void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa)
> 
> Why is this a separate function?
> 
> It is all part of the jump table setup.

Right, but the sev_es_setup_ap_jump_table_blob() function is already
pretty big and I wanted to keep things readable.

> 
> > +		return 0;
> 
> Why are you returning 0 here and below?

This is in an initcall and it returns just 0 when the environment is not
ready to setup the ap jump table. Returning non-zero would create a
warning message in the caller for something that is not a bug in the
kernel.

> > + * This file contains the source code for the binary blob which gets copied to
> > + * the SEV-ES AP Jumptable to park APs while offlining CPUs or booting a new
> 
> I've seen "Jumptable", "Jump Table" and "jump table" at least. I'd say, do
> the last one everywhere pls.

Fair, sorry for my english being too german :) I changed everything to
'jump table'.

> > +	/* Reset remaining registers */
> > +	movl	$0, %esp
> > +	movl	$0, %eax
> > +	movl	$0, %ebx
> > +	movl	$0, %edx
> 
> All 4: use xor

XOR changes EFLAGS, can not use them here.

> > +
> > +	/* Reset CR0 to get out of protected mode */
> > +	movl	$0x60000010, %ecx
> 
> Another magic naked number.

This is the CR0 reset value. I have updated the comment to make this
more clear.

Thanks,

-- 
J?rg R?del
jroedel at suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 N?rnberg
Germany
 
(HRB 36809, AG N?rnberg)
Gesch?ftsf?hrer: Ivo Totev



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 08/12] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
  2021-11-12 16:33   ` [PATCH v2 08/12] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2 Borislav Petkov
@ 2022-01-27  9:01     ` Joerg Roedel
  0 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2022-01-27  9:01 UTC (permalink / raw)
  To: kexec

On Fri, Nov 12, 2021 at 05:33:05PM +0100, Borislav Petkov wrote:
> On Mon, Sep 13, 2021 at 05:55:59PM +0200, Joerg Roedel wrote:
> > +		     "ljmpl	*%0" : :
> > +		     "m" (real_mode_header->sev_real_ap_park_asm),
> > +		     "b" (sev_es_jump_table_pa >> 4));
> 
> In any case, this asm needs comments: why those regs, why
> sev_es_jump_table_pa >> 4 in rbx (I found later in the patch why) and so
> on.

Turned out the jump_table_pa is not used in asm code anymore. It was a
left-over from a previous version of the patch, it is removed now.

> > +SYM_INNER_LABEL(sev_ap_park_paging_off, SYM_L_GLOBAL)
> 
> Global symbol but used only in this file. .L-prefix then?

It needs to be a global symbol so the pa_ variant can be generated.

Regards,

-- 
J?rg R?del
jroedel at suse.de

SUSE Software Solutions Germany GmbH
Maxfeldstr. 5
90409 N?rnberg
Germany
 
(HRB 36809, AG N?rnberg)
Gesch?ftsf?hrer: Ivo Totev



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-01-27  9:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20210913155603.28383-1-joro@8bytes.org>
2021-09-13 16:02 ` [PATCH v2 00/12] x86/sev: KEXEC/KDUMP support for SEV-ES guests Dave Hansen
2021-09-13 16:14   ` Joerg Roedel
2021-09-13 16:21     ` Dave Hansen
     [not found] ` <20210913155603.28383-2-joro@8bytes.org>
2021-11-01 16:10   ` [PATCH v2 01/12] kexec: Allow architecture code to opt-out at runtime Borislav Petkov
2021-11-01 21:11     ` Eric W. Biederman
2021-11-02 16:37       ` Joerg Roedel
2021-11-02 17:00       ` Joerg Roedel
2021-11-02 18:17         ` Eric W. Biederman
2021-11-02 17:17       ` Borislav Petkov
     [not found] ` <20210913155603.28383-4-joro@8bytes.org>
2021-11-03 14:27   ` [PATCH v2 03/12] x86/sev: Save and print negotiated GHCB protocol version Borislav Petkov
2022-01-26  9:27     ` Joerg Roedel
     [not found] ` <20210913155603.28383-6-joro@8bytes.org>
2021-11-03 16:05   ` [PATCH v2 05/12] x86/sev: Use GHCB protocol version 2 if supported Borislav Petkov
     [not found] ` <20210913155603.28383-7-joro@8bytes.org>
2021-11-08 18:14   ` [PATCH v2 06/12] x86/sev: Cache AP Jump Table Address Borislav Petkov
     [not found] ` <20210913155603.28383-8-joro@8bytes.org>
2021-11-10 16:37   ` [PATCH v2 07/12] x86/sev: Setup code to park APs in the AP Jump Table Borislav Petkov
2022-01-26 14:26     ` Joerg Roedel
     [not found] ` <20210913155603.28383-9-joro@8bytes.org>
2021-11-12 16:33   ` [PATCH v2 08/12] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2 Borislav Petkov
2022-01-27  9:01     ` Joerg Roedel
     [not found] ` <20210913155603.28383-10-joro@8bytes.org>
2021-11-15 18:44   ` [PATCH v2 09/12] x86/sev: Use AP Jump Table blob to stop CPU Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).