qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Michael Roth <michael.roth@amd.com>
Cc: qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Akihiko Odaki <akihiko.odaki@daynix.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
Date: Fri, 08 Dec 2023 17:20:08 +0200	[thread overview]
Message-ID: <3663aa04f6c3002d47362f3877d96c0e18bc163e.camel@redhat.com> (raw)
In-Reply-To: <20231206174235.b7fwrqzko27of7qz@amd.com>

On Wed, 2023-12-06 at 11:42 -0600, Michael Roth wrote:
> On Wed, Dec 06, 2023 at 07:20:14PM +0200, Maxim Levitsky wrote:
> > On Tue, 2023-12-05 at 16:28 -0600, Michael Roth wrote:
> > > Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > > added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
> > > exposed a long-running bug in current KVM support for SEV-ES where the
> > > kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
> > > kernel, in which case EFER write traps would result in KVM eventually
> > > seeing MSR_EFER_LMA get set and recording it in such a way that it would
> > > be subsequently visible when accessing it via KVM_GET_SREGS/etc.
> > > 
> > > However, guests kernels currently rely on MSR_EFER_LMA getting set
> > > automatically when MSR_EFER_LME is set and paging is enabled via
> > > CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
> > > MSR_EFER_LMA even though it is set internally, and when QEMU
> > > subsequently tries to pass this EFER value back to KVM via
> > > KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
> > > which is now considered fatal due to the aforementioned QEMU commit.
> > > 
> > > This can be addressed by inferring the MSR_EFER_LMA bit being set when
> > > paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
> > > the expected bits are all present in subsequent handling on the host
> > > side.
> > > 
> > > Ultimately, this handling will be implemented in the host kernel, but to
> > > avoid breaking QEMU's SEV-ES support when using older host kernels, the
> > > same handling can be done in QEMU just after fetching the register
> > > values via KVM_GET_SREGS*. Implement that here.
> > > 
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > Cc: kvm@vger.kernel.org
> > > Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > > v2:
> > >   - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2
> > > 
> > >  target/i386/kvm/kvm.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 11b8177eff..8721c1bf8f 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu)
> > >  {
> > >      CPUX86State *env = &cpu->env;
> > >      struct kvm_sregs sregs;
> > > +    target_ulong cr0_old;
> > >      int ret;
> > >  
> > >      ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, &sregs);
> > > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
> > >      env->gdt.limit = sregs.gdt.limit;
> > >      env->gdt.base = sregs.gdt.base;
> > >  
> > > +    cr0_old = env->cr[0];
> > >      env->cr[0] = sregs.cr0;
> > >      env->cr[2] = sregs.cr2;
> > >      env->cr[3] = sregs.cr3;
> > >      env->cr[4] = sregs.cr4;
> > >  
> > >      env->efer = sregs.efer;
> > > +    if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > > +        if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > > +            env->efer |= MSR_EFER_LMA;
> > > +        }
> > > +    }
> > 
> > I think that we should not check that CR0_PG has changed, and just blindly assume
> > that if EFER.LME is set and CR0.PG is set, then EFER.LMA must be set as defined in x86 spec.
> > 
> > Otherwise, suppose qemu calls kvm_get_sregs twice: First time it will work,
> > but second time CR0.PG will match one that is stored in the env, and thus the workaround
> > will not be executed, and instead we will revert back to wrong EFER value 
> > reported by the kernel.
> > 
> > How about something like that:
> > 
> > 
> > if (sev_es_enabled() && env->efer & MSR_EFER_LME && env->cr[0] & CR0_PG_MASK) {
> > 	/* 
> >          * Workaround KVM bug, because of which KVM might not be aware of the 
> >          * fact that EFER.LMA was toggled by the hardware 
> >          */
> > 	env->efer |= MSR_EFER_LMA;
> > }
> 
> Hi Maxim,
> 
> I'd already sent a v3 based on a similar suggestion from Paolo:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2023-12/msg00751.html
> 
> Does that one look okay to you?

Yep, thanks!

Best regards,
	Maxim Levitsky
> 
> Thanks,
> 
> Mike
> 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > >  
> > >      /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */
> > >      x86_update_hflags(env);
> > > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
> > >  {
> > >      CPUX86State *env = &cpu->env;
> > >      struct kvm_sregs2 sregs;
> > > +    target_ulong cr0_old;
> > >      int i, ret;
> > >  
> > >      ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs);
> > > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
> > >      env->gdt.limit = sregs.gdt.limit;
> > >      env->gdt.base = sregs.gdt.base;
> > >  
> > > +    cr0_old = env->cr[0];
> > >      env->cr[0] = sregs.cr0;
> > >      env->cr[2] = sregs.cr2;
> > >      env->cr[3] = sregs.cr3;
> > >      env->cr[4] = sregs.cr4;
> > >  
> > >      env->efer = sregs.efer;
> > > +    if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > > +        if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > > +            env->efer |= MSR_EFER_LMA;
> > > +        }
> > > +    }
> > >  
> > >      env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
> > >  




      reply	other threads:[~2023-12-08 15:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05 22:28 [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit Michael Roth
2023-12-06 11:48 ` Philippe Mathieu-Daudé
2023-12-06 13:12   ` Michael Roth via
2023-12-06 13:43     ` Paolo Bonzini
2023-12-06 13:41 ` Paolo Bonzini
2023-12-06 14:46   ` Michael Roth
2023-12-06 15:04     ` Paolo Bonzini
2023-12-06 15:15       ` Michael Roth
2023-12-06 17:20 ` Maxim Levitsky
2023-12-06 17:42   ` Michael Roth
2023-12-08 15:20     ` Maxim Levitsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3663aa04f6c3002d47362f3877d96c0e18bc163e.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=akihiko.odaki@daynix.com \
    --cc=kvm@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thomas.lendacky@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).