From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D847BC4320A for ; Thu, 26 Aug 2021 16:23:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF90E6103A for ; Thu, 26 Aug 2021 16:23:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243010AbhHZQYX (ORCPT ); Thu, 26 Aug 2021 12:24:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233548AbhHZQYV (ORCPT ); Thu, 26 Aug 2021 12:24:21 -0400 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12086C061757 for ; Thu, 26 Aug 2021 09:23:34 -0700 (PDT) Received: by mail-pf1-x42a.google.com with SMTP id 7so3184525pfl.10 for ; Thu, 26 Aug 2021 09:23:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Cr6/zwxvFT060RSzezsHpgYecFKnegANyyYpx1rGGgk=; b=uNY4uwBlkIObxedN9T7czrSSQ8+WkGVy+7iyGhT2QeDpnuklcKX8rriKe6JkwQHEYH R5RSvD+JfWj31OFDSZKcDqvcxL2bk12TnP46mH9tDqW2f4FPJTA/idWD1Fha49Mzkwz7 GkmSMPGe4APTTk4wRDKtLy0qWKpzUkE2cwMy6eBLQ7VuxvUQ+o9cPxgAV0xb6zh+ow3t D6e4jkzKEpeKRgqP7Dvz8aEK9Kj599LEZEhdBA+TsMPsgqXralkZnR/sNlmchcSYX0Y8 nG5Ai0koCXynnxG62A8FxxdvewBzzD79RJ/wT5Zhtq+zBd1ujr5ctcYwq8Re4A3yPgjA DJMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Cr6/zwxvFT060RSzezsHpgYecFKnegANyyYpx1rGGgk=; b=St24B9CL9q7v9AdcOlHRC7Atg8aU4LiplrwvPuGjuOubqXLAPBkwQFx32PNe5RoUyS +APRSsBxrVzaWObuwLlnJjeyNtUPB2g6Q7i9xV2LZUaVJTQnqhd3dS2x/ZqlpOrxu6oo Rv6d0eB28X+JzWBEuP94IO7nL37h6LZlkxjA7e2UlZkGsMKpVkrApr7pmR+0f7NYF30T +M87RFmhHQr08mWZly2KRug7TJk41W1gTsaheSBRftJz/fFww/yDNc3YMvZqk7DCc3Or qgcBkiNg4k9EXZwzF0MnfiICUqr+IBVqaoAUUKNkDz4vwZMyh86SbSP5O2RR+BLNZXnZ /TQg== X-Gm-Message-State: AOAM532kfGmvXSrpyoho86iPrc5XzKF8dn8SgJTNEDK8NAGPWjBKrBDM GqGpRiU+rqVci8lZ/BE5gk+6Sw== X-Google-Smtp-Source: ABdhPJzgUzoqaXbTLoNg7TkATbYE+7I5G/QVg9ZvB+a0/4dCbtt4U8oj73oQ/a1uhyE3aXxUmUkyWQ== X-Received: by 2002:a62:b414:0:b029:32e:3ef0:7735 with SMTP id h20-20020a62b4140000b029032e3ef07735mr4451329pfn.61.1629995013329; Thu, 26 Aug 2021 09:23:33 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id w14sm163145pge.40.2021.08.26.09.23.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Aug 2021 09:23:32 -0700 (PDT) Date: Thu, 26 Aug 2021 16:23:29 +0000 From: Sean Christopherson To: Maxim Levitsky Cc: kvm@vger.kernel.org, Thomas Gleixner , Wanpeng Li , Joerg Roedel , "H. Peter Anvin" , Jim Mattson , Ingo Molnar , Paolo Bonzini , Vitaly Kuznetsov , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Borislav Petkov , "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Subject: Re: [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM Message-ID: References: <20210826095750.1650467-1-mlevitsk@redhat.com> <20210826095750.1650467-3-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210826095750.1650467-3-mlevitsk@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 26, 2021, Maxim Levitsky wrote: > SMM return code switches CPU to real mode, and > then the nested_vmx_enter_non_root_mode first switches to vmcs02, > and then restores CR0 in the KVM register cache. > > Unfortunately when it restores the CR0, this enables the protection mode > which leads us to "restore" the segment registers from > "real mode segment cache", which is not up to date vs L2 and trips > 'vmx_guest_state_valid check' later, when the > unrestricted guest mode is not enabled. I suspect this is slightly inaccurate. When loading vmcs02, vmx_switch_vmcs() will do vmx_register_cache_reset(), which also causes the segment cache to be reset. enter_pmode() will still load stale values, but they'll come from vmcs02, not KVM's segment register cache. > This happens to work otherwise, because after we enter the nested guest, > we restore its register state again from SMRAM with correct values > and that includes the segment values. > > As a workaround to this if we enter protected mode first, > then setting CR0 won't cause this damage. > > Signed-off-by: Maxim Levitsky > --- > arch/x86/kvm/vmx/vmx.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 0c2c0d5ae873..805c415494cf 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7507,6 +7507,13 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate) > } > > if (vmx->nested.smm.guest_mode) { > + > + /* > + * Enter protected mode to avoid clobbering L2's segment > + * registers during nested guest entry > + */ > + vmx_set_cr0(vcpu, vcpu->arch.cr0 | X86_CR0_PE); I'd really, really, reaaaally like to avoid stuffing state. All of the instances I've come across where KVM has stuffed state for something like this were just papering over one symptom of an underlying bug. For example, won't this now cause the same bad behavior if L2 is in Real Mode? Is the problem purely that emulation_required is stale? If so, how is it stale? Every segment write as part of RSM emulation should reevaluate emulation_required via vmx_set_segment(). Oooooh, or are you talking about the explicit vmx_guest_state_valid() in prepare_vmcs02()? If that's the case, then we likely should skip that check entirely. The only part I'm not 100% clear on is whether or not it can/should be skipped for vmx_set_nested_state(). diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index bc6327950657..20bd84554c1f 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2547,7 +2547,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, * which means L1 attempted VMEntry to L2 with invalid state. * Fail the VMEntry. */ - if (CC(!vmx_guest_state_valid(vcpu))) { + if (from_vmentry && CC(!vmx_guest_state_valid(vcpu))) { *entry_failure_code = ENTRY_FAIL_DEFAULT; return -EINVAL; } If we want to retain the check for the common vmx_set_nested_state() path, i.e. when the vCPU is truly being restored to guest mode, then we can simply exempt the smm.guest_mode case (which also exempts that case when its set via vmx_set_nested_state()). The argument would be that RSM is going to restore L2 state, so whatever happens to be in vmcs12/vmcs02 is stale. diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index bc6327950657..ac30ba6a8592 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2547,7 +2547,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, * which means L1 attempted VMEntry to L2 with invalid state. * Fail the VMEntry. */ - if (CC(!vmx_guest_state_valid(vcpu))) { + if (!vmx->nested.smm.guest_mode && CC(!vmx_guest_state_valid(vcpu))) { *entry_failure_code = ENTRY_FAIL_DEFAULT; return -EINVAL; }