From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 208D4EB64DA for ; Fri, 30 Jun 2023 22:56:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231324AbjF3W4T (ORCPT ); Fri, 30 Jun 2023 18:56:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229484AbjF3W4R (ORCPT ); Fri, 30 Jun 2023 18:56:17 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8D6F2D69 for ; Fri, 30 Jun 2023 15:56:16 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-260cb94f585so2090629a91.0 for ; Fri, 30 Jun 2023 15:56:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688165776; x=1690757776; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Y+QnFBcSRBuOpb7/7NCuCNZ3hM36J3/pYWSX0Cwatlw=; b=S0waF408wufglvxMeJmrbw1kI/mjgD/oT5sEf/q15auZfk2dybtZV4SIAd53uc2cTx jO6Ed7ldXcWOFosmsIjCZ+0C0Ic76XRyuMC8wmnPjcTfNl2AEHXdVFslCM8N+43wP/Qy tVMAwGwYSnuK2V6QPEmgJZqhzDTucNSH8z5LbwRRikUjoiQQ/pZJx0Do/7xrXqBxALPa TAovJI94lsrzduMH/H8lnybCqIsA1bmP+XsEFssM7dMqWGJkN+xHMFv8ptCa1dJ8JuUr SakB0OMT6IwT+4QRca8Krzdf34Z5IDdUi3ZddEIrkd/LSd90ptqRefG5bYGWsm7mCSbf /+zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688165776; x=1690757776; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Y+QnFBcSRBuOpb7/7NCuCNZ3hM36J3/pYWSX0Cwatlw=; b=gk1UVRpJlCvTN1KTd/Zo5gUXGRdISRcFYdgcmDA37JgsIR/oN7FhusspkL8Ehh4UKB 4Xo8j9US2UyFyWY4yWTdLnM7jFkvS3SrOtgvWxrT/DkgXqnAhYpKeNHwUVbdydHpsWPA gTHsOAXQGJoGHxaAc6nnNQ2ku2R1PRXiVkYI1QQRCHcPXbQZsThlSXckb2xP0vd2jX7i B3UWjbWVu2fnco7qIcpvEeFWvYE9GpkvCmaJv1+RZ1ZJ0XlvJEgPF2uMoAWC0WQgTMS2 6f661eyJnTZSdYHjKjrWZeK79t/bnWURacZcaXsV6PRbratSguvMyHr1vM46rHghyp2g 18EQ== X-Gm-Message-State: AC+VfDynaQy+BUi2x75S05RTwOuBxNk/UczrICjxh7d2yjlHXSFxhdP9 WFSmMxaBqDaPH1jug5lIP1OyBBS0x0M= X-Google-Smtp-Source: ACHHUZ7iPNwIykcY15LNzA+wFLMyr203vVFrF8aE2GhoGO8tUKRF6R2KJm2aQ5+nXurVT5iyhkPHkiH6wak= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:164f:b0:263:1117:32da with SMTP id il15-20020a17090b164f00b00263111732damr4394158pjb.2.1688165776199; Fri, 30 Jun 2023 15:56:16 -0700 (PDT) Date: Fri, 30 Jun 2023 15:56:14 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230630072612.1106705-1-aiqi.i7@bytedance.com> Message-ID: Subject: Re: [PATCH] kvm/x86: clear hlt for intel cpu when resetting vcpu From: Sean Christopherson To: Chao Gao Cc: Qi Ai , pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, kvm@vger.kernel.org, fengzhimin@bytedance.com, cenjiahui@bytedance.com, fangying.tommy@bytedance.com, dengqiao.joey@bytedance.com Content-Type: text/plain; charset="us-ascii" Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org mn Fri, Jun 30, 2023, Chao Gao wrote: > On Fri, Jun 30, 2023 at 03:26:12PM +0800, Qi Ai wrote: > >+ !is_protmode(vcpu)) > >+ kvm_x86_ops.clear_hlt(vcpu); > > Use static_call_cond(kvm_x86_clear_hlt)(vcpu) instead. > > It looks incorrect that we add this side-effect heuristically here. I Yeah, adding heuristics to KVM_SET_REGS isn't happening. KVM's existing hack for "Older userspace" in __set_sregs_common() is bad enough: /* Older userspace won't unhalt the vcpu on reset. */ if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 && sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 && !is_protmode(vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > am wondering if we can link vcpu->arch.mp_state to VMCS activity state, Hrm, maybe. > i.e., when mp_state is set to RUNNABLE in KVM_SET_MP_STATE ioctl, KVM > sets VMCS activity state to active. Not in the ioctl(), there needs to be a proper set of APIs, e.g. so that the existing hack works, and so that KVM actually reports out to userspace that a vCPU is HALTED if userspace gained control of the vCPU, e.g. after an IRQ exit, while the vCPU was HALTED. I.e. mp_state versus vmcs.ACTIVITY_STATE needs to be bidirectional, not one-way. E.g. if a vCPU is live migrated, I'm pretty sure vmcs.ACTIVITY_STATE is lost, which is wrong. The downside is that if KVM propagates vmcs.ACTIVITY_STATE to mp_state for the halted case, then KVM will enter kvm_vcpu_halt() instead of entering the guest in halted state, which is undesirable. Hmm, that can be handled by treating the vCPU as running, e.g. static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu) { return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE || (vcpu->arch.mp_state == KVM_MP_STATE_HALTED && kvm_hlt_in_guest(vcpu->kvm))) && !vcpu->arch.apf.halted); } but that would have cascading effect to a whole pile of things. I don't *think* they'd be used with kvm_hlt_in_guest(), but we've had weirder stuff. I'm half tempted to solve this particular issue by stuffing vmcs.ACTIVITY_STATE on shutdown, similar to what SVM does on shutdown interception. KVM doesn't come anywhere near faithfully emulating shutdown, so it's unlikely to break anything. And then the mp_state vs. hlt_in_guest coulbe tackled separately. Ugh, but that wouldn't cover a synthesized KVM_REQ_TRIPLE_FAULT. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 44fb619803b8..ee4bb37067d1 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5312,6 +5312,8 @@ static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu) static int handle_triple_fault(struct kvm_vcpu *vcpu) { + vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); + vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; vcpu->mmio_needed = 0; return 0; I don't suppose QEMU can to blast INIT at all vCPUs for this case?