From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBFF4158550; Mon, 16 Sep 2024 15:33:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.152 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726500834; cv=none; b=uhAJY+n/Q5k+Yh35NQgJFc2VDZLLeYkzOgOXY+Xv6D0aQgzNct+v3T7WJHEecTVBNVmTugf/uilmJi/Xhx4bWKJmAZiPMsn4uUjTFLvPDBdIkMOl6rtjV6+L/hpmYOUcxb49Vm9pdL1eKzeNIRH1lTTjWo45oip18CK3/dhh7Fc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726500834; c=relaxed/simple; bh=IY8C+StfjFJHJtqw4O3q5t4JZ6R26AYxgS+dshF64Y8=; h=Subject:MIME-Version:Content-Type:Date:Message-ID:From:To:CC: References:In-Reply-To; b=np8Ays1Rywe5EWMAWEaCf58M/my84IY9VZZOVJCbrQ/yJw8Vl/wx6xFElNpRS99CGak7FNQ1PxhhQJQNVi/a++i9zy5MYqCFi//6X1pTV+RWYSMmV4pW7d9ED5rjK8Ab4IpascCutIBCXdjz8G9UAHDHAbJg4qe3YIn7X1iUAUM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.es; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=kdRYpM8s; arc=none smtp.client-ip=52.119.213.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.es Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="kdRYpM8s" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1726500833; x=1758036833; h=mime-version:content-transfer-encoding:date:message-id: from:to:cc:references:in-reply-to:subject; bh=EEwEknCR7gPqewEKfA7AQyWhoTIg8MMivyoECSmpsEI=; b=kdRYpM8slLpP1+mbE98ePb1MrORmSx6TZzULdNMAqdyuZc8ehl2plGhw MUnQcVhNZvLmkItwz2bz8JnWXc7lj0NccZQk17epiH20PQXgIw3OgFGVq JD8W7Uj5PtT3EYxDSK61E9NsEQ4SEGi1xSEQ9mAaKbKVkkbC1wP+Va8iB w=; X-IronPort-AV: E=Sophos;i="6.10,233,1719878400"; d="scan'208";a="25953235" Subject: Re: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2024 15:33:48 +0000 Received: from EX19MTAEUB002.ant.amazon.com [10.0.10.100:51984] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.1.23:2525] with esmtp (Farcaster) id f0c8e00d-30a9-4cd0-acb9-4acbebf05106; Mon, 16 Sep 2024 15:33:46 +0000 (UTC) X-Farcaster-Flow-ID: f0c8e00d-30a9-4cd0-acb9-4acbebf05106 Received: from EX19D004EUC001.ant.amazon.com (10.252.51.190) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 16 Sep 2024 15:33:45 +0000 Received: from localhost (10.13.235.138) by EX19D004EUC001.ant.amazon.com (10.252.51.190) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 16 Sep 2024 15:33:39 +0000 Precedence: bulk X-Mailing-List: linux-arch@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Date: Mon, 16 Sep 2024 15:33:36 +0000 Message-ID: From: Nicolas Saenz Julienne To: Sean Christopherson CC: , , , , , , , , , , , , , , , , , , , X-Mailer: aerc 0.18.2-0-ge037c095a049-dirty References: <20240609154945.55332-1-nsaenz@amazon.com> <20240609154945.55332-6-nsaenz@amazon.com> In-Reply-To: X-ClientProxiedBy: EX19D039UWA003.ant.amazon.com (10.13.139.49) To EX19D004EUC001.ant.amazon.com (10.252.51.190) On Fri Sep 13, 2024 at 7:01 PM UTC, Sean Christopherson wrote: > On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote: > > Model inactive VTL vCPUs' behaviour with a new MP state. > > > > Inactive VTLs are in an artificial halt state. They enter into this > > state in response to invoking HvCallVtlCall, HvCallVtlReturn. > > User-space, which is VTL aware, can processes the hypercall, and set th= e > > vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'l= l > > block until a wakeup event is received. The rules of what constitutes a= n > > event are analogous to halt's except that VTL's ignore RFLAGS.IF. > > > > When a wakeup event is registered, KVM will exit to user-space with a > > KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type. > > User-space is responsible of deciding whether the event has precedence > > over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE > > before resuming execution on it. > > > > Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will > > return immediately to user-space. > > > > Note that by re-using the readily available halt infrastructure in > > KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables) > > virtualisation features like the VMX preemption timer or APICv before > > blocking. > > IIUC, this is a convoluted and roundabout way to let userspace check if a= vCPU > has a wake event, correct? Even by the end of the series, KVM never sets > MP_STATE_HV_INACTIVE_VTL, i.e. the only use for this is to combine it as: > > KVM_SET_MP_STATE =3D> KVM_RUN =3D> KVM_SET_MP_STATE =3D> KVM_RUN Correct. > The upside to this approach is that it requires minimal uAPI and very few= KVM > changes, but that's about it AFAICT. On the other hand, making this so p= ainfully > specific feels like a missed opportunity, and unnecessarily bleeds VTL de= tails > into KVM. > > Bringing halt-polling into the picture (by going down kvm_vcpu_halt()) is= also > rather bizarre since quite a bit of time has already elapsed since the vC= PU first > did HvCallVtlCall/HvCallVtlReturn. But that doesn't really have anything= to do > with MP_STATE_HV_INACTIVE_VTL, e.g. it'd be just as easy to go to kvm_vcp= u_block(). > > Why not add an ioctl() to very explicitly block until a wake event is rea= dy? > Or probably better, a generic "wait" ioctl() that takes the wait type as = an > argument. > > Kinda like your idea of supporting .poll() on the vCPU FD[*], except it's= very > specifically restricted to a single caller (takes vcpu->mutex). We could= probably > actually implement it via .poll(), but I suspect that would be more confu= sing than > helpful. > > E.g. extract the guts of vcpu_block() to a separate helper, and then wire= that > up to an ioctl(). > > As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag? That w= ay, > userspace doesn't need to do a round-trip just to set a single bit. E.g.= I think > we should be able to squeeze it into "struct kvm_hyperv_exit". It's things like the RFLAG.IF exemption that deterred me from building a generic interface. We might find out that the generic blocking logic doesn't match the expected VTL semantics and be stuck with a uAPI that isn't enough for VSM, nor useful for any other use-case. We can always introduce 'flags' I guess. Note that I'm just being cautious here, AFAICT the generic approach works, and I'm fine with going the "wait" ioctl. > Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply = wire up > HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspac= e_io() > callback that blocks if some flag is set? That would make it _much_ clea= ner to > scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to= no new > uAPI. So IIUC, the approach is to have complete_userspace_io() block after re-entering HVCALL_VTL_RETURN. Then, have it exit back onto user-space whenever an event is made available (maybe re-using KVM_SYSTEM_EVENT_WAKEUP?). That would work, but will need something extra to be compatible with migration/live-update. > > @@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu= ) > > if (!gif_set(svm)) > > return true; > > > > + /* > > + * The Hyper-V TLFS states that RFLAGS.IF is ignored when decid= ing > > + * whether to block interrupts targeted at inactive VTLs. > > + */ > > if (is_guest_mode(vcpu)) { > > /* As long as interrupts are being delivered... */ > > if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) > > @@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) > > if (nested_exit_on_intr(svm)) > > return false; > > } else { > > - if (!svm_get_if_flag(vcpu)) > > + if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(= vcpu)) > > Speaking of RFLAGS.IF, I think it makes sense to add a common x86 helper = to handle > the RFLAGS.IF vs. idle VTL logic. Naming will be annoying, but that's ab= out it. > > E.g. kvm_is_irq_blocked_by_rflags_if() or so. Noted. Thanks, Nicolas