From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C662A1A2550; Tue, 10 Dec 2024 02:52:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.12 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733799125; cv=none; b=KbzWGeHqYbXthq3th+IDYNmR0uTtAZcND7t2J3Cq0sU2ZRAneZpbdLd1lbtqd3/ES6egDB2VMRyCbWgIk8e1U2yBnaW7yCL66kjtNzlCJoUyg1SGkV6ue/MVmaL2PIF+B4qLh5ldf/J8ndkprAHYovavPRJQYHf8upBNp3qAc1U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733799125; c=relaxed/simple; bh=ClVyDaGFj9M5uw6bBVAR3oour6eApQIS+xBoSBMZ04o=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=gfQKbVzI/If4fYNQ0wSX5IBamdz6j2YY/YWEO6SRlHlGMt4MkGTI2Yn9JB9bsWfWozrpgyFmvbXoCxWDx+LikG55ydEjZWC+6jjuj0qhZSFuTkE6y4vDH6oEvw33G2ExoMqr/x2cQLMwbkhKYaL8MxiRQ4hW/XugrNYDURu/I7g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=L7G+jiN1; arc=none smtp.client-ip=192.198.163.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="L7G+jiN1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733799123; x=1765335123; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=ClVyDaGFj9M5uw6bBVAR3oour6eApQIS+xBoSBMZ04o=; b=L7G+jiN19/LpTblCnYB6Ye4hOTLy/49ybSs4fEsqZPArkO4ipDWAFBZ2 zEwObFudTbZe2w3kzCNQjZrliQMJ+LpuhxsbG8IKRF9fmJnACcQ1+IpyZ sGYaVSi6NZQdeVT8xW4Yh1ffYT7CpHguA20nXWASoN9gjR2WC5lzhxuNk T2PbX77dUPzy9WU5nQJDO1BLvorbVk43q8vde5sSGEQXdXAFpWS/Dkaq7 vWsT8Gjp/ZV1cJ3sZQr+uIU6npmyFyuO6Fm64WmpLORB/HvzZO8mHDKkW C/d8BzWHPViE9ux9zi8KXoqoyR09w94iUY6YVvCHtQvr1wlwuKcEVoJjd w==; X-CSE-ConnectionGUID: BGWl7ZvIRaunOKfkpVOVzQ== X-CSE-MsgGUID: Mtj+xeS3RUOxtDtHLvzbkQ== X-IronPort-AV: E=McAfee;i="6700,10204,11281"; a="38055956" X-IronPort-AV: E=Sophos;i="6.12,221,1728975600"; d="scan'208";a="38055956" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 18:52:02 -0800 X-CSE-ConnectionGUID: +CoZqUtaQHGGb0XWyvOLdA== X-CSE-MsgGUID: SLv2HwdASUSPMuRAGRpwfQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,221,1728975600"; d="scan'208";a="95078689" Received: from unknown (HELO [10.238.9.154]) ([10.238.9.154]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Dec 2024 18:51:59 -0800 Message-ID: Date: Tue, 10 Dec 2024 10:51:56 +0800 Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 4/7] KVM: TDX: Handle TDG.VP.VMCALL To: Chao Gao Cc: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, michael.roth@amd.com, linux-kernel@vger.kernel.org References: <20241201035358.2193078-1-binbin.wu@linux.intel.com> <20241201035358.2193078-5-binbin.wu@linux.intel.com> Content-Language: en-US From: Binbin Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 12/9/2024 8:45 PM, Chao Gao wrote: >> +/* >> + * Split into chunks and check interrupt pending between chunks. This allows >> + * for timely injection of interrupts to prevent issues with guest lockup >> + * detection. > Would it cause any problems if an (intra-host or inter-host) migration happens > between chunks? > > My understanding is that KVM would lose track of the progress if > map_gpa_next/end are not migrated. I'm not sure if KVM should expose the > state or prevent migration in the middle. Or, we can let the userspace VMM > cut the range into chunks, making it the userspace VMM's responsibility to > ensure necessary state is migrated. > > I am not asking to fix this issue right now. I just want to ensure this issue > can be solved in a clean way when we start to support migration. How about: Before exiting to userspace, KVM always sets the start GPA to r11 and set return code to TDVMCALL_STATUS_RETRY. - If userspace finishes the part, the complete_userspace_io() callback will   be called and the return code will be set to TDVMCALL_STATUS_SUCCESS. - If the live migration interrupts the MapGAP in the userspace, and   complete_userspace_io() is not called, when the vCPU resumes from migration,   TDX guest will see the return code is TDVMCALL_STATUS_RETRY with the failed   GPA, and it can retry the MapGAP with the failed GAP. > >> + */ >> +#define TDX_MAP_GPA_MAX_LEN (2 * 1024 * 1024) >> +static void __tdx_map_gpa(struct vcpu_tdx * tdx); >> + >> +static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_tdx * tdx = to_tdx(vcpu); >> + >> + if(vcpu->run->hypercall.ret) { >> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); >> + kvm_r11_write(vcpu, tdx->map_gpa_next); > s/kvm_r11_write/tdvmcall_set_return_val > > please fix other call sites in this patch. I didn't use tdvmcall_set_return_val() because it's used for TDVMCALL like MMIO, PIO, etc..., which has a return value. For MapGPA, it's part of error information returned to TDX guest. So, I used the raw register name version. Let's see if others also think tdvmcall_set_return_val() is more clear for this case. > >> + return 1; >> + } >> + >> + tdx->map_gpa_next += TDX_MAP_GPA_MAX_LEN; >> + if (tdx->map_gpa_next >= tdx->map_gpa_end) { >> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); >> + return 1; >> + } >> + >> + /* >> + * Stop processing the remaining part if there is pending interrupt. >> + * Skip checking pending virtual interrupt (reflected by >> + * TDX_VCPU_STATE_DETAILS_INTR_PENDING bit) to save a seamcall because >> + * if guest disabled interrupt, it's OK not returning back to guest >> + * due to non-NMI interrupt. Also it's rare to TDVMCALL_MAP_GPA >> + * immediately after STI or MOV/POP SS. >> + */ >> + if (pi_has_pending_interrupt(vcpu) || >> + kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) { >> + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY); >> + kvm_r11_write(vcpu, tdx->map_gpa_next); >> + return 1; >> + } >> + >> + __tdx_map_gpa(tdx); >> + /* Forward request to userspace. */ >> + return 0; >> +} >> + >> +static void __tdx_map_gpa(struct vcpu_tdx * tdx) >> +{ >> + u64 gpa = tdx->map_gpa_next; >> + u64 size = tdx->map_gpa_end - tdx->map_gpa_next; >> + >> + if(size > TDX_MAP_GPA_MAX_LEN) >> + size = TDX_MAP_GPA_MAX_LEN; >> + >> + tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; >> + tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >> + tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); >> + tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; >> + tdx->vcpu.run->hypercall.args[2] = vt_is_tdx_private_gpa(tdx->vcpu.kvm, gpa) ? >> + KVM_MAP_GPA_RANGE_ENCRYPTED : >> + KVM_MAP_GPA_RANGE_DECRYPTED; >> + tdx->vcpu.run->hypercall.flags = KVM_EXIT_HYPERCALL_LONG_MODE; >> + >> + tdx->vcpu.arch.complete_userspace_io = tdx_complete_vmcall_map_gpa; >> +} >> + >> +static int tdx_map_gpa(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_tdx * tdx = to_tdx(vcpu); >> + u64 gpa = tdvmcall_a0_read(vcpu); >> + u64 size = tdvmcall_a1_read(vcpu); >> + u64 ret; >> + >> + /* >> + * Converting TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE requires >> + * userspace to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE >> + * bit set. If not, the error code is not defined in GHCI for TDX, use >> + * TDVMCALL_STATUS_INVALID_OPERAND for this case. >> + */ >> + if (!user_exit_on_hypercall(vcpu->kvm, KVM_HC_MAP_GPA_RANGE)) { >> + ret = TDVMCALL_STATUS_INVALID_OPERAND; >> + goto error; >> + } >> + >> + if (gpa + size <= gpa || !kvm_vcpu_is_legal_gpa(vcpu, gpa) || >> + !kvm_vcpu_is_legal_gpa(vcpu, gpa + size -1) || >> + (vt_is_tdx_private_gpa(vcpu->kvm, gpa) != >> + vt_is_tdx_private_gpa(vcpu->kvm, gpa + size -1))) { >> + ret = TDVMCALL_STATUS_INVALID_OPERAND; >> + goto error; >> + } >> + >> + if (!PAGE_ALIGNED(gpa) || !PAGE_ALIGNED(size)) { >> + ret = TDVMCALL_STATUS_ALIGN_ERROR; >> + goto error; >> + } >> + >> + tdx->map_gpa_end = gpa + size; >> + tdx->map_gpa_next = gpa; >> + >> + __tdx_map_gpa(tdx); >> + /* Forward request to userspace. */ >> + return 0; >> + >> +error: >> + tdvmcall_set_return_code(vcpu, ret); >> + kvm_r11_write(vcpu, gpa); >> + return 1; >> +} >> + >> static int handle_tdvmcall(struct kvm_vcpu *vcpu) >> { >> if (tdvmcall_exit_type(vcpu)) >> return tdx_emulate_vmcall(vcpu); >> >> switch (tdvmcall_leaf(vcpu)) { >> + case TDVMCALL_MAP_GPA: >> + return tdx_map_gpa(vcpu); >> default: >> break; >> } >> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h >> index 1abc94b046a0..bfae70887695 100644 >> --- a/arch/x86/kvm/vmx/tdx.h >> +++ b/arch/x86/kvm/vmx/tdx.h >> @@ -71,6 +71,9 @@ struct vcpu_tdx { >> >> enum tdx_prepare_switch_state prep_switch_state; >> u64 msr_host_kernel_gs_base; >> + >> + u64 map_gpa_next; >> + u64 map_gpa_end; >> }; >> >> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err); >> -- >> 2.46.0 >>