From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E0260CCF9E0 for ; Tue, 28 Oct 2025 00:28:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jpCJYgRp82lQv0i8mBtIwWCb3ISLbvZkCqkJQSt4yh8=; b=Qmk1UtL4ykOsWuBl5kzXO27PWT HZi6Bpt5ZCYeKnpx/1AUXP63QOTL6pWruEBZ+3eND6fPo0eHitDmVg7pZ013peHVSPpL659DAUdyw /aUecYJHA6hw2m0zXzhzRFduAcy3O6K0OzyE4iV7KSodl7UvbdWLdXO0xFCe8fPupMWoTeN9Rx4JH g15i80Gd5x171Qkb2snLnDvuTVWo5upDuaH15TCErcX4hCiAbeczTLjdMYm+H4Cx8ZrHUKjC9lDNi dlyXtxk6sbp7vsDNyJuPaBBBYgwdDD3N7ZKV7RgFlXt4ggFXo+I3AYKCXgUhNmirtkDErsvX63GPz W5ELjyvA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vDXZo-0000000Ez5t-3kse; Tue, 28 Oct 2025 00:28:40 +0000 Received: from mgamail.intel.com ([198.175.65.18]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1vDXZi-0000000Ez3Y-1KLa; Tue, 28 Oct 2025 00:28:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1761611314; x=1793147314; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=O9r9SqoXgXXFWNnj6936GGaEnFwZ1KbD5C04lY3OlKM=; b=RbYjObmZKXBJdSRuwfmkCmPx+mi3qLxxaAyMe4c3KeYAMYe3eza1AAtL R3pFlN012+MMRUl1p6TFuPjGoQrRhLzAdEYfULcgyHExu57D6eZhpI2ej ThQm1udRITMOWc2kIxZZ2tDNiqLEkVPlAIW7Amc+Pu+ElUWJ29sszqoWM lDXO2Dnssei/Fkx6aFeKDS7QblEtkXe/dFtW+VMPCljFCgLSDGKiJViv1 nGp7xUZlSfwp9+yWe9ZxwBa+hhpLGBeuLeC66LjZHh2qpzJEQklKbmD/I eUSxSUULMj7yLwnZHn6kaQTb32e0DVS7bk0WJ8IyPaTvzobkiDGEq01kw A==; X-CSE-ConnectionGUID: h4bP0oUZSOik21wiuaU9OQ== X-CSE-MsgGUID: p+/AWTEXSRi/m2d2T3zrxA== X-IronPort-AV: E=McAfee;i="6800,10657,11586"; a="63741217" X-IronPort-AV: E=Sophos;i="6.19,260,1754982000"; d="scan'208";a="63741217" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2025 17:28:34 -0700 X-CSE-ConnectionGUID: MtsX8J75QwO7JVf7pLQ1Rw== X-CSE-MsgGUID: sxeH5MnIQqC0pE4YKx+1fg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.19,260,1754982000"; d="scan'208";a="222400872" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2025 17:28:32 -0700 From: Rick Edgecombe To: seanjc@google.com Cc: ackerleytng@google.com, anup@brainfault.org, aou@eecs.berkeley.edu, binbin.wu@linux.intel.com, borntraeger@linux.ibm.com, chenhuacai@kernel.org, frankja@linux.ibm.com, imbrenda@linux.ibm.com, ira.weiny@intel.com, kai.huang@intel.com, kas@kernel.org, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, maddy@linux.ibm.com, maobibo@loongson.cn, maz@kernel.org, michael.roth@amd.com, oliver.upton@linux.dev, palmer@dabbelt.com, pbonzini@redhat.com, pjw@kernel.org, rick.p.edgecombe@intel.com, vannapurve@google.com, x86@kernel.org, yan.y.zhao@intel.com, zhaotianrui@loongson.cn Subject: [PATCH] KVM: TDX: Take MMU lock around tdh_vp_init() Date: Mon, 27 Oct 2025 17:28:24 -0700 Message-ID: <20251028002824.1470939-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.51.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20251027_172834_538553_FA8C1FDE X-CRM114-Status: GOOD ( 18.53 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Take MMU lock around tdh_vp_init() in KVM_TDX_INIT_VCPU to prevent meeting contention during retries in some no-fail MMU paths. The TDX module takes various try-locks internally, which can cause SEAMCALLs to return an error code when contention is met. Dealing with an error in some of the MMU paths that make SEAMCALLs is not straight forward, so KVM takes steps to ensure that these will meet no contention during a single BUSY error retry. The whole scheme relies on KVM to take appropriate steps to avoid making any SEAMCALLs that could contend while the retry is happening. Unfortunately, there is a case where contention could be met if userspace does something unusual. Specifically, hole punching a gmem fd while initializing the TD vCPU. The impact would be triggering a KVM_BUG_ON(). The resource being contended is called the "TDR resource" in TDX docs parlance. The tdh_vp_init() can take this resource as exclusive if the 'version' passed is 1, which happens to be version the kernel passes. The various MMU operations (tdh_mem_range_block(), tdh_mem_track() and tdh_mem_page_remove()) take it as shared. There isn't a KVM lock that maps conceptually and in a lock order friendly way to the TDR lock. So to minimize infrastructure, just take MMU lock around tdh_vp_init(). This makes the operations we care about mutually exclusive. Since the other operations are under a write mmu_lock, the code could just take the lock for read, however this is weirdly inverted from the actual underlying resource being contended. Since this is covering an edge case that shouldn't be hit in normal usage, be a little less weird and take the mmu_lock for write around the call. Fixes: 02ab57707bdb ("KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror page table") Reported-by: Yan Zhao Suggested-by: Yan Zhao Signed-off-by: Rick Edgecombe --- Hi, It was indeed awkward, as Sean must have sniffed. But seems ok enough to close the issue. Yan, can you give it a look? Posted here, but applies on top of this series. Thanks, Rick --- arch/x86/kvm/vmx/tdx.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index daec88d4b88d..8bf5d2624152 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2938,9 +2938,18 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) } } - err = tdh_vp_init(&tdx->vp, vcpu_rcx, vcpu->vcpu_id); - if (TDX_BUG_ON(err, TDH_VP_INIT, vcpu->kvm)) - return -EIO; + /* + * tdh_vp_init() can take a exclusive lock of the TDR resource inside + * the TDX module. This resource is also taken as shared in several + * no-fail MMU paths, which could return TDX_OPERAND_BUSY on contention. + * A read lock here would be enough to exclude the contention, but take + * a write lock to avoid the weird inversion. + */ + scoped_guard(write_lock, &vcpu->kvm->mmu_lock) { + err = tdh_vp_init(&tdx->vp, vcpu_rcx, vcpu->vcpu_id); + if (TDX_BUG_ON(err, TDH_VP_INIT, vcpu->kvm)) + return -EIO; + } vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; -- 2.51.1