From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CD3225FA29 for ; Fri, 6 Feb 2026 15:01:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770390077; cv=none; b=IgspAxiydqODxw9znreFDpxIB/emQYBVQ0DHtZ4tlRWUYOYtYLijtzffh/tDE7vu3FIlG/8XZxV5fO4RYRgEX340/r1EVyPP626HG98a2/mnJu0YkCuF+jHtsADo2Qbgpm4z2DvjoajQ8KdhBZUgXvVe2hQDqZRDJP/9oeVxP60= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770390077; c=relaxed/simple; bh=+0p5/2cxZHOIH3gdTEZejCJJDcduW40PMOoDLbWCWGs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CU49oOTNMykoer/zE8ndkUnppTTJJhtaAIaUUZlH3U14bs/sjPAbMp2JGIJ8hdybLLb51barUDtO/Yt59sNiWsA83EA4a+3e3vT2Nd1RwEWBvPw7xUSXJ2v9ZThypPZtvUIp1S6/QVjF7lUYmkoFTY+A82wCt4Li2dnzLrCr2FI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=bNJ4IIkT; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="bNJ4IIkT" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34ec823527eso4352812a91.2 for ; Fri, 06 Feb 2026 07:01:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770390077; x=1770994877; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BKp8ZbACqKocuJ3VqCzgsghWr3tFDD71E/QKe7BVqwM=; b=bNJ4IIkT2FaaMseh6/6VmufUlVsHfYt1gLWy8AG72eu2jLgO0Bqj86E/MUtI37IjLH PcfvyU3ftZrBJCOatb3dW/skYMeIYDn8C/gtqd4/3VamGK3UFYF4XelJScNHbSQysedm DRFnC7ABKhrz1Lo46wTBTzEK/y2mTJLIFK/HKnHLme7J6V/aCZKoBty5JMbjGb4nXghY bev6c6zVS9V6IWUmY+o9wKAXQIKfprRxLeuXK5M6holm+bCLz/pRRDIWcX4opBYHhyFo D369uZtrAOFmsLoXIKbHm31L20zgioePio9TVyH4egKZwyNK7LbyHvv8X9EH2ifnFfzJ rhTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770390077; x=1770994877; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BKp8ZbACqKocuJ3VqCzgsghWr3tFDD71E/QKe7BVqwM=; b=Hl5azMKjOlPY4N+BZY0yMwie6yTL0N97Wur2hxTteFdCbMB+ITOiXIrLB+6TzkFkAb Vh/UpGymCSPyz5tVSVJJLnwC9nm7v4uKYsHzCgmYl+sMmrVEhK9ar0RhwwB6yPVykqzp EWU0mm6SHo9XmJMieUgAD23H8Nc2KtYMHuYN4n/E1XRmUrzAKx3608gPFw7fTlj+IIHW hfif6fm3pJVAZ/Yc99mlrR2ub0Nip7r/Lra+uPyMheFdKtlBiSHmxjfEpEvTAonm9Cvs +X/19EBc6pdxigFZh/76XKfrZexz1cHxNLv9QrSdGf4yoVhImX+DZpcuayCkDWUipnou s5pA== X-Forwarded-Encrypted: i=1; AJvYcCWCpM470TwUOKr49TbAx3mj4UbGXd7jDijU8MsCMa4kEgvr3dtK4B2d2Cx/x++/rekF3tFELVL8EVzbV/g=@vger.kernel.org X-Gm-Message-State: AOJu0YxgCnHRG/rJSi2LvBx3L2T14fbk5uZVWxU+sf5nQAB/r1NAu0Gq DHWlEGdvLKOKyAKtfE3jJLEbTrkfEnziRLTgrv//ClAVbevgkoBZubY2C0sIyH7CZpcu0/9oSAh CC0Op3Q== X-Received: from pjvc4.prod.google.com ([2002:a17:90a:d904:b0:340:c0e9:24b6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2e4e:b0:341:b5a2:3e7b with SMTP id 98e67ed59e1d1-354b3b7d70bmr2708644a91.4.1770390076562; Fri, 06 Feb 2026 07:01:16 -0800 (PST) Date: Fri, 6 Feb 2026 07:01:14 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-21-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages using tdx_{alloc,free}_control_page() From: Sean Christopherson To: Yan Zhao Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Paolo Bonzini , linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Type: text/plain; charset="us-ascii" On Fri, Feb 06, 2026, Yan Zhao wrote: > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > > index 18764dbc97ea..01e3e4f4baa5 100644 > > --- a/arch/x86/kvm/mmu/tdp_mmu.c > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > > @@ -55,7 +55,8 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) > > > > static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) > > { > > - free_page((unsigned long)sp->external_spt); > > + if (sp->external_spt) > > + kvm_x86_call(free_external_sp)((unsigned long)sp->external_spt); > > free_page((unsigned long)sp->spt); > > kmem_cache_free(mmu_page_header_cache, sp); > > } > Strictly speaking, external_spt is not a control page. Its alloc/free are > different from normal control pages managed by TDX's code. Yeah, I called that out in the changelog. I'm definitley not wedded to tdx_{alloc,free}_control_page(), but I am very much against tdx_{alloc,free}_page(). (arguably S-EPT pages aren't "control" pages, but they're not guest pages either) > (1) alloc > tdx_alloc_control_page > __tdx_alloc_control_page > __tdx_pamt_get > spin_lock(&pamt_lock) ==> under process context > spin_unlock(&pamt_lock) > > (2) free > tdp_mmu_free_sp_rcu_callback > tdp_mmu_free_sp > kvm_x86_call(free_external_sp) > tdx_free_control_page > __tdx_free_control_page > __tdx_pamt_put > spin_lock(&pamt_lock) ==> under softirq context > spin_unlock(&pamt_lock) > > So, invoking __tdx_pamt_put() in the RCU callback triggers deadlock warning > (see the bottom for details). Hrm. I can think of two options. Option #1 would be to use a raw spinlock and disable IRQs: diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 823ec092b4e4..6348085d7dcb 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2246,7 +2246,7 @@ static u64 tdh_phymem_pamt_remove(u64 pfn, u64 *pamt_pa_array) } /* Serializes adding/removing PAMT memory */ -static DEFINE_SPINLOCK(pamt_lock); +static DEFINE_RAW_SPINLOCK(pamt_lock); /* Bump PAMT refcount for the given page and allocate PAMT memory if needed */ int __tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache) @@ -2272,7 +2272,7 @@ int __tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache) if (ret) goto out_free; - scoped_guard(spinlock, &pamt_lock) { + scoped_guard(raw_spinlock_irqsave, &pamt_lock) { /* * Lost race to other tdx_pamt_add(). Other task has already allocated * PAMT memory for the HPA. @@ -2348,7 +2348,7 @@ void __tdx_pamt_put(u64 pfn) if (!atomic_dec_and_test(pamt_refcount)) return; - scoped_guard(spinlock, &pamt_lock) { + scoped_guard(raw_spinlock_irqsave, &pamt_lock) { /* Lost race with tdx_pamt_get(). */ if (atomic_read(pamt_refcount)) return; -- Option #2 would be to immediately free the page in tdx_sept_reclaim_private_sp(), so that pages that freed via handle_removed_pt() don't defer freeing the S-EPT page table (which, IIUC, is safe since the TDX-Module forces TLB flushes and exits). I really, really don't like this option (if it even works). diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ae7b9beb3249..4726011ad624 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2014,7 +2014,15 @@ static void tdx_sept_reclaim_private_sp(struct kvm *kvm, gfn_t gfn, */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || tdx_reclaim_page(virt_to_page(sp->external_spt))) - sp->external_spt = NULL; + goto out; + + /* + * Immediately free the control page, as the TDX subsystem doesn't + * support freeing pages from RCU callbacks. + */ + tdx_free_control_page((unsigned long)sp->external_spt); +out: + sp->external_spt = NULL; } void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, --