From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A156E1E98E6 for ; Thu, 4 Sep 2025 08:45:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756975523; cv=none; b=SLoLN7Rn5EXavszLj0MqEEkmS/C0L2Q2Qyg/LNLrgVgWKDUg5rDIxon4z9RhlbobPw6NnnV+SQWQE2/YdmoRjPRag10rV/tOf9yzKykgK7gIOpndzJQn6o7nJu7lZepbY/hdHiRIQxDyaHgQKWgG36gaFJ+ePS3XGiGIKIl+Mks= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756975523; c=relaxed/simple; bh=bF8A9kjIXOyp4KJQCwHrBsND+iKStCkVgpl25MXZkzc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aOypghzzpcP4z1JnDTfmK8aIyvBck20nkwovjB9FLT+PfhlwYJmOMfWFKb43rf4kvJ3x1BXuC3tEMqJGC8l8X4S4jO4SgsVkDoCJyESR+yA/uu6kJFOaPPiReglaAxXiJSFRRwkDDor3ylCbQLXoMLttvxZJfWRTGY8Vhnlw+Hg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=raH91HrR; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="raH91HrR" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-77267239591so1301812b3a.2 for ; Thu, 04 Sep 2025 01:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1756975521; x=1757580321; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=P71OTbqs7HkHLt+0YQ5aCeNmESM5V1X30oeNZynlP4Y=; b=raH91HrRbRogsl8dWmTlnqyl5lxpIK4vHdCEhQm0sQQ4ADvj7kT4J2yZNqzWC3sDUA zpSY+cybxMtEWChCfAwvAWXo+nvn3jR1Zw/JPdrAFP7WoSeqCJhIgLxzoUK5RkuQ+4Yx QcSGwnhyoNay6MI7oAWm1TG5vKgZXdf4A1YxJexjuDp4cRuyYwAXlOVYjomRwoRbi955 k9ooOJezddfhik10cxE3crtGL0PVnVAyBDQB4MNrZevqXX+i8uR2F7U4Elb+Gp89ebEL 3aNDeXA57QKM3P1G/hpKAwJFEUY/ZAb3ZDEJu8gt89/0r2joBLBM6mT2UCP51YFnwuzD bOQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756975521; x=1757580321; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=P71OTbqs7HkHLt+0YQ5aCeNmESM5V1X30oeNZynlP4Y=; b=LJve86oOA9S7uCwJbGT06JxSYNBDaTamM5YRepPKirHVvSigYSKzxSWhF94IGpo6rc 8rGBZK6aMWlINvTNGJrUj4pNo+TiM+IQEgFRHkWv536wNn/Dlf5aXBQNEAhAUQVLSihU 9EFJf+ff9hHZB0RjABAph0od1z8l5vX+hmTDypYqfqBJmc1Aef9dgkfpuZ3T3auaE3t3 QNBXAbuTOPxRjSXaHak8ESxUDDcbhQxlECxh+ANnfZpMLBxY+oxXiU32D2cttSnag1w/ 7W75R9T+i1Un87AXY9D4sh93uROEMHS2/3efiw6TmA9DHQUh+ptyfD09wXTO0/LGHY/E +EQg== X-Forwarded-Encrypted: i=1; AJvYcCUVK3cAEDDLhW5sYXPhWvAXaCDRXuGVV1RnlOMApaKeMPaOr5dbUD3GI+Orn1keKw5uRkk=@vger.kernel.org X-Gm-Message-State: AOJu0YzVy0eraZnAHqpvrNOVou3yt02EkZOmKRoByJS5c91Rii1kNdEx BKQLPvPbdvT8PIGSeaw5/fyTyw699UoSZ7RE5IZxIUMVy1XpzuoM8vx9aIgLg0dJu+cNTDMu5pt UJkJYIA== X-Google-Smtp-Source: AGHT+IFifnWO8BsbpNnWCA7xwWar2zICrVkC3ktYtvdG1V4ckGV0ffM8aKP/TkCPQ7eLFK9nQT5Kj0Gq72U= X-Received: from pga17.prod.google.com ([2002:a05:6a02:4f91:b0:b49:dac2:f892]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:32a7:b0:243:15b9:7791 with SMTP id adf61e73a8af0-243d6f7d02fmr23663536637.53.1756975520887; Thu, 04 Sep 2025 01:45:20 -0700 (PDT) Date: Thu, 4 Sep 2025 01:45:09 -0700 In-Reply-To: <97a422c0ba7a5d68b35b5327d3bf0cd11429c300.camel@intel.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250829000618.351013-1-seanjc@google.com> <20250829000618.351013-6-seanjc@google.com> <49c337d247940e8bd3920e5723c2fa710cd0dd83.camel@intel.com> <97a422c0ba7a5d68b35b5327d3bf0cd11429c300.camel@intel.com> Message-ID: Subject: Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in S-EPT management From: Sean Christopherson To: Rick P Edgecombe Cc: Yan Y Zhao , Kai Huang , "ackerleytng@google.com" , Vishal Annapurve , "linux-kernel@vger.kernel.org" , Ira Weiny , "kvm@vger.kernel.org" , "michael.roth@amd.com" , "pbonzini@redhat.com" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Sep 02, 2025, Rick P Edgecombe wrote: > On Tue, 2025-09-02 at 10:33 -0700, Sean Christopherson wrote: > > > Besides, a cache flush after 2 can essentially cause a memory write t= o the > > > page. > > > Though we could invoke tdh_phymem_page_wbinvd_hkid() after the KVM_BU= G_ON(), > > > the SEAMCALL itself can fail. > >=20 > > I think this falls into the category of "don't screw up" flows.=C2=A0 F= ailure to > > remove a private SPTE is a near-catastrophic error.=C2=A0 Going out of = our way to > > reduce the impact of such errors increases complexity without providing= much > > in the way of value. > >=20 > > E.g. if VMCLEAR fails, KVM WARNs but continues on and hopes for the bes= t, even > > though there's a decent chance failure to purge the VMCS cache entry co= uld be > > lead to UAF-like problems.=C2=A0 To me, this is largely the same. > >=20 > > If anything, we should try to prevent #2, e.g. by marking the entire > > guest_memfd as broken or something, and then deliberately leaking _all_= pages. >=20 > There was a marathon thread on this subject. Holy moly, you weren't kidding. > We did discuss this option (link to > most relevant part I could find): > https://lore.kernel.org/kvm/a9affa03c7cdc8109d0ed6b5ca30ec69269e2f34.came= l@intel.com/ >=20 > The high level summary is that pinning the pages wrinkles guestmemfd's pl= ans to > use refcount for other tracking purposes. Dropping refcounts interferes w= ith the > error handling safety. It also bakes even more assumptions into TDX about guest_memfd being backed= with "struct page", which I would like to avoid doing whenever possible. > I strongly agree that we should not optimize for the error path at all. I= f we > could bug the guestmemfd (kind of what we were discussing in that link) I= think > it would be appropriate to use in these cases. I guess the question is ar= e we ok > dropping the safety before we have a solution like that. Definitely a "yes" from me. For this to actually cause real world problems= , we'd need a critical KVM, hardware, or TDX-Module bug, and several unlikely even= ts to all line up. If someone encounters any of these KVM_BUG_ON()s _and_ has observed that th= e probability of data corruption is meaningful, then we can always convert on= e or more of these to full BUG_ON() conditions, but I don't see any reason to do= that without strong evidence that it's necessary. > In that thread I was advocating for yes, partly to close it because the > conversation was getting stuck. But there is probably a long tail of > potential issues or ways of looking at it that could put it in the grey a= rea.