From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74E723B9604 for ; Thu, 15 Jan 2026 16:26:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768494389; cv=none; b=TyopzvG0c3JUAkuh+Qrtvn1E5lAWqRHCO8UoD5IStfMEpT0aoc8EIriNW8fXf9G1ZTpMwxy6WhyqMp2mfo9TPJVbh1q+RQF/47MvenFShKSAcFzsr19wGnyN/d6HxYtukPO6pH18ii+J3cEeNNwK3fLrcaVNQt9U1A7R9ssoAEc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768494389; c=relaxed/simple; bh=O9Ec/dFOzt00Qe0E/Y4PV0oguKJYyQ+GkPJ3I10uCF0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oXKz5BNmIuXoyaGhn0pf18c4ikeNkSC5fw0tdQCr/wuHepotAnynF0kcczvfuCXtrpoHe/iaCGhEa1BQALG3PPNdMCXDHBFnabkOOp67bgbA4tqtIO8XR/HWQ6bLUg5E7hTBlH9QlTk+m4CYZToDcGP24kS2DJaBPSiqyHTRUTk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1pUkNHI4; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1pUkNHI4" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34abd303b4aso2008487a91.1 for ; Thu, 15 Jan 2026 08:26:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1768494383; x=1769099183; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Cqob1pqf/0CpEhvrUSjLUn32N0WxHoPxiFVd1xN70ko=; b=1pUkNHI4EXfjH8G5Ha5E+hpouQfs1homgMGBOEFxyh8p0kiUSqKTBTNKAuiDhYpOy2 x5Y0Hcgamrv0PHIQEZ35lTa03M9b23StvcoHIKc06dJ6f4id03Wj7qej0qPFqQoBjAzY sdnXUDSjEkYGMRToOAP1sq+Bysdx6kZ5roAiWHoJDY/VFyAlkUCQQus/1Fx7bYIIXii/ djjc7M0cCwky258xkzb9gnmGHP1THC3xCQJPNYXTrh59eb/guirmRxUw2ibVYCTfyOD7 DI0AqFjQxR1hXP2yOZjVUkz+QWTDV6Z1QSBmAjdzWlHmjhvX3xL5t+znSLAOmFfPpyTG DGTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768494383; x=1769099183; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Cqob1pqf/0CpEhvrUSjLUn32N0WxHoPxiFVd1xN70ko=; b=Jlb+UPg6dYXPydI84ggnUYTFK8RLK89RQ60V7asKfq0EeKUK6tdMCq2vTbMQdvCxz7 G3zc6Y+xEuvJMMx97gDs0R5swKIvZb/BtEP7cUjNibarHCNoQVGWFTwpcb5wKAqj2LSy 3Ap0iH2P2rQ7Fc9NcsFMWzx2rLuT2HHWV9BXTKT/A/JV0hsTB2XM9ih0f/cYDtO5RKx9 koxyrHlgUAkspX0xDpZiiBceRqd+rP9NybOhBPG3AKAIKXGHV4MiJVwWV599cWUS+wfe zg6vcxo9npORdp+X48FCojLR1SaSpOFQjMUtCjNTZ0nfwKzEZeJqC9IqO5QQtWFk8S6v wYQQ== X-Forwarded-Encrypted: i=1; AJvYcCXfJ/PcOxrJA0DTXf3GVchgTe29n5C7pE8jEB2kMtWzvY+ImHUcI2rCZBGGSmzeCscLJ6mi5BIV2f/2phE=@vger.kernel.org X-Gm-Message-State: AOJu0YwnyqtHAKhbzssSrzcweiuLIqpZqqteRiGKsH77a3N59kVCSLmg zE2hIXwKJDF0IEssTuk8k3K6lIukFxJIpgcTtPXEGndcRELtsLibBDVQzl0/u7BlCmKi7070BaF 9MJCTiQ== X-Received: from pjbft24.prod.google.com ([2002:a17:90b:f98:b0:347:76e2:5ff6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3943:b0:336:9dcf:ed14 with SMTP id 98e67ed59e1d1-3510911d394mr7062210a91.23.1768494382813; Thu, 15 Jan 2026 08:26:22 -0800 (PST) Date: Thu, 15 Jan 2026 08:26:21 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory From: Sean Christopherson To: Yan Zhao Cc: Ackerley Tng , Vishal Annapurve , pbonzini@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org, rick.p.edgecombe@intel.com, dave.hansen@intel.com, kas@kernel.org, tabba@google.com, michael.roth@amd.com, david@kernel.org, sagis@google.com, vbabka@suse.cz, thomas.lendacky@amd.com, nik.borisov@suse.com, pgonda@google.com, fan.du@intel.com, jun.miao@intel.com, francescolavra.fl@gmail.com, jgross@suse.com, ira.weiny@intel.com, isaku.yamahata@intel.com, xiaoyao.li@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, chao.p.peng@intel.com, chao.gao@intel.com Content-Type: text/plain; charset="us-ascii" On Thu, Jan 15, 2026, Yan Zhao wrote: > On Wed, Jan 14, 2026 at 07:26:44AM -0800, Sean Christopherson wrote: > > Ok, with the disclaimer that I hadn't actually looked at the patches in this > > series before now... > > > > TDX absolutely should not be doing _anything_ with folios. I am *very* strongly > > opposed to TDX assuming that memory is backed by refcounted "struct page", and > > thus can use folios to glean the maximum mapping size. > > > > guest_memfd is _the_ owner of that information. guest_memfd needs to explicitly > > _tell_ the rest of KVM what the maximum mapping size is; arch code should not > > infer that size from a folio. > > > > And that code+behavior already exists in the form of kvm_gmem_mapping_order() and > > its users, _and_ is plumbed all the way into tdx_mem_page_aug() as @level. IIUC, > > the _only_ reason tdx_mem_page_aug() retrieves the page+folio is because > > tdx_clflush_page() ultimately requires a "struct page". That is absolutely > > ridiculous and not acceptable. CLFLUSH takes a virtual address, there is *zero* > > reason tdh_mem_page_aug() needs to require/assume a struct page. > Not really. > > Per my understanding, tdx_mem_page_aug() requires "struct page" (and checks > folios for huge pages) because the SEAMCALL wrapper APIs are not currently built > into KVM. Since they may have callers other than KVM, some sanity checking in > case the caller does something incorrect seems necessary (e.g., in case the > caller provides an out-of-range struct page or a page with !pfn_valid() PFN). As a mentioned in my reply to Dave, I don't object to reasonable sanity checks. > This is similar to "VM_WARN_ON_ONCE_FOLIO(!folio_test_large(folio), folio)" in > __folio_split(). No, it's not. __folio_split() is verifying that the input for the exact one thing it's doing, splitting a huge folio, matches what the function is being asked to do. TDX requiring guest_memfd to back everything with struct page, and to only use single, huge folios to map hugepages into the guest is making completely unnecessary about guest_memfd and KVM MMU implementation details. > With tdx_mem_page_aug() ensuring pages validity and contiguity, It absolutely does not. - If guest_memfd unmaps the direct map[*], CLFLUSH will fault and panic the kernel. - If the PFN isn't backed by struct page, tdx_mem_page_aug() will hit a NULL pointer deref. - If the PFN is back by struct page, but the page is managed by something other than guest_memfd or core MM, all bets are off. [*] https://lore.kernel.org/all/20260114134510.1835-1-kalyazin@amazon.com > invoking local static function tdx_clflush_page() page-per-page looks good to > me. Alternatively, we could convert tdx_clflush_page() to > tdx_clflush_cache_range(), which receives VA. > > However, I'm not sure if my understanding is correct now, especially since it > seems like everyone thinks the SEAMCALL wrapper APIs should trust the caller, > assuming they are KVM-specific. It's all kernel code. Implying that KVM is somehow untrusted is absurd.