From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A739257827 for ; Wed, 4 Feb 2026 02:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770171376; cv=none; b=MWT4+cK/+SwtCCFiDr5bG3oVx541YSBBfSoaKD9ApcwHv/EM8AoOYtKvCv9PN7tvCRX2ccwtAIHLSfdFzxNr44sgs1PM+c2F39WBJUSUG/VMpSRK+72zGWY2cPw89puyCbOAyefZcIRMZ8ZGHroV8OxmSObEKhwbZAYcnazF0wc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770171376; c=relaxed/simple; bh=Sf5jVo4PBZ8D8xrb/7N0ZgU5WeJfHYchhsPy0/JAdI4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ozg5+ZV3eTtaokS9LQC5BadrFYVb7cLf0fMus3HLNDcDxFMat2Iu/rAUTCabV15J6Sgn03r8gE4YYQP6Tfa4dKJT8ptDjQjUraJlk1JbUcTTT6tzvL+q7SigFWC0WtBFQ+YohqVRzu4Bel76gBvVzkfJqCQ1D3zqAe2urJpeRbU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iSz+hOGy; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iSz+hOGy" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b62da7602a0so4070576a12.2 for ; Tue, 03 Feb 2026 18:16:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770171375; x=1770776175; darn=lists.linux.dev; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7Mow+gtTd3xBrxz7PLLvBC7PjYT0zAewqKSEgx5mE5o=; b=iSz+hOGyEYXeNrmUBMAigp498ktJfd4xQCf318uMhdAW6lQ5x4/jFw8BZIj+elumOA QkzlpRC63qzfk17wKb8J0ijQCswncx0Go1cxAWIp4WiwnKOEx02klsGzS/ZmuHTd5d7J mIhHYg+n3AvLLfX4GI9tgXN+za4mE44LYItGvD3q6/3T7UMORMjKQPa1BvnLVZHkMoV/ in0RbEU+CmrmdR+QA/VMAxs8uJ0Uzq6XEdspUbpEEgwdOgfBYgrSI98S5RK7QElaIp5Q E+ljspRTVsyJupUKGoNC90jGxtccgBpMgrPbtKC3z/XIe+Fk5s3QaIopJUv+xVmy6EVZ OBQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770171375; x=1770776175; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7Mow+gtTd3xBrxz7PLLvBC7PjYT0zAewqKSEgx5mE5o=; b=poA4bAYjgOjHbBNg2q0ESZX/a82BA3609Vs3na4Rtc7ZwV9qRXh8EYGIAs5AEHOpXx 8jWIESaI3pA8kgUD100/HC+3tfFaArHmHXN24ooGhNMGL/W5BOHOAuCqlpaCoPb+q3uT Q1X25sIPgwnZRlDKKu8jzKHGfQeshLWounOYKqi2cQsCtHzu/S97vCwOyLQQMDkap9P7 XvvrHdQMbGzdo4Xz8XpsrVtp0pCAO0A8UIhmPqpQpvUu2pOXr5MXQL/kUWiuXkX/tulM ghE7Sej3Bs0sqb+/CVQF+WkDPp6tggoAq0PQ9y/BlluQGX6Rs/6Di7pwte7rPT0NxlNO DQ0g== X-Forwarded-Encrypted: i=1; AJvYcCXLRjf16obZs1v03Dza5IEro5My9deNzX2nKqdeLqTF5vTrMgB2f9gweG5Vip9clEQnEKL+2Ztp1u+B@lists.linux.dev X-Gm-Message-State: AOJu0YyZcaZsJlTk48dlZJ0ps0cMYevEJ9M1JgOtGIV2a0tdeuthbxcb KNp4kZZCcdacp6v25VmkNJgMZPJ1V8zF3gxNv37UBUbrTs/dP5Xt3KhXjP/46Fj7M+EfTPnYoFp PvBioEA== X-Received: from pgan187.prod.google.com ([2002:a63:40c4:0:b0:c51:8b09:2a32]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6300:6682:b0:38d:e87c:48c2 with SMTP id adf61e73a8af0-39372498707mr1358827637.58.1770171374750; Tue, 03 Feb 2026 18:16:14 -0800 (PST) Date: Tue, 3 Feb 2026 18:16:13 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-20-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator From: Sean Christopherson To: Kai Huang Cc: "kvm@vger.kernel.org" , "linux-coco@lists.linux.dev" , Xiaoyao Li , Yan Y Zhao , "dave.hansen@linux.intel.com" , "kas@kernel.org" , "mingo@redhat.com" , "binbin.wu@linux.intel.com" , "pbonzini@redhat.com" , "ackerleytng@google.com" , "linux-kernel@vger.kernel.org" , Isaku Yamahata , "sagis@google.com" , "tglx@kernel.org" , Rick P Edgecombe , "bp@alien8.de" , Vishal Annapurve , "x86@kernel.org" Content-Type: text/plain; charset="us-ascii" On Tue, Feb 03, 2026, Kai Huang wrote: > On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote: > > On Tue, Feb 03, 2026, Kai Huang wrote: > > > On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote: > > > > Extend "struct kvm_mmu_memory_cache" to support a custom page allocator > > > > so that x86's TDX can update per-page metadata on allocation and free(). > > > > > > > > Name the allocator page_get() to align with __get_free_page(), e.g. to > > > > communicate that it returns an "unsigned long", not a "struct page", and > > > > to avoid collisions with macros, e.g. with alloc_page. > > > > > > > > Suggested-by: Kai Huang > > > > Signed-off-by: Sean Christopherson > > > > > > I thought it could be more generic for allocating an object, but not just a > > > page. > > > > > > E.g., I thought we might be able to use it to allocate a structure which has > > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page'. But > > > it seems you abandoned this idea. May I ask why? Just want to understand > > > the reasoning here. > > > > Because that requires more complexity and there's no known use case, and I don't > > see an obvious way for a use case to come along. All of the motiviations for a > > custom allocation scheme that I can think of apply only to full pages, or fit > > nicely in a kmem_cache. > > > > Specifically, the "cache" logic is already bifurcated between "kmem_cache' and > > "page" usage. Further splitting the "page" case doesn't require modifications to > > the "kmem_cache" case, whereas providing a fully generic solution would require > > additional changes, e.g. to handle this code: > > > > page = (void *)__get_free_page(gfp_flags); > > if (page && mc->init_value) > > memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64)); > > > > It certainly wouldn't be much complexity, but this code is already a bit awkward, > > so I don't think it makes sense to add support for something that will probably > > never be used. > > For this particular piece of code, we can add a helper for allocating normal > page table pages, get rid of mc->init_value completely and hook mc-page_get() > to that helper. Hmm, I like the idea, but I don't think it would be a net positive. In practice, x86's "normal" page tables stop being normal, because KVM now initializes all SPTEs with BIT(63)=1 on x86-64. And that would also incur an extra RETPOLINE on all those allocations. > A bonus is we can then call that helper in all places when KVM needs to > allocate a page for normal page table instead of just calling > get_zerod_pages() directly, e.g., like the one in > tdp_mmu_alloc_sp_for_split(), Huh. Actually, that's a bug, but not the one you probably expect. At a glance, it looks like KVM incorrectly zeroing the page instead of initializing it with SHADOW_NONPRESENT_VALUE. But it's actually a "performance" bug, because KVM doesn't actually need to pre-initialize the page: either the page will never be used, or every SPTE will be initialized as a child SPTE. So that one _should_ be different, e.g. should be: diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a32192c35099..36afd67601fc 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1456,7 +1456,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, if (!sp) return NULL; - sp->spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); + sp->spt = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); if (!sp->spt) goto err_spt; > so that we can have a consistent way for allocating normal page table pages.