From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 710FB18024 for ; Tue, 3 Feb 2026 21:17:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770153448; cv=none; b=WqGGudkDYCHM6jHftMl4gxhTVBu5v6Lh2u8BOigZpWi5C6yZyJh/VpiUMlJeOQsbV692NP9lT0tbjllc/qQRUHooujIegvPlHmZwweZcdTEXe8pp1/r0ks/S5MZEZanjUmKgvVaHTl51QHBW2oBueCRIKYigCkBBSULtjxfYxKk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770153448; c=relaxed/simple; bh=q7TBmmYy5zKha9UWOs8VX5s4nynivNQu/jxFVFMQUSw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JGTWFPbXKs/GaVWwZFUhh8++AilKy+bOpWFCnV/sSMdEkjUrpLfGduB7HcmVaQrCxCCf8rGfBF/b88FJ9Ir/jCoZktZrsQg4IK1txQMXFkmrq/K4BFsoVIXytkHPzNIJcVKW4IWzHx3EEKHuPAgf72oo9iqH63nekjmFcwdMObI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=w4USU2lD; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="w4USU2lD" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a8fc061ce1so30883285ad.0 for ; Tue, 03 Feb 2026 13:17:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770153447; x=1770758247; darn=lists.linux.dev; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=D0/+kZSHM7M0m1jLitFGSeKA06Tacic7vjEUioHDGHc=; b=w4USU2lDef1HQLvyq4r19/p3TewcLKCrQpLy17Usj/fXxQLow/qme1aArtGVbdfHjr y4wFM+lCMKy76jJPdP5Sh6eBdTpqQFt21kK7BZ9YXJFZFy9xuFYmAwQuoybZ4zr8uA+f NbLrUXUvXCsKyd6r+EMyMrY4mmNbtw+Dby7xQ32LyIvu6jL/09jCNJtOGH8MnF4TZ+Kq sCil0HRHA0+gkItKlc+x7OC5ZRj8RGoRewyJ/Mn3R2pnlqh4w0B5zmq6na5D7UvCmAIB p8I2sib/q0kzME82w4GE/RR4OnaDZcea5CGPXkeh9vWes4epY6wSDgFs3rnajHEPbLMv i/Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770153447; x=1770758247; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=D0/+kZSHM7M0m1jLitFGSeKA06Tacic7vjEUioHDGHc=; b=RRuWBMBV67n4B8AZSbhnCEszQ6IgUfP0Az2Q3mt3K8DqGtuJWdEyKFCcIYPSAGbzcH y+4zq0p0uWVp9ZtyNzICHlnWh1RrL8ou8MPnpCXaHPC2r6/RJLRG7STSaShvCnbcK35k 8eQHTiY379R3yoxCrDNHW5FJBj0Z6PtCNsYariTGWD6fROiup5stnFtIHHedPeYB2JYy oHUIldyFbUdMof/4hWLQL2TwwltN/8+I9MYuvSrq5e6fVNXhzIBX42PbxZgSLputSi+J TcJd/LdWYI6T3yamAV8nlHfd0zu53ALyq6VV1+kgwxiBrYLRoBC8DbQ+eyzcRbBBBVzL 3u3A== X-Forwarded-Encrypted: i=1; AJvYcCXOlqWg0CEA7dSRBcHcmEeMaJoL89EFIGWAUtzkIR/tmvD/i5O89qARdXpxvx2SxpkLcFa5AF2fsfJ5@lists.linux.dev X-Gm-Message-State: AOJu0Yz6lBz7vxCsz1DfeopQFf9Z+cYlqI9/AZGQiTuQX7HElhcgLcEH dbqStiiu078fLKykW4xF/yMvsBEUayEZ6MkIohl2Eh0fuhJ8btKTNzdYBr2vEjpHFr/ER4ijUeB kGX310A== X-Received: from plrq7.prod.google.com ([2002:a17:902:b107:b0:2a0:c92e:a37a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:8cd:b0:2a7:683c:afc6 with SMTP id d9443c01a7336-2a933fbc24emr6884695ad.39.1770153446865; Tue, 03 Feb 2026 13:17:26 -0800 (PST) Date: Tue, 3 Feb 2026 13:17:25 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-20-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator From: Sean Christopherson To: Rick P Edgecombe Cc: Kai Huang , "kvm@vger.kernel.org" , "linux-coco@lists.linux.dev" , Xiaoyao Li , Yan Y Zhao , "dave.hansen@linux.intel.com" , "kas@kernel.org" , "mingo@redhat.com" , "binbin.wu@linux.intel.com" , "pbonzini@redhat.com" , Isaku Yamahata , "ackerleytng@google.com" , "linux-kernel@vger.kernel.org" , "sagis@google.com" , "tglx@kernel.org" , "bp@alien8.de" , Vishal Annapurve , "x86@kernel.org" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Feb 03, 2026, Rick P Edgecombe wrote: > On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote: > > > E.g., I thought we might be able to use it to allocate a structure wh= ich has > > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page= '.=C2=A0 But > > > it seems you abandoned this idea.=C2=A0 May I ask why?=C2=A0 Just wan= t to understand > > > the reasoning here. > >=20 > > Because that requires more complexity and there's no known use case, an= d I > > don't see an obvious way for a use case to come along.=C2=A0 All of the > > motiviations for a custom allocation scheme that I can think of apply o= nly to > > full pages, or fit nicely in a kmem_cache. > >=20 > > Specifically, the "cache" logic is already bifurcated between "kmem_cac= he' and > > "page" usage.=C2=A0 Further splitting the "page" case doesn't require m= odifications > > to the "kmem_cache" case, whereas providing a fully generic solution wo= uld > > require additional changes, e.g. to handle this code: > >=20 > > page =3D (void *)__get_free_page(gfp_flags); > > if (page && mc->init_value) > > memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64)); > >=20 > > It certainly wouldn't be much complexity, but this code is already a bi= t > > awkward, so I don't think it makes sense to add support for something t= hat > > will probably never be used. >=20 > The thing that the design needlessly works around is that we can rely on = that > there are only two DPAMT pages per 2MB range. We don't need the dynamic p= age > count allocations. >=20 > This means we don't need to pass around the list of pages that lets arch/= x86 > take as many pages as it needs. We can maybe just pass in a struct like K= ai was > suggesting to the get/put helpers. So I was in the process of trying to m= orph > this series in that direction to get rid of the complexity resulting from= the > dynamic assumption.=20 >=20 > This was what I had done in response to v4 discussions, so now retrofitti= ng it > into this new ops scheme. Care to warn me off of this before I have somet= hing to > show? That's largely orthogonal to this change. This change is about preparing t= he DPAMT when S-EPT page is allocated versus being installed. The fact that D= PAMT requires at most two pages versus a more dynamic maximum is irrelevant. The caches aren't about dynamic sizes (though they play nicely with them), = they're about: (a) not having to deal with allocating under spinlock (b) not having to free memory that goes unused (for a single page fault) (c) batching allocations for performance reasons (with the caveat that I = doubt anyone has measured the performance impact in many, many years). None of those talking points change at all if KVM needs to provide 2 pages = versus N pages. The max number of pages needed for page tables is pretty much the= same thing as DPAMT, just with a higher max (4/5 vs. 2). In both cases, the all= ocated pages may or may not be consumed for any given fault. For the leaf pages (including the hugepage splitting cases), which don't ut= ilize KVM's kvm_mmu_memory_cache, I wouldn't expect the KVM details to change all= that much. In fact, they shouldn't change at all, because tracking 2 pages vers= us N pages in "struct tdx_pamt_cache" is a detail that is 100% buried in the TDX= subsystem (which was pretty much the entire goal of my design). Though maybe I'm misunderstanding what you have in mind?