From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 711B027A123 for ; Tue, 3 Feb 2026 21:17:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770153448; cv=none; b=g8cGjDdzutgsMikQuLnDrgY4RMr/NpDPsVKZiE4AG2RFrD3tB7Vy3T3U8tF+qwkY6QMbe1Y5R5z1ivUx2C2yDBZk1K2Bv3OPkNHjRD8Aynbh76Uremjw9JPJzyiXa5euvgGUtLXGwZvzN3zQFWPBr37i9+fCTSUDK8rlbHtmbJQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770153448; c=relaxed/simple; bh=q7TBmmYy5zKha9UWOs8VX5s4nynivNQu/jxFVFMQUSw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JGTWFPbXKs/GaVWwZFUhh8++AilKy+bOpWFCnV/sSMdEkjUrpLfGduB7HcmVaQrCxCCf8rGfBF/b88FJ9Ir/jCoZktZrsQg4IK1txQMXFkmrq/K4BFsoVIXytkHPzNIJcVKW4IWzHx3EEKHuPAgf72oo9iqH63nekjmFcwdMObI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oot8xxCJ; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oot8xxCJ" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a8fc061ce1so30883265ad.0 for ; Tue, 03 Feb 2026 13:17:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770153447; x=1770758247; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=D0/+kZSHM7M0m1jLitFGSeKA06Tacic7vjEUioHDGHc=; b=oot8xxCJ5H/UYfFnBsuOI9/d6jPgxb5rLuBe+pEi2KfIvirZAXQ7fMbJTUrBH+LsNH GI5gasAFU31j5AtGnWS+MkbW+dpqRGjvMgCwAlWJGuVyxmGavxuU/fo4pF99OFrJ/r3o /i6Kc/Jpa2H/f2QUtTFnuH5pzSMceu5MliKMw3mWQ7lo6OZPbS3A+OYRPaV+6bZbs/eW EXzbU/ZgszvsXzOLyojnoVUgp+gBMFLJdP8sEMAMt73MMfSInycd+7xs+hXUkIkGlUXg c4/3abUxgyhPJH20ZxbLhaOZKmZpNt47CMZ8d64QBn/rFqtdlWS8kMpP5ffK7eTWECFm nW3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770153447; x=1770758247; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=D0/+kZSHM7M0m1jLitFGSeKA06Tacic7vjEUioHDGHc=; b=uIN/09ReIrAtRaV4zqEarwEW3YfSzvEBjRKAUSAWDYFhl2t8qB0/Eg/8yNL3TYHly9 t1PhM/vjfByHAGpwBNyvSj10BDmBnVymCZAj8hbI4r1bl3Fu5hTedNkXFJr/hK0xy2YG w1bnYcKyuCLypuUnkrrjP+O8Xauv95IzzZAV4YIcyzQ1lQ50Z+hgmav9m1pZn0MecDOA /ibX5/BFz7hB1azIDsZPP/JxQDcwMKpnHw+acdxP5IPMihoLpBmh451CbzLyjKjE7eiP Y3k87FTO8n9xh9saBdMh2F8sgTuieC+s1kDNFKViSSE6UKzl0eVmw7aIJRDh/moYB5JI mbNw== X-Forwarded-Encrypted: i=1; AJvYcCWuzJ0s/UYs2pI66ALod0Nknd8GDTSY5bhiRvOwWxZB9tgsv3xf9T6+QxdrQS8DV0HaKWRldhii2/zy8c8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw0u5lwgpHoeTmKi/Oq0RjAisA0PgGnpgYeKIyw2xwmS5Ohy/Ht Q8kYqAKtSIf++EuIt3w7LnEUHVbUnuKyWOlU8iQlkQVSzwB+fI236X5+a1o2IJqX1mwTCn6HlPm z/JTaCQ== X-Received: from plrq7.prod.google.com ([2002:a17:902:b107:b0:2a0:c92e:a37a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:8cd:b0:2a7:683c:afc6 with SMTP id d9443c01a7336-2a933fbc24emr6884695ad.39.1770153446865; Tue, 03 Feb 2026 13:17:26 -0800 (PST) Date: Tue, 3 Feb 2026 13:17:25 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-20-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator From: Sean Christopherson To: Rick P Edgecombe Cc: Kai Huang , "kvm@vger.kernel.org" , "linux-coco@lists.linux.dev" , Xiaoyao Li , Yan Y Zhao , "dave.hansen@linux.intel.com" , "kas@kernel.org" , "mingo@redhat.com" , "binbin.wu@linux.intel.com" , "pbonzini@redhat.com" , Isaku Yamahata , "ackerleytng@google.com" , "linux-kernel@vger.kernel.org" , "sagis@google.com" , "tglx@kernel.org" , "bp@alien8.de" , Vishal Annapurve , "x86@kernel.org" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Tue, Feb 03, 2026, Rick P Edgecombe wrote: > On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote: > > > E.g., I thought we might be able to use it to allocate a structure wh= ich has > > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page= '.=C2=A0 But > > > it seems you abandoned this idea.=C2=A0 May I ask why?=C2=A0 Just wan= t to understand > > > the reasoning here. > >=20 > > Because that requires more complexity and there's no known use case, an= d I > > don't see an obvious way for a use case to come along.=C2=A0 All of the > > motiviations for a custom allocation scheme that I can think of apply o= nly to > > full pages, or fit nicely in a kmem_cache. > >=20 > > Specifically, the "cache" logic is already bifurcated between "kmem_cac= he' and > > "page" usage.=C2=A0 Further splitting the "page" case doesn't require m= odifications > > to the "kmem_cache" case, whereas providing a fully generic solution wo= uld > > require additional changes, e.g. to handle this code: > >=20 > > page =3D (void *)__get_free_page(gfp_flags); > > if (page && mc->init_value) > > memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64)); > >=20 > > It certainly wouldn't be much complexity, but this code is already a bi= t > > awkward, so I don't think it makes sense to add support for something t= hat > > will probably never be used. >=20 > The thing that the design needlessly works around is that we can rely on = that > there are only two DPAMT pages per 2MB range. We don't need the dynamic p= age > count allocations. >=20 > This means we don't need to pass around the list of pages that lets arch/= x86 > take as many pages as it needs. We can maybe just pass in a struct like K= ai was > suggesting to the get/put helpers. So I was in the process of trying to m= orph > this series in that direction to get rid of the complexity resulting from= the > dynamic assumption.=20 >=20 > This was what I had done in response to v4 discussions, so now retrofitti= ng it > into this new ops scheme. Care to warn me off of this before I have somet= hing to > show? That's largely orthogonal to this change. This change is about preparing t= he DPAMT when S-EPT page is allocated versus being installed. The fact that D= PAMT requires at most two pages versus a more dynamic maximum is irrelevant. The caches aren't about dynamic sizes (though they play nicely with them), = they're about: (a) not having to deal with allocating under spinlock (b) not having to free memory that goes unused (for a single page fault) (c) batching allocations for performance reasons (with the caveat that I = doubt anyone has measured the performance impact in many, many years). None of those talking points change at all if KVM needs to provide 2 pages = versus N pages. The max number of pages needed for page tables is pretty much the= same thing as DPAMT, just with a higher max (4/5 vs. 2). In both cases, the all= ocated pages may or may not be consumed for any given fault. For the leaf pages (including the hugepage splitting cases), which don't ut= ilize KVM's kvm_mmu_memory_cache, I wouldn't expect the KVM details to change all= that much. In fact, they shouldn't change at all, because tracking 2 pages vers= us N pages in "struct tdx_pamt_cache" is a detail that is 100% buried in the TDX= subsystem (which was pretty much the entire goal of my design). Though maybe I'm misunderstanding what you have in mind?