From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B6BB3D6690 for ; Thu, 12 Mar 2026 17:07:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773335285; cv=none; b=fPNf/zfk7Vr3VTm7pZo7D2lZqlRedQj32QhEnEw4U6xKQGP+efsl2hkjZ512f0XelNZVqkMr1zDiJ1JcIkZpJ5NvEQ55P0sHRtpyVl95xRB5/CDxDAYmqKE//WYA69GGjM3nR7WzRrp/Y6gHLEYabDoZ0l4ghcFsrJy+UsN4PQg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773335285; c=relaxed/simple; bh=dCihQyjycB7Usr+YxgYmF5WoXPjFADav9IhHXH7cJkU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ihOzEnIccb+I1dzfRhG8tMhmgH4IiZYQRSvVh+0klZKfaCIVaazBdCgsUzlSR0+ghmPjC8td+riF3V4DTBdYSVoJ/7xd68qbXVvNGbkb5lyXvduERtEBRrScv8l4PuCHSv5b3fZpJOLiKjH0rI2KbJFWlZSz1t4wMMD2jI8pJco= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xviDv1co; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xviDv1co" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b630b4d8d52so1032515a12.3 for ; Thu, 12 Mar 2026 10:07:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1773335279; x=1773940079; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=omOiJb4Hh8QtKDy8lo7hxG0GaV4FGO8rNZrLZ4efW5w=; b=xviDv1coGUgAKj+gQKnnyHrQW0FpiXzqhoNSS8wneP/IKgdEKPLqTb17g1AIsFLUVT +nRdMULRU1dHPmwBHgC9VEjFuLReRSxm51WhKlBxAXjvRHDzZFN5fSh1FWpj5DpeJIkX m5+yu9nn0BMgkWBDAvRRQJZkGtXpfVAICuXpOQ8fAXwoOULgWtPMXApKf69BraQEzstd UCjgiWeRUTStTC8qyZlt0jsLbZVA4l8yRfxc0hjNL7ZrcF57WniLnJO4uE5Nms0XxiNA X+QeNU+uW+0VnGlL/wqdmsPMUi3TZJNgbU9AsBD+M7MPhOvClMvXNKiWdrHHnJ7jFfDN 9xdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773335279; x=1773940079; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=omOiJb4Hh8QtKDy8lo7hxG0GaV4FGO8rNZrLZ4efW5w=; b=aa2nnSHw8Q+Rq+Y3eAgS7cxqjeBpC4W7VDu5c+BwSgHVe/dnif9MDEMPx3uaqOrTc2 RCv19jfadGRlQovjPzPNggCp8c/nvqXCK4EU4Af29sVI80XaSxsja3XxM3oDeRs9gp2w m9/SQs2LxQHKCED9ghSg8W6g9Mk0mRU8UawJEBfvhzp5QqXZP3DadnfIYGOOUYG8mqg5 m2j04mD9o7JQwshdIai1aLBPjrlqa0iCzswYjkRU6W3Vlg85PToEBCfmGZndHSdXH5mT FZMwR8ekgXOgGkdbYbIiLzwtpbPfP9oZc7O2vatf1/Tag81t544Kr7vVMmr8r8iy8pVq bDnQ== X-Forwarded-Encrypted: i=1; AJvYcCU7mzSWq6yLA2zLN0jo88q1++avCpWUnUZgtH265phAEZ6l/EuwgjHoK+OFtfuEpcYCXbw=@vger.kernel.org X-Gm-Message-State: AOJu0Ywho9FVBEabmSfDfYfbaIvUyFi2y43ckfBTwLWRiEzAUEajVk9l HyRKwWnrpyLdbg86sirkp0or0nvzmPJHtprbt46KTS4dfG8TQJiojtaMevbQh1/20xaTa0+DfsK 3WIHljw== X-Received: from pghd18.prod.google.com ([2002:a63:fd12:0:b0:c73:8dcd:3d2]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:497:b0:398:7769:f869 with SMTP id adf61e73a8af0-398ecab0dc1mr53818637.20.1773335278518; Thu, 12 Mar 2026 10:07:58 -0700 (PDT) Date: Thu, 12 Mar 2026 10:07:57 -0700 In-Reply-To: <20260123090304.32286-2-jiangshanlai@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260123090304.32286-1-jiangshanlai@gmail.com> <20260123090304.32286-2-jiangshanlai@gmail.com> Message-ID: Subject: Re: [PATCH 2/2] KVM: x86/mmu: KVM: x86/mmu: Skip unsync when large pages are allowed From: Sean Christopherson To: Lai Jiangshan Cc: linux-kernel@vger.kernel.org, Lai Jiangshan , Paolo Bonzini , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , kvm@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Fri, Jan 23, 2026, Lai Jiangshan wrote: > From: Lai Jiangshan > > Use the large-page metadata to avoid pointless attempts to search SP. > > If the target GFN falls within a range where a large page is allowed, > then there cannot be a shadow page for that GFN; a shadow page in the > range would itself disallow using a large page. In that case, there > is nothing to unsync and mmu_try_to_unsync_pages() can return > immediately. > > This is always true for TDP MMU without nested TDP, I wouldn't expect this to be a much of a performance optimization for this case though, as kvm_get_mmu_page_hash() will return an empty list, i.e. for_each_gfn_valid_sp_with_gptes() won't do meaningful work anyways. > and holds for a significant fraction of cases with shadow paging even all SPs > are 4K. > > For shadow paging, this optimization theoretically avoids work for about > 1/e ~= 37% of GFNs, assuming one guest page table per 2M of memory and > that each GPT falls randomly into the 2M memory buckets. In a simple > test setup, it skipped unsync in a much higher percentage of cases, > mainly because the guest buddy allocator clusters GPTs into fewer > buckets. > > Signed-off-by: Lai Jiangshan > --- > arch/x86/kvm/mmu/mmu.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4535d2836004..555075fb63d9 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -2932,6 +2932,14 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, > struct kvm_mmu_page *sp; > bool locked = false; > > + /* > + * If large page is allowed, there is no shadow page in the GFN range, > + * because the presence of a shadow page in that range would prevent > + * using a large page. > + */ > + if (!lpage_info_slot(gfn, slot, PG_LEVEL_2M)->disallow_lpage) > + return 0; Hmm, I'd like to move this to after the write-tracking check, even though as implemented in code today, the two are mutually exclusive. Specifically, I don't want to rely on KVM not supporting write-tracking at 2MiB granularity, and also to avoid confusing readers. E.g. a shallow read of account_shadowed() would lead people to believe this code is wrong: /* the non-leaf shadow pages are keeping readonly. */ if (sp->role.level > PG_LEVEL_4K) return __kvm_write_track_add_gfn(kvm, slot, gfn); kvm_mmu_gfn_disallow_lpage(slot, gfn); if they didn't follow __kvm_write_track_add_gfn() to see: /* * new track stops large page mapping for the * tracked page. */ kvm_mmu_gfn_disallow_lpage(slot, gfn); >From a performance perspective, kvm_gfn_is_write_tracked() is O(1) time, and should be very fast for the "pure" TDP MMU case, so I don't think that's a concern. This is what I have locally, please holler if you object to landing the code after the write-tracked check. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 363967a17069..3d0e0c1b5332 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2940,6 +2940,15 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, if (kvm_gfn_is_write_tracked(kvm, slot, gfn)) return -EPERM; + /* + * Only 4KiB mappings can become unsync, and KVM disallows hugepages + * for unsync gfns. Upper-level gPTEs (leaf or non-leaf) are always + * write-protected (see above), thus if the gfn can be mapped with a + * hugepage and isn't write-tracked, it can't be unsync. + */ + if (!lpage_info_slot(gfn, slot, PG_LEVEL_2M)->disallow_lpage) + return 0; + /* * The page is not write-tracked, mark existing shadow pages unsync * unless KVM is synchronizing an unsync SP. In that case, KVM must > /* > * Force write-protection if the page is being tracked. Note, the page > * track machinery is used to write-protect upper-level shadow pages, > -- > 2.19.1.6.gb485710b >