From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D092418DF86 for ; Tue, 29 Oct 2024 15:14:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730214894; cv=none; b=E4ht8pDPi25/ASCBeAtXcxTeU9cBLoz47KtVsEHmmkSqqRYbqSA0qdC/p831tfq0EkDA40h/VtPULcaS/8XfWixFCfr4FYzuvQJoS3rpSlX7l3Lwo6iokKZV7ak6S/2/q9JXviVL5bk2FhwAN3SUHVzFev8JeFZ3751ZC+WM80U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730214894; c=relaxed/simple; bh=lJKtF9uyw57HcSU3i1U+ZzNYOAQwKTbClVh4WKhb1GI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TpSl+xpUV3Q/oocipPGcK5g6+z3FClPjPLEZg5kcyb0qmfuKBNZBGR3V8UN3jSo7484ZHufGILabEewPA7b9hOApCRFp1DRwVXbEO4Lk9o3rYb2bhUKoy+IyK4Sk8GfTrBxiQjoefFaZHQwGRom6nZRTSLGMINITuYTIIJgSx18= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RO15ECDH; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RO15ECDH" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6e9e8978516so45310517b3.0 for ; Tue, 29 Oct 2024 08:14:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730214892; x=1730819692; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KCP2Vn2Ng5zf73Gvn1yHJlAIc5fq6V4M6kVXt5RfLJM=; b=RO15ECDHxMZoZ16drfsTIUuxxzLSwDfPls1zDbnMcdfAu4uyrwoM5IV23V37DVpPNJ 1vOag/4ATlLHIYK1wt2kH1BtpSPY/RvspwHFLY3tf/T4PqoJSN7jnFgP3GZuyffOKa1Z ZRWHoVjI3VWgxPIrHYAwaYh3t73wtNmy5mOlq/uwb34AB5lLHh3+M2D1tzinOI1k6I7+ cHsO1whtw4SKLUi9clVNkmtV0ECSLn5bKa27JPAXvqHwE1tvnk8rAw4wDGDax35dBCRs C0S1v/RTifRTAYgoVCTXt53Xe1PrJQDRJq8L6UVqUV2wXpD6HoEfYV7kKjUQNP+V9FrU MSUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730214892; x=1730819692; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KCP2Vn2Ng5zf73Gvn1yHJlAIc5fq6V4M6kVXt5RfLJM=; b=dptkezJ2S0YmnrZDKLgA15ST7UkUTx20cSZ0Et7qJoDsWRHXMHU+qhkCt4opV8iCBH IcjEsewxvoEO4i/uB0MesrgojtWMltHOjnO/IMW28/q8BiEOVR2gai0Pac0DPThWCayB t1rgWgDVqWDZo7t/Im7AwRQrhCc0MlC8SZcjE47y2JUMeFol3ACLW5J6jh9OSPnb4V8o lxumsYfd60sZsWdyxUB11jnME+vctmQB/WxkBu66tJw+gXIDjKaPhUtVhDa8WP8HX7lW ESjFwVGzz8jFHMgJ0TE//koRQZDNBCx77J1KN8pZ2yErfyI5BHtJdAKm2f6NGdWmrHhx 0Lgg== X-Forwarded-Encrypted: i=1; AJvYcCUVGAL0WGGWQV3AaNA6eJTDjPb6vFlWqN9slWFv45MGAiJxAj/Jjrkhaxh3xI+RT0Ia/AQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxIYff6OhPXuHpfOmhZc1M6hYLAHbh+ZkPA07HsoIZV7V0zM2WR SauPabgia+5zzVkPf8btZS/CEe/4BBSqq5Wy0lcMs7Pza7Vira1MvPLZPbFOOLFBFcuboPFt5Lh LDQ== X-Google-Smtp-Source: AGHT+IElLOMAQKvJj1e8y35BEp+KK30WKxEW1tL1IHN4ETgeazCSYk9zCW8Wkj3atDVL2MvB3W+nt1s1Kkk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a25:d313:0:b0:e29:7479:402a with SMTP id 3f1490d57ef6-e3087c2350dmr32167276.10.1730214891688; Tue, 29 Oct 2024 08:14:51 -0700 (PDT) Date: Tue, 29 Oct 2024 08:14:50 -0700 In-Reply-To: <20241029031400.622854-3-alexyonghe@tencent.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241029031400.622854-1-alexyonghe@tencent.com> <20241029031400.622854-3-alexyonghe@tencent.com> Message-ID: Subject: Re: [PATCH 2/2] KVM: x86: introduce cache configurations for previous CR3s From: Sean Christopherson To: Yong He Cc: pbonzini@redhat.com, kvm@vger.kernel.org, wanpengli@tencent.com, alexyonghe@tencent.com, junaids@google.com Content-Type: text/plain; charset="us-ascii" On Tue, Oct 29, 2024, Yong He wrote: > From: Yong He > > Introduce prev_roots_num param, so that we use more cache of > previous CR3/root_hpa pairs, which help us to reduce shadow > page table evict and rebuild overhead. > > Signed-off-by: Yong He > --- ... > +uint __read_mostly prev_roots_num = KVM_MMU_NUM_PREV_ROOTS; > +EXPORT_SYMBOL_GPL(prev_roots_num); > +module_param_cb(prev_roots_num, &prev_roots_num_ops, > + &prev_roots_num, 0644); Allowing the variable to be changed while KVM is running is unsafe. I also think a module param is the wrong way to try to allow for bigger caches. The caches themselves are relatively cheap, at 16 bytes per entry. And I doubt the cost of searching a larger cache in fast_pgd_switch() would have a measurable impact, since the most recently used roots will be at the front of the cache, i.e. only near-misses and misses will be affected. The only potential downside to larger caches I can think of, is that keeping root_count elevated would make it more difficult to reclaim shadow pages from roots that are no longer relevant to the guest. kvm_mmu_zap_oldest_mmu_pages() in particular would refuse to reclaim roots. That shouldn't be problematic for legacy shadow paging, because KVM doesn't recursively zap shadow pages. But for nested TDP, mmu_page_zap_pte() frees the entire tree, in the common case that child SPTEs aren't shared across multiple trees (common in legacy shadow paging, extremely uncommon in nested TDP). And for the nested TDP issue, if it's actually a problem, I would *love* to solve that problem by making KVM's forced reclaim more sophisticated. E.g. one idea would be to kick all vCPUs if the maximum number of pages has been reached, have each vCPU purge old roots from prev_roots, and then reclaim unused roots. It would be a bit more complicated than that, as KVM would need a way to ensure forward progress, e.g. if the shadow pages limit has been reach with a single root. But even then, kvm_mmu_zap_oldest_mmu_pages() could be made a _lot_ smarter. TL;DR: what if we simply bump the number of cached roots to ~16?