From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1AC1F184 for ; Sat, 18 Jan 2025 00:03:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737158613; cv=none; b=svjKbyuYcnFJb6VW2F1fF97VIgSqfSd9cHeDKFlSQqNX3o1tMOcvx24YEfIv++0APYA5cOsF/bo0zOncdg/ixgg0j/P+peMdaKUQCdKXAyoy17ezc0OOfPuRSJ/cW25SoBjGDRRak+TlZYLUYRa+Nqd/upcjlss9Q+veZyiDeXA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737158613; c=relaxed/simple; bh=qJdY9x8suc1rcBEi6tXK+0Y9fSUvD+XII4lshzIJFHI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MR0Meql9SWpneX6RJplgkYW8f7s4IU4JF3EYSTgyq0gfe4Zm8znF+JArxcs7vLmfb8vL7c2TXj2F2uQDWZuv/xrqEsXWtBGyBFzIgN33eVHyKoRmtD9lmGpHixCyYEwsbTKVjySEzFkaS8RIlZTgIgosO61R5zSK9s8YkfCdM84= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RNbErqVw; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RNbErqVw" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef9dbeb848so5079545a91.0 for ; Fri, 17 Jan 2025 16:03:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737158611; x=1737763411; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=etfdB542KXwDSKrqFlaD50tTuvndpX1rN0vVPU52dZU=; b=RNbErqVw3YJQnS/7KeiewE6/IgE50XX2Yq5aujxBJv0E5Hokv2UQXGbxSFwI2FMkpf dSEDmb38iZoKElSRfLefRm6hgH3H/ufWYN+8WhC26w17sPEGUq7+tLuerXTVg5aZcmvk rvFCBADSif6Z7a05x7DPh/2sGDYj2pHXYNURiCgqA+3fR6Bu44Rphe5lF5swIFdRfN75 c8JwbMwIg+ANoA2XCVZ0ndj3c/DZ9Ej0JsI27ghLwW9W2skiNzbBLvfnaF/h5Anbt/DQ FfXVErS+yk4ogfcLri8LS9Hq3fHhWedcvT2XQJ7Qj+wxdTjrh9s9hSMsFeTRp2HNfOOS nC+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737158611; x=1737763411; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=etfdB542KXwDSKrqFlaD50tTuvndpX1rN0vVPU52dZU=; b=pn8X7v2jGeftuBBa1WQGIqAsNygi68ZP7+F/iQGwWU8jSk10LPcVnVij1YEcELBbyB RlPvPaecN6UB2bcJBWTqvshe+XNfbDfitloreUA9SjCEt9N6NNnR+miALFrd0+WS801e FnNafgERbrMicbaKDfG1bSrJjd2oi+lKzEzPf/5Bfl+rFSJtkagrE+baZKUhgAYZi6/q XmmpS/BX57/mSeYO6XOiP1j3lAZG+cEVUoABk2oUlsePcQ2AWTDDTcbX6c/2p2UEtX3R wFN9bcbe8Zmj6H3mUqL7QmlXm9nU83BGoqhqcRIAcLPRcip/BLYSE/nfzC+2KrWTrVe/ WQHA== X-Forwarded-Encrypted: i=1; AJvYcCXRMGOZG4yAhrfMYzgoWOUtNnxRJOfm+fPn7HeIui9zvCbIQjKMu7EMtHZj0FAC6n/RTo16sXZ8oyG7fDc=@vger.kernel.org X-Gm-Message-State: AOJu0Yw4JyOo/dGjSl+vCdo1AREhDyGlqS82iZF7dYrqjpvJYiLEmYv3 AJ9Nu2OvIkCXrYjNfqXHdp4Pxlok48OEmOLiOdKT3tSxJxjTw3ACf9XdjK2nbg9dmiPZzW0dy3y NSw== X-Google-Smtp-Source: AGHT+IHfKny2wEdvKVczFtFx/BNFEUOaoX62L1v/XH5SA10ePHOYZXPulOMMzYqGAfpFB2ydGgUrznkTLDg= X-Received: from pjbli9.prod.google.com ([2002:a17:90b:48c9:b0:2ee:3cc1:7944]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1f8e:b0:2ef:114d:7bf8 with SMTP id 98e67ed59e1d1-2f782c4ff33mr6042973a91.6.1737158611359; Fri, 17 Jan 2025 16:03:31 -0800 (PST) Date: Fri, 17 Jan 2025 16:03:30 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit From: Sean Christopherson To: Yosry Ahmed Cc: Jim Mattson , Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Fri, Jan 17, 2025, Yosry Ahmed wrote: > On Fri, Jan 17, 2025 at 10:01=E2=80=AFAM Sean Christopherson wrote: > > Yep. I suspect the issue is lack of documentation for TLB_FLUSH_GUEST = and > > TLB_FLUSH_CURRENT. I'm not entirely sure where it would be best to doc= ument > > them. I guess maybe where they are #defined? >=20 > I guess at the #define we can just mention that they result in calling > kvm_vcpu_flush_tlb_{guest/current}() before entering the guest, if > anything. Yeah, a "See xx for details" redirect is probably the best option. > The specific documentation about what they do could be above the > functions themselves, and describing the potential MMU sync is > naturally part of documenting kvm_vcpu_flush_tlb_guest() (kinda > already there). >=20 > The flush_tlb_guest() callback is documented in kvm_host.h, but not > flush_tlb_current(). I was going to suggest just documenting that. But > kvm_vcpu_flush_tlb_guest() does not only call flush_tlb_guest(), but > it also potentially synchronizes the MMU. So only documenting the > callbacks does not paint a full picture. >=20 > FTR, I initially confused myself because all kvm_vcpu_flush_tlb_*() > functions are more-or-less thin wrappers around the per-vendor > callbacks -- except kvm_vcpu_flush_tlb_guest(). >=20 > > > > TLB_FLUSH_GUEST is used when a flush of the guest's TLB, from the guest= 's > > perspective, is architecturally required. The one oddity with TLB_FLUS= H_GUEST > > is that it does NOT include guest-physical mappings, i.e. TLB entries t= hat are > > associated with an EPT root. >=20 > The way I think about this is how it's documented above the per-vendor > callback. It flushes translations created by the guest. The guest does > not (directly) create guest-physical translations, only linear and > combined translations. That's not accurate either. When L1 is using nested TDP, it does create gu= est- physical translations. The lack of any form of handling in TLB_FLUSH_GUEST= is a reflection of two things: EPT is weird, and nested SVM doesn't yet suppor= t precise flushing on transitions, i.e. nested NPT handling is missing becaus= e KVM unconditionally flushes and synchronizes. EPT is "weird" because the _only_ time guest-physical translations are flus= hed is when the "wrong" KVM MMU is loaded. The only way to flush guest-physica= l translations (short of RESET :-D) is via INVEPT, and INVEPT is a root-only = (VMX terminology) instruction, i.e. can only be executed by L1. And because L1 = can't itself be using EPT[*], INVEPT can never target/flush the current context. Furthermore, INVEPT isn't strictly tied to a VMCS, e.g. deferring the emula= ted flush until the next time KVM runs a vmcs12 isn't viable. Rather than add dedicated tracking, KVM simply unloads the roots and lets the normal root "allocation" handle the flush+sync the next time the vCPU uses the associat= ed MMU. Nested NPT is different, as there is no INVNPT. Instead, there's the ASID = itself and a flushing control, both of which are properties of the VMCB. As a res= ult, NPT TLB flushes that are initiated by a hypervisor always take effect at VM= RUN, e.g. by bumping the ASID, or via the dedicated flushing control. So when proper handling of TLB flushing on nested SVM transition comes alon= g, I do expect that either kvm_vcpu_flush_tlb_guest() will grow. Or maybe we'll= add yet another TLB_FLUSH_XXX flavor :-) One thing that could be helpful would be to document that KVM doesn't use TLB_FLUSH_GUEST to handle INVEPT, and so there's no need to sync nested TDP= MMUs. [*] Even in a deprivileged scenario like pKVM, the guest kernel would becom= e L2 from KVM's perspective.