From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2668C1DB377 for ; Mon, 3 Feb 2025 19:41:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738611684; cv=none; b=PIsh56smowce/789AiLebhqHdieghAzoIy2JOJe4C0EhFiWHrZ0UrVvBtJrbMOyHslmnz4U1Fg4V3z1MDyXaRz/GbBYF5sDx1GD2jnAo7Z+zFJrpAvYO8R3x1/DYk2yUpzAJ7+dBivdetnO2Nu8Sbw124qp0DhOkxuNBkaGA9/g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738611684; c=relaxed/simple; bh=bmzPo8fjt8JOhc/1bHrIPMZFP8Y2muFcWZy4Ysux2lU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tFtVe+7ShiZ0PvoLg1XT3nFyIX0evbcExxkiUru19aOAeCZWUlPlgt04zJvq8Ihb17lH9W6wcYPmR8bAS/oyFDXg9qXxkLkyCe3NbEgGkvHsk2vjqai1ekfj6JUWpwW3I6E/qSyJ0S9mQ9ctMRXhKAxCjxYJA70/F+rq1+Kovig= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Cu5lf5Lt; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Cu5lf5Lt" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ee5616e986so13253780a91.2 for ; Mon, 03 Feb 2025 11:41:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738611682; x=1739216482; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=VJXV2LQY26MqizxcjQXhtNRXXOMRYW+qm0imDYwsMvw=; b=Cu5lf5LtLxTxa/GXdUK7k6HYS89RYuoTJsSXOUkd1YWJ1CRns2Dv24VD+GQC1WCdx9 SU/Dmie28f+2sAhXWZhXxNfhQTba8j3FGJbf96LsPhRKbGH1iDICxGvhcBBartou1FN8 n2+8NNevuM+X2ijWyYLMKT+VJhaZnATrPzkphxXMZ2w0uALjmDj2qwuW6z/y3RlUyh98 4JYUuARYARwHpGq6q7gwg5k9ZGHdkII6Le8jJGPxvB3gE6tNxzeFt16qTZ2vqQV76zP5 37HyhsOyz4Tz+uanh+iSJRpX4vlqOAo/atRHozWIyUBynEKAWn880gOZt0ovSlavlBWW llIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738611682; x=1739216482; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=VJXV2LQY26MqizxcjQXhtNRXXOMRYW+qm0imDYwsMvw=; b=ehPklf0mzOYfIbrTQ+G+iY+QN74ACeFQCEZVnSuLuLyFinezY6hOcKDSobkaHPE6Zq 63FV1fY8dWteUb/+UBnY2cpNil9ZmMSStihFnm2agv5FiioE4I4WU7tcRt38CBxgLxd3 2/7ycXuGM9sLx3ZwPuqEZ4MrlshaRDpo5N8eRbUUFGv2udkThEqd/XfTB373ERjJgJXi 1EDS44Q167JKw4uZ4l4S8OB264iuldrllgdiZaixS5bZ8LpltlZITFbmTXD7lXFSu9D6 NteWcnyRSWs8EZqrWjV+z7KWvRYDzEyjyVplPqIARky8JlIP8KMboYxIq+Fhkg4A65+D KbNA== X-Forwarded-Encrypted: i=1; AJvYcCX/GpU+C7vnW0d4x8k3vec6sQVrCeXRRCaMiTHv24HoHmGqfJaGpDWYe3ge+eizuwjF7dia1JdeT0M=@vger.kernel.org X-Gm-Message-State: AOJu0YxBROJMvFycAvfm9KiWLnFId6l6a6gfFV+M6jTdKoJ8ZGNz+EOc KhcKl4ilFGs6v2a04HW+Z0GHUG9x5IoyQdyWJsReikN6yLcM9Xt8Hhda/2cuWzeLr+jHCKW9uDY AGA== X-Google-Smtp-Source: AGHT+IGcEY16oN3aex7QCo8rxntT+v0xmLo7K0x8S2drWCxpjUutztQRaAT/c6jmhnfhVhYFkmYgpyMZlK8= X-Received: from pjtu5.prod.google.com ([2002:a17:90a:c885:b0:2ee:3128:390f]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4b88:b0:2f8:34df:564e with SMTP id 98e67ed59e1d1-2f83abf3506mr38263640a91.14.1738611682399; Mon, 03 Feb 2025 11:41:22 -0800 (PST) Date: Mon, 3 Feb 2025 11:41:20 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls From: Sean Christopherson To: Doug Covelli Cc: Paolo Bonzini , Zack Rusin , kvm , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "the arch/x86 maintainers" , "H. Peter Anvin" , Shuah Khan , Namhyung Kim , Arnaldo Carvalho de Melo , Isaku Yamahata , Joel Stanley , Linux Doc Mailing List , linux-kernel@vger.kernel.org, linux-kselftest Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Mon, Feb 03, 2025, Doug Covelli wrote: > On Mon, Feb 3, 2025 at 1:22=E2=80=AFPM Paolo Bonzini wrote: > > > > On Mon, Feb 3, 2025 at 5:35=E2=80=AFPM Doug Covelli wrote: > > > OK. It seems like fully embracing the in-kernel APIC is the way to g= o > > > especially considering it really simplifies using KVM's support for n= ested > > > virtualization. Speaking of nested virtualization we have been worki= ng on > > > adding support for that and would like to propose a couple of changes= : > > > > > > - Add an option for L0 to handle backdoor accesses from CPL3 code run= ning in L2. > > > On a #GP nested_vmx_l0_wants_exit can check if this option is enabled= and KVM > > > can handle the #GP like it would if it had been from L1 (exit to user= level iff > > > it is a backdoor access otherwwise deliver the fault to L2). When co= mbined with > > > enable_vmware_backdoor this will allow L0 to optionally handle backdo= or accesses > > > from CPL3 code running in L2. This is needed for cases such as runni= ng VMware > > > tools in a Windows VM with VBS enabled. For other cases such as runn= ing tools > > > in a Windows VM in an ESX VM we still want L1 to handle the backdoor = accesses > > > from L2. > > > > I think this makes sense and could be an argument to KVM_ENABLE_CAP. > > > > > - Extend KVM_EXIT_MEMORY_FAULT for permission faults (e.g the guest a= ttempting > > > to write to a page that has been protected by userlevel calling mprot= ect). This > > > is useful for cases where we want synchronous detection of guest writ= es such as > > > lazy snapshots (dirty page tracking is no good for this case). Curre= ntly > > > permission faults result in KVM_RUN returning EFAULT which we handle = by > > > interpreting the instruction as we do not know the guest physical add= ress > > > associated with the fault. > > > > Yes, this makes sense too, though you might want to look into > > userfaultfd as well. > > > > We had something planned using attributes, but I don't see any issue > > extending it to EFAULT. Maybe it would have to be yet another > > KVM_ENABLE_CAP; considering that it would break your existing code, > > there might be someone else in the wild doing it. >=20 > It looks like KVM_EXIT_MEMORY_FAULT was implemented in such a way that it > won't break existing code: >=20 > Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in > that it accompanies a return code of =E2=80=98-1=E2=80=99, not =E2=80=980= =E2=80=99! errno will always > be set to EFAULT or EHWPOISON when KVM exits with > KVM_EXIT_MEMORY_FAULT, userspace should assume kvm_run.exit_reason is > stale/undefined for all other error numbers. >=20 > That being said we could certainly make this opt-in if that is preferable= . -EFAULT isn't the problem, KVM not being able to return useful information = in all situations is the issue. Specifically, "guest" accesses that are emula= ted by KVM are problematic, because the -EFAULT from e.g. __kvm_write_guest_pag= e() is disconnected from the code that actually kicks out to userspace. In tha= t case, userspace will get KVM_EXIT_MMIO, not -EFAULT. There are more proble= ms beyond KVM_EXIT_MMIO vs. -EFAULT, e.g. instructions that perform multiple m= emory accesses, "failures" that are squashed and never propagated to userspace (P= V features tend to do this), page splits, etc. In general, I don't expect most KVM access to guest memory to Just Work, as= I doubt KVM will behave as you want. We spent a lot of time trying to sort out a viable approach in the context = of the USERFAULT_ON_MISSING series[1], and ultimately gave up (ignoring that we po= stponed the entire series)[2], because we decided that fully solving KVM accesses w= ould require an absurd amount of effort and churn, and wasn't at all necessary f= or the userfault use case. What exactly needs to happen on "synchronous detection of guest writes"? O= ne idea (which may be horribly flawed as I have put *very* little thought into= it) would be to implement a module (or KVM extension) that utilizes KVM's "exte= rnal" write-tracking APIs to get the synchronous notifications (see arch/x86/include/asm/kvm_page_track.h). [1] https://lore.kernel.org/all/ZIn6VQSebTRN1jtX@google.com [2] https://lore.kernel.org/all/ZR88w9W62qsZDro-@google.com