From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74A211D537 for ; Thu, 9 Nov 2023 23:54:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dKpxV0Y2" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAE2444A4 for ; Thu, 9 Nov 2023 15:54:53 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-da3dd6a72a7so1812619276.0 for ; Thu, 09 Nov 2023 15:54:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699574093; x=1700178893; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=X4AzsPDPuaaKoRN4/f+ytz5TKai7RNDm9+3R88GkMKg=; b=dKpxV0Y28OjwsW6ncMqQG/B2c7YfRppgv7CIikDTvg8HhOWUdOwBFoLylnssRJNuhz LgHMRKXB2R7oKnRH2CLzuslZll3vzHYN1yZINdzJk3DHcG7+EC/JWKIRR4esv5TsstoH lIojt1AcBRRFExR/EFPLm8ecbAj12AFBG0iGXMaG3RSFCQgHmrlvy5/DU2rEyrQJjsnI q2evl5ldNxfOaC668Xu/DjjPlK0rgp2EWuEYnbTtVmq802oeVjFWKwedCKoxFGuSn7Co v31TXttt8BCBcBoU3nCvrCGwsBy5x/9LSXP9OUbk+o8iq41NKRITJ4Q51JFuIPtO+cEK MN6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699574093; x=1700178893; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=X4AzsPDPuaaKoRN4/f+ytz5TKai7RNDm9+3R88GkMKg=; b=SKpzfLV0dMoqy+pP8UqHvXuJ6CT3K7qzijJ8sdQS2/OGmvQAioAki6ZWeBge4kp0AZ toafwEDLcRjY7yk4e+s1iiNOMP/jrc6Ip1bQseKXfafQjMalK6FTukkmQknfPpxqaOMB wwphYCBArGjcgT6+ut6xLUT5/g3UU6Sit2EPWql+p+LQih1lxLbs+XJEg/jOYPl+wbds xRFUFqENn/w9QkUuhrXwWYGnYkuezpa9TwCG+j2pqxZNfIy99hesmBPGtfjIZ1kAFwfX xSaWeeELuK39aCgedggezwlRzmty80z+BYv6GyGRJEbdKQjKkXc4IgkTH2ugoeHzx92F NB9g== X-Gm-Message-State: AOJu0Yzhk5604421ysLa1zlBFVthGCV5IjDSRF7hul0Q9Fsz4nfQniVA ipXhuXd7ei22ERnp1FBFwhNsiGrgph4= X-Google-Smtp-Source: AGHT+IHCWX1lH7qnskX3mMkWilwA8HTAv4r5zrdS+u3hmw1OCBEUs/zOsH0QdoA+EfxsHsoW/KrnsZDxulQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:285:0:b0:dae:baac:5606 with SMTP id 127-20020a250285000000b00daebaac5606mr167493ybc.6.1699574093131; Thu, 09 Nov 2023 15:54:53 -0800 (PST) Date: Thu, 9 Nov 2023 15:54:51 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: RFC: A KVM-specific alternative to UserfaultFD From: Sean Christopherson To: David Matlack Cc: Peter Xu , Paolo Bonzini , kvm list , James Houghton , Oliver Upton , Axel Rasmussen , Mike Kravetz , Andrea Arcangeli Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Thu, Nov 09, 2023, David Matlack wrote: > On Thu, Nov 9, 2023 at 10:33=E2=80=AFAM David Matlack wrote: > > On Thu, Nov 9, 2023 at 9:58=E2=80=AFAM Sean Christopherson wrote: > > > For both cases, KVM will need choke points on all accesses to guest m= emory. Once > > > the choke points exist and we have signed up to maintain them, the ex= tra burden of > > > gracefully handling "missing" memory versus frozen memory should be r= elatively > > > small, e.g. it'll mainly be the notify-and-wait uAPI. > > > > To be honest, the choke points are a relatively small part of any > > KVM-based demand paging scheme. We still need (a)-(e) from my original > > email. >=20 > Another small thing here: I think we can find clean choke point(s) > that fit both freezing and demand paging (aka "missing" pages), but > there is a difference to keep in mind. To freeze guest memory KVM only > needs to return an error at the choke point(s). Whereas handling > "missing" pages may require blocking, which adds constraints on where > the choke point(s) can be placed. Rats, I didn't think about not being able to block. Luckily, that's *almos= t* a non-issue as user accesses already might_sleep(). At a glance, it's only x= 86's shadow paging that uses kvm_vcpu_read_guest_atomic(), everything else eithe= r can sleep or uses a gfn_to_pfn_cache or kvm_host_map cache. Aha! And all of x= 86's usage can fail gracefully (for some definitions of gracefully), i.e. will e= ither result in the access being retried after dropping mmu_lock or will cause KV= M to zap a SPTE instead of doing something more optimal.