From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D58A6C27C4F for ; Fri, 21 Jun 2024 08:24:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53F506B05BB; Fri, 21 Jun 2024 04:24:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EF946B05BC; Fri, 21 Jun 2024 04:24:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38FBF6B05BD; Fri, 21 Jun 2024 04:24:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 122606B05BB for ; Fri, 21 Jun 2024 04:24:21 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 78377C0B16 for ; Fri, 21 Jun 2024 08:24:20 +0000 (UTC) X-FDA: 82254208680.12.9F6B007 Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) by imf02.hostedemail.com (Postfix) with ESMTP id AAEF380008 for ; Fri, 21 Jun 2024 08:24:18 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HsOJvH7T; spf=pass (imf02.hostedemail.com: domain of tabba@google.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718958253; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hSL4jxfV9+DvzrB5xjyOhPkA4JDFq4ANvkv63lZh6Ho=; b=QXBpe2AUX93NW22ftrBsMXI1vLtY2z6vGhKSS6QVxK6HeVHqrI9Y9b0YYHEfNv/yZ8b47G vPpu/5NXE+iDzNipUZTpmR7g3ldizNxWVVZe7QlLwnMaqLajqDMMBSw9+uFzuOly3V7t3J O1wpKXchEL3r/LiCcJXFJpfDwpo7aW0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HsOJvH7T; spf=pass (imf02.hostedemail.com: domain of tabba@google.com designates 209.85.222.54 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718958254; a=rsa-sha256; cv=none; b=U++6sMqK9APwTPWoI08T/zm457U7fjV2APE0vG+wnaoxgwCMHcrujlHvVKvk9I+PmpBpkh FqX7x9+qxx2eu0d8IGYdBFydAclVIN3t7o1sypjEunbBvpoPgmngRBulJ+KJh5U7wcuh45 Wk5ssSx+/tHEg9mAGcAagnn894G7U8U= Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-80d6c63af28so481755241.0 for ; Fri, 21 Jun 2024 01:24:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718958258; x=1719563058; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hSL4jxfV9+DvzrB5xjyOhPkA4JDFq4ANvkv63lZh6Ho=; b=HsOJvH7TwcffY3DtXA2EdZJ8zXcYeA4zSi0f/aNRbHNXD5VSL0HGR6V2zivVUse1ih uOXK/9ALmNNmRflfDPMuAKcrbpzcVRVRo33Ee2CrlmrqKI27YCox8OYT/bGeHnMZt9k9 WDeXttlhPK9PJ2jw6t/QHphfwHUELSJF5xzvipepF7aXUoIonqE4U4VPrJ24SH8gOEyX 6VpAt2kjTvqS3dDwcM0+m30M9JxVr0ioBGkZJ3X/4ana2E+daSwROkPaZo0ustdBIOLT lzahUQ778sL8TXEFyfnZi0MxrjEsqF7qTWfJ+iE+fQ7TrWFmEu4aw4tMCI7EmE95O7hL 5hvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718958258; x=1719563058; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hSL4jxfV9+DvzrB5xjyOhPkA4JDFq4ANvkv63lZh6Ho=; b=NMOMmug/tNH2ih3/85V0RLUgygU4/dkiqTEWqVcQQeWa6id+z6c3KfANiPkb9Kpdh8 o9QVp3k20pDmPPSeA5j6g9aJc3N22BGnwBB8v89WvXpj9Xzx+1OKqERLPQOmD/ymMtfS Sk4eaEHgNfu+d1u5Q4Imu2cx74qBPfZ+99zmQF8Qc6usgQ8UQ2WGR1v1VBgrXVI8P2Sw gJcyN5wU0uERSFlmVMSkFPaMwYdMiH+G+5zWQ8dHcZs1Tpq1OxVyEly+vL4MPCOn4hxV Qn1a+O4cr+Ni2bUq3eRCApZJOryHfFigJls76pfiAOCexnNYSLKiI20vjAY+PpUiwk84 CEyA== X-Forwarded-Encrypted: i=1; AJvYcCUnKayQeyMWHEHShzZ84/Qqgv/Nni45k+fj9vDZBrdM+HVGegl77ws87Ohy4jYjhzTJN8jzm3UkWuOqCiOx+6j5cFk= X-Gm-Message-State: AOJu0Yw7J9dkrrqB4+oATiev6AXHmVrz1NxjgmyNxNujSGAZ2zo+vtCj IK9RhQ879hXUwcTG84ItBHLjQeaQxGlWJ5S15cpL3PbTRdDXV2CUMx+xNujU4P7azZEURQApEDt mP43ZrWzZKQFrJ2vu0UNJ3nd9HTU1yV8myPlQ X-Google-Smtp-Source: AGHT+IFOm3+J7HmNxfiqvxOkkiwF5vyaMcHsOCQZWMdwd3gCNyLm2HD7cdpJGLriJIJX24RcWbLSAdpJ42Ukykvu0xA= X-Received: by 2002:a05:6102:743:b0:48f:205e:9b8 with SMTP id ada2fe7eead31-48f205e0aafmr5713897137.34.1718958257567; Fri, 21 Jun 2024 01:24:17 -0700 (PDT) MIME-Version: 1.0 References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> <7fb8cc2c-916a-43e1-9edf-23ed35e42f51@nvidia.com> <14bd145a-039f-4fb9-8598-384d6a051737@redhat.com> <20240619115135.GE2494510@nvidia.com> In-Reply-To: From: Fuad Tabba Date: Fri, 21 Jun 2024 09:23:41 +0100 Message-ID: Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning To: Sean Christopherson Cc: Jason Gunthorpe , David Hildenbrand , John Hubbard , Elliot Berman , Andrew Morton , Shuah Khan , Matthew Wilcox , maz@kernel.org, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: AAEF380008 X-Stat-Signature: r3y68a1pejwkx8xtdry4iwaid85sijjr X-HE-Tag: 1718958258-94905 X-HE-Meta: U2FsdGVkX19OhTxVBWyahwAZA4mEa+EhvxXG5/gHQNDlR+MgKpX8ET3JB081p1ft91sgdULJ0ou03OQn78QvZEJU6Qe/q7lYVQ3mZts4cdtBLLpYE4JFqJQD3+Zt/G4Ao7KPgblUqxISMfpcChgDgkJyslzSRrSDaWWIM9SGZm6AY2ywCZ7uG2L45DoYxr4tiHCQG89tRgyILUABhCnoxlZsTwwSuLBXM4HQAPRlaRGgQ9NK7gji0L4O4la3czAJJF6tXMI0N1TKgz+M2wKFp1+OA9HMKEbACRYSXXWmYUUCDGAfHRvXALwgB2UwFo/+SsdQjNdtAYR9pT08watTu8B5YW8pFcXYk1FHqkJf05ZoCHmikBSbmWDjBApYT6pgGXkFer/zUDJ5gbZfCpCwpJeDxWqnihqV6kN/mH2TX9T3Sn8rWz3aAFffIUiHSWUwU3efgTFqFbhsMFtSNAXByhI0b2SCsfQDqFI3weBKbEvBwllhQxh5O3QCr13JgCMpmWYxJSBZTouccieSH3/06w9G87+yntm5hI5Wslwdov1wIFudXFYPa5ZvZo5/FT5FBMAdiPxzR2KvTEwOCoQYkiW1iwVh4SHaAUILdC5Y0j4nEmeWk0/79j2ua4MASmfGo1WIZjCH/oxTnBw2u4IfMXD3Px3qXOZf0m4DhpapeRsQLr1kPsEpGgZYbel1OgNL13YWU2QuIsM/D3QDPIiC2uG23tl/YrkijG9YEzy846ZtJzfB/mbYdSyfAwAOfnb4SpyNDcXRrfrwZfTZIxapbATbOXbZNImAeE1i74kdhvVzJXtkVWd4QxxC8fGoz9JQ58KL6Oc3ku4KaaZJ4qh/1YnxWUqdfrUyfRI+1Wgwj0D90EZoIXBpgOHY4IIJGM8VktFiOxWuZbmskApsUtC0V9BZZPES4a1BY2zKPx5Eq/KP12W8kAbZ9I1ghGCjEdiZ1QvzOoQiQBhrSrgmiCf Q6Pui1iy E5kev6uxpgkOZleZPBtfb3zsYPrrwi7AZYlmysJ58aj28f0qF1w1nGAuE9F/Hrsyz9mGchXx+qEe3RPZOrFlMy9xGkhS4qBdt7E7/VQxz6ZgahbQBxhy2tcRyzPssIOhjjac1cduFDSj3+zSXBZ7pkPEqNWTBIF8DzWZc0qiO+upF3Pmgh81K7lR/Hc5ok+mJQ5tvdBWMjBlyA4zT68IkqHGsLJyVZr5udLlA0q1TMgLqU8wuvVH01mW6WTLlTd8lHRy2ZHsu4xLoFyIaO5yT4hqZhA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000318, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Sean, On Thu, Jun 20, 2024 at 4:37=E2=80=AFPM Sean Christopherson wrote: > > On Wed, Jun 19, 2024, Fuad Tabba wrote: > > Hi Jason, > > > > On Wed, Jun 19, 2024 at 12:51=E2=80=AFPM Jason Gunthorpe wrote: > > > > > > On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote: > > > > > > > To be honest, personally (speaking only for myself, not necessarily > > > > for Elliot and not for anyone else in the pKVM team), I still would > > > > prefer to use guest_memfd(). I think that having one solution for > > > > confidential computing that rules them all would be best. But we do > > > > need to be able to share memory in place, have a plan for supportin= g > > > > huge pages in the near future, and migration in the not-too-distant > > > > future. > > > > > > I think using a FD to control this special lifetime stuff is > > > dramatically better than trying to force the MM to do it with struct > > > page hacks. > > > > > > If you can't agree with the guest_memfd people on how to get there > > > then maybe you need a guest_memfd2 for this slightly different specia= l > > > stuff instead of intruding on the core mm so much. (though that would > > > be sad) > > > > > > We really need to be thinking more about containing these special > > > things and not just sprinkling them everywhere. > > > > I agree that we need to agree :) This discussion has been going on > > since before LPC last year, and the consensus from the guest_memfd() > > folks (if I understood it correctly) is that guest_memfd() is what it > > is: designed for a specific type of confidential computing, in the > > style of TDX and CCA perhaps, and that it cannot (or will not) perform > > the role of being a general solution for all confidential computing. > > That isn't remotely accurate. I have stated multiple times that I want g= uest_memfd > to be a vehicle for all VM types, i.e. not just CoCo VMs, and most defini= tely not > just TDX/SNP/CCA VMs. I think that there might have been a slight misunderstanding between us. I just thought that that's what you meant by: : And I'm saying say we should stand firm in what guest_memfd _won't_ support, e.g. : swap/reclaim and probably page migration should get a hard "no". https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com/ > What I am staunchly against is piling features onto guest_memfd that will= cause > it to eventually become virtually indistinguishable from any other file-b= ased > backing store. I.e. while I want to make guest_memfd usable for all VM *= types*, > making guest_memfd the preferred backing store for all *VMs* and use case= s is > very much a non-goal. > > From an earlier conversation[1]: > > : In other words, ditch the complexity for features that are well served= by existing > : general purpose solutions, so that guest_memfd can take on a bit of co= mplexity to > : serve use cases that are unique to KVM guests, without becoming an unm= aintainble > : mess due to cross-products. > > > > Also, since pin is already overloading the refcount, having the > > > > exclusive pin there helps in ensuring atomic accesses and avoiding > > > > races. > > > > > > Yeah, but every time someone does this and then links it to a uAPI it > > > becomes utterly baked in concrete for the MM forever. > > > > I agree. But if we can't modify guest_memfd() to fit our needs (pKVM, > > Gunyah), then we don't really have that many other options. > > What _are_ your needs? There are multiple unanswered questions from our = last > conversation[2]. And by "needs" I don't mean "what changes do you want t= o make > to guest_memfd?", I mean "what are the use cases, patterns, and scenarios= that > you want to support?". I think Quentin's reply in this thread outlines what it is pKVM would like to do, and why it's different from, e.g., TDX: https://lore.kernel.org/all/ZnUsmFFslBWZxGIq@google.com/ To summarize, our requirements are the same as other CC implementations, except that we don't want to pay a penalty for operations that pKVM (and Gunyah) can do more efficiently than encryption-based CC, e.g., in-place conversion of private -> shared. Apart from that, we are happy to use an interface that can support our needs, or at least that we can extend in the (near) future to do that. Whether it's guest_memfd() or something else. > : What's "hypervisor-assisted page migration"? More specifically, what'= s the > : mechanism that drives it? I believe what Will specifically meant by this is that, we can add hypervisor support for migration in pKVM for the stage 2 page tables. We don't have a detailed implementation for this yet, of course, since there's no point yet until we know whether we're going with guest_memfd(), or another alternative. > : Do you happen to have a list of exactly what you mean by "normal mm st= uff"? I > : am not at all opposed to supporting .mmap(), because long term I also = want to > : use guest_memfd for non-CoCo VMs. But I want to be very conservative = with respect > : to what is allowed for guest_memfd. E.g. host userspace can map gues= t_memfd, > : and do operations that are directly related to its mapping, but that's= about it. > > That distinction matters, because as I have stated in that thread, I am n= ot > opposed to page migration itself: > > : I am not opposed to page migration itself, what I am opposed to is add= ing deep > : integration with core MM to do some of the fancy/complex things that l= ead to page > : migration. So it's not a "hard no"? :) > I am generally aware of the core pKVM use cases, but I AFAIK I haven't se= en a > complete picture of everything you want to do, and _why_. > E.g. if one of your requirements is that guest memory is managed by core-= mm the > same as all other memory in the system, then yeah, guest_memfd isn't for = you. > Integrating guest_memfd deeply into core-mm simply isn't realistic, at le= ast not > without *massive* changes to core-mm, as the whole point of guest_memfd i= s that > it is guest-first memory, i.e. it is NOT memory that is managed by core-m= m (primary > MMU) and optionally mapped into KVM (secondary MMU). It's not a requirement that guest memory is managed by the core-mm. But, like we mentioned, support for in-place conversion from shared->private, huge pages, and eventually migration are. > Again from that thread, one of most important aspects guest_memfd is that= VMAs > are not required. Stating the obvious, lack of VMAs makes it really hard= to drive > swap, reclaim, migration, etc. from code that fundamentally operates on V= MAs. > > : More broadly, no VMAs are required. The lack of stage-1 page tables a= re nice to > : have; the lack of VMAs means that guest_memfd isn't playing second fid= dle, e.g. > : it's not subject to VMA protections, isn't restricted to host mapping = size, etc. > > [1] https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com > [2] https://lore.kernel.org/all/Zg3xF7dTtx6hbmZj@google.com I wonder if it might be more productive to also discuss this in one of the PUCKs, ahead of LPC, in addition to trying to go over this in LPC. Cheers, /fuad