From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0DE9C27C79 for ; Thu, 20 Jun 2024 15:37:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 349B36B0289; Thu, 20 Jun 2024 11:37:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 282416B0372; Thu, 20 Jun 2024 11:37:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0868E6B0374; Thu, 20 Jun 2024 11:37:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DDD266B0372 for ; Thu, 20 Jun 2024 11:37:08 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0FA5A1C0D96 for ; Thu, 20 Jun 2024 15:37:08 +0000 (UTC) X-FDA: 82251670536.27.7C0D846 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf28.hostedemail.com (Postfix) with ESMTP id 3F34BC0026 for ; Thu, 20 Jun 2024 15:37:06 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hmU0q2e6; spf=pass (imf28.hostedemail.com: domain of 3oEx0ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3oEx0ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718897817; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YcvtK+onJiZBymHPHugY4FRoclgetOnfb6qMYuc/RYA=; b=v42tOyeSUx0cS7uGCEgjjY9yDo9ADHSlKklrPKEBeqD8SJHZ6UNsNQUJB3tfcel10SwTgL GstpSlTkruiEOKmtuDiUxTzBn7F6f3RR3z9viYWpQY+kLXJq0nauNAJJ0MajqgtAqTEWy9 w6BT1JHTlg0pKXuptzIzEFVQmR0e/cE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718897817; a=rsa-sha256; cv=none; b=nROw4J32l2cGM8D9gJb7OFOpWTIbbzG/AVE0GG5TeW2cE9V0PK2IrnbO5ziPqrufPKb6eH ysx4UEXtmJmxSYsDiTz/OMGW0IASmj6DUgJks4h/rkaVb96pYlGBQIa1kkxjr5hf5KnJKW 26XPWRTfeUJOkV43v5KkZ8ZvF4OWM/M= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=hmU0q2e6; spf=pass (imf28.hostedemail.com: domain of 3oEx0ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3oEx0ZgYKCEUzlhuqjnvvnsl.jvtspu14-ttr2hjr.vyn@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-6ea972a3547so1245604a12.1 for ; Thu, 20 Jun 2024 08:37:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718897825; x=1719502625; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=YcvtK+onJiZBymHPHugY4FRoclgetOnfb6qMYuc/RYA=; b=hmU0q2e6qvPpUochqZcEY2QQhT/CR+wazmFgsI9+mY92opjj8f4bSo/hHVXGW88gBm HzmGjq3GXj6AMQBvT8+KGM7COb+59I8pzjfBDCYO91jiloG9DMGHOj7fawBzRgQmF1rv t0kXB7j6XLGiDr+S8RSTVOqp9+8xhIG8lWHLqRAtD01M8prFTi2JyVVtaE9UgKbvp69c k0sUOdnufltpwInh55wMkslF/RaDXKZwEq8l33RQxUenicOq1GGKU/Y0Z33zehQlsQgW W7wXlLJLH6yuQrokpDTklb66n4bB0yPqcFdZ3sQoYctGxSleyG/IiG3r3TcGfF1flqd2 UXPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718897825; x=1719502625; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=YcvtK+onJiZBymHPHugY4FRoclgetOnfb6qMYuc/RYA=; b=gtH/vio4Ss9foAbrlFq4xEivjiptMP+6Txn3/0GtKF8gOJSePtz2PhPQKFXB2FBQ1Q jlDK6MUMaFaqg4Lh6XByd3i2/mwlAhC25l3+H9Cs1iQiJdqcQMnwnyNwnIklqqltxQkR qH4ceR94d5EYF5vY9DYSnphlmCpjdwdRyroElIgPAVdrnD0RtlFW0k56IBDDS+ws5Ivw z8AIwGzf0sr4psSo9PCB8aktdOrsBzKHQDzZTdw9UTUu+nKv3Opwwsswt3NjVO5RCgbZ uZaC2oOmmnuIwMszGn3huHzKn2mYVBWp07d89LyRY+G7YO0OjPMPM+RHc4uxATsC6uxo 8whQ== X-Forwarded-Encrypted: i=1; AJvYcCWSJOFZKZ4EVfAd/Q05uIsuQ29ZDmZCqs7ZzrMlvu/TWQCC/w3gpQ1rWxro1c1N4FWMMN4syCrrLCS04+F0pQbTkBU= X-Gm-Message-State: AOJu0Yx+qWK1T4kXsf+tSLgVvv1GQbKYhHuGXOU5VMKlOnYIMaSgSqoy uuToEbXkP5vFhe79lwFqVXFhox+bBeUDV9kcrFf2lg1ThevgIK1tzuhP3/P7VjdiC67X0+xGt/B XaA== X-Google-Smtp-Source: AGHT+IHXhor/kuVW03oouyq1tDyLAY6GkCcCTg8hiO3HTvemYmIW2WnCN2V89ElanisIBodVVNdjjBCJ40M= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:3703:b0:2c4:aafe:75ec with SMTP id 98e67ed59e1d1-2c7b4e45665mr16589a91.0.1718897824912; Thu, 20 Jun 2024 08:37:04 -0700 (PDT) Date: Thu, 20 Jun 2024 08:37:03 -0700 In-Reply-To: Mime-Version: 1.0 References: <20240618-exclusive-gup-v1-0-30472a19c5d1@quicinc.com> <7fb8cc2c-916a-43e1-9edf-23ed35e42f51@nvidia.com> <14bd145a-039f-4fb9-8598-384d6a051737@redhat.com> <20240619115135.GE2494510@nvidia.com> Message-ID: Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning From: Sean Christopherson To: Fuad Tabba Cc: Jason Gunthorpe , David Hildenbrand , John Hubbard , Elliot Berman , Andrew Morton , Shuah Khan , Matthew Wilcox , maz@kernel.org, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3F34BC0026 X-Stat-Signature: jr4jtfjm9gtr79d6jo7enpkamhao6ni3 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1718897826-663482 X-HE-Meta: U2FsdGVkX18LpYM3tmK8NJ4xLC/E4gwBykFrf54PeoHGBXnNxqqweZngcIj/8gPrgVJ1k1NjZVr13ntpK1XYzHxdmV6UJdxdJsys8jwu2GTv5EL+DUOWA2Cbn/TOtxPaG4shmhWq4R6F60uu3vWfWKFPamikW9lwuPI8hZ8BFsFANKSvGt+4Crv6nSBcmDOTFoLwKdMdwVmvhvKA/FBFu9V7sqsXH0agnCqe5+gYvFgb2eg/QvyaQnymzg6VoPOJdG8sxtuJlysxApFVeIBPNStxcfQOciquuJNisEKRadEpv3NVwmeFic1x5SZPmBXZwDDU2dFd6Plbdanv/hYaR34h82IooQM6vQ5U0dicfa6qjn2QeZpjS20Hs9x8GlOko8g88GswLVyBKJRGFsvJUWe3Xu2gexQhhElugpcViuIqHodqLyVPq/r1gaziAqe3XDvYJqo6w9S3HLQWnPZ5rLnEdy93kCyI+7D8dyy5yKWRCvLe/Q7zbKi2EFjtmdDxcH9Jnwucqrab2oR2lsr/dIJJWvuxXFB5e1eqDF3nq4WQE/o/qRLAPmW3FsNpQb+p9E2yTpCv9Bgtdao8fjlMVPM/sgg4B7KCg378tJQfIsZfEaEi9G9ycA7290lZJbuKhI0/t4OUh46XZIbbJgA0qsJAyIsR4Zui7ijPBZx9uy/fSebimkcftVfvjDsnQaKUZxiASe5dLRyYscNjJpUWdUuUhF3DziLQ2037JMNrNrSWsPu3NmptklJsA80W+0jRRlYoovM2WSpGQh8Y4JK/omtiyLbJoAxT2QtHgGwC84ZzaePKPk5vaNbhpjlzYev6+9wAMu0noeSOa9tBpMCP8nmFprbqpH+HzNGEuLNHVdf4czBRYzcRySnqAe8DQsl6sxhCR2Coe6mSZBM0+38j/+Z/w+z9td7hIssDMwuSePpt+UDiXZo6e5PX3VLokF/PcYcw4CxVztpE76enGaW 0TJ/GGRh wLTrdA24MR8OhqXG59mduPDClCya85m13dFKMGB9Ih8eskfYjApU4pxleiH14l6FEUQ6B4JY/UTFHpeT6hFgKcbDJtgbrSYDVMJWD5MbRmD4ftrHJLhKXqKamK0gOyZUWUf3mhfyPnVLXg7y2F4KxwBrXwtMq2YwRU6/PF1qTXRbQoFUFqx8tsAu4UjbjKXUleTlzlJ9LexOArJFhkGrbLfbhYX4EzGs3G91YWE6G//izAi7RAGBuMQmbvYT0MwWh4DnRFHHTMjTaYbDgkvnGYWc/GHOX8pOIGpP9vfcRrWOVXp0oHOIrdMp3llUZDwZSbQzxlYmsZfnjNCKUXTdmU76veYjgqHgJa5JslaetZC8Uq6xr3JoUN3D8tAv0Bzeo/ESqtNvX/gw+rQQbd2wAGNeOpOiGdFdghkKJ2mlcKJlJm6GW25GKJh7lgeyxe1ZpC5GLrHzdiIKS0Ljnzedayj5S8oFjeTp77gCF6YWcxo4stU6zTRwhCcacbBaBEV1J8OuWrZWP+TwozvZxN9Wu2INJ4Zu8jRHBMXYcwpau1JDkZ1ofVIIuCCsvYk6QaEf8cMfz+90ce1iqRXrpwA9ITWsjKHyl32KO70eNJOLLcQT3fjZq7NGE7XYl1R6grmKjhTsa X-Bogosity: Ham, tests=bogofilter, spamicity=0.002238, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 19, 2024, Fuad Tabba wrote: > Hi Jason, >=20 > On Wed, Jun 19, 2024 at 12:51=E2=80=AFPM Jason Gunthorpe = wrote: > > > > On Wed, Jun 19, 2024 at 10:11:35AM +0100, Fuad Tabba wrote: > > > > > To be honest, personally (speaking only for myself, not necessarily > > > for Elliot and not for anyone else in the pKVM team), I still would > > > prefer to use guest_memfd(). I think that having one solution for > > > confidential computing that rules them all would be best. But we do > > > need to be able to share memory in place, have a plan for supporting > > > huge pages in the near future, and migration in the not-too-distant > > > future. > > > > I think using a FD to control this special lifetime stuff is > > dramatically better than trying to force the MM to do it with struct > > page hacks. > > > > If you can't agree with the guest_memfd people on how to get there > > then maybe you need a guest_memfd2 for this slightly different special > > stuff instead of intruding on the core mm so much. (though that would > > be sad) > > > > We really need to be thinking more about containing these special > > things and not just sprinkling them everywhere. >=20 > I agree that we need to agree :) This discussion has been going on > since before LPC last year, and the consensus from the guest_memfd() > folks (if I understood it correctly) is that guest_memfd() is what it > is: designed for a specific type of confidential computing, in the > style of TDX and CCA perhaps, and that it cannot (or will not) perform > the role of being a general solution for all confidential computing. That isn't remotely accurate. I have stated multiple times that I want gue= st_memfd to be a vehicle for all VM types, i.e. not just CoCo VMs, and most definite= ly not just TDX/SNP/CCA VMs. What I am staunchly against is piling features onto guest_memfd that will c= ause it to eventually become virtually indistinguishable from any other file-bas= ed backing store. I.e. while I want to make guest_memfd usable for all VM *ty= pes*, making guest_memfd the preferred backing store for all *VMs* and use cases = is very much a non-goal. >From an earlier conversation[1]: : In other words, ditch the complexity for features that are well served b= y existing : general purpose solutions, so that guest_memfd can take on a bit of comp= lexity to : serve use cases that are unique to KVM guests, without becoming an unmai= ntainble : mess due to cross-products. > > > Also, since pin is already overloading the refcount, having the > > > exclusive pin there helps in ensuring atomic accesses and avoiding > > > races. > > > > Yeah, but every time someone does this and then links it to a uAPI it > > becomes utterly baked in concrete for the MM forever. >=20 > I agree. But if we can't modify guest_memfd() to fit our needs (pKVM, > Gunyah), then we don't really have that many other options. What _are_ your needs? There are multiple unanswered questions from our la= st conversation[2]. And by "needs" I don't mean "what changes do you want to = make to guest_memfd?", I mean "what are the use cases, patterns, and scenarios t= hat you want to support?". : What's "hypervisor-assisted page migration"? More specifically, what's = the : mechanism that drives it? : Do you happen to have a list of exactly what you mean by "normal mm stuf= f"? I : am not at all opposed to supporting .mmap(), because long term I also wa= nt to : use guest_memfd for non-CoCo VMs. But I want to be very conservative wi= th respect : to what is allowed for guest_memfd. E.g. host userspace can map guest_= memfd, : and do operations that are directly related to its mapping, but that's a= bout it. That distinction matters, because as I have stated in that thread, I am not opposed to page migration itself: : I am not opposed to page migration itself, what I am opposed to is addin= g deep : integration with core MM to do some of the fancy/complex things that lea= d to page : migration. I am generally aware of the core pKVM use cases, but I AFAIK I haven't seen= a complete picture of everything you want to do, and _why_. E.g. if one of your requirements is that guest memory is managed by core-mm= the same as all other memory in the system, then yeah, guest_memfd isn't for yo= u. Integrating guest_memfd deeply into core-mm simply isn't realistic, at leas= t not without *massive* changes to core-mm, as the whole point of guest_memfd is = that it is guest-first memory, i.e. it is NOT memory that is managed by core-mm = (primary MMU) and optionally mapped into KVM (secondary MMU). Again from that thread, one of most important aspects guest_memfd is that V= MAs are not required. Stating the obvious, lack of VMAs makes it really hard t= o drive swap, reclaim, migration, etc. from code that fundamentally operates on VMA= s. : More broadly, no VMAs are required. The lack of stage-1 page tables are= nice to : have; the lack of VMAs means that guest_memfd isn't playing second fiddl= e, e.g. : it's not subject to VMA protections, isn't restricted to host mapping si= ze, etc. [1] https://lore.kernel.org/all/Zfmpby6i3PfBEcCV@google.com [2] https://lore.kernel.org/all/Zg3xF7dTtx6hbmZj@google.com