From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E780FC2BA18 for ; Thu, 20 Jun 2024 22:32:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B6496B00D4; Thu, 20 Jun 2024 18:32:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 566D16B00D6; Thu, 20 Jun 2024 18:32:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4073F8D00DB; Thu, 20 Jun 2024 18:32:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1E3EB6B00D4 for ; Thu, 20 Jun 2024 18:32:32 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B49EF1202F5 for ; Thu, 20 Jun 2024 22:32:31 +0000 (UTC) X-FDA: 82252717302.10.833C04D Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf21.hostedemail.com (Postfix) with ESMTP id ED6351C000B for ; Thu, 20 Jun 2024 22:32:29 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wqXLguDi; spf=pass (imf21.hostedemail.com: domain of 3_K10ZgYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3_K10ZgYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718922740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=auxotihlkK8h83W2bzIbhNSOdi4xEkdwqnFVqelIJic=; b=HMlFZLxMoZHrLvC4HlESJEg3+xZ1hwNe50WmcMx3FhnEkSistkrQ9y9wLjmJo4V6Kaql1R H5aU0pedlZAxcADGlscIskDureffzLQdkiYOSJ93YFko5/vlC25FS/+v1LgOFBfEKzI+Ce 9ZbhTPUb64wpHoGM5zH72SRbP/93svg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718922740; a=rsa-sha256; cv=none; b=KT1BqNxtc+jK1jPhQYiG/276RFbdQAC/MHiyghKm64bYHBV3Y49oS9Rcj74ZPhoVGAOntZ 7y8HYCEGJxcy9Wy+3wRVvy/Tf3ZaKFsfmvVQEzloDgbpmc5j/Mn+7vDRLcfsUyzMbvdKbo W0ZkHIxYY+D5bo9MxodnOTvUWIEJjqo= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=wqXLguDi; spf=pass (imf21.hostedemail.com: domain of 3_K10ZgYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3_K10ZgYKCGUVHDQMFJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-632b6ff93e4so26118457b3.1 for ; Thu, 20 Jun 2024 15:32:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718922749; x=1719527549; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=auxotihlkK8h83W2bzIbhNSOdi4xEkdwqnFVqelIJic=; b=wqXLguDinsWH28NJSpX6MGhqgUvE00SS1pfhlGoMLKP4tkajqMKtGUf5kysvJPSRof soEBFhbrE113YFdRljutYInGCXnlzdI54QGaEZHVEGdEL5+t67095WsulBKqL0WDjuQ9 W6JwQvbmDNOte8hTrhf+F/6dDZmfb0j4y506fzxe03V/jAV5WbVYVuzsxuOg9bDuuJv6 TECEkzyK1P4mXyfsD9r73uClpl5nufwTWxk3F5qm+xFrGG5P4P5epivtRdZ+oWYHI/hE oX6F8baFkDzRGP57pRXlXsrxzP0GoCpTy18Z7bHR74If6bsCGEjzChwz/Ak0mU8ME8IG JL5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718922749; x=1719527549; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=auxotihlkK8h83W2bzIbhNSOdi4xEkdwqnFVqelIJic=; b=lfPHX0V5xK1WYe41NvZ6wRCsxX0Dd53AztTEaVCtRPf5UewgdLEH1qMWDLzJ52bq05 oekiGiJLlBsJawTuE/S713PpTAJHiKMubNCHM/iZCqkNBT9xnjK98VlgQAw4xX40cGw4 1yAkveUZGOn3Og5pVpupJ69TWSVo4YyB/cOs9Vx6ILi6vZ5Oe5Jvt21KBEUYcddxC0EV MIf7wMeCXNYb4QZZZulnOE5IogJRU0/MbRkWyx6QgLEK0s8rJH6iqgzoz5Bkzcj7EWQo SHkL519+zF3vBxq+uQhxnsX20j0Olc0zV8hVvJN/5BwgAWmrvQ8hnHeLF4Szpo1HKtZV bvOw== X-Forwarded-Encrypted: i=1; AJvYcCXSjKXbBEp9jfzyVLSIay7eceWtLqyi6nBZvNCaf5elyQAdkonoVxNTmbCgd3nifI6VawKq3S4PsWUn+XyulbAI6yk= X-Gm-Message-State: AOJu0Yzx+HU0/D+IvTeFOM8Yg314uflA8FMpdI8rstiZSAHpuEVNVTCM 1LjSOq9zKXNSxts8P9+350QfOdDS0s/qJRBBaNpVJxRhyvawhRJmzAz+O0RiXTw3u2IlE2kDEEb T/A== X-Google-Smtp-Source: AGHT+IGHcNRLshBOMyztBLTs8rVKhKRXiJZQJYiLLVyOYcXhh1aFZDp6gOCGM/6iHRYpAGgEpAN5Al5CfjI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:6105:b0:62f:5d17:3628 with SMTP id 00721157ae682-63a8ae564c0mr15344897b3.0.1718922748971; Thu, 20 Jun 2024 15:32:28 -0700 (PDT) Date: Thu, 20 Jun 2024 15:32:27 -0700 In-Reply-To: <53d1e7c5-3e77-467b-be33-a618c3bb6cb3@redhat.com> Mime-Version: 1.0 References: <20240620135540.GG2494510@nvidia.com> <6d7b180a-9f80-43a4-a4cc-fd79a45d7571@redhat.com> <20240620142956.GI2494510@nvidia.com> <385a5692-ffc8-455e-b371-0449b828b637@redhat.com> <20240620163626.GK2494510@nvidia.com> <66a285fc-e54e-4247-8801-e7e17ad795a6@redhat.com> <53d1e7c5-3e77-467b-be33-a618c3bb6cb3@redhat.com> Message-ID: Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning From: Sean Christopherson To: David Hildenbrand Cc: Jason Gunthorpe , Fuad Tabba , Christoph Hellwig , John Hubbard , Elliot Berman , Andrew Morton , Shuah Khan , Matthew Wilcox , maz@kernel.org, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com Content-Type: text/plain; charset="us-ascii" X-Stat-Signature: inhsmqg3g8fdsc751qkqy5hxepff77oj X-Rspamd-Queue-Id: ED6351C000B X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1718922749-732843 X-HE-Meta: U2FsdGVkX19SsEHB+WcsQfpjGipUtcC4LcXWJrZNEcml0LlA6SH86wBdJClEOTdp78pENcba+7Ov7+EIm6faAIuJgoqgdEJZ5WQ+NY6xPxyB+OHx5N9O6YwhQ1pdzuZ8lWc9dmC5j13z3yFGC/F+AkOwBjaXhS1eHcT/sqGau1HlbyFXzlhT/+MdNw1hVZBabm8TbxF3wR5EZJTvm/WaaO0aU15QIhDEtuPosJlH/EYhDjcvc3TbqNWQtggzrE+ZDe3K/jWAZjsNTrkUl2dR8suGrQWQBdG/5xBixFowiI1uHMqZLNzFhuOs4NKJK7PGUaJ9PDtMRqkITr7YzCTxB24NblbfioV2SMjr3oWl/gJecag1k5qDHADZW9Tfiv+2t9Q1owtq/2uD0BsLgKGSb3eB4JWeLkNKBwm1HThaWzpzuTxTQ/XGWnZmRaQFZQ4kSZztsAYv3DCK9YxjzPKIWJzx1aQfWJyA2Mq0KN47B/hZnxpEtncrw229Poo66ehf8bfXxQFYMvazhxmGnzaaPlZQoLW0WE/yn6UAaqDEc9XSAFRQAfzxwRjFvWJ9MjTESj1SDxr5mlvSU1PLyVH2zssHMQFHwfETQFSLTxBLKYRtKN9qAlJ7emGQfdZbbYq2SFh/WNaqKDVsKpnLSODnIMvpymd6bPw4q0tzr8MViA6JW7V/bvOO0K5iFYn5ziyrljkphxzzCgqwuzjYjQHqJbcgjEr8ysc6B5m274/xJ4m0fnzvL2jGBytazbzkUAgE7Guy1KbcJ5xTvddcgHGi/xRQZ1XSk7M1hfdPLMTE9H8o37j4cVKMiOMx2YhwO7jIFBb3ZoKhNK+QvsT3EtzbxWbu6OPsJGgtuav8GCU6NR+0aJ5izOM5WsLWEC1z/dPipuFTyBd9nLylJM5X67pIobeP3568mTLVfNHDPTuVKRRhIzm51QDF8whSM5Ff6W5plky2ZSimmC1vg3QJAcl n30jhoIw VifU+7ILHwPMRO5broxcqXcxDpcBnri1GyuvfpUni2kog+a22b1/SmhstpVQNSAdw3yCw4orQFDVNfPMpRBOhwgboY2+J6/AxaGKglBZSwCNVxxcjh1iyGe5e8ZLviXmCmyp2lzvwFDlngrCiVkf1gm2Y9pl1zY1o8uzlnj1xyCUorQk8RsPvhBPfi+0HcCTxXEgPwmFDT7XnJgBi/nWI186VZXvRosnUgnSDNUIN8tCWEoeSrXvP1g7AYU6acYv1+DpHPBIjclIkMWyWAYlYex3Kxz79WkETUQWSlYHQphnb7JFywtjf6tEO+fcK4+LXFJhfMVOvfk2LGG7abJdJWpIxidReVaNVsvV9uqExdjD3A/Ml/KQlLPuCZzUzEIrLvqrLgM6UIJsUHdB2/9MtiHyi2ayNffYip1uxd8pXEu7Il+dwqx4Mm6++p2XcKGY7UGIUO/yB53f24elXAVpuoRJRLg/PeF6RiFEOBvcplwaAwWmf5QQRYlWeBtxdQ7loyNhZhhw8GgiCPEUijCAjvFT7EFS7awxS9uOrk61eygR7a1SfOZntNg0CzAZV28UgCZgn X-Bogosity: Ham, tests=bogofilter, spamicity=0.026827, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 20, 2024, David Hildenbrand wrote: > On 20.06.24 22:30, Sean Christopherson wrote: > > On Thu, Jun 20, 2024, David Hildenbrand wrote: > > > On 20.06.24 18:36, Jason Gunthorpe wrote: > > > > On Thu, Jun 20, 2024 at 04:45:08PM +0200, David Hildenbrand wrote: > > > > > > > > > If we could disallow pinning any shared pages, that would make life a lot > > > > > easier, but I think there were reasons for why we might require it. To > > > > > convert shared->private, simply unmap that folio (only the shared parts > > > > > could possibly be mapped) from all user page tables. > > > > > > > > IMHO it should be reasonable to make it work like ZONE_MOVABLE and > > > > FOLL_LONGTERM. Making a shared page private is really no different > > > > from moving it. > > > > > > > > And if you have built a VMM that uses VMA mapped shared pages and > > > > short-term pinning then you should really also ensure that the VM is > > > > aware when the pins go away. For instance if you are doing some virtio > > > > thing with O_DIRECT pinning then the guest will know the pins are gone > > > > when it observes virtio completions. > > > > > > > > In this way making private is just like moving, we unmap the page and > > > > then drive the refcount to zero, then move it. > > > Yes, but here is the catch: what if a single shared subpage of a large folio > > > is (validly) longterm pinned and you want to convert another shared subpage > > > to private? > > > > > > Sure, we can unmap the whole large folio (including all shared parts) before > > > the conversion, just like we would do for migration. But we cannot detect > > > that nobody pinned that subpage that we want to convert to private. > > > > > > Core-mm is not, and will not, track pins per subpage. > > > > > > So I only see two options: > > > > > > a) Disallow long-term pinning. That means, we can, with a bit of wait, > > > always convert subpages shared->private after unmapping them and > > > waiting for the short-term pin to go away. Not too bad, and we > > > already have other mechanisms disallow long-term pinnings (especially > > > writable fs ones!). > > > > I don't think disallowing _just_ long-term GUP will suffice, if we go the "disallow > > GUP" route than I think it needs to disallow GUP, period. Like the whole "GUP > > writes to file-back memory" issue[*], which I think you're alluding to, short-term > > GUP is also problematic. But unlike file-backed memory, for TDX and SNP (and I > > think pKVM), a single rogue access has a high probability of being fatal to the > > entire system. > > Disallowing short-term should work, in theory, because the By "short-term", I assume you mean "long-term"? Or am I more lost than I realize? > writes-to-fileback has different issues (the PIN is not the problem but the > dirtying). > > It's more related us not allowing long-term pins for FSDAX pages, because > the lifetime of these pages is determined by the FS. > > What we would do is > > 1) Unmap the large folio completely and make any refaults block. > -> No new pins can pop up > > 2) If the folio is pinned, busy-wait until all the short-term pins are > gone. This is the step that concerns me. "Relatively short time" is, well, relative. Hmm, though I suppose if userspace managed to map a shared page into something that pins the page, and can't force an unpin, e.g. by stopping I/O?, then either there's a host userspace bug or a guest bug, and so effectively hanging the vCPU that is waiting for the conversion to complete is ok. > 3) Safely convert the relevant subpage from shared -> private > > Not saying it's the best approach, but it should be doable.