From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E32ABC27C4F for ; Fri, 21 Jun 2024 09:25:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2726C6B0413; Fri, 21 Jun 2024 05:25:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1857A6B0416; Fri, 21 Jun 2024 05:25:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E08246B0413; Fri, 21 Jun 2024 05:25:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AC82B6B040D for ; Fri, 21 Jun 2024 05:25:17 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5A1C7120BF4 for ; Fri, 21 Jun 2024 09:25:17 +0000 (UTC) X-FDA: 82254362274.12.E2D8BA0 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf20.hostedemail.com (Postfix) with ESMTP id 7022D1C0008 for ; Fri, 21 Jun 2024 09:25:15 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Uxui2XfK; spf=pass (imf20.hostedemail.com: domain of qperret@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718961909; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TJ7kPulOx451lKRJFwdomb+qJgH0vBNRBUJ9vVH3v6U=; b=hN1r5SbzXb8XC4/RjXejGkjOUAW2p0brtvJBWDHBwUp+34FeKiWMb5SL31R9k+VnKMq8NM xTwlv45YtOrHD3Q+R99G3541I5s8fIFONHfEpXZvODCzaRU5hfSkdlnU0B6zaul+Hi7LxH 4wltiC2sYuEq57Vk29pISliVhZXirZg= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Uxui2XfK; spf=pass (imf20.hostedemail.com: domain of qperret@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718961909; a=rsa-sha256; cv=none; b=GUk4GT1x29LNeveZPyye/v5hgH0iWUNK9lexGfWhgd+laTaCVFKMK0NL2FkWrOw4KQymFJ eazlbbwewz0cEhPmxRGuQKPWRMrdqXykdIu4YzPDeWRXn8Z5t+VxAWMEu4fTng66Sh5El8 djGp17ok2OT15VaTQj9P7ryA/eqUvlA= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-57d0f929f79so1631908a12.2 for ; Fri, 21 Jun 2024 02:25:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718961914; x=1719566714; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=TJ7kPulOx451lKRJFwdomb+qJgH0vBNRBUJ9vVH3v6U=; b=Uxui2XfKfHulG+HxA4mEhQJVrHeEzGwlY406fgOTcuKCYuzlnC9h9v1sE5+uKtSLXd caL/ccBRq2IncyNBgQI5NeVRsu7AnaCoNyIUGupHcnP2xlhBsnnH3AJaMUqcI+6Cn6dd 2k/YgEfs9w6LIGcuygXhdGwjdagFE4ppd+VYyn4tIan5nf44iLog298hOakGr43skrfl dBw3/PIYutdIK7/gMXC4dqvHIXKr9DMhqe17vTzcFh44miqsYTQh5uDdZbt4R4P0w696 TbCCRLRngmTG1TKUfh7sHXsr2OO6oXYZX+fjqcviW4VPHsY3wulCi3a3i95IEJ5g/GJ0 TmJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718961914; x=1719566714; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TJ7kPulOx451lKRJFwdomb+qJgH0vBNRBUJ9vVH3v6U=; b=oUIYbJMeNOQN7Y2nQDvl+qMoW5K/wnHeo2gGugh7MEyW3GqT7Gu2kfeg86DpfGFDPD OZbdz6YE/vvzq0IX8hcQquQi2TDodGVidVTLPeliVH0yE5//Qsl+gD2S/+9nBf63D7zD l9HVnqALj1aGueDgmUVx/Hqg79CrRTusq53BDId4tEB0kV2edh03FirIQM8Zu78KdaOc qZctht8DMbk6WHrllJWVz8HoztfFvNwQzkeFYbwXpVGvnzncItGqKNZIDBYzZ2RynOHO WFi88d33nG3l86Dmrhq8CnM8Yw1q86zTKvOYPMUhCP988Gt44Qi4d/eaEXrqnpidIvpO LZiQ== X-Forwarded-Encrypted: i=1; AJvYcCUeCyzgV8PWIqwmHcSzxZ7LHcu9zKi4rUjMKwSLBVWZhH35pblAStAVQvmbendZKF5uakiSeGy+UEIQeRqa3ngBSYc= X-Gm-Message-State: AOJu0YwjhBQIis+xYR1lRADaA61YN2dYbbxzOKKNa3FWHrrrsDYUOp4z M4+2N2e/fhTi2NRr0keZ54OEc/+XGgI6Nh+jJ8QtCKttJxD7l8iV4gqdRsVmMg== X-Google-Smtp-Source: AGHT+IEpz/9QgX9U5wyhR6jzpLAUkz7DFkr/DyE9ciM0TRNOVp0gEwP5yeSZ1FfEyhdCaNpMGEoyBw== X-Received: by 2002:a50:d71c:0:b0:57c:d237:4fd with SMTP id 4fb4d7f45d1cf-57d20d49ee9mr2816823a12.4.1718961913575; Fri, 21 Jun 2024 02:25:13 -0700 (PDT) Received: from google.com (118.240.90.34.bc.googleusercontent.com. [34.90.240.118]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-57d3042d8b1sm683245a12.45.2024.06.21.02.25.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jun 2024 02:25:13 -0700 (PDT) Date: Fri, 21 Jun 2024 09:25:10 +0000 From: Quentin Perret To: David Hildenbrand Cc: Jason Gunthorpe , Elliot Berman , Fuad Tabba , Christoph Hellwig , John Hubbard , Andrew Morton , Shuah Khan , Matthew Wilcox , maz@kernel.org, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning Message-ID: References: <20240619115135.GE2494510@nvidia.com> <20240620135540.GG2494510@nvidia.com> <6d7b180a-9f80-43a4-a4cc-fd79a45d7571@redhat.com> <20240620142956.GI2494510@nvidia.com> <20240620140516768-0700.eberman@hu-eberman-lv.qualcomm.com> <20240620231814.GO2494510@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7022D1C0008 X-Stat-Signature: fborchmm5tsstfakh4maaj8deum5szr9 X-Rspam-User: X-HE-Tag: 1718961915-319541 X-HE-Meta: U2FsdGVkX1+RSjxlVDnUJiT5t4nzxkKYftfBwru2diJi6LRYREwA5EAxI0kqs0wvru/+aXbXjlldLUYtCwh6vPzKvY3QHO3nqDovxPBjUoTZMFnZ3kDX7gaS47sIk4rCwNGLodDeot6tjzTQ4/PWaQ98Nq0OFjhHdLg7f0w23ui5sN/9vlh+sJrROEyeIpJK+1t8yMJWuKU5z2jSHzz9zQsAaGjZhzVh6ZVxzpmSkFwnpblNKBF0tWyyf6KpCS06jVxB3/s+rfeDhOASAz9KGrEXsYwOjZBVm42hY1F3izaxPV+sD1OGvsMfaIDPexjFhDNUUdsarDjiRiCp4Myr1rKhFLGMf1Hd9ht8bPQCVgVlDpysbVsDUc9ve6SwUNdJ0/7Y0QHpi8L0MZr4qV0GvwXoGzJ8X2dWahm0eyVfOMlEVUBtk5miyVWOi12U/9CinAAjeIBNwpqOh9dFVsCXKkyg0+GTSGSoC6yYIaO071z+wl8jgBw0sv4WwbfJXVQk46GQtF7ywHXBkkYBWkjkun6xk7YpF3u9wM8jcVs2E26i8602hmKID/P1Rgh5U+AdqqzoEZK8XqI4QptbtmJ+sOTP7kV4YWtfb5rSbUN65wAUjVFkKwwkrMWSlPyYQjFUHLDsarAW0Gy0hFFmsKpNPuSP/kwgXlM4Tize/lC2VzLgYQs6Q8e+0yp6Z+4n9fGQYbmMy0e7WhxtSNA3sgGr5+g4AkaSmmM7FxLM678Z+z3ET3y+UuFmmnMd8xEm36VywHz1X7DFaCE//56yX9s5dtPGmyJ5+loaIvy0sA2xo4SMEtNWqFNlLzOsveY8EBFkfXu0HTDmLglpmnouXwe+JJvaz2MpvkXrcJieAyFnkhc21uhM+Mu9MuHBtUy7L4M5cU0QVw0eO8AMKgU2EZjEESX7OyaaZyZcR51bgX63DXVoidcJoX8J18kUEe8jEBNouIL5iNfOO9a4XBP+M9q L4mGrcEM 8KuMFuiBjd8DzHvSuSC/bjHBWbawah8GaOY/pvr0jz/L+FN3JAPyKW20+ifKdsC3ZGtL5id3HyG1zEDd1XYKUVeAe+Bsc99cFmYE9C1ED8I4YqR40lngPQQzdusfmwjD/6HUhN4Ls3IWDKj5h6JL0V/Ja9Cp+iJhl/kfv/7q9RZ6U8rJ/Gg7zQPzGQ8ObHaXQdd9FJgfQ2syyLAPmUl4vyKDglr7gSf/IKaqCzIWVHQtf8p3oDrGqlYo9sAMN3p9FuGWhF+cyly2IaG9cGVH2xypjKSpJF4keZv9oxsiX2NW6R8l5jvVG3UjiYDOUWz1+S8AFkfyZTTceW98a9VTEhuynZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Friday 21 Jun 2024 at 10:02:08 (+0200), David Hildenbrand wrote: > Thanks for the information. IMHO we really should try to find a common > ground here, and FOLL_EXCLUSIVE is likely not it :) That's OK, IMO at least :-). > Thanks for reviving this discussion with your patch set! > > pKVM is interested in in-place conversion, I believe there are valid use > cases for in-place conversion for TDX and friends as well (as discussed, I > think that might be a clean way to get huge/gigantic page support in). > > This implies the option to: > > 1) Have shared+private memory in guest_memfd > 2) Be able to mmap shared parts > 3) Be able to convert shared<->private in place > > and later in my interest > > 4) Have huge/gigantic page support in guest_memfd with the option of > converting individual subpages > > We might not want to make use of that model for all of CC -- as you state, > sometimes the destructive approach might be better performance wise -- but > having that option doesn't sound crazy to me (and maybe would solve real > issues as well). Cool. > After all, the common requirement here is that "private" pages are not > mapped/pinned/accessible. > > Sure, there might be cases like "pKVM can handle access to private pages in > user page mappings", "AMD-SNP will not crash the host if writing to private > pages" but there are not factors that really make a difference for a common > solution. Sure, there isn't much value in differentiating on these things. One might argue that we could save one mmap() on the private->shared conversion path by keeping all of guest_memfd mapped in userspace including private memory, but that's most probably not worth the effort of re-designing the whole thing just for that, so let's forget that. The ability to handle stage-2 faults in the kernel has implications in other places however. It means we don't need to punch holes in the kernel linear map when donating memory to a guest for example, even with 'crazy' access patterns like load_unaligned_zeropad(). So that's good. > private memory: not mapped, not pinned > shared memory: maybe mapped, maybe pinned > granularity of conversion: single pages > > Anything I am missing? That looks good to me. And as discussed in previous threads, we have the ambition of getting page-migration to work, including for private memory, mostly to get kcompactd to work better when pVMs are running. Android makes extensive use of compaction, and pVMs currently stick out like a sore thumb. We can trivially implement a hypercall to have pKVM swap a private page with another without the guest having to know. The difficulty is obviously to hook that in Linux, and I've personally not looked into it properly, so that is clearly longer term. We don't want to take anybody by surprise if there is a need for some added complexity in guest_memfd to support this use-case though. I don't expect folks on the receiving end of that to agree to it blindly without knowing _what_ this complexity is FWIW. But at least our intentions are clear :-) Thanks, Quentin