From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08171C54E5D for ; Tue, 19 Mar 2024 09:47:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CC686B0087; Tue, 19 Mar 2024 05:47:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87BFA6B0089; Tue, 19 Mar 2024 05:47:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71C8C6B009A; Tue, 19 Mar 2024 05:47:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 610726B0087 for ; Tue, 19 Mar 2024 05:47:49 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2A66AA1090 for ; Tue, 19 Mar 2024 09:47:49 +0000 (UTC) X-FDA: 81913311858.13.BEF8381 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf19.hostedemail.com (Postfix) with ESMTP id 1302A1A0008 for ; Tue, 19 Mar 2024 09:47:46 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bmArvdwI; spf=pass (imf19.hostedemail.com: domain of qperret@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710841667; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/HHhyDwGvRCaPjjju0RIkoEll4g5zmeDctgezA6gAMo=; b=46nXULLLjhkPI5vkcmY2ABSViO3bMjCWC6sOnDF8oQvwqVQqB/EZJG9OgayeTGBEnAlpa1 pYPcIHxq4GIQhY5b9MCYP34hu85l7uX1FLpbxJRc8TkOpl/RxiJQyidpHu2tAwaN+T53Yh DQJyVibdbC3vuEcxbGRynRW1FzQHJUQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710841667; a=rsa-sha256; cv=none; b=ANtIhlIn+xpj5/NJw+jwDY/Po46iAhfxQwKtL//L0JcD34LOukYNCboidI5M7DRX7BvGV3 Ce3DPVE7ZmU8ONmSXeXUkB8J1z6Bk8h1aLLlXF1sY78IGn8TpGGFDjx6Ipu8vMR6VMXq1J cYliiIMOh7UHzw6YEonvJxpN16lDn8c= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bmArvdwI; spf=pass (imf19.hostedemail.com: domain of qperret@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-56b857bac38so717114a12.0 for ; Tue, 19 Mar 2024 02:47:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710841665; x=1711446465; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/HHhyDwGvRCaPjjju0RIkoEll4g5zmeDctgezA6gAMo=; b=bmArvdwIzqQHH8j5K2vtZwPUj2CyOjJYl/DngvWUe4qVjauC8Xe9Qwmrv/wLo8kfd7 mgXzIg2L5Ov2OLx5jkEJodBpxGTv3PZILMPPB1KLm2OpT9SCmSOTbUSUk1V6tksoLosW NouBFXXsrkaqnilpicCQZBCULY1MmnoJ7PK+9HJYlT5vJiTmI72Aglo1SHQUpz53vVLU YekhcPl+6j0WI/SQY685rCi3x3E+xXjcARdfd9hpPOkzYbpK3o+cnxIq1ymkLqJzb1LK 8Msj82QNCdWb7MRyLOt+9GWkJ1fllgqCXSkwaEhqxC0zPwLjwgARkhjX0SVyXqV96nxT DI1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710841665; x=1711446465; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/HHhyDwGvRCaPjjju0RIkoEll4g5zmeDctgezA6gAMo=; b=ekSCzaouFrYxXcXmDxW2T13eq/Lu990FpywaR7Qplj2notkcyPQ8COA3HUF/qVR+bP Y1QEybylcMqe7grnj2ldRJqPhKEeRD+5RegwGq88ZUaxyDVaTXzH0ZbQzhbhZGtF26wu hkx16TSDlrt2xRvvOwnhVBFts7gZdToruQaSMQzCKhec0NUa7UUmfLbOVgs8fuTBdmiD XL9XD8DDunu3TZBNtYtf95QoS6CIex6RhqhnJctMEuG6Xu9/iD+fZQROCxPl09f/QNdl 56eEZpIcnICVv3jTHVhnY8sqg8iKz5ha9HAtQMKQ6TAzvoC91LGKjdnPzhn748XeMW/0 bjgg== X-Forwarded-Encrypted: i=1; AJvYcCUVHk7KLO0AAXeHo65dXf5De9nmbH1QvZ92Vgn+eE9gFG55I7prWIplphgbe/cokRVy9z8SjeYPc+mau2T59vj5/dk= X-Gm-Message-State: AOJu0YxnZlcjZLjBzPA9r0+AuN6u/1Qe837Xvy/epP2uhRyt5gGIvW5a rNdE2jGD0KfZNPfmcbUn+Y90d1folHZ/MooEWuR9RRl71kvoHb8yxO7ZfcB/RQ== X-Google-Smtp-Source: AGHT+IF/puef8zhuN1727aKfDCay8r7kOdm59liIzFTyv3w/POjfIRKnd9fb5D53y8VNfxHFZefOTw== X-Received: by 2002:a05:6402:448c:b0:568:1599:b854 with SMTP id er12-20020a056402448c00b005681599b854mr6246355edb.42.1710841665168; Tue, 19 Mar 2024 02:47:45 -0700 (PDT) Received: from google.com (64.227.90.34.bc.googleusercontent.com. [34.90.227.64]) by smtp.gmail.com with ESMTPSA id p11-20020a056402044b00b00568d325acf8sm2526799edw.20.2024.03.19.02.47.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Mar 2024 02:47:44 -0700 (PDT) Date: Tue, 19 Mar 2024 09:47:40 +0000 From: Quentin Perret To: David Hildenbrand Cc: Sean Christopherson , Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, keirf@google.com, linux-mm@kvack.org Subject: Re: folio_mmapped Message-ID: References: <755911e5-8d4a-4e24-89c7-a087a26ec5f6@redhat.com> <99a94a42-2781-4d48-8b8c-004e95db6bb5@redhat.com> <20240304132732963-0800.eberman@hu-eberman-lv.qualcomm.com> <4b0fd46a-cc4f-4cb7-9f6f-ce19a2d3064e@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4b0fd46a-cc4f-4cb7-9f6f-ce19a2d3064e@redhat.com> X-Rspamd-Queue-Id: 1302A1A0008 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: bkoybz84g7e3gqkycdeygj1mdctnokwk X-HE-Tag: 1710841666-977120 X-HE-Meta: U2FsdGVkX19V8Ta4iA8CT+QtcXUrHCpYZmcdx6Pxj8kZGmfvuW1JpD1dnWFlBIwJZc4xqmD2vZEc83dutkDCEq5jAzq7pjWzcM12qM5TqIzY18scXL/SiGguxd/N2MyukLmWdiDxC/uz/B0wKR/V5kWJSPycBtfD9qpjlPJYB4MBDaO3znCogyL3A/cE00M8jsHLy6BJmLlPalc3Si9v5xbPHyIlZqfBl20ljIwCbzbszQfAo3bxYYFTfvhrDgXBYxPeqeHARRt3se9zCEuvf6fQXAUxoYr5cOhCVTS8qMhnvB12tjadAOZYejx1jJaOqBIWJxBxXPo5MKsSqV1AjFKAng2qcECfGijIxNxuWazm9keW21nx1SjpMGFnyidrXiRTasOV+YWefUG0Cvt5jcKR5PwLdu6sXSDWeOVXn8oWVaE+BhFqrYcr1fcQhYpMatjIg2Jr7mmcqm2LKPKpBZ0aEa0gOW+pxQbyAs03aMD5pAJQNr/FMx1+cfSpuOlDuIEsO+tNqbhLU617/hFtxpix+bXhHY/5dk0ttfUy2GCRaEtQzjQV7Radxplfo0vVOYJLwzuowCyk1VYBlkOc9XnJBsetcz0e1QTao+sHdHLssZX1L0CTfZH3t+5xPAUzJDnf7KmpRKrMOBxTmKPQHSk+1Sg4pvNm2Irln3SFdBwa3vO7n56pUdoLJx/P2q5lbGqHleTJKoLFHgmOkLKoYVmAuE4+1yoP0VP3mfaCSD+yb02Lx2/2ujw+ovj2voJNeV7TnVdbcYRTJ4rxLika3VvxTSTZQNIPfzWl8aijnMUcun22Flz+x4iyTPcQdrJXBRV1sDG5JNGDPitHEtWbOLOuRsCRNzAlQgCVPFPh66LnU8cpUxrgPn3k0V7cHcoElMHFJdS/Z+QbEjlIMwrrcxgaMrmjNSmrjA6V1CRhTQPRe29vUDxia1QdrlNkM18hh30Sox/i+FVjv4ve/Z3 ZYK6DSEB s+N9IIBZJ/19oaR1IT66exoUBbLnh0+KcCQtl7JRiZVCbM88HS+xJ4C2iaKzNKhd9vyEJVK1GJ8cHjskY9YjHgCChyKPe2zQzniuENNvc86mg6XERLV+IXrKhOLarSgR0z4a4kchMI0RMbbfg48vInyRZ+1IvnytvotvxuhJi4Mvp0hZ14QzPbpB3psPWqhDwbieV1HxdpqMfzN/7cZ57hOM/gE4cvhOjCYTZeN1Sxs4Mwyda8LWwX541LQNsJ/SYmcyJM7ObBCQqX0C80TNwhxsOJ1DVHhIRNNJOUl/vcqsjJqG6CBFYeG2IIQO3gSgwB4Mm4srNePL7+n73Zq6Gr221e0ZQO+ahJBBRCWeI83lz8+TmRqHQz1EyzouNpUYAXXmyb3zFRuptEFY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Monday 04 Mar 2024 at 22:58:49 (+0100), David Hildenbrand wrote: > On 04.03.24 22:43, Elliot Berman wrote: > > On Mon, Mar 04, 2024 at 09:17:05PM +0100, David Hildenbrand wrote: > > > On 04.03.24 20:04, Sean Christopherson wrote: > > > > On Mon, Mar 04, 2024, Quentin Perret wrote: > > > > > > As discussed in the sub-thread, that might still be required. > > > > > > > > > > > > One could think about completely forbidding GUP on these mmap'ed > > > > > > guest-memfds. But likely, there might be use cases in the future where you > > > > > > want to use GUP on shared memory inside a guest_memfd. > > > > > > > > > > > > (the iouring example I gave might currently not work because > > > > > > FOLL_PIN|FOLL_LONGTERM|FOLL_WRITE only works on shmem+hugetlb, and > > > > > > guest_memfd will likely not be detected as shmem; 8ac268436e6d contains some > > > > > > details) > > > > > > > > > > Perhaps it would be wise to start with GUP being forbidden if the > > > > > current users do not need it (not sure if that is the case in Android, > > > > > I'll check) ? We can always relax this constraint later when/if the > > > > > use-cases arise, which is obviously much harder to do the other way > > > > > around. > > > > > > > > +1000. At least on the KVM side, I would like to be as conservative as possible > > > > when it comes to letting anything other than the guest access guest_memfd. > > > > > > So we'll have to do it similar to any occurrences of "secretmem" in gup.c. > > > We'll have to see how to marry KVM guest_memfd with core-mm code similar to > > > e.g., folio_is_secretmem(). > > > > > > IIRC, we might not be able to de-reference the actual mapping because it > > > could get free concurrently ... > > > > > > That will then prohibit any kind of GUP access to these pages, including > > > reading/writing for ptrace/debugging purposes, for core dumping purposes > > > etc. But at least, you know that nobody was able to optain page references > > > using GUP that might be used for reading/writing later. > > > > Do you have any concerns to add to enum mapping_flags, AS_NOGUP, and > > replacing folio_is_secretmem() with a test of this bit instead of > > comparing the a_ops? I think it scales better. > > The only concern I have are races, but let's look into the details: > > In GUP-fast, we can essentially race with unmap of folios, munmap() of VMAs > etc. > > We had a similar discussion recently about possible races. It's documented > in folio_fast_pin_allowed() regarding disabled IRQs and RCU grace periods. > > "inodes and thus their mappings are freed under RCU, which means the mapping > cannot be freed beneath us and thus we can safely dereference it." > > So if we follow the same rules as folio_fast_pin_allowed(), we can > de-reference folio->mapping, for example comparing mapping->a_ops. > > [folio_is_secretmem should better follow the same approach] Resurecting this discussion, I had discussions internally and as it turns out Android makes extensive use of vhost/vsock when communicating with guest VMs, which requires GUP. So, my bad, not supporting GUP for the pKVM variant of guest_memfd is a bit of a non-starter, we'll need to support it from the start. But again this should be a matter of 'simply' having a dedicated KVM exit reason so hopefully it's not too bad. Thanks, Quentin