From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFEC91F4CA4; Sun, 14 Sep 2025 07:35:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757835355; cv=none; b=rBWqtMpEvt1wnRYyhmRYS4ZcmyVq/uJX9OOrOKgF4+MxKd96betR99ahup1DathNgmBTf/lDi/2L55OElcX/M0IOPgK94g69DUqEVY/9kN0t6YgSBrKU8+iHhXhlsDePVPn5+XAJXTSs/wDJbooxQl7ntlO5RjGeazPnHDYZ3t0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757835355; c=relaxed/simple; bh=vFnfGTkj5/KhQQ0QRoZ2pu4iIv0mM+KWgCa9F0xbdl0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=WpglhYwD0lkKSPye87Yjr31hZww5y2gM/o6F0gu2j3FFRmjfMo32UvvEQuPhB07I8L8rf+kW/U+rXbeoelr9PzVmnlQqtG/RVtKK2lOGtHPRCz7d0OZa+lG0i8EW0CeS8yCIJidDnGiAjt+qX0QUmZmnAuD5RNOICWRNACB57Xs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KKfr8C/P; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KKfr8C/P" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B99EBC4CEF1; Sun, 14 Sep 2025 07:35:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1757835354; bh=vFnfGTkj5/KhQQ0QRoZ2pu4iIv0mM+KWgCa9F0xbdl0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KKfr8C/PuOOvlqMXgt2nNqevFc6crDsLfQefigkMxEmHUQqKThFaSj7Kt7tgJN4Ql tqwGBz8U9I31FemsXLNBesItokumnSgVNDEzsAQrwzbqWwJwnQIDazhbA1Q/3KP+VP AQqh9YXlhPpv9/6VgKrt9NrHUMC2s8z6wl4SAPtpDk0BMTpCk+1VBM2F7dJPjMiCgx sOI4Dv09T0ldqRZdEnwtXoIPVkkM09/pR0S7gXJwMOp2kmZ/LcHM4PYRqtqXpfzW6h UcRqfaCUNbA3FMvl6sJbGdZ3ojqu1Nas1iYqlaPOanydB0APtSr/WcoEu4gvcaosEK LUw3//H6uFjmQ== Date: Sun, 14 Sep 2025 10:35:23 +0300 From: Mike Rapoport To: "Roy, Patrick" Cc: "Thomson, Jack" , "Kalyazin, Nikita" , "Cali, Marco" , "derekmn@amazon.co.uk" , "willy@infradead.org" , "corbet@lwn.net" , "pbonzini@redhat.com" , "maz@kernel.org" , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "chenhuacai@kernel.org" , "kernel@xen0n.name" , "paul.walmsley@sifive.com" , "palmer@dabbelt.com" , "aou@eecs.berkeley.edu" , "alex@ghiti.fr" , "agordeev@linux.ibm.com" , "gerald.schaefer@linux.ibm.com" , "hca@linux.ibm.com" , "gor@linux.ibm.com" , "borntraeger@linux.ibm.com" , "svens@linux.ibm.com" , "dave.hansen@linux.intel.com" , "luto@kernel.org" , "peterz@infradead.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "x86@kernel.org" , "hpa@zytor.com" , "trondmy@kernel.org" , "anna@kernel.org" , "hubcap@omnibond.com" , "martin@omnibond.com" , "viro@zeniv.linux.org.uk" , "brauner@kernel.org" , "jack@suse.cz" , "akpm@linux-foundation.org" , "david@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "vbabka@suse.cz" , "surenb@google.com" , "mhocko@suse.com" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "song@kernel.org" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jolsa@kernel.org" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "axelrasmussen@google.com" , "yuanchu@google.com" , "weixugc@google.com" , "hannes@cmpxchg.org" , "zhengqi.arch@bytedance.com" , "shakeel.butt@linux.dev" , "shuah@kernel.org" , "seanjc@google.com" , "linux-fsdevel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "loongarch@lists.linux.dev" , "linux-riscv@lists.infradead.org" , "linux-s390@vger.kernel.org" , "linux-nfs@vger.kernel.org" , "devel@lists.orangefs.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" Subject: Re: [PATCH v6 03/11] mm: introduce AS_NO_DIRECT_MAP Message-ID: References: <20250912091708.17502-1-roypat@amazon.co.uk> <20250912091708.17502-4-roypat@amazon.co.uk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250912091708.17502-4-roypat@amazon.co.uk> On Fri, Sep 12, 2025 at 09:17:34AM +0000, Roy, Patrick wrote: > Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are > set to not present . Currently, mappings that match this description are > secretmem mappings (memfd_secret()). Later, some guest_memfd > configurations will also fall into this category. > > Reject this new type of mappings in all locations that currently reject > secretmem mappings, on the assumption that if secretmem mappings are > rejected somewhere, it is precisely because of an inability to deal with > folios without direct map entries, and then make memfd_secret() use > AS_NO_DIRECT_MAP on its address_space to drop its special > vma_is_secretmem()/secretmem_mapping() checks. > > This drops a optimization in gup_fast_folio_allowed() where > secretmem_mapping() was only called if CONFIG_SECRETMEM=y. secretmem is > enabled by default since commit b758fe6df50d ("mm/secretmem: make it on > by default"), so the secretmem check did not actually end up elided in > most cases anymore anyway. > > Use a new flag instead of overloading AS_INACCESSIBLE (which is already > set by guest_memfd) because not all guest_memfd mappings will end up > being direct map removed (e.g. in pKVM setups, parts of guest_memfd that > can be mapped to userspace should also be GUP-able, and generally not > have restrictions on who can access it). > > Signed-off-by: Patrick Roy Acked-by: Mike Rapoport (Microsoft) > --- > include/linux/pagemap.h | 16 ++++++++++++++++ > include/linux/secretmem.h | 18 ------------------ > lib/buildid.c | 4 ++-- > mm/gup.c | 19 +++++-------------- > mm/mlock.c | 2 +- > mm/secretmem.c | 8 ++------ > 6 files changed, 26 insertions(+), 41 deletions(-) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 12a12dae727d..1f5739f6a9f5 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -211,6 +211,7 @@ enum mapping_flags { > folio contents */ > AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ > AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, > + AS_NO_DIRECT_MAP = 10, /* Folios in the mapping are not in the direct map */ > /* Bits 16-25 are used for FOLIO_ORDER */ > AS_FOLIO_ORDER_BITS = 5, > AS_FOLIO_ORDER_MIN = 16, > @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_reclaim(struct address_spac > return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags); > } > > +static inline void mapping_set_no_direct_map(struct address_space *mapping) > +{ > + set_bit(AS_NO_DIRECT_MAP, &mapping->flags); > +} > + > +static inline bool mapping_no_direct_map(const struct address_space *mapping) > +{ > + return test_bit(AS_NO_DIRECT_MAP, &mapping->flags); > +} > + > +static inline bool vma_has_no_direct_map(const struct vm_area_struct *vma) > +{ > + return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping); > +} > + > static inline gfp_t mapping_gfp_mask(struct address_space * mapping) > { > return mapping->gfp_mask; > diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h > index e918f96881f5..0ae1fb057b3d 100644 > --- a/include/linux/secretmem.h > +++ b/include/linux/secretmem.h > @@ -4,28 +4,10 @@ > > #ifdef CONFIG_SECRETMEM > > -extern const struct address_space_operations secretmem_aops; > - > -static inline bool secretmem_mapping(struct address_space *mapping) > -{ > - return mapping->a_ops == &secretmem_aops; > -} > - > -bool vma_is_secretmem(struct vm_area_struct *vma); > bool secretmem_active(void); > > #else > > -static inline bool vma_is_secretmem(struct vm_area_struct *vma) > -{ > - return false; > -} > - > -static inline bool secretmem_mapping(struct address_space *mapping) > -{ > - return false; > -} > - > static inline bool secretmem_active(void) > { > return false; > diff --git a/lib/buildid.c b/lib/buildid.c > index c4b0f376fb34..89e567954284 100644 > --- a/lib/buildid.c > +++ b/lib/buildid.c > @@ -65,8 +65,8 @@ static int freader_get_folio(struct freader *r, loff_t file_off) > > freader_put_folio(r); > > - /* reject secretmem folios created with memfd_secret() */ > - if (secretmem_mapping(r->file->f_mapping)) > + /* reject folios without direct map entries (e.g. from memfd_secret() or guest_memfd()) */ > + if (mapping_no_direct_map(r->file->f_mapping)) > return -EFAULT; > > r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT); > diff --git a/mm/gup.c b/mm/gup.c > index adffe663594d..75a0cffdf37d 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -11,7 +11,6 @@ > #include > #include > #include > -#include > > #include > #include > @@ -1234,7 +1233,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) > if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma)) > return -EOPNOTSUPP; > > - if (vma_is_secretmem(vma)) > + if (vma_has_no_direct_map(vma)) > return -EFAULT; > > if (write) { > @@ -2736,7 +2735,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked); > * This call assumes the caller has pinned the folio, that the lowest page table > * level still points to this folio, and that interrupts have been disabled. > * > - * GUP-fast must reject all secretmem folios. > + * GUP-fast must reject all folios without direct map entries (such as secretmem). > * > * Writing to pinned file-backed dirty tracked folios is inherently problematic > * (see comment describing the writable_file_mapping_allowed() function). We > @@ -2751,7 +2750,6 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) > { > bool reject_file_backed = false; > struct address_space *mapping; > - bool check_secretmem = false; > unsigned long mapping_flags; > > /* > @@ -2763,18 +2761,10 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) > reject_file_backed = true; > > /* We hold a folio reference, so we can safely access folio fields. */ > - > - /* secretmem folios are always order-0 folios. */ > - if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio)) > - check_secretmem = true; > - > - if (!reject_file_backed && !check_secretmem) > - return true; > - > if (WARN_ON_ONCE(folio_test_slab(folio))) > return false; > > - /* hugetlb neither requires dirty-tracking nor can be secretmem. */ > + /* hugetlb neither requires dirty-tracking nor can be without direct map. */ > if (folio_test_hugetlb(folio)) > return true; > > @@ -2812,8 +2802,9 @@ static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags) > * At this point, we know the mapping is non-null and points to an > * address_space object. > */ > - if (check_secretmem && secretmem_mapping(mapping)) > + if (mapping_no_direct_map(mapping)) > return false; > + > /* The only remaining allowed file system is shmem. */ > return !reject_file_backed || shmem_mapping(mapping); > } > diff --git a/mm/mlock.c b/mm/mlock.c > index a1d93ad33c6d..36f5e70faeb0 100644 > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, > > if (newflags == oldflags || (oldflags & VM_SPECIAL) || > is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || > - vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE)) > + vma_is_dax(vma) || vma_has_no_direct_map(vma) || (oldflags & VM_DROPPABLE)) > /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */ > goto out; > > diff --git a/mm/secretmem.c b/mm/secretmem.c > index 422dcaa32506..b5ce55079695 100644 > --- a/mm/secretmem.c > +++ b/mm/secretmem.c > @@ -134,11 +134,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc *desc) > return 0; > } > > -bool vma_is_secretmem(struct vm_area_struct *vma) > -{ > - return vma->vm_ops == &secretmem_vm_ops; > -} > - > static const struct file_operations secretmem_fops = { > .release = secretmem_release, > .mmap_prepare = secretmem_mmap_prepare, > @@ -157,7 +152,7 @@ static void secretmem_free_folio(struct address_space *mapping, > folio_zero_segment(folio, 0, folio_size(folio)); > } > > -const struct address_space_operations secretmem_aops = { > +static const struct address_space_operations secretmem_aops = { > .dirty_folio = noop_dirty_folio, > .free_folio = secretmem_free_folio, > .migrate_folio = secretmem_migrate_folio, > @@ -206,6 +201,7 @@ static struct file *secretmem_file_create(unsigned long flags) > > mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); > mapping_set_unevictable(inode->i_mapping); > + mapping_set_no_direct_map(inode->i_mapping); > > inode->i_op = &secretmem_iops; > inode->i_mapping->a_ops = &secretmem_aops; > -- > 2.50.1 > -- Sincerely yours, Mike.