From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fra-out-003.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-003.esa.eu-central-1.outbound.mail-perimeter.amazon.com [3.72.182.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 02E7F286D66; Fri, 12 Sep 2025 09:17:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=3.72.182.33 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757668671; cv=none; b=jce4LrCmb5gaSUmO73WaFDer9c+sBgv+zJTIb60Z9zgU+wANG6kPlpgKFbRYH0FT9mGfk9jkSfK+pWB6Q6WHb+++XxT9hzOlnOTg2hmGP3lBrKZ6rB3S2ut9LI6lEG+yId4Exi8RBsDI3oBH1Pk61V7fNqBREiUmgEfio3UuOro= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757668671; c=relaxed/simple; bh=mnV1rcfIJWTdk+Az9VL1UEwVFY3zfcMptbsHZrfh2x8=; h=From:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=Mc6AjVGDHj03+RVvRN33wdHrskrynv1jCDqjKUVwfZ/wwu4JsfGbJw73ahECG42hannkRjDUTnaAKjGlKTLgbTR14gSDm/y+WNO9H+vq/5yqTqL57jXcMWNYGH7BVayJ1hpPT0aok25wXyUps0A5EKMzTI6Qj1xrjKmtdngtrIQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=nJ7cet3q; arc=none smtp.client-ip=3.72.182.33 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="nJ7cet3q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazoncorp2; t=1757668668; x=1789204668; h=from:cc:subject:date:message-id:references:in-reply-to: content-transfer-encoding:mime-version; bh=L1uOJJu3rEt1RZKXmiQj9GuMwXC6NRFndNhKGAKSBZI=; b=nJ7cet3qDITO/PLPYyS06y/3crFazGas8F58+/PnOwRKovlbSZ2mADF7 UN2xjJBbcl0LypXltaXCGA+HcRRjV8oKBbHy+N7p6ylz/3ZZZccZ6JaOa DIUy9poic996QSQmOqHtK4YoDACR8ZbWDVt5Jncw3NoM4MIZMQ3gUW/6G TH0hemvca4gpBlU1MIJT+nv1f6twZ0djnjsAVFX0oc/2aMoMqcf+9MiZz muSN+PQ6SBlO23YQTu5lxRoweCMjrCzK5ZmMl3qdocQlB60lsKQmLF0An VffkX4IQewDEZR/mcFbNObk6td5t5oUHbqcgmDod4ipXCGQcyVDA/BmHr Q==; X-CSE-ConnectionGUID: RQKw5bjlSfejfpAhVlOeLg== X-CSE-MsgGUID: 2RPJ5/9NQ9m/oFIybgUfMA== X-IronPort-AV: E=Sophos;i="6.18,259,1751241600"; d="scan'208";a="2006961" Received: from ip-10-6-6-97.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.6.97]) by internal-fra-out-003.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Sep 2025 09:17:36 +0000 Received: from EX19MTAEUB001.ant.amazon.com [54.240.197.226:21336] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.2.154:2525] with esmtp (Farcaster) id c1f0eae9-f42c-47d6-84b3-1325a62adc4b; Fri, 12 Sep 2025 09:17:36 +0000 (UTC) X-Farcaster-Flow-ID: c1f0eae9-f42c-47d6-84b3-1325a62adc4b Received: from EX19D015EUB002.ant.amazon.com (10.252.51.123) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Fri, 12 Sep 2025 09:17:35 +0000 Received: from EX19D015EUB004.ant.amazon.com (10.252.51.13) by EX19D015EUB002.ant.amazon.com (10.252.51.123) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Fri, 12 Sep 2025 09:17:35 +0000 Received: from EX19D015EUB004.ant.amazon.com ([fe80::2dc9:7aa9:9cd3:fc8a]) by EX19D015EUB004.ant.amazon.com ([fe80::2dc9:7aa9:9cd3:fc8a%3]) with mapi id 15.02.2562.020; Fri, 12 Sep 2025 09:17:34 +0000 From: "Roy, Patrick" CC: "Thomson, Jack" , "Kalyazin, Nikita" , "Cali, Marco" , "derekmn@amazon.co.uk" , "Roy, Patrick" , "willy@infradead.org" , "corbet@lwn.net" , "pbonzini@redhat.com" , "maz@kernel.org" , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "chenhuacai@kernel.org" , "kernel@xen0n.name" , "paul.walmsley@sifive.com" , "palmer@dabbelt.com" , "aou@eecs.berkeley.edu" , "alex@ghiti.fr" , "agordeev@linux.ibm.com" , "gerald.schaefer@linux.ibm.com" , "hca@linux.ibm.com" , "gor@linux.ibm.com" , "borntraeger@linux.ibm.com" , "svens@linux.ibm.com" , "dave.hansen@linux.intel.com" , "luto@kernel.org" , "peterz@infradead.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "x86@kernel.org" , "hpa@zytor.com" , "trondmy@kernel.org" , "anna@kernel.org" , "hubcap@omnibond.com" , "martin@omnibond.com" , "viro@zeniv.linux.org.uk" , "brauner@kernel.org" , "jack@suse.cz" , "akpm@linux-foundation.org" , "david@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "song@kernel.org" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jolsa@kernel.org" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "axelrasmussen@google.com" , "yuanchu@google.com" , "weixugc@google.com" , "hannes@cmpxchg.org" , "zhengqi.arch@bytedance.com" , "shakeel.butt@linux.dev" , "shuah@kernel.org" , "seanjc@google.com" , "linux-fsdevel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "loongarch@lists.linux.dev" , "linux-riscv@lists.infradead.org" , "linux-s390@vger.kernel.org" , "linux-nfs@vger.kernel.org" , "devel@lists.orangefs.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" Subject: [PATCH v6 03/11] mm: introduce AS_NO_DIRECT_MAP Thread-Topic: [PATCH v6 03/11] mm: introduce AS_NO_DIRECT_MAP Thread-Index: AQHcI8YTZR0JftjwNkq9EBotomJZxg== Date: Fri, 12 Sep 2025 09:17:34 +0000 Message-ID: <20250912091708.17502-4-roypat@amazon.co.uk> References: <20250912091708.17502-1-roypat@amazon.co.uk> In-Reply-To: <20250912091708.17502-1-roypat@amazon.co.uk> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are=0A= set to not present . Currently, mappings that match this description are=0A= secretmem mappings (memfd_secret()). Later, some guest_memfd=0A= configurations will also fall into this category.=0A= =0A= Reject this new type of mappings in all locations that currently reject=0A= secretmem mappings, on the assumption that if secretmem mappings are=0A= rejected somewhere, it is precisely because of an inability to deal with=0A= folios without direct map entries, and then make memfd_secret() use=0A= AS_NO_DIRECT_MAP on its address_space to drop its special=0A= vma_is_secretmem()/secretmem_mapping() checks.=0A= =0A= This drops a optimization in gup_fast_folio_allowed() where=0A= secretmem_mapping() was only called if CONFIG_SECRETMEM=3Dy. secretmem is= =0A= enabled by default since commit b758fe6df50d ("mm/secretmem: make it on=0A= by default"), so the secretmem check did not actually end up elided in=0A= most cases anymore anyway.=0A= =0A= Use a new flag instead of overloading AS_INACCESSIBLE (which is already=0A= set by guest_memfd) because not all guest_memfd mappings will end up=0A= being direct map removed (e.g. in pKVM setups, parts of guest_memfd that=0A= can be mapped to userspace should also be GUP-able, and generally not=0A= have restrictions on who can access it).=0A= =0A= Signed-off-by: Patrick Roy =0A= ---=0A= include/linux/pagemap.h | 16 ++++++++++++++++=0A= include/linux/secretmem.h | 18 ------------------=0A= lib/buildid.c | 4 ++--=0A= mm/gup.c | 19 +++++--------------=0A= mm/mlock.c | 2 +-=0A= mm/secretmem.c | 8 ++------=0A= 6 files changed, 26 insertions(+), 41 deletions(-)=0A= =0A= diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h=0A= index 12a12dae727d..1f5739f6a9f5 100644=0A= --- a/include/linux/pagemap.h=0A= +++ b/include/linux/pagemap.h=0A= @@ -211,6 +211,7 @@ enum mapping_flags {=0A= folio contents */=0A= AS_INACCESSIBLE =3D 8, /* Do not attempt direct R/W access to the mapping= */=0A= AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM =3D 9,=0A= + AS_NO_DIRECT_MAP =3D 10, /* Folios in the mapping are not in the direct m= ap */=0A= /* Bits 16-25 are used for FOLIO_ORDER */=0A= AS_FOLIO_ORDER_BITS =3D 5,=0A= AS_FOLIO_ORDER_MIN =3D 16,=0A= @@ -346,6 +347,21 @@ static inline bool mapping_writeback_may_deadlock_on_r= eclaim(struct address_spac=0A= return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);= =0A= }=0A= =0A= +static inline void mapping_set_no_direct_map(struct address_space *mapping= )=0A= +{=0A= + set_bit(AS_NO_DIRECT_MAP, &mapping->flags);=0A= +}=0A= +=0A= +static inline bool mapping_no_direct_map(const struct address_space *mappi= ng)=0A= +{=0A= + return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);=0A= +}=0A= +=0A= +static inline bool vma_has_no_direct_map(const struct vm_area_struct *vma)= =0A= +{=0A= + return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);=0A= +}=0A= +=0A= static inline gfp_t mapping_gfp_mask(struct address_space * mapping)=0A= {=0A= return mapping->gfp_mask;=0A= diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h=0A= index e918f96881f5..0ae1fb057b3d 100644=0A= --- a/include/linux/secretmem.h=0A= +++ b/include/linux/secretmem.h=0A= @@ -4,28 +4,10 @@=0A= =0A= #ifdef CONFIG_SECRETMEM=0A= =0A= -extern const struct address_space_operations secretmem_aops;=0A= -=0A= -static inline bool secretmem_mapping(struct address_space *mapping)=0A= -{=0A= - return mapping->a_ops =3D=3D &secretmem_aops;=0A= -}=0A= -=0A= -bool vma_is_secretmem(struct vm_area_struct *vma);=0A= bool secretmem_active(void);=0A= =0A= #else=0A= =0A= -static inline bool vma_is_secretmem(struct vm_area_struct *vma)=0A= -{=0A= - return false;=0A= -}=0A= -=0A= -static inline bool secretmem_mapping(struct address_space *mapping)=0A= -{=0A= - return false;=0A= -}=0A= -=0A= static inline bool secretmem_active(void)=0A= {=0A= return false;=0A= diff --git a/lib/buildid.c b/lib/buildid.c=0A= index c4b0f376fb34..89e567954284 100644=0A= --- a/lib/buildid.c=0A= +++ b/lib/buildid.c=0A= @@ -65,8 +65,8 @@ static int freader_get_folio(struct freader *r, loff_t fi= le_off)=0A= =0A= freader_put_folio(r);=0A= =0A= - /* reject secretmem folios created with memfd_secret() */=0A= - if (secretmem_mapping(r->file->f_mapping))=0A= + /* reject folios without direct map entries (e.g. from memfd_secret() or = guest_memfd()) */=0A= + if (mapping_no_direct_map(r->file->f_mapping))=0A= return -EFAULT;=0A= =0A= r->folio =3D filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT= );=0A= diff --git a/mm/gup.c b/mm/gup.c=0A= index adffe663594d..75a0cffdf37d 100644=0A= --- a/mm/gup.c=0A= +++ b/mm/gup.c=0A= @@ -11,7 +11,6 @@=0A= #include =0A= #include =0A= #include =0A= -#include =0A= =0A= #include =0A= #include =0A= @@ -1234,7 +1233,7 @@ static int check_vma_flags(struct vm_area_struct *vma= , unsigned long gup_flags)=0A= if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))=0A= return -EOPNOTSUPP;=0A= =0A= - if (vma_is_secretmem(vma))=0A= + if (vma_has_no_direct_map(vma))=0A= return -EFAULT;=0A= =0A= if (write) {=0A= @@ -2736,7 +2735,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked);=0A= * This call assumes the caller has pinned the folio, that the lowest page= table=0A= * level still points to this folio, and that interrupts have been disable= d.=0A= *=0A= - * GUP-fast must reject all secretmem folios.=0A= + * GUP-fast must reject all folios without direct map entries (such as sec= retmem).=0A= *=0A= * Writing to pinned file-backed dirty tracked folios is inherently proble= matic=0A= * (see comment describing the writable_file_mapping_allowed() function). = We=0A= @@ -2751,7 +2750,6 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags)=0A= {=0A= bool reject_file_backed =3D false;=0A= struct address_space *mapping;=0A= - bool check_secretmem =3D false;=0A= unsigned long mapping_flags;=0A= =0A= /*=0A= @@ -2763,18 +2761,10 @@ static bool gup_fast_folio_allowed(struct folio *fo= lio, unsigned int flags)=0A= reject_file_backed =3D true;=0A= =0A= /* We hold a folio reference, so we can safely access folio fields. */=0A= -=0A= - /* secretmem folios are always order-0 folios. */=0A= - if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))=0A= - check_secretmem =3D true;=0A= -=0A= - if (!reject_file_backed && !check_secretmem)=0A= - return true;=0A= -=0A= if (WARN_ON_ONCE(folio_test_slab(folio)))=0A= return false;=0A= =0A= - /* hugetlb neither requires dirty-tracking nor can be secretmem. */=0A= + /* hugetlb neither requires dirty-tracking nor can be without direct map.= */=0A= if (folio_test_hugetlb(folio))=0A= return true;=0A= =0A= @@ -2812,8 +2802,9 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags)=0A= * At this point, we know the mapping is non-null and points to an=0A= * address_space object.=0A= */=0A= - if (check_secretmem && secretmem_mapping(mapping))=0A= + if (mapping_no_direct_map(mapping))=0A= return false;=0A= +=0A= /* The only remaining allowed file system is shmem. */=0A= return !reject_file_backed || shmem_mapping(mapping);=0A= }=0A= diff --git a/mm/mlock.c b/mm/mlock.c=0A= index a1d93ad33c6d..36f5e70faeb0 100644=0A= --- a/mm/mlock.c=0A= +++ b/mm/mlock.c=0A= @@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct= vm_area_struct *vma,=0A= =0A= if (newflags =3D=3D oldflags || (oldflags & VM_SPECIAL) ||=0A= is_vm_hugetlb_page(vma) || vma =3D=3D get_gate_vma(current->mm) ||=0A= - vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE)= )=0A= + vma_is_dax(vma) || vma_has_no_direct_map(vma) || (oldflags & VM_DROPP= ABLE))=0A= /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */=0A= goto out;=0A= =0A= diff --git a/mm/secretmem.c b/mm/secretmem.c=0A= index 422dcaa32506..b5ce55079695 100644=0A= --- a/mm/secretmem.c=0A= +++ b/mm/secretmem.c=0A= @@ -134,11 +134,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc = *desc)=0A= return 0;=0A= }=0A= =0A= -bool vma_is_secretmem(struct vm_area_struct *vma)=0A= -{=0A= - return vma->vm_ops =3D=3D &secretmem_vm_ops;=0A= -}=0A= -=0A= static const struct file_operations secretmem_fops =3D {=0A= .release =3D secretmem_release,=0A= .mmap_prepare =3D secretmem_mmap_prepare,=0A= @@ -157,7 +152,7 @@ static void secretmem_free_folio(struct address_space *= mapping,=0A= folio_zero_segment(folio, 0, folio_size(folio));=0A= }=0A= =0A= -const struct address_space_operations secretmem_aops =3D {=0A= +static const struct address_space_operations secretmem_aops =3D {=0A= .dirty_folio =3D noop_dirty_folio,=0A= .free_folio =3D secretmem_free_folio,=0A= .migrate_folio =3D secretmem_migrate_folio,=0A= @@ -206,6 +201,7 @@ static struct file *secretmem_file_create(unsigned long= flags)=0A= =0A= mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);=0A= mapping_set_unevictable(inode->i_mapping);=0A= + mapping_set_no_direct_map(inode->i_mapping);=0A= =0A= inode->i_op =3D &secretmem_iops;=0A= inode->i_mapping->a_ops =3D &secretmem_aops;=0A= -- =0A= 2.50.1=0A= =0A=