From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com [18.197.217.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FC4C3451A6; Fri, 5 Dec 2025 16:58:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.197.217.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764953931; cv=none; b=OXnkXpdH73vueTUJ2McH4cFxHkuI3r5CSPxmfVj9/unCYcvt6jxW9XTbt8HdOGOZE/b7Xb2GT9VnreoDZTfTG1Dg36Jreh2GEiCT/lPdUk+a7HknDr+Qc3/SdN6LZajRci2oLITIMWWyoBgkhvTb8f4Mw7S/ae8XxOtWro9LnXA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764953931; c=relaxed/simple; bh=4bavK0iph9koAlnHX2Wy2xe770aVguNdisa0TR0XACY=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=vD0UOK+i4XGbAVmxTIBl6HpH20WVVi0xke0uBqxszjz/DngBRa2AiTk6/29e1vJAQqPXQqjHxFkc9FHx7nZJO7HMnSoIgkGh53IYxbJgOHTf0Aa583dWUM7pRnUk51l5D2wuswBnWRM5sArEbvpOkRKDQDyMEZ+1SbkZ3UMO90E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=M9UhwKuZ; arc=none smtp.client-ip=18.197.217.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="M9UhwKuZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazoncorp2; t=1764953924; x=1796489924; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=XlbB/AHh0YQTX9rpyDTJlTKE99RJ4c4rYvfLyG5A7fY=; b=M9UhwKuZQrxbLRkABkd56gJKDSO7Vy0jFO/ty6wPW8SLUuAYFAZxykRX Cfh+FArrC13u4fOL4ZbFmgqilYNwF+d/q5j6FsATyl6lqCxn0LzPBC9sI 3gJA3C+/6QykNX43OUUgtxkWWIvdOjdqWLue6y8w/SK+8HYtCE7K7pmR5 SDEVbMBYpfkG2n7m4bmAIZVcqsUFiM/IWMhnf4CCECw4SW92//Js15Xoy HAaBrW+JHQ8dVMKjzc1h0eOsBc/0GQ63T1N+wKiHX0YhTBbT/E5COR+uk DN7BIz7LU+CN7fZfEL9D79yDcriL4bUv2iX/09x1cIIZlUqftCN9FKFB/ g==; X-CSE-ConnectionGUID: tZu3dFNkQvqA1ZyZ7qwk7w== X-CSE-MsgGUID: cc5HHGcuQSOXn+liRVfvJw== X-IronPort-AV: E=Sophos;i="6.20,252,1758585600"; d="scan'208";a="6301792" Received: from ip-10-6-11-83.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.11.83]) by internal-fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2025 16:58:21 +0000 Received: from EX19MTAEUA001.ant.amazon.com [54.240.197.233:7206] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.0.231:2525] with esmtp (Farcaster) id b66dbefc-1774-4ce5-8285-0803812e64ba; Fri, 5 Dec 2025 16:58:20 +0000 (UTC) X-Farcaster-Flow-ID: b66dbefc-1774-4ce5-8285-0803812e64ba Received: from EX19D005EUB001.ant.amazon.com (10.252.51.12) by EX19MTAEUA001.ant.amazon.com (10.252.50.223) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Fri, 5 Dec 2025 16:58:20 +0000 Received: from EX19D005EUB003.ant.amazon.com (10.252.51.31) by EX19D005EUB001.ant.amazon.com (10.252.51.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Fri, 5 Dec 2025 16:58:20 +0000 Received: from EX19D005EUB003.ant.amazon.com ([fe80::b825:becb:4b38:da0c]) by EX19D005EUB003.ant.amazon.com ([fe80::b825:becb:4b38:da0c%3]) with mapi id 15.02.2562.029; Fri, 5 Dec 2025 16:58:20 +0000 From: "Kalyazin, Nikita" To: "kvm@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "kvmarm@lists.linux.dev" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" CC: "pbonzini@redhat.com" , "corbet@lwn.net" , "maz@kernel.org" , "oupton@kernel.org" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "seanjc@google.com" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "luto@kernel.org" , "peterz@infradead.org" , "willy@infradead.org" , "akpm@linux-foundation.org" , "david@kernel.org" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "song@kernel.org" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jolsa@kernel.org" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "shuah@kernel.org" , "riel@surriel.com" , "baohua@kernel.org" , "ryan.roberts@arm.com" , "jgross@suse.com" , "yu-cheng.yu@intel.com" , "kas@kernel.org" , "coxu@redhat.com" , "kevin.brodsky@arm.com" , "ackerleytng@google.com" , "maobibo@loongson.cn" , "prsampat@amd.com" , "mlevitsk@redhat.com" , "isaku.yamahata@intel.com" , "jmattson@google.com" , "jthoughton@google.com" , "linux-arm-kernel@lists.infradead.org" , "vannapurve@google.com" , "jackmanb@google.com" , "aneesh.kumar@kernel.org" , "patrick.roy@linux.dev" , "Thomson, Jack" , "Itazuri, Takahiro" , "Manwaring, Derek" , "Cali, Marco" , "Kalyazin, Nikita" Subject: [PATCH v8 03/13] mm: introduce AS_NO_DIRECT_MAP Thread-Topic: [PATCH v8 03/13] mm: introduce AS_NO_DIRECT_MAP Thread-Index: AQHcZghb1bjlwJbbGEq/dQeHI0u86A== Date: Fri, 5 Dec 2025 16:58:19 +0000 Message-ID: <20251205165743.9341-4-kalyazin@amazon.com> References: <20251205165743.9341-1-kalyazin@amazon.com> In-Reply-To: <20251205165743.9341-1-kalyazin@amazon.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrick Roy =0A= =0A= Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are=0A= set to not present. Currently, mappings that match this description are=0A= secretmem mappings (memfd_secret()). Later, some guest_memfd=0A= configurations will also fall into this category.=0A= =0A= Reject this new type of mappings in all locations that currently reject=0A= secretmem mappings, on the assumption that if secretmem mappings are=0A= rejected somewhere, it is precisely because of an inability to deal with=0A= folios without direct map entries, and then make memfd_secret() use=0A= AS_NO_DIRECT_MAP on its address_space to drop its special=0A= vma_is_secretmem()/secretmem_mapping() checks.=0A= =0A= This drops a optimization in gup_fast_folio_allowed() where=0A= secretmem_mapping() was only called if CONFIG_SECRETMEM=3Dy. secretmem is= =0A= enabled by default since commit b758fe6df50d ("mm/secretmem: make it on=0A= by default"), so the secretmem check did not actually end up elided in=0A= most cases anymore anyway.=0A= =0A= Use a new flag instead of overloading AS_INACCESSIBLE (which is already=0A= set by guest_memfd) because not all guest_memfd mappings will end up=0A= being direct map removed (e.g. in pKVM setups, parts of guest_memfd that=0A= can be mapped to userspace should also be GUP-able, and generally not=0A= have restrictions on who can access it).=0A= =0A= Acked-by: Mike Rapoport (Microsoft) =0A= Acked-by: David Hildenbrand (Red Hat)" =0A= Signed-off-by: Patrick Roy =0A= Signed-off-by: Nikita Kalyazin =0A= ---=0A= include/linux/pagemap.h | 16 ++++++++++++++++=0A= include/linux/secretmem.h | 18 ------------------=0A= lib/buildid.c | 4 ++--=0A= mm/gup.c | 19 +++++--------------=0A= mm/mlock.c | 2 +-=0A= mm/secretmem.c | 8 ++------=0A= 6 files changed, 26 insertions(+), 41 deletions(-)=0A= =0A= diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h=0A= index a17fabbc0269..d51e0c0404e2 100644=0A= --- a/include/linux/pagemap.h=0A= +++ b/include/linux/pagemap.h=0A= @@ -213,6 +213,7 @@ enum mapping_flags {=0A= AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM =3D 9,=0A= AS_KERNEL_FILE =3D 10, /* mapping for a fake kernel file that shouldn't= =0A= account usage to user cgroups */=0A= + AS_NO_DIRECT_MAP =3D 11, /* Folios in the mapping are not in the direct m= ap */=0A= /* Bits 16-25 are used for FOLIO_ORDER */=0A= AS_FOLIO_ORDER_BITS =3D 5,=0A= AS_FOLIO_ORDER_MIN =3D 16,=0A= @@ -348,6 +349,21 @@ static inline bool mapping_writeback_may_deadlock_on_r= eclaim(const struct addres=0A= return test_bit(AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM, &mapping->flags);= =0A= }=0A= =0A= +static inline void mapping_set_no_direct_map(struct address_space *mapping= )=0A= +{=0A= + set_bit(AS_NO_DIRECT_MAP, &mapping->flags);=0A= +}=0A= +=0A= +static inline bool mapping_no_direct_map(const struct address_space *mappi= ng)=0A= +{=0A= + return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);=0A= +}=0A= +=0A= +static inline bool vma_has_no_direct_map(const struct vm_area_struct *vma)= =0A= +{=0A= + return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);=0A= +}=0A= +=0A= static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)= =0A= {=0A= return mapping->gfp_mask;=0A= diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h=0A= index e918f96881f5..0ae1fb057b3d 100644=0A= --- a/include/linux/secretmem.h=0A= +++ b/include/linux/secretmem.h=0A= @@ -4,28 +4,10 @@=0A= =0A= #ifdef CONFIG_SECRETMEM=0A= =0A= -extern const struct address_space_operations secretmem_aops;=0A= -=0A= -static inline bool secretmem_mapping(struct address_space *mapping)=0A= -{=0A= - return mapping->a_ops =3D=3D &secretmem_aops;=0A= -}=0A= -=0A= -bool vma_is_secretmem(struct vm_area_struct *vma);=0A= bool secretmem_active(void);=0A= =0A= #else=0A= =0A= -static inline bool vma_is_secretmem(struct vm_area_struct *vma)=0A= -{=0A= - return false;=0A= -}=0A= -=0A= -static inline bool secretmem_mapping(struct address_space *mapping)=0A= -{=0A= - return false;=0A= -}=0A= -=0A= static inline bool secretmem_active(void)=0A= {=0A= return false;=0A= diff --git a/lib/buildid.c b/lib/buildid.c=0A= index c4b0f376fb34..89e567954284 100644=0A= --- a/lib/buildid.c=0A= +++ b/lib/buildid.c=0A= @@ -65,8 +65,8 @@ static int freader_get_folio(struct freader *r, loff_t fi= le_off)=0A= =0A= freader_put_folio(r);=0A= =0A= - /* reject secretmem folios created with memfd_secret() */=0A= - if (secretmem_mapping(r->file->f_mapping))=0A= + /* reject folios without direct map entries (e.g. from memfd_secret() or = guest_memfd()) */=0A= + if (mapping_no_direct_map(r->file->f_mapping))=0A= return -EFAULT;=0A= =0A= r->folio =3D filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT= );=0A= diff --git a/mm/gup.c b/mm/gup.c=0A= index a8ba5112e4d0..719e4dbecc09 100644=0A= --- a/mm/gup.c=0A= +++ b/mm/gup.c=0A= @@ -11,7 +11,6 @@=0A= #include =0A= #include =0A= #include =0A= -#include =0A= =0A= #include =0A= #include =0A= @@ -1216,7 +1215,7 @@ static int check_vma_flags(struct vm_area_struct *vma= , unsigned long gup_flags)=0A= if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))=0A= return -EOPNOTSUPP;=0A= =0A= - if (vma_is_secretmem(vma))=0A= + if (vma_has_no_direct_map(vma))=0A= return -EFAULT;=0A= =0A= if (write) {=0A= @@ -2724,7 +2723,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked);=0A= * This call assumes the caller has pinned the folio, that the lowest page= table=0A= * level still points to this folio, and that interrupts have been disable= d.=0A= *=0A= - * GUP-fast must reject all secretmem folios.=0A= + * GUP-fast must reject all folios without direct map entries (such as sec= retmem).=0A= *=0A= * Writing to pinned file-backed dirty tracked folios is inherently proble= matic=0A= * (see comment describing the writable_file_mapping_allowed() function). = We=0A= @@ -2739,7 +2738,6 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags)=0A= {=0A= bool reject_file_backed =3D false;=0A= struct address_space *mapping;=0A= - bool check_secretmem =3D false;=0A= unsigned long mapping_flags;=0A= =0A= /*=0A= @@ -2751,18 +2749,10 @@ static bool gup_fast_folio_allowed(struct folio *fo= lio, unsigned int flags)=0A= reject_file_backed =3D true;=0A= =0A= /* We hold a folio reference, so we can safely access folio fields. */=0A= -=0A= - /* secretmem folios are always order-0 folios. */=0A= - if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))=0A= - check_secretmem =3D true;=0A= -=0A= - if (!reject_file_backed && !check_secretmem)=0A= - return true;=0A= -=0A= if (WARN_ON_ONCE(folio_test_slab(folio)))=0A= return false;=0A= =0A= - /* hugetlb neither requires dirty-tracking nor can be secretmem. */=0A= + /* hugetlb neither requires dirty-tracking nor can be without direct map.= */=0A= if (folio_test_hugetlb(folio))=0A= return true;=0A= =0A= @@ -2800,8 +2790,9 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags)=0A= * At this point, we know the mapping is non-null and points to an=0A= * address_space object.=0A= */=0A= - if (check_secretmem && secretmem_mapping(mapping))=0A= + if (mapping_no_direct_map(mapping))=0A= return false;=0A= +=0A= /* The only remaining allowed file system is shmem. */=0A= return !reject_file_backed || shmem_mapping(mapping);=0A= }=0A= diff --git a/mm/mlock.c b/mm/mlock.c=0A= index bb0776f5ef7c..506452ee8cad 100644=0A= --- a/mm/mlock.c=0A= +++ b/mm/mlock.c=0A= @@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct= vm_area_struct *vma,=0A= =0A= if (newflags =3D=3D oldflags || (oldflags & VM_SPECIAL) ||=0A= is_vm_hugetlb_page(vma) || vma =3D=3D get_gate_vma(current->mm) ||=0A= - vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE)= )=0A= + vma_is_dax(vma) || vma_has_no_direct_map(vma) || (oldflags & VM_DROPP= ABLE))=0A= /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */=0A= goto out;=0A= =0A= diff --git a/mm/secretmem.c b/mm/secretmem.c=0A= index b59350daffe3..c1eafe29c3e8 100644=0A= --- a/mm/secretmem.c=0A= +++ b/mm/secretmem.c=0A= @@ -134,11 +134,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc = *desc)=0A= return 0;=0A= }=0A= =0A= -bool vma_is_secretmem(struct vm_area_struct *vma)=0A= -{=0A= - return vma->vm_ops =3D=3D &secretmem_vm_ops;=0A= -}=0A= -=0A= static const struct file_operations secretmem_fops =3D {=0A= .release =3D secretmem_release,=0A= .mmap_prepare =3D secretmem_mmap_prepare,=0A= @@ -156,7 +151,7 @@ static void secretmem_free_folio(struct folio *folio)= =0A= folio_zero_segment(folio, 0, folio_size(folio));=0A= }=0A= =0A= -const struct address_space_operations secretmem_aops =3D {=0A= +static const struct address_space_operations secretmem_aops =3D {=0A= .dirty_folio =3D noop_dirty_folio,=0A= .free_folio =3D secretmem_free_folio,=0A= .migrate_folio =3D secretmem_migrate_folio,=0A= @@ -205,6 +200,7 @@ static struct file *secretmem_file_create(unsigned long= flags)=0A= =0A= mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);=0A= mapping_set_unevictable(inode->i_mapping);=0A= + mapping_set_no_direct_map(inode->i_mapping);=0A= =0A= inode->i_op =3D &secretmem_iops;=0A= inode->i_mapping->a_ops =3D &secretmem_aops;=0A= -- =0A= 2.50.1=0A= =0A=