From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from iad-out-004.esa.us-east-1.outbound.mail-perimeter.amazon.com (iad-out-004.esa.us-east-1.outbound.mail-perimeter.amazon.com [18.207.52.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C013F3DD50E; Fri, 10 Apr 2026 15:18:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.207.52.234 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775834336; cv=none; b=uDTB73FATTQ/nvQCwZisWPYHYitEs4UYzpvAnm28SqtCK4op+0UJf26iOcd0EOwQjzB+9JK6Pm+nxGWf/h7MBRqyNurlitx9zBfjeaOuCmguzsCxjU5478hHu5R5pNH9SqrOGdKpAaEBdHkiPLJ7Th20jz1iX92PiTqvbUePzk8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775834336; c=relaxed/simple; bh=mgtQXp5f5L8+/ktN6U1wKtJnXXeukg/g5Sa1bjlZMo0=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=MnEuJ4Kqm7heBSi4urnAi5/Ke/5heWVlyHPY6ATPEDgSN38V2q6Y9k11ZZgFBhta48AZ50eTxQ4MG7co1A+xvRg25QXjogkmJrRpvv+63/cwaxUjYAGqUQN+ErfbJEmSZORjz56G+3CBT5ah9hrSJH/ZVmJlYKf8t1t79hx6CF0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=No1zT8hR; arc=none smtp.client-ip=18.207.52.234 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="No1zT8hR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazoncorp2; t=1775834335; x=1807370335; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=Vlrq9+scq5tX8BT0/42NQPY8nqOPb+ph46OhWZ+yV6Q=; b=No1zT8hRyhsab2Ftygkpi0D+zfRMW8iWix5lwdv97if6S1k3AKiTvs+L Ej4EUCt11K17P5DpIhehhb0Ycz8zJRKvtSYOMnuT9K7CCQY3RRkQSeD/q hz5UZMbfWDyoWFd27DCkVUGV86H2WOFZzVQUQsh5Kiy92HPSeGHsCLu68 HCE1z6QVMybyXgrg4Sx+n7yx+qkLaKuzdh+5mTjIg+/hhE/6iDsRYFKJl rftTrJJ79ZauOkJL+x6DvuQF7igOsF3qAhQrp3SKs9i6QgYf4ONhJYwPV 8VmDFXGVvhjFHlCyyDk3EnVUopwaoi4Uh+YrNOjdfajeD7lvQKjk0aeeD Q==; X-CSE-ConnectionGUID: Omtz2rITRA2rAwYZ2NuU0Q== X-CSE-MsgGUID: EsqiVVUbSU6NBGiW7eGB2Q== X-IronPort-AV: E=Sophos;i="6.23,171,1770595200"; d="scan'208";a="15970650" Received: from ip-10-4-10-75.ec2.internal (HELO smtpout.naws.us-east-1.prod.farcaster.email.amazon.dev) ([10.4.10.75]) by internal-iad-out-004.esa.us-east-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Apr 2026 15:18:53 +0000 Received: from EX19MTAUEA001.ant.amazon.com [72.21.196.67:15977] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.32.14:2525] with esmtp (Farcaster) id bb09835d-2a64-45b6-8ade-fc65d90d3c16; Fri, 10 Apr 2026 15:18:53 +0000 (UTC) X-Farcaster-Flow-ID: bb09835d-2a64-45b6-8ade-fc65d90d3c16 Received: from EX19D027UEC001.ant.amazon.com (10.252.137.156) by EX19MTAUEA001.ant.amazon.com (10.252.134.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.37; Fri, 10 Apr 2026 15:18:53 +0000 Received: from EX19D027UEC003.ant.amazon.com (10.252.137.250) by EX19D027UEC001.ant.amazon.com (10.252.137.156) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.37; Fri, 10 Apr 2026 15:18:52 +0000 Received: from EX19D027UEC003.ant.amazon.com ([fe80::887f:519b:ba73:21d]) by EX19D027UEC003.ant.amazon.com ([fe80::887f:519b:ba73:21d%3]) with mapi id 15.02.2562.037; Fri, 10 Apr 2026 15:18:52 +0000 From: "Kalyazin, Nikita" To: "kvm@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "kvmarm@lists.linux.dev" , "linux-fsdevel@vger.kernel.org" , "linux-mm@kvack.org" , "bpf@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , "kernel@xen0n.name" , "linux-riscv@lists.infradead.org" , "linux-s390@vger.kernel.org" , "loongarch@lists.linux.dev" , "linux-pm@vger.kernel.org" CC: "pbonzini@redhat.com" , "corbet@lwn.net" , "maz@kernel.org" , "oupton@kernel.org" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "seanjc@google.com" , "tglx@kernel.org" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "luto@kernel.org" , "peterz@infradead.org" , "willy@infradead.org" , "akpm@linux-foundation.org" , "david@kernel.org" , "lorenzo.stoakes@oracle.com" , "vbabka@kernel.org" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "ast@kernel.org" , "daniel@iogearbox.net" , "andrii@kernel.org" , "martin.lau@linux.dev" , "eddyz87@gmail.com" , "song@kernel.org" , "yonghong.song@linux.dev" , "john.fastabend@gmail.com" , "kpsingh@kernel.org" , "sdf@fomichev.me" , "haoluo@google.com" , "jolsa@kernel.org" , "jgg@ziepe.ca" , "jhubbard@nvidia.com" , "peterx@redhat.com" , "jannh@google.com" , "pfalcato@suse.de" , "skhan@linuxfoundation.org" , "riel@surriel.com" , "ryan.roberts@arm.com" , "jgross@suse.com" , "yu-cheng.yu@intel.com" , "kas@kernel.org" , "coxu@redhat.com" , "ackerleytng@google.com" , "yosry@kernel.org" , "ajones@ventanamicro.com" , "maobibo@loongson.cn" , "tabba@google.com" , "prsampat@amd.com" , "wu.fei9@sanechips.com.cn" , "mlevitsk@redhat.com" , "jmattson@google.com" , "jthoughton@google.com" , "agordeev@linux.ibm.com" , "alex@ghiti.fr" , "aou@eecs.berkeley.edu" , "borntraeger@linux.ibm.com" , "chenhuacai@kernel.org" , "baolu.lu@linux.intel.com" , "dev.jain@arm.com" , "gor@linux.ibm.com" , "hca@linux.ibm.com" , "palmer@dabbelt.com" , "pjw@kernel.org" , "shijie@os.amperecomputing.com" , "svens@linux.ibm.com" , "thuth@redhat.com" , "yang@os.amperecomputing.com" , "Liam.Howlett@oracle.com" , "urezki@gmail.com" , "zhengqi.arch@bytedance.com" , "gerald.schaefer@linux.ibm.com" , "jiayuan.chen@shopee.com" , "lenb@kernel.org" , "pavel@kernel.org" , "rafael@kernel.org" , "yangyicong@hisilicon.com" , "vannapurve@google.com" , "jackmanb@google.com" , "patrick.roy@linux.dev" , "Thomson, Jack" , "Itazuri, Takahiro" , "Manwaring, Derek" , "Kalyazin, Nikita" , Vlastimil Babka Subject: [PATCH v12 06/16] mm: introduce AS_NO_DIRECT_MAP Thread-Topic: [PATCH v12 06/16] mm: introduce AS_NO_DIRECT_MAP Thread-Index: AQHcyP1WbHr5tPTWgUuW3xsaOzDaaQ== Date: Fri, 10 Apr 2026 15:18:52 +0000 Message-ID: <20260410151746.61150-7-kalyazin@amazon.com> References: <20260410151746.61150-1-kalyazin@amazon.com> In-Reply-To: <20260410151746.61150-1-kalyazin@amazon.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-doc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrick Roy =0A= =0A= Add AS_NO_DIRECT_MAP for mappings where direct map entries of folios are=0A= set to not present. Currently, mappings that match this description are=0A= secretmem mappings (memfd_secret()). Later, some guest_memfd=0A= configurations will also fall into this category.=0A= =0A= Reject this new type of mappings in all locations that currently reject=0A= secretmem mappings, on the assumption that if secretmem mappings are=0A= rejected somewhere, it is precisely because of an inability to deal with=0A= folios without direct map entries, and then make memfd_secret() use=0A= AS_NO_DIRECT_MAP on its address_space to drop its special=0A= vma_is_secretmem()/secretmem_mapping() checks.=0A= =0A= Use a new flag instead of overloading AS_INACCESSIBLE (which is already=0A= set by guest_memfd) because not all guest_memfd mappings will end up=0A= being direct map removed (e.g. in pKVM setups, parts of guest_memfd that=0A= can be mapped to userspace should also be GUP-able, and generally not=0A= have restrictions on who can access it).=0A= =0A= Acked-by: Mike Rapoport (Microsoft) =0A= Acked-by: David Hildenbrand (Red Hat) =0A= Signed-off-by: Patrick Roy =0A= Acked-by: Vlastimil Babka =0A= Reviewed-by: Ackerley Tng =0A= Signed-off-by: Nikita Kalyazin =0A= ---=0A= include/linux/pagemap.h | 16 ++++++++++++++++=0A= include/linux/secretmem.h | 18 ------------------=0A= lib/buildid.c | 8 ++++++--=0A= mm/gup.c | 9 ++++-----=0A= mm/mlock.c | 2 +-=0A= mm/secretmem.c | 8 ++------=0A= 6 files changed, 29 insertions(+), 32 deletions(-)=0A= =0A= diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h=0A= index ec442af3f886..68c075502d91 100644=0A= --- a/include/linux/pagemap.h=0A= +++ b/include/linux/pagemap.h=0A= @@ -211,6 +211,7 @@ enum mapping_flags {=0A= AS_KERNEL_FILE =3D 10, /* mapping for a fake kernel file that shouldn't= =0A= account usage to user cgroups */=0A= AS_NO_DATA_INTEGRITY =3D 11, /* no data integrity guarantees */=0A= + AS_NO_DIRECT_MAP =3D 12, /* Folios in the mapping are not in the direct m= ap */=0A= /* Bits 16-25 are used for FOLIO_ORDER */=0A= AS_FOLIO_ORDER_BITS =3D 5,=0A= AS_FOLIO_ORDER_MIN =3D 16,=0A= @@ -356,6 +357,21 @@ static inline bool mapping_no_data_integrity(const str= uct address_space *mapping=0A= return test_bit(AS_NO_DATA_INTEGRITY, &mapping->flags);=0A= }=0A= =0A= +static inline void mapping_set_no_direct_map(struct address_space *mapping= )=0A= +{=0A= + set_bit(AS_NO_DIRECT_MAP, &mapping->flags);=0A= +}=0A= +=0A= +static inline bool mapping_no_direct_map(const struct address_space *mappi= ng)=0A= +{=0A= + return test_bit(AS_NO_DIRECT_MAP, &mapping->flags);=0A= +}=0A= +=0A= +static inline bool vma_has_no_direct_map(const struct vm_area_struct *vma)= =0A= +{=0A= + return vma->vm_file && mapping_no_direct_map(vma->vm_file->f_mapping);=0A= +}=0A= +=0A= static inline gfp_t mapping_gfp_mask(const struct address_space *mapping)= =0A= {=0A= return mapping->gfp_mask;=0A= diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h=0A= index e918f96881f5..0ae1fb057b3d 100644=0A= --- a/include/linux/secretmem.h=0A= +++ b/include/linux/secretmem.h=0A= @@ -4,28 +4,10 @@=0A= =0A= #ifdef CONFIG_SECRETMEM=0A= =0A= -extern const struct address_space_operations secretmem_aops;=0A= -=0A= -static inline bool secretmem_mapping(struct address_space *mapping)=0A= -{=0A= - return mapping->a_ops =3D=3D &secretmem_aops;=0A= -}=0A= -=0A= -bool vma_is_secretmem(struct vm_area_struct *vma);=0A= bool secretmem_active(void);=0A= =0A= #else=0A= =0A= -static inline bool vma_is_secretmem(struct vm_area_struct *vma)=0A= -{=0A= - return false;=0A= -}=0A= -=0A= -static inline bool secretmem_mapping(struct address_space *mapping)=0A= -{=0A= - return false;=0A= -}=0A= -=0A= static inline bool secretmem_active(void)=0A= {=0A= return false;=0A= diff --git a/lib/buildid.c b/lib/buildid.c=0A= index c4b737640621..ba79bf28f7e6 100644=0A= --- a/lib/buildid.c=0A= +++ b/lib/buildid.c=0A= @@ -47,6 +47,10 @@ static int freader_get_folio(struct freader *r, loff_t f= ile_off)=0A= =0A= freader_put_folio(r);=0A= =0A= + /* reject folios without direct map entries (e.g. from memfd_secret() or = guest_memfd()) */=0A= + if (mapping_no_direct_map(r->file->f_mapping))=0A= + return -EFAULT;=0A= +=0A= /* only use page cache lookup - fail if not already cached */=0A= r->folio =3D filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT= );=0A= =0A= @@ -87,8 +91,8 @@ const void *freader_fetch(struct freader *r, loff_t file_= off, size_t sz)=0A= return r->data + file_off;=0A= }=0A= =0A= - /* reject secretmem folios created with memfd_secret() */=0A= - if (secretmem_mapping(r->file->f_mapping)) {=0A= + /* reject folios without direct map entries (e.g. from memfd_secret() or = guest_memfd()) */=0A= + if (mapping_no_direct_map(r->file->f_mapping)) {=0A= r->err =3D -EFAULT;=0A= return NULL;=0A= }=0A= diff --git a/mm/gup.c b/mm/gup.c=0A= index 41eb64783e03..c1b4fb1eaee7 100644=0A= --- a/mm/gup.c=0A= +++ b/mm/gup.c=0A= @@ -11,7 +11,6 @@=0A= #include =0A= #include =0A= #include =0A= -#include =0A= =0A= #include =0A= #include =0A= @@ -1216,7 +1215,7 @@ static int check_vma_flags(struct vm_area_struct *vma= , unsigned long gup_flags)=0A= if ((gup_flags & FOLL_SPLIT_PMD) && is_vm_hugetlb_page(vma))=0A= return -EOPNOTSUPP;=0A= =0A= - if (vma_is_secretmem(vma))=0A= + if (vma_has_no_direct_map(vma))=0A= return -EFAULT;=0A= =0A= if (write) {=0A= @@ -2724,7 +2723,7 @@ EXPORT_SYMBOL(get_user_pages_unlocked);=0A= * This call assumes the caller has pinned the folio, that the lowest page= table=0A= * level still points to this folio, and that interrupts have been disable= d.=0A= *=0A= - * GUP-fast must reject all secretmem folios.=0A= + * GUP-fast must reject all folios without direct map entries (such as sec= retmem).=0A= *=0A= * Writing to pinned file-backed dirty tracked folios is inherently proble= matic=0A= * (see comment describing the writable_file_mapping_allowed() function). = We=0A= @@ -2744,7 +2743,7 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags)=0A= if (WARN_ON_ONCE(folio_test_slab(folio)))=0A= return false;=0A= =0A= - /* hugetlb neither requires dirty-tracking nor can be secretmem. */=0A= + /* hugetlb neither requires dirty-tracking nor can be without direct map.= */=0A= if (folio_test_hugetlb(folio))=0A= return true;=0A= =0A= @@ -2786,7 +2785,7 @@ static bool gup_fast_folio_allowed(struct folio *foli= o, unsigned int flags)=0A= * At this point, we know the mapping is non-null and points to an=0A= * address_space object.=0A= */=0A= - if (secretmem_mapping(mapping))=0A= + if (mapping_no_direct_map(mapping))=0A= return false;=0A= =0A= /*=0A= diff --git a/mm/mlock.c b/mm/mlock.c=0A= index 2f699c3497a5..a6f4b3df4f3f 100644=0A= --- a/mm/mlock.c=0A= +++ b/mm/mlock.c=0A= @@ -474,7 +474,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct= vm_area_struct *vma,=0A= =0A= if (newflags =3D=3D oldflags || (oldflags & VM_SPECIAL) ||=0A= is_vm_hugetlb_page(vma) || vma =3D=3D get_gate_vma(current->mm) ||=0A= - vma_is_dax(vma) || vma_is_secretmem(vma) || (oldflags & VM_DROPPABLE)= )=0A= + vma_is_dax(vma) || vma_has_no_direct_map(vma) || (oldflags & VM_DROPP= ABLE))=0A= /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */=0A= goto out;=0A= =0A= diff --git a/mm/secretmem.c b/mm/secretmem.c=0A= index 27b176af8fc4..d32e1be1eb35 100644=0A= --- a/mm/secretmem.c=0A= +++ b/mm/secretmem.c=0A= @@ -129,11 +129,6 @@ static int secretmem_mmap_prepare(struct vm_area_desc = *desc)=0A= return 0;=0A= }=0A= =0A= -bool vma_is_secretmem(struct vm_area_struct *vma)=0A= -{=0A= - return vma->vm_ops =3D=3D &secretmem_vm_ops;=0A= -}=0A= -=0A= static const struct file_operations secretmem_fops =3D {=0A= .release =3D secretmem_release,=0A= .mmap_prepare =3D secretmem_mmap_prepare,=0A= @@ -151,7 +146,7 @@ static void secretmem_free_folio(struct folio *folio)= =0A= folio_zero_segment(folio, 0, folio_size(folio));=0A= }=0A= =0A= -const struct address_space_operations secretmem_aops =3D {=0A= +static const struct address_space_operations secretmem_aops =3D {=0A= .dirty_folio =3D noop_dirty_folio,=0A= .free_folio =3D secretmem_free_folio,=0A= .migrate_folio =3D secretmem_migrate_folio,=0A= @@ -200,6 +195,7 @@ static struct file *secretmem_file_create(unsigned long= flags)=0A= =0A= mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);=0A= mapping_set_unevictable(inode->i_mapping);=0A= + mapping_set_no_direct_map(inode->i_mapping);=0A= =0A= inode->i_op =3D &secretmem_iops;=0A= inode->i_mapping->a_ops =3D &secretmem_aops;=0A= -- =0A= 2.50.1=0A= =0A=