From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10BDE108B906 for ; Fri, 20 Mar 2026 11:57:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 678156B0005; Fri, 20 Mar 2026 07:57:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 627BD6B0088; Fri, 20 Mar 2026 07:57:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 516686B0089; Fri, 20 Mar 2026 07:57:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3E27B6B0005 for ; Fri, 20 Mar 2026 07:57:14 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CAE7388415 for ; Fri, 20 Mar 2026 11:57:13 +0000 (UTC) X-FDA: 84566290746.22.BC6395E Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf29.hostedemail.com (Postfix) with ESMTP id 04619120004 for ; Fri, 20 Mar 2026 11:57:11 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cuDpamLL; spf=pass (imf29.hostedemail.com: domain of vbabka@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=vbabka@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1774007832; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f0Avv2YVnMjut1nZHkcEN2Xio1w9aceLXv7p5r9+qm4=; b=cF+PCtg9TbVo2dS2jj+CI3YcNB798qvxh7TXhg1Z6opHtDgJywTzcTez9HiCiaZrDtiWPH gPWHKN36DYqO6EmsSTZFvTKlGzyVbdQ/a8ymPgJqCyXj3M4zbc6JheK1QT001WihTxjYPw lGbth1I0+nF6jCeKiOutMRnv5CbIBys= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=cuDpamLL; spf=pass (imf29.hostedemail.com: domain of vbabka@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=vbabka@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1774007832; a=rsa-sha256; cv=none; b=LYyTzvlwz4VfmdnBu5/pgpc+R01zSdeiAVvpzFfobbW60ra8B7KvFyjrCS52MTvmKfgV1Q PAcP3Mc7P0K0AdY4a8cjmFoCwfOi3508KCY41VRdhkEHxFhgWV0zS+WhOWZVCosRXr+e+e n0nxb76j/zqRq4hDTMF+H5t7CgM44wk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 0223C4090D; Fri, 20 Mar 2026 11:57:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 12487C4CEF7; Fri, 20 Mar 2026 11:56:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774007830; bh=Mf6a4DBjpHPIv99A+jyu38p3Ewxs7HRLba8NEsfOQGI=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=cuDpamLL0C3greMzg3QB3iyxDWrB+iPjvn/sWXP2e2e9/imSZKeqLOJuj86hB5fR5 MpMlIXyYEJ3MpstWS2TNTxc7fPtpr78amYFwSOMT/Z4EbHpHbONMyV4YsJuR3t98Nb FEbnlbwER8qae1AsGU6GnwW4+9cfxLDAC0qs0oHPsO7egT+TylsTJvINu9itZrSjl7 EPJyLWEeAHwlxiL2kwGHiSjDsnGPHWFbJdp6Rbm6cCGki04+T39gZ1hT16wjCd1Gxr c3zL8LsHoRAxuJJWym+LonC/toJ20E1OUQ+JRmG7wDErFQEtivfSVdjL3z2Qo7++7u jKz30Jywr94ww== Message-ID: <373186fb-5000-47ba-85a3-4085658a7101@kernel.org> Date: Fri, 20 Mar 2026 12:56:56 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 22/23] mm/vma: convert vma_modify_flags[_uffd]() to use vma_flags_t Content-Language: en-US To: "Lorenzo Stoakes (Oracle)" Cc: Andrew Morton , David Hildenbrand , "Liam R . Howlett" , Jann Horn , Pedro Falcato , Mike Rapoport , Suren Baghdasaryan , Kees Cook , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Brian Cain , Huacai Chen , WANG Xuerui , Thomas Bogendoerfer , Dinh Nguyen , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexandre Ghiti , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Richard Weinberger , Anton Ivanov , Johannes Berg , Alexander Viro , Christian Brauner , Jan Kara , Xu Xin , Chengming Zhou , Michal Hocko , Paul Moore , Stephen Smalley , Ondrej Mosnacek , linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-um@lists.infradead.org, linux-fsdevel@vger.kernel.org, selinux@vger.kernel.org References: <98a004bf89227ea9abaef5fef06ea7e584f77bcf.1773846935.git.ljs@kernel.org> <0b5765da-67e9-4e2e-99d8-08501730bf76@kernel.org> <7e22cc48-aa04-406d-b4d0-8ebb182b34b9@lucifer.local> From: "Vlastimil Babka (SUSE)" In-Reply-To: <7e22cc48-aa04-406d-b4d0-8ebb182b34b9@lucifer.local> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 04619120004 X-Stat-Signature: jjfpkick61ywmc68m999am8fpiik4j5y X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1774007831-686779 X-HE-Meta: U2FsdGVkX18duIglVoUubV8tl/c2vPmVBn588CT/Op/GtlozjocczE8E3hDyYVh9a/1MBuWyE3EUYT9YCu2aKDjMjxcouDcqAHKiBtV36SZ0ndXm0xJP1+Ut1pJSQ/jLpa8yfOLT+uBtioY3ggFJh3WJ3Aj9Lq7EE3dWl5O6o41KY2YXX9BeH/NdXS4fzHtG3UU+9eVjrrFebDqQK9+AlN3R7YEFFqCZZTbhPK6ZtYUQuPzbY1EuDMLXDB6wKf/iFPoVitG3Y0/05nCApWxaYbOBkVUTWFRI0FeHfu3Ai09rXO6GduyUElMznWWEVVVB0MYxG+K+6yxIIyC18akEcBRDUlAqMCd9rThwmA3a3E/les4zz6mcE/f7pxF8mlMAsYycsvElsEGCRohmQB3012I3J3cI+WYviR5SzqEPIbygZApO66zZ4HXCbOKTJWfIAhcqWZcKIszwgdMUIqqmG4koaNyOnxNpMsNqmGMCtrRM10rjLFPXS1SipMqKG5wC/CAnYu4Pos7DvbpzwxSzuyakODIpgCi0IPB1GYTjg5CS3Mo8jYfGBzzFZYIZMnKS0L9XKlZo4N4GmrYlXaB1kAOhA/suFSEj3RA9RwnI0QU3f95T9/s/lScwa7iBcJrx/asoJaids1UINlurWUWwxC5rC2PBvByNi9mQ6G4HgMTGOvqRirn0ntiwZv/EJ8e2MPG1hybmVoknR3L9szWnMZ0xdIcfnd3/csfHKZMC/C6LUrwMVYpXYPb8BR8ccJc8QctCb81DQVRqm/GTab/Drw0h0KJPz3eoWJ2IRMpwxTOOlL5ZXIAyJOUKCqGmifSbXuW1TxjQ/ilatHmmtGhiJP1KoCGS6zKDuO8gNBjHUyDSHFBNz45AKRv5pEd7apQrCcFAGq+3nua+BNeL3OkqkrUBo4yn54ARbfVJP2CgenV9s2FiLTS4cqzBbRi6ojP9/19xWERZ4+ZWvvwDUDx tMwHc7hz TKYG6WNbLhdjvU2GqfohYjr+nFN8lNmQw7+eLxuuBB/AvcyGeRfeFJZRWqTk8U4s9PWrAqYiY0NxtSAuHsmnUzSf3aKaAcccietTveUAvNOtnrIFmDT0LTqNLIRs3zQrhSHSZJVF0dnwm+3fU1ZVSFkXjKQQPPNjoCyOjiketa1afjzjeQh9wnLNDrepWIkTJBII4ZUxI/qgVaZyNv65idJ2QsTLasOuNoQdIBCeuQhEPlSxEhsF7vA8wsyBgC738GqD5c+NPC3HTZBMLjd62c7FkB7YfkEOT9CDzvemCGoyc+TDYyDzqmBaPj7DvuLsELv3bUl0Ze6A4ZhlcZ4pAS4UxcaSCXFWmLGon2mM9CxyPgtjqDDc2clc9fzxZRSNvXXUvRotJN/gBc4sesVVp3WqM9w== Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/20/26 12:08, Lorenzo Stoakes (Oracle) wrote: > On Fri, Mar 20, 2026 at 11:39:58AM +0100, Vlastimil Babka (SUSE) wrote: >> On 3/18/26 16:50, Lorenzo Stoakes (Oracle) wrote: >> > Update the vma_modify_flags() and vma_modify_flags_uffd() functions to >> > accept a vma_flags_t parameter rather than a vm_flags_t one, and propagate >> > the changes as needed to implement this change. >> > >> > Finally, update the VMA tests to reflect this. >> > >> > Signed-off-by: Lorenzo Stoakes (Oracle) >> >> > --- a/mm/mlock.c >> > +++ b/mm/mlock.c >> > @@ -415,13 +415,14 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr, >> > * @vma - vma containing range to be mlock()ed or munlock()ed >> > * @start - start address in @vma of the range >> > * @end - end of range in @vma >> > - * @newflags - the new set of flags for @vma. >> > + * @new_vma_flags - the new set of flags for @vma. >> > * >> > * Called for mlock(), mlock2() and mlockall(), to set @vma VM_LOCKED; >> > * called for munlock() and munlockall(), to clear VM_LOCKED from @vma. >> > */ >> > static void mlock_vma_pages_range(struct vm_area_struct *vma, >> > - unsigned long start, unsigned long end, vm_flags_t newflags) >> > + unsigned long start, unsigned long end, >> > + vma_flags_t *new_vma_flags) >> > { >> > static const struct mm_walk_ops mlock_walk_ops = { >> > .pmd_entry = mlock_pte_range, >> > @@ -439,18 +440,18 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma, >> > * combination should not be visible to other mmap_lock users; >> > * but WRITE_ONCE so rmap walkers must see VM_IO if VM_LOCKED. >> > */ >> > - if (newflags & VM_LOCKED) >> > - newflags |= VM_IO; >> > + if (vma_flags_test(new_vma_flags, VMA_LOCKED_BIT)) >> > + vma_flags_set(new_vma_flags, VMA_IO_BIT); >> > vma_start_write(vma); >> > - vm_flags_reset_once(vma, newflags); >> > + WRITE_ONCE(vma->flags, *new_vma_flags); >> >> It's not clear to me, how is switching from vm_flags_t to vma_flags_t >> allowing us to simply do WRITE_ONCE() instead of the full logic of >> vm_flags_reset_once()? Won't it fail to compile once once flags are more >> than single word? Or worse, will compile but silently allow tearing? > > We only care about tearing in the flags that can be contained within a > system word, but true we should probably do this more carefully, as I did > for vm_flags_reset_once(). > > I will reimplement this as a new, hideous, helper function. > > I am not a fan of this being a thing to handle races, it's a hack. But I > guess that should be addressed separately. Right, thanks! >> >> > } >> > } >> > >> > @@ -467,20 +468,22 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, >> > struct vm_area_struct **prev, unsigned long start, >> > unsigned long end, vm_flags_t newflags) >> > { >> > + vma_flags_t new_vma_flags = legacy_to_vma_flags(newflags); >> > + const vma_flags_t old_vma_flags = vma->flags; >> > struct mm_struct *mm = vma->vm_mm; >> > int nr_pages; >> > int ret = 0; >> > - vm_flags_t oldflags = vma->vm_flags; >> > >> > - if (newflags == oldflags || vma_is_secretmem(vma) || >> > - !vma_supports_mlock(vma)) >> > + if (vma_flags_same_pair(&old_vma_flags, &new_vma_flags) || >> > + vma_is_secretmem(vma) || !vma_supports_mlock(vma)) { >> > /* >> > * Don't set VM_LOCKED or VM_LOCKONFAULT and don't count. >> > * For secretmem, don't allow the memory to be unlocked. >> > */ >> > goto out; >> > + } >> > >> > - vma = vma_modify_flags(vmi, *prev, vma, start, end, &newflags); >> > + vma = vma_modify_flags(vmi, *prev, vma, start, end, &new_vma_flags); >> > if (IS_ERR(vma)) { >> > ret = PTR_ERR(vma); >> > goto out; >> > @@ -490,9 +493,9 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, >> > * Keep track of amount of locked VM. >> > */ >> > nr_pages = (end - start) >> PAGE_SHIFT; >> > - if (!(newflags & VM_LOCKED)) >> > + if (!vma_flags_test(&new_vma_flags, VMA_LOCKED_BIT)) >> > nr_pages = -nr_pages; >> > - else if (oldflags & VM_LOCKED) >> > + else if (vma_flags_test(&old_vma_flags, VMA_LOCKED_BIT)) >> > nr_pages = 0; >> > mm->locked_vm += nr_pages; >> > >> > @@ -501,12 +504,13 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, >> > * It's okay if try_to_unmap_one unmaps a page just after we >> > * set VM_LOCKED, populate_vma_page_range will bring it back. >> > */ >> > - if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) { >> > + if (vma_flags_test(&new_vma_flags, VMA_LOCKED_BIT) && >> > + vma_flags_test(&old_vma_flags, VMA_LOCKED_BIT)) { >> > /* No work to do, and mlocking twice would be wrong */ >> > vma_start_write(vma); >> > - vm_flags_reset(vma, newflags); >> > + vma->flags = new_vma_flags; >> >> This also does lot less than vm_flags_reset()? > > Well let's look: > > VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY)); > > Are we really at a point where this is problematic? Do we hit this? Why are > we specifically checking only this case on every single instance of > resetting VMA flags? I'll admit I don't know, but sounds like we can stop being so paranoid when converting code to the new API. > > vma_assert_write_locked(vma); > > Note the vma_start_write() line above. I want to separate vma_flags_t > helpers from asserts like that, because: > > 1. We might be operating on a VMA that is not yet added to the tree > 2. We might be operating on a VMA that is now detached > 3. Really in all but core code, you should be using vma_desc_xxx(). > 4. Other VMA fields are manipulated with no such checks. > 5. It'd be egregious to have to add variants of flag functions just to > account for cases such as the above, especially when we don't do so for > other VMA fields. Drivers are the problematic cases and why it was > especially important (and also for debug as VMA locks were introduced), > the mmap_prepare work is solving this generally. Perfectly reasonable! > vm_flags_init(vma, flags); > > Analysing what's in this function: > > VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY)); > > Duplicated. > > vma_flags_clear_all(&vma->flags); > > Only necessary while you're only setting the first system word of > vma->flags. > > vma_flags_overwrite_word(&vma->flags, flags); > > Again only necessary when you're only setting the first system word. > > So yeah it's doing the equivalent and (intentionally) eliminating some > noise. Ack. > But I'll add the S/D check back I guess. Thanks for the detailed explanation. It's fine to drop legacy stuff, it just wasn't obvious if intentional or by mistake. Maybe just mention the intention in the changelog? Thanks! >> >> > } else { >> > - mlock_vma_pages_range(vma, start, end, newflags); >> > + mlock_vma_pages_range(vma, start, end, &new_vma_flags); >> > } >> > out: >> > *prev = vma; >> > diff --git a/mm/mprotect.c b/mm/mprotect.c >> > index eaa724b99908..2b8a85689ab7 100644 >> > --- a/mm/mprotect.c >> > +++ b/mm/mprotect.c >> > @@ -756,13 +756,11 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, >> > vma_flags_clear(&new_vma_flags, VMA_ACCOUNT_BIT); >> > } >> > >> > - newflags = vma_flags_to_legacy(new_vma_flags); >> > - vma = vma_modify_flags(vmi, *pprev, vma, start, end, &newflags); >> > + vma = vma_modify_flags(vmi, *pprev, vma, start, end, &new_vma_flags); >> > if (IS_ERR(vma)) { >> > error = PTR_ERR(vma); >> > goto fail; >> > } >> > - new_vma_flags = legacy_to_vma_flags(newflags); >> > >> > *pprev = vma; >> > >> > @@ -771,7 +769,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, >> > * held in write mode. >> > */ >> > vma_start_write(vma); >> > - vm_flags_reset_once(vma, newflags); >> > + WRITE_ONCE(vma->flags, new_vma_flags); >> >> Ditto. > > I mean overall these cases are hacks in my opinion to work around code that > should have solved their problem another way. > > But sure, as above I'll add a helper function or such. Great. >> >> > if (vma_wants_manual_pte_write_upgrade(vma)) >> > mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE; >> > vma_set_page_prot(vma); >> > @@ -796,6 +794,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, >> > } >> > >> > vm_stat_account(mm, vma_flags_to_legacy(old_vma_flags), -nrpages); >> > + newflags = vma_flags_to_legacy(new_vma_flags); >> > vm_stat_account(mm, newflags, nrpages); >> > perf_event_mmap(vma); >> > return 0; >> > diff --git a/mm/mseal.c b/mm/mseal.c >> > index 316b5e1dec78..603df53ad267 100644