From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 12A7BC27C79 for ; Wed, 19 Jun 2024 15:58:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=A4MMOPaSWyso9gLHKvmXeTKuo5QSS6/XdmFeKTWfSqs=; b=bbDHDzEzT1Km63bOX0nkxJd4Qw mQCnDARleHwbJb7TYlHLYZX8Mw1JHYkSsWhvABmtjdVjJWr1GL/Ls1Zi9k2cumz7R1xzDzZ5ezUF5 UQA+wN2tfMg+U3d4p3FM0YphyHszbA2AklffmonBnWIXWHyCWJWmg4g35L4R9WI589OH8Sv6fiCvJ xIwYhPMQbswVAZr5J7EkJawu2TvkfAcAfZZwiuPIG+ds7qrh+7qyaBvPlDwD45KgGG0BUOx51L7pT 7zB8lTwG6Gmzi4eNudXKQb6lQlbwLtltyrpIzg03APNjM3xh3TipcDCfoKutvwkofJ0ax05Ua7hxw eRnG0+LQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sJxhr-00000001w8s-2Ugg; Wed, 19 Jun 2024 15:58:43 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sJxho-00000001w7U-2KyS for linux-arm-kernel@lists.infradead.org; Wed, 19 Jun 2024 15:58:42 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4258B1042; Wed, 19 Jun 2024 08:58:59 -0700 (PDT) Received: from [10.1.36.163] (XHFQ2J9959.cambridge.arm.com [10.1.36.163]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BC33C3F64C; Wed, 19 Jun 2024 08:58:33 -0700 (PDT) Message-ID: <3a42e195-9392-442f-aba7-fdd2c186b98f@arm.com> Date: Wed, 19 Jun 2024 16:58:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1] arm64: mm: Permit PTE SW bits to change in live mappings Content-Language: en-GB To: Peter Xu Cc: Catalin Marinas , Will Deacon , Mark Rutland , linux-arm-kernel@lists.infradead.org References: <20240619121859.4153966-1-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240619_085840_793428_119991BE X-CRM114-Status: GOOD ( 50.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 19/06/2024 15:54, Peter Xu wrote: > Hi, Ryan, > > On Wed, Jun 19, 2024 at 01:18:56PM +0100, Ryan Roberts wrote: >> Previously pgattr_change_is_safe() was overly-strict and complained >> (e.g. "[ 116.262743] __check_safe_pte_update: unsafe attribute change: >> 0x0560000043768fc3 -> 0x0160000043768fc3") if it saw any SW bits change >> in a live PTE. There is no such restriction on SW bits in the Arm ARM. >> >> Until now, no SW bits have been updated in live mappings via the >> set_ptes() route. PTE_DIRTY would be updated live, but this is handled >> by ptep_set_access_flags() which does not call pgattr_change_is_safe(). >> However, with the introduction of uffd-wp for arm64, there is core-mm >> code that does ptep_get(); pte_clear_uffd_wp(); set_ptes(); which >> triggers this false warning. >> >> Silence this warning by masking out the SW bits during checks. >> >> The bug isn't technically in the highlighted commit below, but that's >> where bisecting would likely lead as its what made the bug user-visible. >> >> Signed-off-by: Ryan Roberts >> Fixes: 5b32510af77b ("arm64/mm: Add uffd write-protect support") >> --- >> >> Hi All, >> >> This applies on top of v6.10-rc4 and it would be good to land this as a hotfix >> for v6.10 since its effectively fixing a bug in 5b32510af77b which was merged >> for v6.10. >> >> I've only been able to trigger this occasionally by running the mm uffd >> selftests, when swap is configured to use a small (64M) zRam disk. With this fix >> applied I can no longer trigger it. > > Totally not familiar with the arm64 pgtable checker here, but I'm just > wondering how the swap affected this, as I see there's: > > /* creating or taking down mappings is always safe */ > if (!pte_valid(__pte(old)) || !pte_valid(__pte(new))) > return true; > > Should pte_valid() always report false on swap entries? Does it mean that > it'll always report PASS for anything switch from/to a swap entry for the > checker? Yes that's correct; swap ptes are invalid from the HW's pov so you can always safely change their values from anything to anything (as long as the valid bit remains 0). > > I assume that's also why you didn't cover bit 3 (uffd-wp swap bit on arm64, > per my read in your previous series), but I don't think I'm confident on my > understanding yet. It might be nice to mention how that was triggered in > the commit message from that regard. Bit 3 is the uffd-wp bit in swap ptes. Bit 58 is the uffd-wp bit for valid ptes. Here we are only concerned with valid ptes. Yes, its a mess ;-) > >> >> Thanks, >> Ryan >> >> arch/arm64/include/asm/pgtable-hwdef.h | 1 + >> arch/arm64/mm/mmu.c | 3 ++- >> 2 files changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h >> index 9943ff0af4c9..1f60aa1bc750 100644 >> --- a/arch/arm64/include/asm/pgtable-hwdef.h >> +++ b/arch/arm64/include/asm/pgtable-hwdef.h >> @@ -170,6 +170,7 @@ >> #define PTE_CONT (_AT(pteval_t, 1) << 52) /* Contiguous range */ >> #define PTE_PXN (_AT(pteval_t, 1) << 53) /* Privileged XN */ >> #define PTE_UXN (_AT(pteval_t, 1) << 54) /* User XN */ >> +#define PTE_SWBITS_MASK _AT(pteval_t, (BIT(63) | GENMASK(58, 55))) >> >> #define PTE_ADDR_LOW (((_AT(pteval_t, 1) << (50 - PAGE_SHIFT)) - 1) << PAGE_SHIFT) >> #ifdef CONFIG_ARM64_PA_BITS_52 >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >> index c927e9312f10..353ea5dc32b8 100644 >> --- a/arch/arm64/mm/mmu.c >> +++ b/arch/arm64/mm/mmu.c >> @@ -124,7 +124,8 @@ bool pgattr_change_is_safe(u64 old, u64 new) >> * The following mapping attributes may be updated in live >> * kernel mappings without the need for break-before-make. >> */ >> - pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG; >> + pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE | PTE_NG | >> + PTE_SWBITS_MASK; > > When applying the uffd-wp bit, normally we shouldn't need this as we'll > need to do BBM-alike ops to avoid concurrent HW A/D updates. E.g. > change_pte_range() uses the ptep_modify_prot_* APIs. > > But indeed at least unprotect / clear-uffd-bit doesn't logically need that, > we already do that in e.g. do_wp_page(). From that POV it makes sense to > me, as I also don't see why soft-bits are forbidden to be updated on ptes > if HWs ignore them as a pretty generic concept. Just want to double check > with you. This bug was indeed triggering from do_wp_page() as you say, and I was considering sending out a separate patch to change that code to use the ptep_modify_prot_start()/ptep_modify_prot_commit() pattern which transitions the pte through 0 so that we guarrantee not to lose any A/D updates. In the end I convinced myself that while ptep_get(); pte_clear_uffd_wp(); set_ptes(); is a troubling pattern, it is safe in this instance because the page is write-protected so the HW can't race to set the dirty bit. The code in question is: if (userfaultfd_pte_wp(vma, ptep_get(vmf->pte))) { if (!userfaultfd_wp_async(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); return handle_userfault(vmf, VM_UFFD_WP); } /* * Nothing needed (cache flush, TLB invalidations, * etc.) because we're only removing the uffd-wp bit, * which is completely invisible to the user. */ pte = pte_clear_uffd_wp(ptep_get(vmf->pte)); set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); /* * Update this to be prepared for following up CoW * handling */ vmf->orig_pte = pte; } Perhaps we should consider a change to the following style as a cleanup? old_pte = ptep_modify_prot_start(vma, addr, pte); ptent = pte_clear_uffd_wp(old_pte); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); Regardless, this patch is still a correct and valuable change; arm64 arch doesn't care if SW bits are modified in valid mappings so we shouldn't be checking for it. > >> >> /* creating or taking down mappings is always safe */ >> if (!pte_valid(__pte(old)) || !pte_valid(__pte(new))) >> -- >> 2.43.0 >> > > When looking at this function I found this and caught my attention too: > > /* live contiguous mappings may not be manipulated at all */ > if ((old | new) & PTE_CONT) > return false; > > I'm now wondering how cont-ptes work with uffd-wp now for arm64, from > either hugetlb or mTHP pov. This check may be relevant here as a start. When transitioning a block of ptes between cont and non-cont, we transition the block through invalid with tlb invalidation. See contpte_convert(). > > The other thing is since x86 doesn't have cont-ptes yet, uffd-wp didn't > consider that, and there may be things overlooked at least from my side. > E.g., consider wr-protect one cont-pte huge pages on hugetlb: > > static inline pte_t huge_pte_mkuffd_wp(pte_t pte) > { > return huge_pte_wrprotect(pte_mkuffd_wp(pte)); > } > > I think it means so far it won't touch the rest cont-ptes but the 1st. Not > sure whether it'll work if write happens on the rest. I'm not completely sure I follow your point. I think this should work correctly. The arm64 huge_pte code knows what size (and level) the huge pte is and spreads the passed in pte across all the HW ptes. > > For mTHPs, they should still be done in change_pte_range() which doesn't > understand mTHPs yet, so it should loop over all ptes and looks good so > far, but I didn't further check other than that. For mTHP, it will JustWork (TM). PTEs are exposed to core-mm with the same semantics they had before; they all appear independent. The code dertermines when it needs to apply or remove PTE_CONT bit, and in that case, the block is transitioned through invalid state + tlbi. See contpte_try_fold() and contpte_try_unfold(). Hope that helps! Thanks, Ryan > > Thanks, >