From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43DCB21CC6A; Wed, 26 Feb 2025 17:41:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740591696; cv=none; b=J19mEGITd8w/DwWmzlYj9YSbF/ha44awuZP548mlWR0616O8HsefPw7ds/KSSnwKMHEk/fDmMi5FHE24LvDs0/vnNTH1M5Q4c5TgPZYS5MWfUfu5HYKoGXcDnSoUOiUXURkTxCSVh7CytPWbwPT1Yp7RW4lNIqkfiHz5BUSoW1I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740591696; c=relaxed/simple; bh=8sV5lBpJpfsYzYIXD7QQspHnPGHzN2tgfhX2Q7fvFxc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jnrc4SMOBJ75ul2MAtmOb64gEcri/Mfo8sKBC+IctGQ3aNBy1yKomvG2DZfyyXJpOeG88uM7mZENWpBII95Wq7r6nYXhYq3AyMajymH/DrTszKlQUHkhrY367IKBBS+WqDfGWxPhywDoMrZ9JicT22/rMjt2JB73gpFNDBX0zFg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C9E5C4CEE2; Wed, 26 Feb 2025 17:41:29 +0000 (UTC) Date: Wed, 26 Feb 2025 17:41:27 +0000 From: Catalin Marinas To: Ryan Roberts Cc: Will Deacon , Huacai Chen , WANG Xuerui , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Gerald Schaefer , "David S. Miller" , Andreas Larsson , Arnd Bergmann , Muchun Song , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Dev Jain , Kevin Brodsky , Alexandre Ghiti , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH v3 2/3] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Message-ID: References: <20250226120656.2400136-1-ryan.roberts@arm.com> <20250226120656.2400136-3-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250226120656.2400136-3-ryan.roberts@arm.com> On Wed, Feb 26, 2025 at 12:06:52PM +0000, Ryan Roberts wrote: > arm64 supports multiple huge_pte sizes. Some of the sizes are covered by > a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some > are covered by multiple ptes at a particular level (CONT_PTE_SIZE, > CONT_PMD_SIZE). So the function has to figure out the size from the > huge_pte pointer. This was previously done by walking the pgtable to > determine the level and by using the PTE_CONT bit to determine the > number of ptes at the level. > > But the PTE_CONT bit is only valid when the pte is present. For > non-present pte values (e.g. markers, migration entries), the previous > implementation was therefore erroneously determining the size. There is > at least one known caller in core-mm, move_huge_pte(), which may call > huge_ptep_get_and_clear() for a non-present pte. So we must be robust to > this case. Additionally the "regular" ptep_get_and_clear() is robust to > being called for non-present ptes so it makes sense to follow the > behavior. > > Fix this by using the new sz parameter which is now provided to the > function. Additionally when clearing each pte in a contig range, don't > gather the access and dirty bits if the pte is not present. > > An alternative approach that would not require API changes would be to > store the PTE_CONT bit in a spare bit in the swap entry pte for the > non-present case. But it felt cleaner to follow other APIs' lead and > just pass in the size. > > As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap > entry offset field (layout of non-present pte). Since hugetlb is never > swapped to disk, this field will only be populated for markers, which > always set this bit to 0 and hwpoison swap entries, which set the offset > field to a PFN; So it would only ever be 1 for a 52-bit PVA system where > memory in that high half was poisoned (I think!). So in practice, this > bit would almost always be zero for non-present ptes and we would only > clear the first entry if it was actually a contiguous block. That's > probably a less severe symptom than if it was always interpreted as 1 > and cleared out potentially-present neighboring PTEs. > > Cc: stable@vger.kernel.org > Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") > Reviewed-by: Catalin Marinas > Signed-off-by: Ryan Roberts > > tmp > --- Random "tmp" here, otherwise the patch looks fine (can be removed when applying). -- Catalin