From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D696E50A65; Wed, 1 May 2024 11:38:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714563520; cv=none; b=WXZQebJEUkAayoWRvfuLhqOn+4Sj5wMXikXu8x9onH1eheiGaG5CDoNp9SsQVIa5Ih3luJ70lQILzpM1If2LiSPgZIz282RErHf9gzRt7u2R8bWvH2ZyVSjbeyXmlW/ycFI0yViUU6Kwl+LaAHTb7WtH2o6H5MM9C2ZgJM/cPBg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714563520; c=relaxed/simple; bh=fowAljGGPuLX/Fuzbpc6F7s7qHiuRtHS8TPcVV4cvcM=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=JoQDd3wHz4ZxuF4nLL07aj0QfYN2frbBKFQbsgfJEbFGI6GdJ8fggB1zSWTcdMCwTXvIv6Bm3hpQM5QMUIPJjbqR+SbUCaC9klEFAmRHrDPk8B6+5cPiK7/1bnjNSivixFQDlZtbTnkpzBm5x0zGHZ4RDUzPxTcRiFNiqzwazGQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73416339; Wed, 1 May 2024 04:39:04 -0700 (PDT) Received: from [10.57.65.146] (unknown [10.57.65.146]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ACD493F793; Wed, 1 May 2024 04:38:36 -0700 (PDT) Message-ID: <0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com> Date: Wed, 1 May 2024 12:38:35 +0100 Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Content-Language: en-GB From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Anshuman Khandual , Andrew Morton , Zi Yan , "Aneesh Kumar K.V" , David Hildenbrand Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <20240430133138.732088-1-ryan.roberts@arm.com> <9fb15315-6317-4bf4-a736-a8a44288b0c2@arm.com> In-Reply-To: <9fb15315-6317-4bf4-a736-a8a44288b0c2@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Pulling in David, who may be able to advise... On 01/05/2024 12:35, Ryan Roberts wrote: > Zi Yan, I'm hoping you might have some input on the below... > > > On 30/04/2024 14:31, Ryan Roberts wrote: >> __split_huge_pmd_locked() can be called for a present THP, devmap or >> (non-present) migration entry. It calls pmdp_invalidate() >> unconditionally on the pmdp and only determines if it is present or not >> based on the returned old pmd. >> >> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(), >> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future >> pmd_present() calls to return true - even for a swap pmd. Therefore any >> lockless pgtable walker could see the migration entry pmd in this state >> and start interpretting the fields (e.g. pmd_pfn()) as if it were >> present, leading to BadThings (TM). GUP-fast appears to be one such >> lockless pgtable walker. >> >> While the obvious fix is for core-mm to avoid such calls for non-present >> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for >> this case either), all other arches that implement pmd_mkinvalid() do it >> in such a way that it is robust to being called with a non-present pmd. > > OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below... > >> So it is simpler and safer to make arm64 robust too. This approach means >> we can even add tests to debug_vm_pgtable.c to validate the required >> behaviour. >> >> This is a theoretical bug found during code review. I don't have any >> test case to trigger it in practice. >> >> Cc: stable@vger.kernel.org >> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration") >> Signed-off-by: Ryan Roberts >> --- >> >> Hi all, >> >> v1 of this fix [1] took the approach of fixing core-mm to never call >> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64 >> suffers this problem; all other arches are robust. So his suggestion was to >> instead make arm64 robust in the same way and add tests to validate it. Despite >> my stated reservations in the context of the v1 discussion, having thought on it >> for a bit, I now agree with Zi Yan. Hence this post. >> >> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is >> remove it from there and have this go in through the arm64 tree? Assuming there >> is agreement that this approach is right one. >> >> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64. >> >> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ >> >> Thanks, >> Ryan >> >> >> arch/arm64/include/asm/pgtable.h | 12 +++++-- >> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++ >> 2 files changed, 71 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index afdd56d26ad7..7d580271a46d 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd) >> >> static inline pmd_t pmd_mkinvalid(pmd_t pmd) >> { >> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); >> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); >> + /* >> + * If not valid then either we are already present-invalid or we are >> + * not-present (i.e. none or swap entry). We must not convert >> + * not-present to present-invalid. Unbelievably, the core-mm may call >> + * pmd_mkinvalid() for a swap entry and all other arches can handle it. >> + */ >> + if (pmd_valid(pmd)) { >> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); >> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); >> + } >> >> return pmd; >> } >> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >> index 65c19025da3d..7e9c387d06b0 100644 >> --- a/mm/debug_vm_pgtable.c >> +++ b/mm/debug_vm_pgtable.c >> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { } >> #endif /* CONFIG_HUGETLB_PAGE */ >> >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) >> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) >> +{ > > Printing various values at different locations in this function for debug: > >> + unsigned long max_swap_offset; >> + swp_entry_t swp_set, swp_clear, swp_convert; >> + pmd_t pmd_set, pmd_clear; >> + >> + /* >> + * See generic_max_swapfile_size(): probe the maximum offset, then >> + * create swap entry will all possible bits set and a swap entry will >> + * all bits clear. >> + */ >> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL)))); >> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); >> + swp_clear = swp_entry(0, 0); >> + >> + /* Convert to pmd. */ >> + pmd_set = swp_entry_to_pmd(swp_set); >> + pmd_clear = swp_entry_to_pmd(swp_clear); > > [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00 > >> + >> + /* >> + * Sanity check that the pmds are not-present, not-huge and swap entry >> + * is recoverable without corruption. >> + */ >> + WARN_ON(pmd_present(pmd_set)); >> + WARN_ON(pmd_trans_huge(pmd_set)); >> + swp_convert = pmd_to_swp_entry(pmd_set); >> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); >> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); >> + WARN_ON(pmd_present(pmd_clear)); >> + WARN_ON(pmd_trans_huge(pmd_clear)); >> + swp_convert = pmd_to_swp_entry(pmd_clear); >> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); >> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); >> + >> + /* Now invalidate the pmd. */ >> + pmd_set = pmd_mkinvalid(pmd_set); >> + pmd_clear = pmd_mkinvalid(pmd_clear); > > [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00 > >> + >> + /* >> + * Since its a swap pmd, invalidation should effectively be a noop and >> + * the checks we already did should give the same answer. Check the >> + * invalidation didn't corrupt any fields. >> + */ >> + WARN_ON(pmd_present(pmd_set)); >> + WARN_ON(pmd_trans_huge(pmd_set)); >> + swp_convert = pmd_to_swp_entry(pmd_set); > > [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff) > >> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); >> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); >> + WARN_ON(pmd_present(pmd_clear)); >> + WARN_ON(pmd_trans_huge(pmd_clear)); >> + swp_convert = pmd_to_swp_entry(pmd_clear); > > [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8) > >> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); >> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); > > This line fails on x86_64. > > The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8. > > I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things. > > So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes. > >> +} >> +#else >> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { } >> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */ >> + >> static void __init pmd_thp_tests(struct pgtable_debug_args *args) >> { >> pmd_t pmd; >> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args) >> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd)))); >> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))); >> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */ >> + >> + swp_pmd_mkinvalid_tests(args); >> } >> >> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >> -- >> 2.25.1 >> >