From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BFDEC7EE25 for ; Fri, 9 Jun 2023 20:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231439AbjFIUMi (ORCPT ); Fri, 9 Jun 2023 16:12:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231319AbjFIUMO (ORCPT ); Fri, 9 Jun 2023 16:12:14 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9768E2D5F for ; Fri, 9 Jun 2023 13:11:57 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2E61365BB1 for ; Fri, 9 Jun 2023 20:11:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78A5AC433EF; Fri, 9 Jun 2023 20:11:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1686341516; bh=f7ExTZLRw0w3VEi5CNhZpKDd1TeoIDmAoiNxrnrROgo=; h=Date:To:From:Subject:From; b=ahdiRkWOI3qjA7lJGOqxtY0CUUeOqYVdEpxnGQHiTmR9MKIOu4e7YAuPmb/NBbn1U S7Z3Xs43MARaxHS6itkL+K8URHzBKA4s3rATWqJhLhyH9jfpFUlPzGNk6I3YLVaT7k tKrAbQ5jjiRTLJ87JLYs3F9EBWwSXr2WGE1E8z40= Date: Fri, 09 Jun 2023 13:11:55 -0700 To: mm-commits@vger.kernel.org, zhengqi.arch@bytedance.com, zackr@vmware.com, yuzhao@google.com, ying.huang@intel.com, willy@infradead.org, will@kernel.org, thomas.hellstrom@linux.intel.com, surenb@google.com, steven.price@arm.com, song@kernel.org, sj@kernel.org, shy828301@gmail.com, ryan.roberts@arm.com, rppt@kernel.org, rcampbell@nvidia.com, peterz@infradead.org, peterx@redhat.com, pasha.tatashin@soleen.com, naoya.horiguchi@nec.com, minchan@kernel.org, mike.kravetz@oracle.com, mgorman@techsingularity.net, lstoakes@gmail.com, linmiaohe@huawei.com, kirill.shutemov@linux.intel.com, jgg@ziepe.ca, ira.weiny@intel.com, hch@infradead.org, david@redhat.com, christophe.leroy@csgroup.eu, axelrasmussen@google.com, apopple@nvidia.com, anshuman.khandual@arm.com, hughd@google.com, akpm@linux-foundation.org From: Andrew Morton Subject: + mm-memory-handle_pte_fault-use-pte_offset_map_nolock.patch added to mm-unstable branch Message-Id: <20230609201156.78A5AC433EF@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: mm/memory: handle_pte_fault() use pte_offset_map_nolock() has been added to the -mm mm-unstable branch. Its filename is mm-memory-handle_pte_fault-use-pte_offset_map_nolock.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-memory-handle_pte_fault-use-pte_offset_map_nolock.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Hugh Dickins Subject: mm/memory: handle_pte_fault() use pte_offset_map_nolock() Date: Thu, 8 Jun 2023 18:45:05 -0700 (PDT) handle_pte_fault() use pte_offset_map_nolock() to get the vmf.ptl which corresponds to vmf.pte, instead of pte_lockptr() being used later, when there's a chance that the pmd entry might have changed, perhaps to none, or to a huge pmd, with no split ptlock in its struct page. Remove its pmd_devmap_trans_unstable() call: pte_offset_map_nolock() will handle that case by failing. Update the "morph" comment above, looking forward to when shmem or file collapse to THP may not take mmap_lock for write (or not at all). do_numa_page() use the vmf->ptl from handle_pte_fault() at first, but refresh it when refreshing vmf->pte. do_swap_page()'s pte_unmap_same() (the thing that takes ptl to verify a two-part PAE orig_pte) use the vmf->ptl from handle_pte_fault() too; but do_swap_page() is also used by anon THP's __collapse_huge_page_swapin(), so adjust that to set vmf->ptl by pte_offset_map_nolock(). Link: https://lkml.kernel.org/r/c1107654-3929-60ac-223e-6877cbb86065@google.com Signed-off-by: Hugh Dickins Cc: Alistair Popple Cc: Anshuman Khandual Cc: Axel Rasmussen Cc: Christophe Leroy Cc: Christoph Hellwig Cc: David Hildenbrand Cc: "Huang, Ying" Cc: Ira Weiny Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lorenzo Stoakes Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Mike Kravetz Cc: Mike Rapoport (IBM) Cc: Minchan Kim Cc: Naoya Horiguchi Cc: Pavel Tatashin Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Ralph Campbell Cc: Ryan Roberts Cc: SeongJae Park Cc: Song Liu Cc: Steven Price Cc: Suren Baghdasaryan Cc: Thomas Hellström Cc: Will Deacon Cc: Yang Shi Cc: Yu Zhao Cc: Zack Rusin Signed-off-by: Andrew Morton --- mm/khugepaged.c | 6 ++++-- mm/memory.c | 38 +++++++++++++------------------------- 2 files changed, 17 insertions(+), 27 deletions(-) --- a/mm/khugepaged.c~mm-memory-handle_pte_fault-use-pte_offset_map_nolock +++ a/mm/khugepaged.c @@ -1003,6 +1003,7 @@ static int __collapse_huge_page_swapin(s unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE); int result; pte_t *pte = NULL; + spinlock_t *ptl; for (address = haddr; address < end; address += PAGE_SIZE) { struct vm_fault vmf = { @@ -1014,7 +1015,7 @@ static int __collapse_huge_page_swapin(s }; if (!pte++) { - pte = pte_offset_map(pmd, address); + pte = pte_offset_map_nolock(mm, pmd, address, &ptl); if (!pte) { mmap_read_unlock(mm); result = SCAN_PMD_NULL; @@ -1022,11 +1023,12 @@ static int __collapse_huge_page_swapin(s } } - vmf.orig_pte = *pte; + vmf.orig_pte = ptep_get_lockless(pte); if (!is_swap_pte(vmf.orig_pte)) continue; vmf.pte = pte; + vmf.ptl = ptl; ret = do_swap_page(&vmf); /* Which unmaps pte (after perhaps re-checking the entry) */ pte = NULL; --- a/mm/memory.c~mm-memory-handle_pte_fault-use-pte_offset_map_nolock +++ a/mm/memory.c @@ -2787,10 +2787,9 @@ static inline int pte_unmap_same(struct int same = 1; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); - spin_lock(ptl); + spin_lock(vmf->ptl); same = pte_same(*vmf->pte, vmf->orig_pte); - spin_unlock(ptl); + spin_unlock(vmf->ptl); } #endif pte_unmap(vmf->pte); @@ -4698,7 +4697,6 @@ static vm_fault_t do_numa_page(struct vm * validation through pte_unmap_same(). It's of NUMA type but * the pfn may be screwed if the read is non atomic. */ - vmf->ptl = pte_lockptr(vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -4769,8 +4767,10 @@ static vm_fault_t do_numa_page(struct vm flags |= TNF_MIGRATED; } else { flags |= TNF_MIGRATE_FAIL; - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - spin_lock(vmf->ptl); + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + goto out; if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); goto out; @@ -4900,26 +4900,15 @@ static vm_fault_t handle_pte_fault(struc vmf->flags &= ~FAULT_FLAG_ORIG_PTE_VALID; } else { /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; - /* * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). + * pmd by anon khugepaged, since that takes mmap_lock in write + * mode; but shmem or file collapse to THP could still morph + * it into a huge pmd: just retry later if so. */ - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + vmf->pte = pte_offset_map_nolock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + return 0; vmf->orig_pte = ptep_get_lockless(vmf->pte); vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID; @@ -4938,7 +4927,6 @@ static vm_fault_t handle_pte_fault(struc if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); - vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); spin_lock(vmf->ptl); entry = vmf->orig_pte; if (unlikely(!pte_same(*vmf->pte, entry))) { _ Patches currently in -mm which might be from hughd@google.com are arm-allow-pte_offset_map-to-fail.patch arm64-allow-pte_offset_map-to-fail.patch arm64-hugetlb-pte_alloc_huge-pte_offset_huge.patch ia64-hugetlb-pte_alloc_huge-pte_offset_huge.patch m68k-allow-pte_offset_map-to-fail.patch microblaze-allow-pte_offset_map-to-fail.patch mips-update_mmu_cache-can-replace-__update_tlb.patch mips-update_mmu_cache-can-replace-__update_tlb-fix.patch parisc-add-pte_unmap-to-balance-get_ptep.patch parisc-unmap_uncached_pte-use-pte_offset_kernel.patch parisc-hugetlb-pte_alloc_huge-pte_offset_huge.patch powerpc-kvmppc_unmap_free_pmd-pte_offset_kernel.patch powerpc-allow-pte_offset_map-to-fail.patch powerpc-hugetlb-pte_alloc_huge.patch riscv-hugetlb-pte_alloc_huge-pte_offset_huge.patch s390-allow-pte_offset_map_lock-to-fail.patch s390-gmap-use-pte_unmap_unlock-not-spin_unlock.patch sh-hugetlb-pte_alloc_huge-pte_offset_huge.patch sparc-hugetlb-pte_alloc_huge-pte_offset_huge.patch sparc-allow-pte_offset_map-to-fail.patch sparc-iounit-and-iommu-use-pte_offset_kernel.patch x86-allow-get_locked_pte-to-fail.patch x86-sme_populate_pgd-use-pte_offset_kernel.patch xtensa-add-pte_unmap-to-balance-pte_offset_map.patch mm-use-pmdp_get_lockless-without-surplus-barrier.patch mm-migrate-remove-cruft-from-migration_entry_waits.patch mm-pgtable-kmap_local_page-instead-of-kmap_atomic.patch mm-pgtable-allow-pte_offset_map-to-fail.patch mm-filemap-allow-pte_offset_map_lock-to-fail.patch mm-page_vma_mapped-delete-bogosity-in-page_vma_mapped_walk.patch mm-page_vma_mapped-reformat-map_pte-with-less-indentation.patch mm-page_vma_mapped-pte_offset_map_nolock-not-pte_lockptr.patch mm-pagewalkers-action_again-if-pte_offset_map_lock-fails.patch mm-pagewalk-walk_pte_range-allow-for-pte_offset_map.patch mm-vmwgfx-simplify-pmd-pud-mapping-dirty-helpers.patch mm-vmalloc-vmalloc_to_page-use-pte_offset_kernel.patch mm-hmm-retry-if-pte_offset_map-fails.patch mm-userfaultfd-retry-if-pte_offset_map-fails.patch mm-userfaultfd-allow-pte_offset_map_lock-to-fail.patch mm-debug_vm_pgtablepage_table_check-warn-pte-map-fails.patch mm-various-give-up-if-pte_offset_map-fails.patch mm-mprotect-delete-pmd_none_or_clear_bad_unless_trans_huge.patch mm-mremap-retry-if-either-pte_offset_map_lock-fails.patch mm-madvise-clean-up-pte_offset_map_lock-scans.patch mm-madvise-clean-up-force_shm_swapin_readahead.patch mm-swapoff-allow-pte_offset_map-to-fail.patch mm-mglru-allow-pte_offset_map_nolock-to-fail.patch mm-migrate_device-allow-pte_offset_map_lock-to-fail.patch mm-gup-remove-foll_split_pmd-use-of-pmd_trans_unstable.patch mm-huge_memory-split-huge-pmd-under-one-pte_offset_map.patch mm-khugepaged-allow-pte_offset_map-to-fail.patch mm-memory-allow-pte_offset_map-to-fail.patch mm-memory-handle_pte_fault-use-pte_offset_map_nolock.patch mm-pgtable-delete-pmd_trans_unstable-and-friends.patch mm-swap-swap_vma_readahead-do-the-pte_offset_map.patch perf-core-allow-pte_offset_map-to-fail.patch