From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFD65C001B2 for ; Fri, 16 Dec 2022 17:18:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231738AbiLPRSU (ORCPT ); Fri, 16 Dec 2022 12:18:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231739AbiLPRSM (ORCPT ); Fri, 16 Dec 2022 12:18:12 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EB3B70B8A for ; Fri, 16 Dec 2022 09:18:05 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id ADC2D62116 for ; Fri, 16 Dec 2022 17:18:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0FE22C433D2; Fri, 16 Dec 2022 17:18:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1671211084; bh=2WoLELey9jbgMTvKRzJ4x05vvSAKqGlehzCueBWIHTw=; h=Date:To:From:Subject:From; b=bbgjSl8A6Fn4WCl/sYi290mCFl9wgxumYgDsmrw93kfZ8oJkKqS/4KZkpRMQIA1UZ 6VLINYTZCy+MykwVJYgV4Tp9TgCOdv4ba36jNnpVjpfF0NoCezIgnma55Mry7B8HN1 o6YhxNORf2EPXaGNqiZJv5GekZjm0pkEyrHWxYLM= Date: Fri, 16 Dec 2022 09:18:03 -0800 To: mm-commits@vger.kernel.org, songmuchun@bytedance.com, riel@surriel.com, nadav.amit@gmail.com, mike.kravetz@oracle.com, linmiaohe@huawei.com, jthoughton@google.com, jhubbard@nvidia.com, jannh@google.com, david@redhat.com, aarcange@redhat.com, peterx@redhat.com, akpm@linux-foundation.org From: Andrew Morton Subject: + mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare.patch added to mm-unstable branch Message-Id: <20221216171804.0FE22C433D2@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org The patch titled Subject: mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare has been added to the -mm mm-unstable branch. Its filename is mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Peter Xu Subject: mm/hugetlb: make walk_hugetlb_range() safe to pmd unshare Date: Fri, 16 Dec 2022 10:52:26 -0500 Since walk_hugetlb_range() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Link: https://lkml.kernel.org/r/20221216155226.2043738-1-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: James Houghton Cc: Jann Horn Cc: Miaohe Lin Cc: Muchun Song Cc: Nadav Amit Cc: Rik van Riel Signed-off-by: Andrew Morton --- include/linux/pagewalk.h | 11 ++++++++++- mm/hmm.c | 15 ++++++++++++++- mm/pagewalk.c | 2 ++ 3 files changed, 26 insertions(+), 2 deletions(-) --- a/include/linux/pagewalk.h~mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare +++ a/include/linux/pagewalk.h @@ -21,7 +21,16 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. This hook + * function is called with the vma lock held, in order to + * protect against a concurrent freeing of the pte_t* or + * the ptl. In some cases, the hook function needs to drop + * and retake the vma lock in order to avoid deadlocks + * while calling other functions. In such cases the hook + * function must either refrain from accessing the pte or + * ptl after dropping the vma lock, or else revalidate + * those items after re-acquiring the vma lock and before + * accessing them. * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning --- a/mm/hmm.c~mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare +++ a/mm/hmm.c @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pt required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { + int ret; + spin_unlock(ptl); - return hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_unlock_read(vma); + /* + * Avoid deadlock: drop the vma lock before calling + * hmm_vma_fault(), which will itself potentially take and + * drop the vma lock. This is also correct from a + * protection point of view, because there is no further + * use here of either pte or ptl after dropping the vma + * lock. + */ + ret = hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_lock_read(vma); + return ret; } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); --- a/mm/pagewalk.c~mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare +++ a/mm/pagewalk.c @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned l const struct mm_walk_ops *ops = walk->ops; int err = 0; + hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned l if (err) break; } while (addr = next, addr != end); + hugetlb_vma_unlock_read(vma); return err; } _ Patches currently in -mm which might be from peterx@redhat.com are mm-uffd-fix-pte-marker-when-fork-without-fork-event.patch mm-fix-a-few-rare-cases-of-using-swapin-error-pte-marker.patch mm-uffd-always-wr-protect-pte-in-ptepmd_mkuffd_wp.patch mm-hugetlb-let-vma_offset_start-to-return-start.patch mm-hugetlb-dont-wait-for-migration-entry-during-follow-page.patch mm-hugetlb-document-huge_pte_offset-usage.patch mm-hugetlb-move-swap-entry-handling-into-vma-lock-when-faulted.patch mm-hugetlb-make-userfaultfd_huge_must_wait-safe-to-pmd-unshare.patch mm-hugetlb-make-hugetlb_follow_page_mask-safe-to-pmd-unshare.patch mm-hugetlb-make-follow_hugetlb_page-safe-to-pmd-unshare.patch mm-hugetlb-make-walk_hugetlb_range-safe-to-pmd-unshare.patch mm-hugetlb-introduce-hugetlb_walk.patch