From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
James Houghton <jthoughton@google.com>,
Jann Horn <jannh@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@surriel.com>,
Nadav Amit <nadav.amit@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Muchun Song <songmuchun@bytedance.com>,
David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted
Date: Mon, 5 Dec 2022 14:14:38 -0800 [thread overview]
Message-ID: <Y45tTl3tleVP8fA6@monkey> (raw)
In-Reply-To: <20221129193526.3588187-5-peterx@redhat.com>
On 11/29/22 14:35, Peter Xu wrote:
> In hugetlb_fault(), there used to have a special path to handle swap entry
> at the entrance using huge_pte_offset(). That's unsafe because
> huge_pte_offset() for a pmd sharable range can access freed pgtables if
> without any lock to protect the pgtable from being freed after pmd unshare.
Thanks. That special path has been there for a long time. I should have given
it more thought when adding lock. However, I was only considering the 'stale'
pte case as opposed to freed.
> Here the simplest solution to make it safe is to move the swap handling to
> be after the vma lock being held. We may need to take the fault mutex on
> either migration or hwpoison entries now (also the vma lock, but that's
> really needed), however neither of them is hot path.
>
> Note that the vma lock cannot be released in hugetlb_fault() when the
> migration entry is detected, because in migration_entry_wait_huge() the
> pgtable page will be used again (by taking the pgtable lock), so that also
> need to be protected by the vma lock. Modify migration_entry_wait_huge()
> so that it must be called with vma read lock held, and properly release the
> lock in __migration_entry_wait_huge().
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> include/linux/swapops.h | 6 ++++--
> mm/hugetlb.c | 32 +++++++++++++++-----------------
> mm/migrate.c | 25 +++++++++++++++++++++----
> 3 files changed, 40 insertions(+), 23 deletions(-)
Looks good with one small change noted below,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index 27ade4f22abb..09b22b169a71 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -335,7 +335,8 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> unsigned long address);
> #ifdef CONFIG_HUGETLB_PAGE
> -extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl);
> +extern void __migration_entry_wait_huge(struct vm_area_struct *vma,
> + pte_t *ptep, spinlock_t *ptl);
> extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte);
> #endif /* CONFIG_HUGETLB_PAGE */
> #else /* CONFIG_MIGRATION */
> @@ -364,7 +365,8 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> unsigned long address) { }
> #ifdef CONFIG_HUGETLB_PAGE
> -static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { }
> +static inline void __migration_entry_wait_huge(struct vm_area_struct *vma,
> + pte_t *ptep, spinlock_t *ptl) { }
> static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { }
> #endif /* CONFIG_HUGETLB_PAGE */
> static inline int is_writable_migration_entry(swp_entry_t entry)
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index dfe677fadaf8..776e34ccf029 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5826,22 +5826,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> int need_wait_lock = 0;
> unsigned long haddr = address & huge_page_mask(h);
>
> - ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
> - if (ptep) {
> - /*
> - * Since we hold no locks, ptep could be stale. That is
> - * OK as we are only making decisions based on content and
> - * not actually modifying content here.
> - */
> - entry = huge_ptep_get(ptep);
> - if (unlikely(is_hugetlb_entry_migration(entry))) {
> - migration_entry_wait_huge(vma, ptep);
> - return 0;
> - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
> - return VM_FAULT_HWPOISON_LARGE |
> - VM_FAULT_SET_HINDEX(hstate_index(h));
> - }
> -
Before acquiring the vma_lock, there is this comment:
/*
* Acquire vma lock before calling huge_pte_alloc and hold
* until finished with ptep. This prevents huge_pmd_unshare from
* being called elsewhere and making the ptep no longer valid.
*
* ptep could have already be assigned via hugetlb_walk(). That
* is OK, as huge_pte_alloc will return the same value unless
* something has changed.
*/
The second sentence in that comment about ptep being already assigned no
longer applies and can be deleted.
--
Mike Kravetz
next prev parent reply other threads:[~2022-12-05 22:14 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 19:35 [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-11-29 19:35 ` [PATCH 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-30 10:11 ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 02/10] mm/hugetlb: Don't wait for migration entry during follow page Peter Xu
2022-11-30 4:37 ` Mike Kravetz
2022-11-30 10:15 ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage Peter Xu
2022-11-30 4:55 ` Mike Kravetz
2022-11-30 15:58 ` Peter Xu
2022-12-05 21:47 ` Mike Kravetz
2022-11-30 10:21 ` David Hildenbrand
2022-11-30 10:24 ` David Hildenbrand
2022-11-30 16:09 ` Peter Xu
2022-11-30 16:11 ` David Hildenbrand
2022-11-30 16:25 ` Peter Xu
2022-11-30 16:31 ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Peter Xu
2022-12-05 22:14 ` Mike Kravetz [this message]
2022-12-05 23:36 ` Peter Xu
2022-11-29 19:35 ` [PATCH 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Peter Xu
2022-11-30 16:08 ` David Hildenbrand
2022-12-05 22:23 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 06/10] mm/hugetlb: Make hugetlb_follow_page_mask() " Peter Xu
2022-11-30 16:09 ` David Hildenbrand
2022-12-05 22:29 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 07/10] mm/hugetlb: Make follow_hugetlb_page() " Peter Xu
2022-11-30 16:09 ` David Hildenbrand
2022-12-05 22:45 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() " Peter Xu
2022-11-30 16:11 ` David Hildenbrand
2022-12-05 23:33 ` Mike Kravetz
2022-12-05 23:52 ` John Hubbard
2022-12-06 16:45 ` Peter Xu
2022-12-06 18:50 ` Mike Kravetz
2022-12-06 21:03 ` John Hubbard
2022-12-06 21:51 ` Peter Xu
2022-12-06 22:31 ` John Hubbard
2022-12-07 0:07 ` Peter Xu
2022-12-07 2:38 ` John Hubbard
2022-12-07 14:58 ` Peter Xu
2022-11-29 19:35 ` [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() " Peter Xu
2022-11-30 16:18 ` David Hildenbrand
2022-11-30 16:32 ` Peter Xu
2022-11-30 16:39 ` David Hildenbrand
2022-12-05 23:52 ` Mike Kravetz
2022-12-06 17:10 ` Mike Kravetz
2022-12-06 17:39 ` Peter Xu
2022-12-06 17:43 ` Peter Xu
2022-12-06 19:58 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 10/10] mm/hugetlb: Introduce hugetlb_walk() Peter Xu
2022-11-30 5:18 ` Eric Biggers
2022-11-30 15:37 ` Peter Xu
2022-12-06 0:21 ` Mike Kravetz
2022-11-29 20:49 ` [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Andrew Morton
2022-11-29 21:19 ` Peter Xu
2022-11-29 21:26 ` Andrew Morton
2022-11-29 20:51 ` Andrew Morton
2022-11-29 21:36 ` Peter Xu
2022-11-30 9:46 ` David Hildenbrand
2022-11-30 16:23 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y45tTl3tleVP8fA6@monkey \
--to=mike.kravetz@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).