From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
James Houghton <jthoughton@google.com>,
Jann Horn <jannh@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@surriel.com>,
Nadav Amit <nadav.amit@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage
Date: Wed, 30 Nov 2022 11:25:04 -0500 [thread overview]
Message-ID: <Y4eD4EW2sAlb00RO@x1n> (raw)
In-Reply-To: <b4bad424-9ae3-41e2-d844-6fa63f44be62@redhat.com>
On Wed, Nov 30, 2022 at 05:11:36PM +0100, David Hildenbrand wrote:
> On 30.11.22 17:09, Peter Xu wrote:
> > On Wed, Nov 30, 2022 at 11:24:34AM +0100, David Hildenbrand wrote:
> > > On 29.11.22 20:35, Peter Xu wrote:
> > > > huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a
> > > > hugetlb address.
> > > >
> > > > Normally, it's always safe to walk a generic pgtable as long as we're with
> > > > the mmap lock held for either read or write, because that guarantees the
> > > > pgtable pages will always be valid during the process.
> > > >
> > > > But it's not true for hugetlbfs, especially shared: hugetlbfs can have its
> > > > pgtable freed by pmd unsharing, it means that even with mmap lock held for
> > > > current mm, the PMD pgtable page can still go away from under us if pmd
> > > > unsharing is possible during the walk.
> > > >
> > > > So we have two ways to make it safe even for a shared mapping:
> > > >
> > > > (1) If we're with the hugetlb vma lock held for either read/write, it's
> > > > okay because pmd unshare cannot happen at all.
> > > >
> > > > (2) If we're with the i_mmap_rwsem lock held for either read/write, it's
> > > > okay because even if pmd unshare can happen, the pgtable page cannot
> > > > be freed from under us.
> > > >
> > > > Document it.
> > > >
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > > include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++
> > > > 1 file changed, 32 insertions(+)
> > > >
> > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > > > index 551834cd5299..81efd9b9baa2 100644
> > > > --- a/include/linux/hugetlb.h
> > > > +++ b/include/linux/hugetlb.h
> > > > @@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages;
> > > > pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> > > > unsigned long addr, unsigned long sz);
> > > > +/*
> > > > + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
> > > > + * Returns the pte_t* if found, or NULL if the address is not mapped.
> > > > + *
> > > > + * Since this function will walk all the pgtable pages (including not only
> > > > + * high-level pgtable page, but also PUD entry that can be unshared
> > > > + * concurrently for VM_SHARED), the caller of this function should be
> > > > + * responsible of its thread safety. One can follow this rule:
> > > > + *
> > > > + * (1) For private mappings: pmd unsharing is not possible, so it'll
> > > > + * always be safe if we're with the mmap sem for either read or write.
> > > > + * This is normally always the case, IOW we don't need to do anything
> > > > + * special.
> > >
> > > Maybe worth mentioning that hugetlb_vma_lock_read() and friends already
> > > optimize for private mappings, to not take the VMA lock if not required.
> >
> > Yes we can. I assume this is not super urgent so I'll hold a while to see
> > whether there's anything else that needs amending for the documents.
> >
> > Btw, even with hugetlb_vma_lock_read() checking SHARED for a private only
> > code path it's still better to not take the lock at all, because that still
> > contains a function jump which will be unnecesary.
>
> IMHO it makes coding a lot more consistent and less error-prone when not
> care about whether to the the lock or not (as an optimization) and just
> having this handled "automatically".
>
> Optimizing a jump out would rather smell like a micro-optimization.
Or we can move the lock helpers into the headers, too.
--
Peter Xu
next prev parent reply other threads:[~2022-11-30 16:25 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 19:35 [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Peter Xu
2022-11-29 19:35 ` [PATCH 01/10] mm/hugetlb: Let vma_offset_start() to return start Peter Xu
2022-11-30 10:11 ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 02/10] mm/hugetlb: Don't wait for migration entry during follow page Peter Xu
2022-11-30 4:37 ` Mike Kravetz
2022-11-30 10:15 ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 03/10] mm/hugetlb: Document huge_pte_offset usage Peter Xu
2022-11-30 4:55 ` Mike Kravetz
2022-11-30 15:58 ` Peter Xu
2022-12-05 21:47 ` Mike Kravetz
2022-11-30 10:21 ` David Hildenbrand
2022-11-30 10:24 ` David Hildenbrand
2022-11-30 16:09 ` Peter Xu
2022-11-30 16:11 ` David Hildenbrand
2022-11-30 16:25 ` Peter Xu [this message]
2022-11-30 16:31 ` David Hildenbrand
2022-11-29 19:35 ` [PATCH 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Peter Xu
2022-12-05 22:14 ` Mike Kravetz
2022-12-05 23:36 ` Peter Xu
2022-11-29 19:35 ` [PATCH 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Peter Xu
2022-11-30 16:08 ` David Hildenbrand
2022-12-05 22:23 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 06/10] mm/hugetlb: Make hugetlb_follow_page_mask() " Peter Xu
2022-11-30 16:09 ` David Hildenbrand
2022-12-05 22:29 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 07/10] mm/hugetlb: Make follow_hugetlb_page() " Peter Xu
2022-11-30 16:09 ` David Hildenbrand
2022-12-05 22:45 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 08/10] mm/hugetlb: Make walk_hugetlb_range() " Peter Xu
2022-11-30 16:11 ` David Hildenbrand
2022-12-05 23:33 ` Mike Kravetz
2022-12-05 23:52 ` John Hubbard
2022-12-06 16:45 ` Peter Xu
2022-12-06 18:50 ` Mike Kravetz
2022-12-06 21:03 ` John Hubbard
2022-12-06 21:51 ` Peter Xu
2022-12-06 22:31 ` John Hubbard
2022-12-07 0:07 ` Peter Xu
2022-12-07 2:38 ` John Hubbard
2022-12-07 14:58 ` Peter Xu
2022-11-29 19:35 ` [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() " Peter Xu
2022-11-30 16:18 ` David Hildenbrand
2022-11-30 16:32 ` Peter Xu
2022-11-30 16:39 ` David Hildenbrand
2022-12-05 23:52 ` Mike Kravetz
2022-12-06 17:10 ` Mike Kravetz
2022-12-06 17:39 ` Peter Xu
2022-12-06 17:43 ` Peter Xu
2022-12-06 19:58 ` Mike Kravetz
2022-11-29 19:35 ` [PATCH 10/10] mm/hugetlb: Introduce hugetlb_walk() Peter Xu
2022-11-30 5:18 ` Eric Biggers
2022-11-30 15:37 ` Peter Xu
2022-12-06 0:21 ` Mike Kravetz
2022-11-29 20:49 ` [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Andrew Morton
2022-11-29 21:19 ` Peter Xu
2022-11-29 21:26 ` Andrew Morton
2022-11-29 20:51 ` Andrew Morton
2022-11-29 21:36 ` Peter Xu
2022-11-30 9:46 ` David Hildenbrand
2022-11-30 16:23 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y4eD4EW2sAlb00RO@x1n \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=jannh@google.com \
--cc=jthoughton@google.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nadav.amit@gmail.com \
--cc=riel@surriel.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.