From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-ia64@vger.kernel.org,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
Naoya Horiguchi <naoya.horiguchi@linux.dev>,
Michael Ellerman <mpe@ellerman.id.au>,
Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask
Date: Fri, 28 Oct 2022 15:27:57 +0000 [thread overview]
Message-ID: <Y1v0/Y4Xiut2FWx4@monkey> (raw)
In-Reply-To: <Y1rdVLMDD4PMt3s3@x1n>
On 10/27/22 15:34, Peter Xu wrote:
> On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote:
> > On 10/26/22 17:59, Peter Xu wrote:
>
> If we want to use the vma read lock to protect here as the slow gup path,
> then please check again with below [1] - I think we'll also need to protect
> it with fast-gup (probably with trylock only, because fast-gup cannot
> sleep) or it'll encounter the same race, iiuc.
>
> Actually, instead of using vma lock, I really think this is another problem
> and needs standalone fixing. The problem is we allows huge_pte_offset() to
> walk the process pgtable without any protection, while pmd unsharing can
> drop a page anytime. huge_pte_offset() is always facing use-after-free
> when walking the PUD page.
>
> We may want RCU lock to protect the pgtable pages from getting away when
> huge_pte_offset() is walking it, it'll be safe then because pgtable pages
> are released in RCU fashion only (e.g. in above example, process [2] will
> munmap() and release the last ref to the "used to be shared" pmd and the
> PUD that maps the shared pmds will be released only after a RCU grace
> period), and afaict that's also what's protecting fast-gup from accessing
> freed pgtable pages.
>
> If with all huge_pte_offset() callers becoming RCU-safe, then IIUC we can
> drop the vma lock in all GUP code, aka, in hugetlb_follow_page_mask() here,
> because both slow and fast gup should be safe too in the same manner.
>
> Thanks,
>
> > > IIUC it's also the same as fast-gup - afaiu we don't take the read vma lock
> > > in fast-gup too but I also think it's safe. But I hope I didn't miss
> > > something.
>
> [1]
Thanks Peter! I think the best thing would be to eliminate the vma_lock
calls in this patch. The code it is replacing/simplifying does not do any
locking, so no real regression.
I think a scheme like you describe above is going to require some more
thought/work. It might be better as a follow on patch.
--
Mike Kravetz
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
linux-ia64@vger.kernel.org, David Hildenbrand <david@redhat.com>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
Muchun Song <songmuchun@bytedance.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask
Date: Fri, 28 Oct 2022 08:27:57 -0700 [thread overview]
Message-ID: <Y1v0/Y4Xiut2FWx4@monkey> (raw)
In-Reply-To: <Y1rdVLMDD4PMt3s3@x1n>
On 10/27/22 15:34, Peter Xu wrote:
> On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote:
> > On 10/26/22 17:59, Peter Xu wrote:
>
> If we want to use the vma read lock to protect here as the slow gup path,
> then please check again with below [1] - I think we'll also need to protect
> it with fast-gup (probably with trylock only, because fast-gup cannot
> sleep) or it'll encounter the same race, iiuc.
>
> Actually, instead of using vma lock, I really think this is another problem
> and needs standalone fixing. The problem is we allows huge_pte_offset() to
> walk the process pgtable without any protection, while pmd unsharing can
> drop a page anytime. huge_pte_offset() is always facing use-after-free
> when walking the PUD page.
>
> We may want RCU lock to protect the pgtable pages from getting away when
> huge_pte_offset() is walking it, it'll be safe then because pgtable pages
> are released in RCU fashion only (e.g. in above example, process [2] will
> munmap() and release the last ref to the "used to be shared" pmd and the
> PUD that maps the shared pmds will be released only after a RCU grace
> period), and afaict that's also what's protecting fast-gup from accessing
> freed pgtable pages.
>
> If with all huge_pte_offset() callers becoming RCU-safe, then IIUC we can
> drop the vma lock in all GUP code, aka, in hugetlb_follow_page_mask() here,
> because both slow and fast gup should be safe too in the same manner.
>
> Thanks,
>
> > > IIUC it's also the same as fast-gup - afaiu we don't take the read vma lock
> > > in fast-gup too but I also think it's safe. But I hope I didn't miss
> > > something.
>
> [1]
Thanks Peter! I think the best thing would be to eliminate the vma_lock
calls in this patch. The code it is replacing/simplifying does not do any
locking, so no real regression.
I think a scheme like you describe above is going to require some more
thought/work. It might be better as a follow on patch.
--
Mike Kravetz
WARNING: multiple messages have this Message-ID (diff)
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-ia64@vger.kernel.org,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
Naoya Horiguchi <naoya.horiguchi@linux.dev>,
Michael Ellerman <mpe@ellerman.id.au>,
Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask
Date: Fri, 28 Oct 2022 08:27:57 -0700 [thread overview]
Message-ID: <Y1v0/Y4Xiut2FWx4@monkey> (raw)
In-Reply-To: <Y1rdVLMDD4PMt3s3@x1n>
On 10/27/22 15:34, Peter Xu wrote:
> On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote:
> > On 10/26/22 17:59, Peter Xu wrote:
>
> If we want to use the vma read lock to protect here as the slow gup path,
> then please check again with below [1] - I think we'll also need to protect
> it with fast-gup (probably with trylock only, because fast-gup cannot
> sleep) or it'll encounter the same race, iiuc.
>
> Actually, instead of using vma lock, I really think this is another problem
> and needs standalone fixing. The problem is we allows huge_pte_offset() to
> walk the process pgtable without any protection, while pmd unsharing can
> drop a page anytime. huge_pte_offset() is always facing use-after-free
> when walking the PUD page.
>
> We may want RCU lock to protect the pgtable pages from getting away when
> huge_pte_offset() is walking it, it'll be safe then because pgtable pages
> are released in RCU fashion only (e.g. in above example, process [2] will
> munmap() and release the last ref to the "used to be shared" pmd and the
> PUD that maps the shared pmds will be released only after a RCU grace
> period), and afaict that's also what's protecting fast-gup from accessing
> freed pgtable pages.
>
> If with all huge_pte_offset() callers becoming RCU-safe, then IIUC we can
> drop the vma lock in all GUP code, aka, in hugetlb_follow_page_mask() here,
> because both slow and fast gup should be safe too in the same manner.
>
> Thanks,
>
> > > IIUC it's also the same as fast-gup - afaiu we don't take the read vma lock
> > > in fast-gup too but I also think it's safe. But I hope I didn't miss
> > > something.
>
> [1]
Thanks Peter! I think the best thing would be to eliminate the vma_lock
calls in this patch. The code it is replacing/simplifying does not do any
locking, so no real regression.
I think a scheme like you describe above is going to require some more
thought/work. It might be better as a follow on patch.
--
Mike Kravetz
next prev parent reply other threads:[~2022-10-28 15:27 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 2:13 [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask Mike Kravetz
2022-09-19 2:13 ` Mike Kravetz
2022-09-19 2:13 ` Mike Kravetz
2022-09-19 7:50 ` David Hildenbrand
2022-09-19 7:50 ` David Hildenbrand
2022-09-19 7:50 ` David Hildenbrand
2022-09-21 9:30 ` Baolin Wang
2022-09-21 9:30 ` Baolin Wang
2022-09-21 9:30 ` Baolin Wang
2022-10-26 21:59 ` Peter Xu
2022-10-26 21:59 ` Peter Xu
2022-10-26 21:59 ` Peter Xu
2022-10-27 0:34 ` Mike Kravetz
2022-10-27 0:34 ` Mike Kravetz
2022-10-27 0:34 ` Mike Kravetz
2022-10-27 19:34 ` Peter Xu
2022-10-27 19:34 ` Peter Xu
2022-10-27 19:34 ` Peter Xu
2022-10-28 15:27 ` Mike Kravetz [this message]
2022-10-28 15:27 ` Mike Kravetz
2022-10-28 15:27 ` Mike Kravetz
2022-10-28 15:57 ` Peter Xu
2022-10-28 15:57 ` Peter Xu
2022-10-28 15:57 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1v0/Y4Xiut2FWx4@monkey \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=naoya.horiguchi@linux.dev \
--cc=peterx@redhat.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.