* [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
@ 2021-06-10 6:34 ` Hugh Dickins
2021-06-10 8:12 ` Alistair Popple
2021-06-10 8:55 ` Kirill A. Shutemov
2021-06-10 6:36 ` [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry Hugh Dickins
` (9 subsequent siblings)
10 siblings, 2 replies; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:34 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was
used, sometimes pvmw->page itself: use the local copy "page" throughout.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index e37bd43904af..a6dbf714ca15 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -156,7 +156,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pte)
goto next_pte;
- if (unlikely(PageHuge(pvmw->page))) {
+ if (unlikely(PageHuge(page))) {
/* when pud is not present, pte will be NULL */
pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
if (!pvmw->pte)
@@ -217,8 +217,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
* cannot return prematurely, while zap_huge_pmd() has
* cleared *pmd but not decremented compound_mapcount().
*/
- if ((pvmw->flags & PVMW_SYNC) &&
- PageTransCompound(pvmw->page)) {
+ if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
spin_unlock(ptl);
@@ -234,9 +233,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
+ if (!PageTransHuge(page) || PageHuge(page))
return not_found(pvmw);
- end = vma_address_end(pvmw->page, pvmw->vma);
+ end = vma_address_end(page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page
2021-06-10 6:34 ` [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page Hugh Dickins
@ 2021-06-10 8:12 ` Alistair Popple
2021-06-10 8:55 ` Kirill A. Shutemov
1 sibling, 0 replies; 37+ messages in thread
From: Alistair Popple @ 2021-06-10 8:12 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Ralph Campbell, Zi Yan, Peter Xu, Will Deacon,
linux-mm, linux-kernel
Thanks for this, I've read through page_vma_mapped_walk() a few too many times
recently and it annoyed me enough to start writing some patches to clean it up,
but happy to see this instead.
I confirmed pvmw->page isn't modified here or in map/check_pte(pvmw), so the
patch looks good to me:
Reviewed-by: Alistair Popple <apopple@nvidia.com>
On Thursday, 10 June 2021 4:34:40 PM AEST Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was
> used, sometimes pvmw->page itself: use the local copy "page" throughout.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> mm/page_vma_mapped.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index e37bd43904af..a6dbf714ca15 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -156,7 +156,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> if (pvmw->pte)
> goto next_pte;
>
> - if (unlikely(PageHuge(pvmw->page))) {
> + if (unlikely(PageHuge(page))) {
> /* when pud is not present, pte will be NULL */
> pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> if (!pvmw->pte)
> @@ -217,8 +217,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> * cannot return prematurely, while zap_huge_pmd() has
> * cleared *pmd but not decremented compound_mapcount().
> */
> - if ((pvmw->flags & PVMW_SYNC) &&
> - PageTransCompound(pvmw->page)) {
> + if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
> spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
>
> spin_unlock(ptl);
> @@ -234,9 +233,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> return true;
> next_pte:
> /* Seek to next pte only makes sense for THP */
> - if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
> + if (!PageTransHuge(page) || PageHuge(page))
> return not_found(pvmw);
> - end = vma_address_end(pvmw->page, pvmw->vma);
> + end = vma_address_end(page, pvmw->vma);
> do {
> pvmw->address += PAGE_SIZE;
> if (pvmw->address >= end)
> --
> 2.26.2
>
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page
2021-06-10 6:34 ` [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page Hugh Dickins
2021-06-10 8:12 ` Alistair Popple
@ 2021-06-10 8:55 ` Kirill A. Shutemov
2021-06-10 14:14 ` Peter Xu
1 sibling, 1 reply; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 8:55 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:34:40PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: sometimes the local copy of pvwm->page was
> used, sometimes pvmw->page itself: use the local copy "page" throughout.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
A question below.
> ---
> mm/page_vma_mapped.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index e37bd43904af..a6dbf714ca15 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -156,7 +156,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> if (pvmw->pte)
> goto next_pte;
>
> - if (unlikely(PageHuge(pvmw->page))) {
> + if (unlikely(PageHuge(page))) {
> /* when pud is not present, pte will be NULL */
> pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> if (!pvmw->pte)
> @@ -217,8 +217,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> * cannot return prematurely, while zap_huge_pmd() has
> * cleared *pmd but not decremented compound_mapcount().
> */
> - if ((pvmw->flags & PVMW_SYNC) &&
> - PageTransCompound(pvmw->page)) {
> + if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
> spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
>
> spin_unlock(ptl);
> @@ -234,9 +233,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> return true;
> next_pte:
> /* Seek to next pte only makes sense for THP */
> - if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
> + if (!PageTransHuge(page) || PageHuge(page))
> return not_found(pvmw);
> - end = vma_address_end(pvmw->page, pvmw->vma);
> + end = vma_address_end(page, pvmw->vma);
> do {
> pvmw->address += PAGE_SIZE;
> if (pvmw->address >= end)
I see two more pvmw->page in this loop. Do you leave them here as the code
will be rewritten later in the patchset?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page
2021-06-10 8:55 ` Kirill A. Shutemov
@ 2021-06-10 14:14 ` Peter Xu
0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-06-10 14:14 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Hugh Dickins, Andrew Morton, Kirill A. Shutemov, Yang Shi,
Wang Yugui, Matthew Wilcox, Alistair Popple, Ralph Campbell,
Zi Yan, Will Deacon, linux-mm, linux-kernel
On Thu, Jun 10, 2021 at 11:55:22AM +0300, Kirill A. Shutemov wrote:
> > @@ -234,9 +233,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > return true;
> > next_pte:
> > /* Seek to next pte only makes sense for THP */
> > - if (!PageTransHuge(pvmw->page) || PageHuge(pvmw->page))
> > + if (!PageTransHuge(page) || PageHuge(page))
> > return not_found(pvmw);
> > - end = vma_address_end(pvmw->page, pvmw->vma);
> > + end = vma_address_end(page, pvmw->vma);
> > do {
> > pvmw->address += PAGE_SIZE;
> > if (pvmw->address >= end)
>
> I see two more pvmw->page in this loop. Do you leave them here as the code
> will be rewritten later in the patchset?
I think they've got removed in previous series ("[PATCH v2 04/10] mm/thp: fix
vma_address() if virtual address below file offset").
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
2021-06-10 6:34 ` [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page Hugh Dickins
@ 2021-06-10 6:36 ` Hugh Dickins
2021-06-10 8:57 ` Kirill A. Shutemov
2021-06-10 14:17 ` Peter Xu
2021-06-10 6:38 ` [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic() Hugh Dickins
` (8 subsequent siblings)
10 siblings, 2 replies; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:36 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case
out of the way at the start, so no need to worry about it later.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index a6dbf714ca15..7c0504641fb8 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -153,10 +153,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pmd && !pvmw->pte)
return not_found(pvmw);
- if (pvmw->pte)
- goto next_pte;
-
if (unlikely(PageHuge(page))) {
+ /* The only possible mapping was handled on last iteration */
+ if (pvmw->pte)
+ return not_found(pvmw);
+
/* when pud is not present, pte will be NULL */
pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
if (!pvmw->pte)
@@ -168,6 +169,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return not_found(pvmw);
return true;
}
+
+ if (pvmw->pte)
+ goto next_pte;
restart:
pgd = pgd_offset(mm, pvmw->address);
if (!pgd_present(*pgd))
@@ -233,7 +237,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return true;
next_pte:
/* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(page) || PageHuge(page))
+ if (!PageTransHuge(page))
return not_found(pvmw);
end = vma_address_end(page, pvmw->vma);
do {
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry
2021-06-10 6:36 ` [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry Hugh Dickins
@ 2021-06-10 8:57 ` Kirill A. Shutemov
2021-06-10 14:17 ` Peter Xu
1 sibling, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 8:57 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:36:36PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case
> out of the way at the start, so no need to worry about it later.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry
2021-06-10 6:36 ` [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry Hugh Dickins
2021-06-10 8:57 ` Kirill A. Shutemov
@ 2021-06-10 14:17 ` Peter Xu
1 sibling, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-06-10 14:17 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:36:36PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: get the hugetlbfs PageHuge case
> out of the way at the start, so no need to worry about it later.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> mm/page_vma_mapped.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index a6dbf714ca15..7c0504641fb8 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -153,10 +153,11 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> if (pvmw->pmd && !pvmw->pte)
> return not_found(pvmw);
>
> - if (pvmw->pte)
> - goto next_pte;
> -
> if (unlikely(PageHuge(page))) {
> + /* The only possible mapping was handled on last iteration */
> + if (pvmw->pte)
> + return not_found(pvmw);
> +
> /* when pud is not present, pte will be NULL */
> pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> if (!pvmw->pte)
Would it be even nicer to move the initial check to be after PageHuge() too?
if (pvmw->pmd && !pvmw->pte)
return not_found(pvmw);
It looks already better, so no strong opinion.
Reviewed-by: Peter Xu <peterx@redhat.com>
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic()
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
2021-06-10 6:34 ` [PATCH 01/11] mm: page_vma_mapped_walk(): use page for pvmw->page Hugh Dickins
2021-06-10 6:36 ` [PATCH 02/11] mm: page_vma_mapped_walk(): settle PageHuge on entry Hugh Dickins
@ 2021-06-10 6:38 ` Hugh Dickins
2021-06-10 9:06 ` Kirill A. Shutemov
2021-06-10 6:40 ` [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Hugh Dickins
` (7 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:38 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: use pmd_read_atomic() with barrier()
instead of READ_ONCE() for pmde: some architectures (e.g. i386 with PAE)
have a multi-word pmd entry, for which READ_ONCE() is not good enough.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 7c0504641fb8..973c3c4e72cc 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -182,13 +182,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
pud = pud_offset(p4d, pvmw->address);
if (!pud_present(*pud))
return false;
+
pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
* Make sure the pmd value isn't cached in a register by the
* compiler and used as a stale value after we've observed a
* subsequent update.
*/
- pmde = READ_ONCE(*pvmw->pmd);
+ pmde = pmd_read_atomic(pvmw->pmd);
+ barrier();
+
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
if (likely(pmd_trans_huge(*pvmw->pmd))) {
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic()
2021-06-10 6:38 ` [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic() Hugh Dickins
@ 2021-06-10 9:06 ` Kirill A. Shutemov
2021-06-10 12:15 ` Jason Gunthorpe
0 siblings, 1 reply; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:06 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:38:11PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: use pmd_read_atomic() with barrier()
> instead of READ_ONCE() for pmde: some architectures (e.g. i386 with PAE)
> have a multi-word pmd entry, for which READ_ONCE() is not good enough.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> mm/page_vma_mapped.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 7c0504641fb8..973c3c4e72cc 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -182,13 +182,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> pud = pud_offset(p4d, pvmw->address);
> if (!pud_present(*pud))
> return false;
> +
> pvmw->pmd = pmd_offset(pud, pvmw->address);
> /*
> * Make sure the pmd value isn't cached in a register by the
> * compiler and used as a stale value after we've observed a
> * subsequent update.
> */
> - pmde = READ_ONCE(*pvmw->pmd);
> + pmde = pmd_read_atomic(pvmw->pmd);
> + barrier();
> +
Hm. It makes me wounder if barrier() has to be part of pmd_read_atomic().
mm/hmm.c uses the same pattern as you are and I tend to think that the
rest of pmd_read_atomic() users may be broken.
Am I wrong?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic()
2021-06-10 9:06 ` Kirill A. Shutemov
@ 2021-06-10 12:15 ` Jason Gunthorpe
[not found] ` <aebb6b96-153e-7d7-59da-f6bad4337aa7@google.com>
0 siblings, 1 reply; 37+ messages in thread
From: Jason Gunthorpe @ 2021-06-10 12:15 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Hugh Dickins, Andrew Morton, Kirill A. Shutemov, Yang Shi,
Wang Yugui, Matthew Wilcox, Alistair Popple, Ralph Campbell,
Zi Yan, Peter Xu, Will Deacon, linux-mm, linux-kernel
On Thu, Jun 10, 2021 at 12:06:17PM +0300, Kirill A. Shutemov wrote:
> On Wed, Jun 09, 2021 at 11:38:11PM -0700, Hugh Dickins wrote:
> > page_vma_mapped_walk() cleanup: use pmd_read_atomic() with barrier()
> > instead of READ_ONCE() for pmde: some architectures (e.g. i386 with PAE)
> > have a multi-word pmd entry, for which READ_ONCE() is not good enough.
> >
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Cc: <stable@vger.kernel.org>
> > mm/page_vma_mapped.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > index 7c0504641fb8..973c3c4e72cc 100644
> > +++ b/mm/page_vma_mapped.c
> > @@ -182,13 +182,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > pud = pud_offset(p4d, pvmw->address);
> > if (!pud_present(*pud))
> > return false;
> > +
> > pvmw->pmd = pmd_offset(pud, pvmw->address);
> > /*
> > * Make sure the pmd value isn't cached in a register by the
> > * compiler and used as a stale value after we've observed a
> > * subsequent update.
> > */
> > - pmde = READ_ONCE(*pvmw->pmd);
> > + pmde = pmd_read_atomic(pvmw->pmd);
> > + barrier();
> > +
>
> Hm. It makes me wounder if barrier() has to be part of pmd_read_atomic().
> mm/hmm.c uses the same pattern as you are and I tend to think that the
> rest of pmd_read_atomic() users may be broken.
>
> Am I wrong?
I agree with you, something called _atomic should not require the
caller to provide barriers.
I think the issue is simply that the two implementations of
pmd_read_atomic() should use READ_ONCE() internally, no?
Jason
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (2 preceding siblings ...)
2021-06-10 6:38 ` [PATCH 03/11] mm: page_vma_mapped_walk(): use pmd_read_atomic() Hugh Dickins
@ 2021-06-10 6:40 ` Hugh Dickins
2021-06-10 9:10 ` Kirill A. Shutemov
2021-06-10 14:31 ` Peter Xu
2021-06-10 6:42 ` [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Hugh Dickins
` (6 subsequent siblings)
10 siblings, 2 replies; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:40 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
use it in subsequent tests, instead of repeatedly dereferencing pointer.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 973c3c4e72cc..81000dd0b5da 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -194,18 +194,19 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- if (likely(pmd_trans_huge(*pvmw->pmd))) {
+ pmde = *pvmw->pmd;
+ if (likely(pmd_trans_huge(pmde))) {
if (pvmw->flags & PVMW_MIGRATION)
return not_found(pvmw);
- if (pmd_page(*pvmw->pmd) != page)
+ if (pmd_page(pmde) != page)
return not_found(pvmw);
return true;
- } else if (!pmd_present(*pvmw->pmd)) {
+ } else if (!pmd_present(pmde)) {
if (thp_migration_supported()) {
if (!(pvmw->flags & PVMW_MIGRATION))
return not_found(pvmw);
- if (is_migration_entry(pmd_to_swp_entry(*pvmw->pmd))) {
- swp_entry_t entry = pmd_to_swp_entry(*pvmw->pmd);
+ if (is_migration_entry(pmd_to_swp_entry(pmde))) {
+ swp_entry_t entry = pmd_to_swp_entry(pmde);
if (migration_entry_to_page(entry) != page)
return not_found(pvmw);
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
2021-06-10 6:40 ` [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Hugh Dickins
@ 2021-06-10 9:10 ` Kirill A. Shutemov
2021-06-10 14:31 ` Peter Xu
1 sibling, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:10 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:40:08PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
> use it in subsequent tests, instead of repeatedly dereferencing pointer.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
2021-06-10 6:40 ` [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Hugh Dickins
2021-06-10 9:10 ` Kirill A. Shutemov
@ 2021-06-10 14:31 ` Peter Xu
1 sibling, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-06-10 14:31 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:40:08PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: re-evaluate pmde after taking lock, then
> use it in subsequent tests, instead of repeatedly dereferencing pointer.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (3 preceding siblings ...)
2021-06-10 6:40 ` [PATCH 04/11] mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd Hugh Dickins
@ 2021-06-10 6:42 ` Hugh Dickins
2021-06-10 9:16 ` Kirill A. Shutemov
2021-06-10 14:48 ` Peter Xu
2021-06-10 6:44 ` [PATCH 06/11] mm: page_vma_mapped_walk(): crossing page table boundary Hugh Dickins
` (5 subsequent siblings)
10 siblings, 2 replies; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
follow the same "return not_found, return not_found, return true" pattern
as the block above it (note: returning not_found there is never premature,
since existence or prior existence of huge pmd guarantees good alignment).
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 81000dd0b5da..b96fae568bc2 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -201,24 +201,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pmd_page(pmde) != page)
return not_found(pvmw);
return true;
- } else if (!pmd_present(pmde)) {
- if (thp_migration_supported()) {
- if (!(pvmw->flags & PVMW_MIGRATION))
- return not_found(pvmw);
- if (is_migration_entry(pmd_to_swp_entry(pmde))) {
- swp_entry_t entry = pmd_to_swp_entry(pmde);
+ }
+ if (!pmd_present(pmde)) {
+ swp_entry_t entry;
- if (migration_entry_to_page(entry) != page)
- return not_found(pvmw);
- return true;
- }
- }
- return not_found(pvmw);
- } else {
- /* THP pmd was split under us: handle on pte level */
- spin_unlock(pvmw->ptl);
- pvmw->ptl = NULL;
+ if (!thp_migration_supported() ||
+ !(pvmw->flags & PVMW_MIGRATION))
+ return not_found(pvmw);
+ entry = pmd_to_swp_entry(pmde);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page)
+ return not_found(pvmw);
+ return true;
}
+ /* THP pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
} else if (!pmd_present(pmde)) {
/*
* If PVMW_SYNC, take and drop THP pmd lock so that we
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
2021-06-10 6:42 ` [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Hugh Dickins
@ 2021-06-10 9:16 ` Kirill A. Shutemov
2021-06-10 14:48 ` Peter Xu
1 sibling, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:16 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:42:12PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
> follow the same "return not_found, return not_found, return true" pattern
> as the block above it (note: returning not_found there is never premature,
> since existence or prior existence of huge pmd guarantees good alignment).
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
2021-06-10 6:42 ` [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Hugh Dickins
2021-06-10 9:16 ` Kirill A. Shutemov
@ 2021-06-10 14:48 ` Peter Xu
1 sibling, 0 replies; 37+ messages in thread
From: Peter Xu @ 2021-06-10 14:48 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:42:12PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: rearrange the !pmd_present() block to
> follow the same "return not_found, return not_found, return true" pattern
> as the block above it (note: returning not_found there is never premature,
> since existence or prior existence of huge pmd guarantees good alignment).
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> mm/page_vma_mapped.c | 30 ++++++++++++++----------------
> 1 file changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 81000dd0b5da..b96fae568bc2 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -201,24 +201,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> if (pmd_page(pmde) != page)
> return not_found(pvmw);
> return true;
> - } else if (!pmd_present(pmde)) {
> - if (thp_migration_supported()) {
> - if (!(pvmw->flags & PVMW_MIGRATION))
> - return not_found(pvmw);
> - if (is_migration_entry(pmd_to_swp_entry(pmde))) {
> - swp_entry_t entry = pmd_to_swp_entry(pmde);
> + }
> + if (!pmd_present(pmde)) {
> + swp_entry_t entry;
>
> - if (migration_entry_to_page(entry) != page)
> - return not_found(pvmw);
> - return true;
> - }
> - }
> - return not_found(pvmw);
> - } else {
> - /* THP pmd was split under us: handle on pte level */
> - spin_unlock(pvmw->ptl);
> - pvmw->ptl = NULL;
> + if (!thp_migration_supported() ||
> + !(pvmw->flags & PVMW_MIGRATION))
> + return not_found(pvmw);
> + entry = pmd_to_swp_entry(pmde);
> + if (!is_migration_entry(entry) ||
> + migration_entry_to_page(entry) != page)
We'll need to do s/migration_entry_to_page/pfn_swap_entry_to_page/, depending
on whether Alistair's series lands first or not I guess (as you mentioned in
the cover letter).
Thanks for the change, it does look much better.
Reviewed-by: Peter Xu <peterx@redhat.com>
> + return not_found(pvmw);
> + return true;
> }
> + /* THP pmd was split under us: handle on pte level */
> + spin_unlock(pvmw->ptl);
> + pvmw->ptl = NULL;
> } else if (!pmd_present(pmde)) {
> /*
> * If PVMW_SYNC, take and drop THP pmd lock so that we
> --
> 2.26.2
>
--
Peter Xu
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 06/11] mm: page_vma_mapped_walk(): crossing page table boundary
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (4 preceding siblings ...)
2021-06-10 6:42 ` [PATCH 05/11] mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block Hugh Dickins
@ 2021-06-10 6:44 ` Hugh Dickins
2021-06-10 9:32 ` Kirill A. Shutemov
2021-06-10 6:46 ` [PATCH 07/11] mm: page_vma_mapped_walk(): add a level of indentation Hugh Dickins
` (4 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: adjust the test for crossing page table
boundary - I believe pvmw->address is always page-aligned, but nothing
else here assumed that; and remember to reset pvmw->pte to NULL after
unmapping the page table, though I never saw any bug from that.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index b96fae568bc2..0fe6e558d336 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -247,16 +247,16 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
- if (pvmw->address % PMD_SIZE == 0) {
- pte_unmap(pvmw->pte);
+ if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
if (pvmw->ptl) {
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
+ pte_unmap(pvmw->pte);
+ pvmw->pte = NULL;
goto restart;
- } else {
- pvmw->pte++;
}
+ pvmw->pte++;
} while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 06/11] mm: page_vma_mapped_walk(): crossing page table boundary
2021-06-10 6:44 ` [PATCH 06/11] mm: page_vma_mapped_walk(): crossing page table boundary Hugh Dickins
@ 2021-06-10 9:32 ` Kirill A. Shutemov
0 siblings, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:32 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:44:10PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: adjust the test for crossing page table
> boundary - I believe pvmw->address is always page-aligned, but nothing
> else here assumed that;
Maybe we should just get it aligned instead? (PMD_SIZE - PAGE_SIZE) is not
most obvious mask calculation.
> and remember to reset pvmw->pte to NULL after
> unmapping the page table, though I never saw any bug from that.
Okay, it's fair enough.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 07/11] mm: page_vma_mapped_walk(): add a level of indentation
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (5 preceding siblings ...)
2021-06-10 6:44 ` [PATCH 06/11] mm: page_vma_mapped_walk(): crossing page table boundary Hugh Dickins
@ 2021-06-10 6:46 ` Hugh Dickins
2021-06-10 9:34 ` Kirill A. Shutemov
2021-06-10 6:48 ` [PATCH 08/11] mm: page_vma_mapped_walk(): use goto instead of while (1) Hugh Dickins
` (3 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: add a level of indentation to much of
the body, making no functional change in this commit, but reducing the
later diff when this is all converted to a loop.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 109 +++++++++++++++++++++++----------------------
1 file changed, 56 insertions(+), 53 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 0fe6e558d336..0840079ef7d2 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -173,65 +173,68 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pte)
goto next_pte;
restart:
- pgd = pgd_offset(mm, pvmw->address);
- if (!pgd_present(*pgd))
- return false;
- p4d = p4d_offset(pgd, pvmw->address);
- if (!p4d_present(*p4d))
- return false;
- pud = pud_offset(p4d, pvmw->address);
- if (!pud_present(*pud))
- return false;
-
- pvmw->pmd = pmd_offset(pud, pvmw->address);
- /*
- * Make sure the pmd value isn't cached in a register by the
- * compiler and used as a stale value after we've observed a
- * subsequent update.
- */
- pmde = pmd_read_atomic(pvmw->pmd);
- barrier();
-
- if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
- pvmw->ptl = pmd_lock(mm, pvmw->pmd);
- pmde = *pvmw->pmd;
- if (likely(pmd_trans_huge(pmde))) {
- if (pvmw->flags & PVMW_MIGRATION)
- return not_found(pvmw);
- if (pmd_page(pmde) != page)
- return not_found(pvmw);
- return true;
- }
- if (!pmd_present(pmde)) {
- swp_entry_t entry;
+ {
+ pgd = pgd_offset(mm, pvmw->address);
+ if (!pgd_present(*pgd))
+ return false;
+ p4d = p4d_offset(pgd, pvmw->address);
+ if (!p4d_present(*p4d))
+ return false;
+ pud = pud_offset(p4d, pvmw->address);
+ if (!pud_present(*pud))
+ return false;
- if (!thp_migration_supported() ||
- !(pvmw->flags & PVMW_MIGRATION))
- return not_found(pvmw);
- entry = pmd_to_swp_entry(pmde);
- if (!is_migration_entry(entry) ||
- migration_entry_to_page(entry) != page)
- return not_found(pvmw);
- return true;
- }
- /* THP pmd was split under us: handle on pte level */
- spin_unlock(pvmw->ptl);
- pvmw->ptl = NULL;
- } else if (!pmd_present(pmde)) {
+ pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
- * If PVMW_SYNC, take and drop THP pmd lock so that we
- * cannot return prematurely, while zap_huge_pmd() has
- * cleared *pmd but not decremented compound_mapcount().
+ * Make sure the pmd value isn't cached in a register by the
+ * compiler and used as a stale value after we've observed a
+ * subsequent update.
*/
- if ((pvmw->flags & PVMW_SYNC) && PageTransCompound(page)) {
- spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = pmd_read_atomic(pvmw->pmd);
+ barrier();
+
+ if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) {
+ pvmw->ptl = pmd_lock(mm, pvmw->pmd);
+ pmde = *pvmw->pmd;
+ if (likely(pmd_trans_huge(pmde))) {
+ if (pvmw->flags & PVMW_MIGRATION)
+ return not_found(pvmw);
+ if (pmd_page(pmde) != page)
+ return not_found(pvmw);
+ return true;
+ }
+ if (!pmd_present(pmde)) {
+ swp_entry_t entry;
- spin_unlock(ptl);
+ if (!thp_migration_supported() ||
+ !(pvmw->flags & PVMW_MIGRATION))
+ return not_found(pvmw);
+ entry = pmd_to_swp_entry(pmde);
+ if (!is_migration_entry(entry) ||
+ migration_entry_to_page(entry) != page)
+ return not_found(pvmw);
+ return true;
+ }
+ /* THP pmd was split under us: handle on pte level */
+ spin_unlock(pvmw->ptl);
+ pvmw->ptl = NULL;
+ } else if (!pmd_present(pmde)) {
+ /*
+ * If PVMW_SYNC, take and drop THP pmd lock so that we
+ * cannot return prematurely, while zap_huge_pmd() has
+ * cleared *pmd but not decremented compound_mapcount().
+ */
+ if ((pvmw->flags & PVMW_SYNC) &&
+ PageTransCompound(page)) {
+ spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
+
+ spin_unlock(ptl);
+ }
+ return false;
}
- return false;
+ if (!map_pte(pvmw))
+ goto next_pte;
}
- if (!map_pte(pvmw))
- goto next_pte;
while (1) {
unsigned long end;
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 07/11] mm: page_vma_mapped_walk(): add a level of indentation
2021-06-10 6:46 ` [PATCH 07/11] mm: page_vma_mapped_walk(): add a level of indentation Hugh Dickins
@ 2021-06-10 9:34 ` Kirill A. Shutemov
0 siblings, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:34 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:46:30PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: add a level of indentation to much of
> the body, making no functional change in this commit, but reducing the
> later diff when this is all converted to a loop.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Thanks, it helps a lot.
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 08/11] mm: page_vma_mapped_walk(): use goto instead of while (1)
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (6 preceding siblings ...)
2021-06-10 6:46 ` [PATCH 07/11] mm: page_vma_mapped_walk(): add a level of indentation Hugh Dickins
@ 2021-06-10 6:48 ` Hugh Dickins
2021-06-10 9:39 ` Kirill A. Shutemov
2021-06-10 6:50 ` [PATCH 09/11] mm: page_vma_mapped_walk(): get vma_address_end() earlier Hugh Dickins
` (2 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:48 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
and use "goto this_pte", in place of the "while (1)" loop at the end.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 0840079ef7d2..006f4814abbc 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -144,6 +144,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
{
struct mm_struct *mm = pvmw->vma->vm_mm;
struct page *page = pvmw->page;
+ unsigned long end;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
@@ -234,10 +235,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
}
if (!map_pte(pvmw))
goto next_pte;
- }
- while (1) {
- unsigned long end;
-
+this_pte:
if (check_pte(pvmw))
return true;
next_pte:
@@ -266,6 +264,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
spin_lock(pvmw->ptl);
}
+ goto this_pte;
}
}
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 08/11] mm: page_vma_mapped_walk(): use goto instead of while (1)
2021-06-10 6:48 ` [PATCH 08/11] mm: page_vma_mapped_walk(): use goto instead of while (1) Hugh Dickins
@ 2021-06-10 9:39 ` Kirill A. Shutemov
0 siblings, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:39 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:48:27PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: add a label this_pte, matching next_pte,
> and use "goto this_pte", in place of the "while (1)" loop at the end.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 09/11] mm: page_vma_mapped_walk(): get vma_address_end() earlier
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (7 preceding siblings ...)
2021-06-10 6:48 ` [PATCH 08/11] mm: page_vma_mapped_walk(): use goto instead of while (1) Hugh Dickins
@ 2021-06-10 6:50 ` Hugh Dickins
2021-06-10 9:40 ` Kirill A. Shutemov
2021-06-10 6:52 ` [PATCH 10/11] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Hugh Dickins
2021-06-10 6:54 ` [PATCH 11/11] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Hugh Dickins
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start,
rather than later at next_pte. It's a little unnecessary overhead on the
first call, but makes for a simpler loop in the following commit.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 006f4814abbc..f6839f536645 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -171,6 +171,15 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
return true;
}
+ /*
+ * Seek to next pte only makes sense for THP.
+ * But more important than that optimization, is to filter out
+ * any PageKsm page: whose page->index misleads vma_address()
+ * and vma_address_end() to disaster.
+ */
+ end = PageTransCompound(page) ?
+ vma_address_end(page, pvmw->vma) :
+ pvmw->address + PAGE_SIZE;
if (pvmw->pte)
goto next_pte;
restart:
@@ -239,10 +248,6 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (check_pte(pvmw))
return true;
next_pte:
- /* Seek to next pte only makes sense for THP */
- if (!PageTransHuge(page))
- return not_found(pvmw);
- end = vma_address_end(page, pvmw->vma);
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 09/11] mm: page_vma_mapped_walk(): get vma_address_end() earlier
2021-06-10 6:50 ` [PATCH 09/11] mm: page_vma_mapped_walk(): get vma_address_end() earlier Hugh Dickins
@ 2021-06-10 9:40 ` Kirill A. Shutemov
0 siblings, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:40 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:50:23PM -0700, Hugh Dickins wrote:
> page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start,
> rather than later at next_pte. It's a little unnecessary overhead on the
> first call, but makes for a simpler loop in the following commit.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 10/11] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (8 preceding siblings ...)
2021-06-10 6:50 ` [PATCH 09/11] mm: page_vma_mapped_walk(): get vma_address_end() earlier Hugh Dickins
@ 2021-06-10 6:52 ` Hugh Dickins
2021-06-10 9:42 ` Kirill A. Shutemov
2021-06-10 6:54 ` [PATCH 11/11] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Hugh Dickins
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
Running certain tests with a DEBUG_VM kernel would crash within hours,
on the total_mapcount BUG() in split_huge_page_to_list(), while trying
to free up some memory by punching a hole in a shmem huge page: split's
try_to_unmap() was unable to find all the mappings of the page (which,
on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
Crash dumps showed two tail pages of a shmem huge page remained mapped
by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
of a long unmapped range; and no page table had yet been allocated for
the head of the huge page to be mapped into.
Although designed to handle these odd misaligned huge-page-mapped-by-pte
cases, page_vma_mapped_walk() falls short by returning false prematurely
when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
are cases when a huge page may span the boundary, with ptes present in
the next.
Restructure page_vma_mapped_walk() as a loop to continue in these cases,
while keeping its layout much as before. Add a step_forward() helper to
advance pvmw->address across those boundaries: originally I tried to use
mm's standard p?d_addr_end() macros, but hit the same crash 512 times
less often: because of the way redundant levels are folded together,
but folded differently in different configurations, it was just too
difficult to use them correctly; and step_forward() is simpler anyway.
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index f6839f536645..6eb2f1863506 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -116,6 +116,13 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
return pfn_is_match(pvmw->page, pfn);
}
+static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size)
+{
+ pvmw->address = (pvmw->address + size) & ~(size - 1);
+ if (!pvmw->address)
+ pvmw->address = ULONG_MAX;
+}
+
/**
* page_vma_mapped_walk - check if @pvmw->page is mapped in @pvmw->vma at
* @pvmw->address
@@ -183,16 +190,22 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
if (pvmw->pte)
goto next_pte;
restart:
- {
+ do {
pgd = pgd_offset(mm, pvmw->address);
- if (!pgd_present(*pgd))
- return false;
+ if (!pgd_present(*pgd)) {
+ step_forward(pvmw, PGDIR_SIZE);
+ continue;
+ }
p4d = p4d_offset(pgd, pvmw->address);
- if (!p4d_present(*p4d))
- return false;
+ if (!p4d_present(*p4d)) {
+ step_forward(pvmw, P4D_SIZE);
+ continue;
+ }
pud = pud_offset(p4d, pvmw->address);
- if (!pud_present(*pud))
- return false;
+ if (!pud_present(*pud)) {
+ step_forward(pvmw, PUD_SIZE);
+ continue;
+ }
pvmw->pmd = pmd_offset(pud, pvmw->address);
/*
@@ -240,7 +253,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
spin_unlock(ptl);
}
- return false;
+ step_forward(pvmw, PMD_SIZE);
+ continue;
}
if (!map_pte(pvmw))
goto next_pte;
@@ -270,7 +284,9 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
spin_lock(pvmw->ptl);
}
goto this_pte;
- }
+ } while (pvmw->address < end);
+
+ return false;
}
/**
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 10/11] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
2021-06-10 6:52 ` [PATCH 10/11] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Hugh Dickins
@ 2021-06-10 9:42 ` Kirill A. Shutemov
0 siblings, 0 replies; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:42 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:52:37PM -0700, Hugh Dickins wrote:
> Running certain tests with a DEBUG_VM kernel would crash within hours,
> on the total_mapcount BUG() in split_huge_page_to_list(), while trying
> to free up some memory by punching a hole in a shmem huge page: split's
> try_to_unmap() was unable to find all the mappings of the page (which,
> on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
>
> Crash dumps showed two tail pages of a shmem huge page remained mapped
> by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
> of a long unmapped range; and no page table had yet been allocated for
> the head of the huge page to be mapped into.
>
> Although designed to handle these odd misaligned huge-page-mapped-by-pte
> cases, page_vma_mapped_walk() falls short by returning false prematurely
> when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
> are cases when a huge page may span the boundary, with ptes present in
> the next.
>
> Restructure page_vma_mapped_walk() as a loop to continue in these cases,
> while keeping its layout much as before. Add a step_forward() helper to
> advance pvmw->address across those boundaries: originally I tried to use
> mm's standard p?d_addr_end() macros, but hit the same crash 512 times
> less often: because of the way redundant levels are folded together,
> but folded differently in different configurations, it was just too
> difficult to use them correctly; and step_forward() is simpler anyway.
>
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 11/11] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
2021-06-10 6:31 [PATCH 00/11] mm: page_vma_mapped_walk() cleanup and THP fixes Hugh Dickins
` (9 preceding siblings ...)
2021-06-10 6:52 ` [PATCH 10/11] mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes Hugh Dickins
@ 2021-06-10 6:54 ` Hugh Dickins
2021-06-10 9:43 ` Kirill A. Shutemov
10 siblings, 1 reply; 37+ messages in thread
From: Hugh Dickins @ 2021-06-10 6:54 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
ptlock in the PVMW_SYNC case? That too might have been responsible for
BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
I've never seen any.
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
---
mm/page_vma_mapped.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index 6eb2f1863506..7ae4a016304b 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -277,6 +277,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
goto restart;
}
pvmw->pte++;
+ if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) {
+ pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
+ spin_lock(pvmw->ptl);
+ }
} while (pte_none(*pvmw->pte));
if (!pvmw->ptl) {
--
2.26.2
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 11/11] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
2021-06-10 6:54 ` [PATCH 11/11] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() Hugh Dickins
@ 2021-06-10 9:43 ` Kirill A. Shutemov
2021-06-11 18:29 ` Hugh Dickins
0 siblings, 1 reply; 37+ messages in thread
From: Kirill A. Shutemov @ 2021-06-10 9:43 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Kirill A. Shutemov, Yang Shi, Wang Yugui,
Matthew Wilcox, Alistair Popple, Ralph Campbell, Zi Yan, Peter Xu,
Will Deacon, linux-mm, linux-kernel
On Wed, Jun 09, 2021 at 11:54:46PM -0700, Hugh Dickins wrote:
> Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
> ptlock in the PVMW_SYNC case? That too might have been responsible for
> BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
> I've never seen any.
>
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: <stable@vger.kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 11/11] mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
2021-06-10 9:43 ` Kirill A. Shutemov
@ 2021-06-11 18:29 ` Hugh Dickins
0 siblings, 0 replies; 37+ messages in thread
From: Hugh Dickins @ 2021-06-11 18:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Hugh Dickins, Kirill A. Shutemov, Kirill A. Shutemov, Yang Shi,
Wang Yugui, Matthew Wilcox, Alistair Popple, Ralph Campbell,
Zi Yan, Peter Xu, Will Deacon, linux-mm, linux-kernel
On Thu, 10 Jun 2021, Kirill A. Shutemov wrote:
> On Wed, Jun 09, 2021 at 11:54:46PM -0700, Hugh Dickins wrote:
> > Aha! Shouldn't that quick scan over pte_none()s make sure that it holds
> > ptlock in the PVMW_SYNC case? That too might have been responsible for
> > BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though
> > I've never seen any.
> >
> > Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > Cc: <stable@vger.kernel.org>
>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Thanks Kirill.
And Wang Yugui has now reported the good news, that this afterthought
patch finally fixes the unmap_page() BUGs they were hitting on 5.10.
Andrew, please add a link to
https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
and
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Thanks,
Hugh
^ permalink raw reply [flat|nested] 37+ messages in thread