* [PATCH v2 0/6] A few fixup patches for hugetlb
@ 2022-08-23 3:02 Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
Hi everyone,
This series contains a few fixup patches to fix incorrect update of
max_huge_pages, fix WARN_ON(!kobj) in sysfs_create_group() and so on.
More details can be found in the respective changelogs.
Thanks!
---
v2:
Collect Reviewed-by tag per Mike, Muchun and Fengwei. Thanks!
Remove err == -EEXIST check and retry logic in 3/6.
Remove unused local variable dst_entry per Lukas. Thanks!
---
Miaohe Lin (6):
mm/hugetlb: fix incorrect update of max_huge_pages
mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group()
mm/hugetlb: fix missing call to restore_reserve_on_error()
mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at()
mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node()
mm/hugetlb: make detecting shared pte more reliable
mm/hugetlb.c | 61 +++++++++++++++++++++++++++-----------------
mm/hugetlb_vmemmap.c | 5 ++++
2 files changed, 42 insertions(+), 24 deletions(-)
--
2.23.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
@ 2022-08-23 3:02 ` Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
There should be pages_per_huge_page(h) / pages_per_huge_page(target_hstate)
pages incremented for target_hstate->max_huge_pages when page is demoted.
Update max_huge_pages accordingly for consistency.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f8f1540108ab..19a7a83af569 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3473,7 +3473,8 @@ static int demote_free_huge_page(struct hstate *h, struct page *page)
* based on pool changes for the demoted page.
*/
h->max_huge_pages--;
- target_hstate->max_huge_pages += pages_per_huge_page(h);
+ target_hstate->max_huge_pages +=
+ pages_per_huge_page(h) / pages_per_huge_page(target_hstate);
return rc;
}
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group()
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
@ 2022-08-23 3:02 ` Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
If sysfs_create_group() fails with hstate_attr_group, hstate_kobjs[hi]
will be set to NULL. Then it will be passed to sysfs_create_group() if
h->demote_order != 0 thus triggering WARN_ON(!kobj) check. Fix this by
making sure hstate_kobjs[hi] != NULL when calling sysfs_create_group.
Fixes: 79dfc695525f ("hugetlb: add demote hugetlb page sysfs interfaces")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/hugetlb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 19a7a83af569..d46dfe5ba62c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3847,6 +3847,7 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
if (retval) {
kobject_put(hstate_kobjs[hi]);
hstate_kobjs[hi] = NULL;
+ return retval;
}
if (h->demote_order) {
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error()
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
@ 2022-08-23 3:02 ` Miaohe Lin
2022-08-24 18:21 ` Mike Kravetz
2022-08-23 3:02 ` [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
` (2 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
When huge_add_to_page_cache() fails, the page is freed directly without
calling restore_reserve_on_error() to restore reserve for newly allocated
pages not in page cache. Fix this by calling restore_reserve_on_error()
when huge_add_to_page_cache fails.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/hugetlb.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d46dfe5ba62c..8e62da153c64 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5576,7 +5576,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
if (idx >= size)
goto out;
-retry:
new_page = false;
page = find_lock_page(mapping, idx);
if (!page) {
@@ -5616,9 +5615,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
if (vma->vm_flags & VM_MAYSHARE) {
int err = huge_add_to_page_cache(page, mapping, idx);
if (err) {
+ /*
+ * err can't be -EEXIST which implies someone
+ * else consumed the reservation since hugetlb
+ * fault mutex is held when add a hugetlb page
+ * to the page cache. So it's safe to call
+ * restore_reserve_on_error() here.
+ */
+ restore_reserve_on_error(h, vma, haddr, page);
put_page(page);
- if (err == -EEXIST)
- goto retry;
goto out;
}
new_pagecache_page = true;
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at()
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
` (2 preceding siblings ...)
2022-08-23 3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
@ 2022-08-23 3:02 ` Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin
5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
The memory barrier smp_wmb() is needed to make sure that preceding stores
to the page contents become visible before the below set_pte_at() write.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
---
mm/hugetlb_vmemmap.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 196159000897..ba2a2596fb4e 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -286,6 +286,11 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
copy_page(to, (void *)walk->reuse_addr);
reset_struct_pages(to);
+ /*
+ * Makes sure that preceding stores to the page contents become visible
+ * before the set_pte_at() write.
+ */
+ smp_wmb();
set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
}
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node()
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
` (3 preceding siblings ...)
2022-08-23 3:02 ` [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
@ 2022-08-23 3:02 ` Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin
5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
The sysfs group per_node_hstate_attr_group and hstate_demote_attr_group
when h->demote_order != 0 are created in hugetlb_register_node(). But
these sysfs groups are not removed when unregister the node, thus sysfs
group is leaked. Using sysfs_remove_group() to fix this issue.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Fengwei Yin <fengwei.yin@intel.com>
---
mm/hugetlb.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8e62da153c64..2dfd10599f98 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3851,12 +3851,18 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent,
}
if (h->demote_order) {
- if (sysfs_create_group(hstate_kobjs[hi],
- &hstate_demote_attr_group))
+ retval = sysfs_create_group(hstate_kobjs[hi],
+ &hstate_demote_attr_group);
+ if (retval) {
pr_warn("HugeTLB unable to create demote interfaces for %s\n", h->name);
+ sysfs_remove_group(hstate_kobjs[hi], hstate_attr_group);
+ kobject_put(hstate_kobjs[hi]);
+ hstate_kobjs[hi] = NULL;
+ return retval;
+ }
}
- return retval;
+ return 0;
}
static void __init hugetlb_sysfs_init(void)
@@ -3942,10 +3948,15 @@ static void hugetlb_unregister_node(struct node *node)
for_each_hstate(h) {
int idx = hstate_index(h);
- if (nhs->hstate_kobjs[idx]) {
- kobject_put(nhs->hstate_kobjs[idx]);
- nhs->hstate_kobjs[idx] = NULL;
- }
+ struct kobject *hstate_kobj = nhs->hstate_kobjs[idx];
+
+ if (!hstate_kobj)
+ continue;
+ if (h->demote_order)
+ sysfs_remove_group(hstate_kobj, &hstate_demote_attr_group);
+ sysfs_remove_group(hstate_kobj, &per_node_hstate_attr_group);
+ kobject_put(hstate_kobj);
+ nhs->hstate_kobjs[idx] = NULL;
}
kobject_put(nhs->hugepages_kobj);
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
` (4 preceding siblings ...)
2022-08-23 3:02 ` [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
@ 2022-08-23 3:02 ` Miaohe Lin
5 siblings, 0 replies; 8+ messages in thread
From: Miaohe Lin @ 2022-08-23 3:02 UTC (permalink / raw)
To: akpm, mike.kravetz, songmuchun
Cc: lukas.bulwahn, linux-mm, linux-kernel, linmiaohe
If the pagetables are shared, we shouldn't copy or take references. Since
src could have unshared and dst shares with another vma, huge_pte_none()
is thus used to determine whether dst_pte is shared. But this check isn't
reliable. A shared pte could have pte none in pagetable in fact. The page
count of ptep page should be checked here in order to reliably determine
whether pte is shared.
[Thanks Lukas for cleanup unused local variable dst_entry.]
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
---
mm/hugetlb.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2dfd10599f98..8aa62765a055 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4763,7 +4763,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *dst_vma,
struct vm_area_struct *src_vma)
{
- pte_t *src_pte, *dst_pte, entry, dst_entry;
+ pte_t *src_pte, *dst_pte, entry;
struct page *ptepage;
unsigned long addr;
bool cow = is_cow_mapping(src_vma->vm_flags);
@@ -4808,15 +4808,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
/*
* If the pagetables are shared don't copy or take references.
- * dst_pte == src_pte is the common case of src/dest sharing.
*
+ * dst_pte == src_pte is the common case of src/dest sharing.
* However, src could have 'unshared' and dst shares with
- * another vma. If dst_pte !none, this implies sharing.
- * Check here before taking page table lock, and once again
- * after taking the lock below.
+ * another vma. So page_count of ptep page is checked instead
+ * to reliably determine whether pte is shared.
*/
- dst_entry = huge_ptep_get(dst_pte);
- if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+ if (page_count(virt_to_page(dst_pte)) > 1) {
addr |= last_addr_mask;
continue;
}
@@ -4825,13 +4823,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
src_ptl = huge_pte_lockptr(h, src, src_pte);
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
entry = huge_ptep_get(src_pte);
- dst_entry = huge_ptep_get(dst_pte);
again:
- if (huge_pte_none(entry) || !huge_pte_none(dst_entry)) {
+ if (huge_pte_none(entry)) {
/*
- * Skip if src entry none. Also, skip in the
- * unlikely case dst entry !none as this implies
- * sharing with another vma.
+ * Skip if src entry none.
*/
;
} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) {
@@ -4910,7 +4905,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
restore_reserve_on_error(h, dst_vma, addr,
new);
put_page(new);
- /* dst_entry won't change as in child */
+ /* huge_ptep of dst_pte won't change as in child */
goto again;
}
hugetlb_install_page(dst_vma, dst_pte, addr, new);
--
2.23.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error()
2022-08-23 3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
@ 2022-08-24 18:21 ` Mike Kravetz
0 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-08-24 18:21 UTC (permalink / raw)
To: Miaohe Lin; +Cc: akpm, songmuchun, lukas.bulwahn, linux-mm, linux-kernel
On 08/23/22 11:02, Miaohe Lin wrote:
> When huge_add_to_page_cache() fails, the page is freed directly without
> calling restore_reserve_on_error() to restore reserve for newly allocated
> pages not in page cache. Fix this by calling restore_reserve_on_error()
> when huge_add_to_page_cache fails.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
> mm/hugetlb.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d46dfe5ba62c..8e62da153c64 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5576,7 +5576,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> if (idx >= size)
> goto out;
>
> -retry:
> new_page = false;
> page = find_lock_page(mapping, idx);
> if (!page) {
> @@ -5616,9 +5615,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> if (vma->vm_flags & VM_MAYSHARE) {
> int err = huge_add_to_page_cache(page, mapping, idx);
> if (err) {
> + /*
> + * err can't be -EEXIST which implies someone
> + * else consumed the reservation since hugetlb
> + * fault mutex is held when add a hugetlb page
> + * to the page cache. So it's safe to call
> + * restore_reserve_on_error() here.
> + */
> + restore_reserve_on_error(h, vma, haddr, page);
> put_page(page);
> - if (err == -EEXIST)
> - goto retry;
Thanks for removing this check and adding the comment.
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
--
Mike Kravetz
> goto out;
> }
> new_pagecache_page = true;
> --
> 2.23.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-08-24 18:22 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-23 3:02 [PATCH v2 0/6] A few fixup patches for hugetlb Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 1/6] mm/hugetlb: fix incorrect update of max_huge_pages Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 2/6] mm/hugetlb: fix WARN_ON(!kobj) in sysfs_create_group() Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 3/6] mm/hugetlb: fix missing call to restore_reserve_on_error() Miaohe Lin
2022-08-24 18:21 ` Mike Kravetz
2022-08-23 3:02 ` [PATCH v2 4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at() Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 5/6] mm/hugetlb: fix sysfs group leak in hugetlb_unregister_node() Miaohe Lin
2022-08-23 3:02 ` [PATCH v2 6/6] mm/hugetlb: make detecting shared pte more reliable Miaohe Lin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.