From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,osalvador@suse.de,naoya.horiguchi@nec.com,muchun.song@linux.dev,mhocko@kernel.org,linmiaohe@huawei.com,david@redhat.com,baolin.wang@linux.alibaba.com,akpm@linux-foundation.org
Subject: + mm-hugetlb-make-the-hugetlb-migration-strategy-consistent.patch added to mm-unstable branch
Date: Thu, 21 Mar 2024 18:04:07 -0700 [thread overview]
Message-ID: <20240322010407.E2AD2C433F1@smtp.kernel.org> (raw)
The patch titled
Subject: mm: hugetlb: make the hugetlb migration strategy consistent
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-make-the-hugetlb-migration-strategy-consistent.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-hugetlb-make-the-hugetlb-migration-strategy-consistent.patch
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: mm: hugetlb: make the hugetlb migration strategy consistent
Date: Wed, 6 Mar 2024 18:13:27 +0800
As discussed in previous thread [1], there is an inconsistency when
handing hugetlb migration. When handling the migration of freed hugetlb,
it prevents fallback to other NUMA nodes in
alloc_and_dissolve_hugetlb_folio(). However, when dealing with in-use
hugetlb, it allows fallback to other NUMA nodes in
alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
and might result in unexpected failures when node bound workloads doesn't
get what is asssumed available.
To make hugetlb migration strategy more clear, we should list all the scenarios
of hugetlb migration and analyze whether allocation fallback is permitted:
1) Memory offline: will call dissolve_free_huge_pages() to free the
freed hugetlb, and call do_migrate_range() to migrate the in-use
hugetlb. Both can break the per-node hugetlb pool, but as this is an
explicit offlining operation, no better choice. So should allow the
hugetlb allocation fallback.
2) Memory failure: same as memory offline. Should allow fallback to a
different node might be the only option to handle it, otherwise the
impact of poisoned memory can be amplified.
3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to
migrate in-use and not-longterm-pinnable hugetlb, which can break the
per-node pool. But we should fail to longterm pinning if can not
allocate on current node to avoid breaking the per-node pool.
4) Syscalls (mbind, migrate_pages, move_pages): these are explicit
users operation to move pages to other nodes, so fallback to other
nodes should not be prohibited.
5) alloc_contig_range: used by CMA allocation and virtio-mem
fake-offline to allocate given range of pages. Now the freed hugetlb
migration is not allowed to fallback, to keep consistency, the in-use
hugetlb migration should be also not allowed to fallback.
6) alloc_contig_pages: used by kfence, pgtable_debug etc. The strategy
should be consistent with that of alloc_contig_range().
Based on the analysis of the various scenarios above, introducing a new
helper to determine whether fallback is permitted according to the
migration reason..
[1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/
Link: https://lkml.kernel.org/r/3519fcd41522817307a05b40fb551e2e17e68101.1709719720.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/hugetlb.h | 35 +++++++++++++++++++++++++++++++++--
mm/hugetlb.c | 14 ++++++++++++--
mm/mempolicy.c | 3 ++-
mm/migrate.c | 3 ++-
4 files changed, 49 insertions(+), 6 deletions(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-make-the-hugetlb-migration-strategy-consistent
+++ a/include/linux/hugetlb.h
@@ -719,7 +719,8 @@ int isolate_or_dissolve_huge_page(struct
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask);
+ nodemask_t *nmask, gfp_t gfp_mask,
+ bool allow_alloc_fallback);
int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx);
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
@@ -942,6 +943,30 @@ static inline gfp_t htlb_modify_alloc_ma
return modified_mask;
}
+static inline bool htlb_allow_alloc_fallback(int reason)
+{
+ bool allowed_fallback = false;
+
+ /*
+ * Note: the memory offline, memory failure and migration syscalls will
+ * be allowed to fallback to other nodes due to lack of a better chioce,
+ * that might break the per-node hugetlb pool. While other cases will
+ * set the __GFP_THISNODE to avoid breaking the per-node hugetlb pool.
+ */
+ switch (reason) {
+ case MR_MEMORY_HOTPLUG:
+ case MR_MEMORY_FAILURE:
+ case MR_SYSCALL:
+ case MR_MEMPOLICY_MBIND:
+ allowed_fallback = true;
+ break;
+ default:
+ break;
+ }
+
+ return allowed_fallback;
+}
+
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
@@ -1037,7 +1062,8 @@ static inline struct folio *alloc_hugetl
static inline struct folio *
alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask,
+ bool allow_alloc_fallback)
{
return NULL;
}
@@ -1153,6 +1179,11 @@ static inline gfp_t htlb_modify_alloc_ma
return 0;
}
+static inline bool htlb_allow_alloc_fallback(int reason)
+{
+ return false;
+}
+
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
--- a/mm/hugetlb.c~mm-hugetlb-make-the-hugetlb-migration-strategy-consistent
+++ a/mm/hugetlb.c
@@ -2598,7 +2598,7 @@ struct folio *alloc_buddy_hugetlb_folio_
/* folio migration callback function */
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback)
{
spin_lock_irq(&hugetlb_lock);
if (available_huge_pages(h)) {
@@ -2613,6 +2613,10 @@ struct folio *alloc_hugetlb_folio_nodema
}
spin_unlock_irq(&hugetlb_lock);
+ /* We cannot fallback to other nodes, as we could break the per-node pool. */
+ if (!allow_alloc_fallback)
+ gfp_mask |= __GFP_THISNODE;
+
return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask);
}
@@ -6630,7 +6634,13 @@ static struct folio *alloc_hugetlb_folio
gfp_mask = htlb_alloc_mask(h);
node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
- folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask);
+ /*
+ * This is used to allocate a temporary hugetlb to hold the copied
+ * content, which will then be copied again to the final hugetlb
+ * consuming a reservation. Set the alloc_fallback to false to indicate
+ * that breaking the per-node hugetlb pool is not allowed in this case.
+ */
+ folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask, false);
mpol_cond_put(mpol);
return folio;
--- a/mm/mempolicy.c~mm-hugetlb-make-the-hugetlb-migration-strategy-consistent
+++ a/mm/mempolicy.c
@@ -1228,7 +1228,8 @@ static struct folio *alloc_migration_tar
h = folio_hstate(src);
gfp = htlb_alloc_mask(h);
nodemask = policy_nodemask(gfp, pol, ilx, &nid);
- return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp);
+ return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp,
+ htlb_allow_alloc_fallback(MR_MEMPOLICY_MBIND));
}
if (folio_test_large(src))
--- a/mm/migrate.c~mm-hugetlb-make-the-hugetlb-migration-strategy-consistent
+++ a/mm/migrate.c
@@ -2022,7 +2022,8 @@ struct folio *alloc_migration_target(str
gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
return alloc_hugetlb_folio_nodemask(h, nid,
- mtc->nmask, gfp_mask);
+ mtc->nmask, gfp_mask,
+ htlb_allow_alloc_fallback(mtc->reason));
}
if (folio_test_large(src)) {
_
Patches currently in -mm which might be from baolin.wang@linux.alibaba.com are
mm-record-the-migration-reason-for-struct-migration_target_control.patch
mm-hugetlb-make-the-hugetlb-migration-strategy-consistent.patch
docs-hugetlbpagerst-add-hugetlb-migration-description.patch
reply other threads:[~2024-03-22 1:04 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240322010407.E2AD2C433F1@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=linmiaohe@huawei.com \
--cc=mhocko@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=muchun.song@linux.dev \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.