From: Mike Kravetz <mike.kravetz@oracle.com>
To: Muchun Song <songmuchun@bytedance.com>,
corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, x86@kernel.org, hpa@zytor.com,
dave.hansen@linux.intel.com, luto@kernel.org,
peterz@infradead.org, viro@zeniv.linux.org.uk,
akpm@linux-foundation.org, paulmck@kernel.org,
pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org,
oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de,
almasrymina@google.com, rientjes@google.com, willy@infradead.org,
osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com,
david@redhat.com, naoya.horiguchi@nec.com,
joao.m.martins@oracle.com
Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com,
zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v22 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page
Date: Wed, 5 May 2021 15:21:11 -0700 [thread overview]
Message-ID: <c2e8bc43-44dc-825d-9f59-0de300815fa4@oracle.com> (raw)
In-Reply-To: <20210430031352.45379-7-songmuchun@bytedance.com>
On 4/29/21 8:13 PM, Muchun Song wrote:
> When we free a HugeTLB page to the buddy allocator, we need to allocate
> the vmemmap pages associated with it. However, we may not be able to
> allocate the vmemmap pages when the system is under memory pressure. In
> this case, we just refuse to free the HugeTLB page. This changes behavior
> in some corner cases as listed below:
>
> 1) Failing to free a huge page triggered by the user (decrease nr_pages).
>
> User needs to try again later.
>
> 2) Failing to free a surplus huge page when freed by the application.
>
> Try again later when freeing a huge page next time.
>
> 3) Failing to dissolve a free huge page on ZONE_MOVABLE via
> offline_pages().
>
> This can happen when we have plenty of ZONE_MOVABLE memory, but
> not enough kernel memory to allocate vmemmmap pages. We may even
> be able to migrate huge page contents, but will not be able to
> dissolve the source huge page. This will prevent an offline
> operation and is unfortunate as memory offlining is expected to
> succeed on movable zones. Users that depend on memory hotplug
> to succeed for movable zones should carefully consider whether the
> memory savings gained from this feature are worth the risk of
> possibly not being able to offline memory in certain situations.
>
> 4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via
> alloc_contig_range() - once we have that handling in place. Mainly
> affects CMA and virtio-mem.
>
> Similar to 3). virito-mem will handle migration errors gracefully.
> CMA might be able to fallback on other free areas within the CMA
> region.
>
> Vmemmap pages are allocated from the page freeing context. In order for
> those allocations to be not disruptive (e.g. trigger oom killer)
> __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation
> because a non sleeping allocation would be too fragile and it could fail
> too easily under memory pressure. GFP_ATOMIC or other modes to access
> memory reserves is not used because we want to prevent consuming
> reserves under heavy hugetlb freeing.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
> Documentation/admin-guide/mm/hugetlbpage.rst | 8 ++
> Documentation/admin-guide/mm/memory-hotplug.rst | 13 ++++
> include/linux/hugetlb.h | 3 +
> include/linux/mm.h | 2 +
> mm/hugetlb.c | 98 +++++++++++++++++++++----
> mm/hugetlb_vmemmap.c | 34 +++++++++
> mm/hugetlb_vmemmap.h | 6 ++
> mm/migrate.c | 5 +-
> mm/sparse-vmemmap.c | 75 ++++++++++++++++++-
> 9 files changed, 227 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> index f7b1c7462991..6988895d09a8 100644
> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> @@ -60,6 +60,10 @@ HugePages_Surp
> the pool above the value in ``/proc/sys/vm/nr_hugepages``. The
> maximum number of surplus huge pages is controlled by
> ``/proc/sys/vm/nr_overcommit_hugepages``.
> + Note: When the feature of freeing unused vmemmap pages associated
> + with each hugetlb page is enabled, the number of surplus huge pages
> + may be temporarily larger than the maximum number of surplus huge
> + pages when the system is under memory pressure.
> Hugepagesize
> is the default hugepage size (in Kb).
> Hugetlb
> @@ -80,6 +84,10 @@ returned to the huge page pool when freed by a task. A user with root
> privileges can dynamically allocate more or free some persistent huge pages
> by increasing or decreasing the value of ``nr_hugepages``.
>
> +Note: When the feature of freeing unused vmemmap pages associated with each
> +hugetlb page is enabled, we can fail to free the huge pages triggered by
> +the user when ths system is under memory pressure. Please try again later.
> +
> Pages that are used as huge pages are reserved inside the kernel and cannot
> be used for other purposes. Huge pages cannot be swapped out under
> memory pressure.
> diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
> index 05d51d2d8beb..c6bae2d77160 100644
> --- a/Documentation/admin-guide/mm/memory-hotplug.rst
> +++ b/Documentation/admin-guide/mm/memory-hotplug.rst
> @@ -357,6 +357,19 @@ creates ZONE_MOVABLE as following.
> Unfortunately, there is no information to show which memory block belongs
> to ZONE_MOVABLE. This is TBD.
>
> + Memory offlining can fail when dissolving a free huge page on ZONE_MOVABLE
> + and the feature of freeing unused vmemmap pages associated with each hugetlb
> + page is enabled.
> +
> + This can happen when we have plenty of ZONE_MOVABLE memory, but not enough
> + kernel memory to allocate vmemmmap pages. We may even be able to migrate
> + huge page contents, but will not be able to dissolve the source huge page.
> + This will prevent an offline operation and is unfortunate as memory offlining
> + is expected to succeed on movable zones. Users that depend on memory hotplug
> + to succeed for movable zones should carefully consider whether the memory
> + savings gained from this feature are worth the risk of possibly not being
> + able to offline memory in certain situations.
> +
> .. note::
> Techniques that rely on long-term pinnings of memory (especially, RDMA and
> vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index d523a345dc86..d3abaaec2a22 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> * code knows it has only reference. All other examinations and
> * modifications require hugetlb_lock.
> * HPG_freed - Set when page is on the free lists.
> + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed.
> * Synchronization: hugetlb_lock held for examination and modification.
You just moved the Synchronization comment so that it applies to both
HPG_freed and HPG_vmemmap_optimized. However, HPG_vmemmap_optimized is
checked/modified both with and without hugetlb_lock. Nothing wrong with
that, just need to update/fix the comment.
Everything else looks good to me,
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
--
Mike Kravetz
next prev parent reply other threads:[~2021-05-05 22:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-30 3:13 [PATCH v22 0/9] Free some vmemmap pages of HugeTLB page Muchun Song
2021-04-30 3:13 ` [PATCH v22 1/9] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-04-30 3:13 ` [PATCH v22 2/9] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-04-30 3:13 ` [PATCH v22 3/9] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-04-30 3:13 ` [PATCH v22 4/9] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-04-30 3:13 ` [PATCH v22 5/9] mm: hugetlb: defer freeing of HugeTLB pages Muchun Song
2021-05-05 21:29 ` Mike Kravetz
2021-04-30 3:13 ` [PATCH v22 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Muchun Song
2021-05-05 22:21 ` Mike Kravetz [this message]
2021-05-06 2:52 ` [External] " Muchun Song
2021-04-30 3:13 ` [PATCH v22 7/9] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-04-30 3:13 ` [PATCH v22 8/9] mm: memory_hotplug: disable memmap_on_memory when hugetlb_free_vmemmap enabled Muchun Song
2021-05-05 23:06 ` Mike Kravetz
2021-04-30 3:13 ` [PATCH v22 9/9] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c2e8bc43-44dc-825d-9f59-0de300815fa4@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=anshuman.khandual@arm.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=duanxiongchun@bytedance.com \
--cc=fam.zheng@bytedance.com \
--cc=hpa@zytor.com \
--cc=joao.m.martins@oracle.com \
--cc=jroedel@suse.de \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=naoya.horiguchi@nec.com \
--cc=oneukum@suse.com \
--cc=osalvador@suse.de \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=song.bao.hua@hisilicon.com \
--cc=songmuchun@bytedance.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).