linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>,
	Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages
Date: Wed, 10 Feb 2021 09:23:59 +0100	[thread overview]
Message-ID: <9ed946df-9c6c-9a4d-4be9-2f32809974f7@redhat.com> (raw)
In-Reply-To: <20210208103812.32056-3-osalvador@suse.de>

On 08.02.21 11:38, Oscar Salvador wrote:
> Free hugetlb pages are trickier to handle as to in order to guarantee
> no userspace appplication disruption, we need to replace the
> current free hugepage with a new one.
> 
> In order to do that, a new function called alloc_and_dissolve_huge_page
> in introduced.
> This function will first try to get a new fresh hugetlb page, and if it
> succeeds, it will dissolve the old one.
>

Thanks for looking into this! Can we move this patch to #1 in the 
series? It is the easier case.

I also wonder if we should at least try on the memory unplug path to 
keep nr_pages by at least trying to allocate at new one if required, and 
printing a warning if that fails (after all, we're messing with 
something configured by the admin - "nr_pages"). Note that gigantic 
pages are special (below).

> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   include/linux/hugetlb.h |  6 ++++++
>   mm/compaction.c         | 11 +++++++++++
>   mm/hugetlb.c            | 35 +++++++++++++++++++++++++++++++++++
>   3 files changed, 52 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..f81afcb86e89 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -505,6 +505,7 @@ struct huge_bootmem_page {
>   	struct hstate *hstate;
>   };
>   
> +bool alloc_and_dissolve_huge_page(struct page *page);
>   struct page *alloc_huge_page(struct vm_area_struct *vma,
>   				unsigned long addr, int avoid_reserve);
>   struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> @@ -773,6 +774,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
>   #else	/* CONFIG_HUGETLB_PAGE */
>   struct hstate {};
>   
> +static inline bool alloc_and_dissolve_huge_page(struct page *page)
> +{
> +	return false;
> +}
> +
>   static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
>   					   unsigned long addr,
>   					   int avoid_reserve)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 89cd2e60da29..7969ddc10856 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   					low_pfn += compound_nr(page) - 1;
>   					goto isolate_success_no_list;
>   				}
> +			} else {

} else if (alloc_and_dissolve_huge_page(page))) {

...

> +				/*
> +				 * Free hugetlb page. Allocate a new one and
> +				 * dissolve this is if succeed.
> +				 */
> +				if (alloc_and_dissolve_huge_page(page)) {
> +					unsigned long order = buddy_order_unsafe(page);
> +
> +					low_pfn += (1UL << order) - 1;
> +					continue;
> +				}



Note that there is a very ugly corner case we will have to handle 
gracefully (I think also in patch #1):

Assume you allocated a gigantic page (and assume that we are not using 
CMA for gigantic pages for simplicity). Assume you want to allocate 
another one. alloc_pool_huge_page()->...->alloc_contig_pages() will 
stumble over the first allocated page. It will try to 
alloc_and_dissolve_huge_page() the existing gigantic page. To do that, 
it will alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on. 
Bad.

We really don't want to mess with gigantic pages (migrate, dissolve) 
while allocating a gigantic page. I think the easiest (and cleanest) way 
forward is to not mess (isolate, migrate, dissolve) with gigantic pages 
at all.

Gigantic pages are not movable, so they won't be placed on random CMA / 
ZONE_MOVABLE.

Some hstate_is_gigantic(h) calls (maybe inside 
alloc_and_dissolve_huge_page() ? ) along with a nice comment might be 
good enough to avoid having to pass down some kind of alloc_contig 
context. I even think that should be handled inside

(the main issue is that in contrast to CMA, plain alloc_contig_pages() 
has no memory about which parts were allocated and will simply try 
re-allocating what it previously allocated and never freed - which is 
usually fine, unless we're dealing with such special cases)

Apart from that, not messing with gigantic pages feels like the right 
approach (allocating/migrating gigantic pages is just horribly slow and 
most probably not worth it anyway).

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-02-10  8:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 10:38 [RFC PATCH 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
2021-02-08 10:38 ` [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages Oscar Salvador
2021-02-10  8:56   ` David Hildenbrand
2021-02-10 14:09     ` Oscar Salvador
2021-02-10 14:11       ` David Hildenbrand
2021-02-10 14:14         ` Oscar Salvador
2021-02-10 14:22           ` David Hildenbrand
2021-02-11  0:56   ` Mike Kravetz
2021-02-11 10:40     ` David Hildenbrand
2021-02-08 10:38 ` [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free " Oscar Salvador
2021-02-10  8:23   ` David Hildenbrand [this message]
2021-02-10 14:24     ` Oscar Salvador
2021-02-10 14:36       ` David Hildenbrand
2021-02-25 21:43     ` Mike Kravetz
2021-02-25 22:15       ` David Hildenbrand
2021-02-11  1:16   ` Mike Kravetz
2021-02-11 21:38     ` Oscar Salvador
2021-02-08 10:39 ` [RFC PATCH 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ed946df-9c6c-9a4d-4be9-2f32809974f7@redhat.com \
    --to=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=osalvador@suse.de \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).