All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages
Date: Wed, 10 Feb 2021 15:24:24 +0100	[thread overview]
Message-ID: <20210210142424.GC3636@localhost.localdomain> (raw)
In-Reply-To: <9ed946df-9c6c-9a4d-4be9-2f32809974f7@redhat.com>

On Wed, Feb 10, 2021 at 09:23:59AM +0100, David Hildenbrand wrote:
> On 08.02.21 11:38, Oscar Salvador wrote:
> > Free hugetlb pages are trickier to handle as to in order to guarantee
> > no userspace appplication disruption, we need to replace the
> > current free hugepage with a new one.
> > 
> > In order to do that, a new function called alloc_and_dissolve_huge_page
> > in introduced.
> > This function will first try to get a new fresh hugetlb page, and if it
> > succeeds, it will dissolve the old one.
> > 
> 
> Thanks for looking into this! Can we move this patch to #1 in the series? It
> is the easier case.
> 
> I also wonder if we should at least try on the memory unplug path to keep
> nr_pages by at least trying to allocate at new one if required, and printing
> a warning if that fails (after all, we're messing with something configured
> by the admin - "nr_pages"). Note that gigantic pages are special (below).

So, do you mean to allocate a new fresh hugepage in case we have a free
hugetlb page within the range we are trying to offline? That makes some
sense I guess.

I can have a look at that, and make hotplug code use the new
alloc_and_dissolve().

Thanks for bringing this up, it is somsething I did not think about.

> > +				/*
> > +				 * Free hugetlb page. Allocate a new one and
> > +				 * dissolve this is if succeed.
> > +				 */
> > +				if (alloc_and_dissolve_huge_page(page)) {
> > +					unsigned long order = buddy_order_unsafe(page);
> > +
> > +					low_pfn += (1UL << order) - 1;
> > +					continue;
> > +				}
> 
> 
> 
> Note that there is a very ugly corner case we will have to handle gracefully
> (I think also in patch #1):
> 
> Assume you allocated a gigantic page (and assume that we are not using CMA
> for gigantic pages for simplicity). Assume you want to allocate another one.
> alloc_pool_huge_page()->...->alloc_contig_pages() will stumble over the
> first allocated page. It will try to alloc_and_dissolve_huge_page() the
> existing gigantic page. To do that, it will
> alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on. Bad.

Heh, I was too naive. I have to confess I completely forgot about
gigantic pages and this cyclic dependency.

> We really don't want to mess with gigantic pages (migrate, dissolve) while
> allocating a gigantic page. I think the easiest (and cleanest) way forward
> is to not mess (isolate, migrate, dissolve) with gigantic pages at all.
> 
> Gigantic pages are not movable, so they won't be placed on random CMA /
> ZONE_MOVABLE.
> 
> Some hstate_is_gigantic(h) calls (maybe inside
> alloc_and_dissolve_huge_page() ? ) along with a nice comment might be good
> enough to avoid having to pass down some kind of alloc_contig context. I
> even think that should be handled inside
> 
> (the main issue is that in contrast to CMA, plain alloc_contig_pages() has
> no memory about which parts were allocated and will simply try re-allocating
> what it previously allocated and never freed - which is usually fine, unless
> we're dealing with such special cases)
> 
> Apart from that, not messing with gigantic pages feels like the right
> approach (allocating/migrating gigantic pages is just horribly slow and most
> probably not worth it anyway).

Yes, I also agree that we should leave out gigantic pages, at least for
now.
We might make it work in the future but I cannot come up with a fancy
way to work around that right now, so it makes sense to cut down the
complexity here.

Thanks David for the insight!

-- 
Oscar Salvador
SUSE L3


  reply	other threads:[~2021-02-10 14:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 10:38 [RFC PATCH 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador
2021-02-08 10:38 ` [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages Oscar Salvador
2021-02-10  8:56   ` David Hildenbrand
2021-02-10 14:09     ` Oscar Salvador
2021-02-10 14:11       ` David Hildenbrand
2021-02-10 14:14         ` Oscar Salvador
2021-02-10 14:22           ` David Hildenbrand
2021-02-11  0:56   ` Mike Kravetz
2021-02-11 10:40     ` David Hildenbrand
2021-02-08 10:38 ` [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free " Oscar Salvador
2021-02-10  8:23   ` David Hildenbrand
2021-02-10 14:24     ` Oscar Salvador [this message]
2021-02-10 14:36       ` David Hildenbrand
2021-02-25 21:43     ` Mike Kravetz
2021-02-25 22:15       ` David Hildenbrand
2021-02-11  1:16   ` Mike Kravetz
2021-02-11 21:38     ` Oscar Salvador
2021-02-08 10:39 ` [RFC PATCH 0/2] Make alloc_contig_range handle Hugetlb pages Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210210142424.GC3636@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.