linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Jinjiang Tu <tujinjiang@huawei.com>
Cc: muchun.song@linux.dev, akpm@linux-foundation.org,
	david@redhat.com, linux-mm@kvack.org, wangkefeng.wang@huawei.com
Subject: Re: [PATCH v2] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
Date: Thu, 3 Apr 2025 16:26:36 +0200	[thread overview]
Message-ID: <Z-6anMyttLDIMWLy@localhost.localdomain> (raw)
In-Reply-To: <20250401082339.676723-1-tujinjiang@huawei.com>

On Tue, Apr 01, 2025 at 04:23:39PM +0800, Jinjiang Tu wrote:
> In set_max_huge_pages(), min_count should mean the acquired persistent
> huge pages, but it contains surplus huge pages. It will leads to failing
> to freeing free huge pages for a Node.
> 
> Steps to reproduce:
> 1) create 5 hugetlb folios in Node0
> 2) run a program to use all the hugetlb folios
> 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5
> hugetlb folios in Node0 are accounted as surplus.
> 4) create 5 hugetlb folios in Node1
> 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios

> 
> The result:
>         Node0    Node1
> Total     5         5
> Free      0         5
> Surp      5         5
> 
> We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb
> folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(),
> hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and
> treat it as surplus. If we directly subtract surplus_huge_pages from
> min_mount, some free folios will be subtracted twice.
> 
> To fix it, check if count is less than the num of free huge pages that
> could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb
> folios if so.
> 
> Since there may exist free surplus hugetlb folios, we should remove
> surplus folios first to make surplus count correct.
> 
> The result with this patch:
>         Node0    Node1
> Total     5         0
> Free      0         0
> Surp      5         0

Unfortunately, this will not fly.


Assume this:

# echo 3 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
# numactl -m 0 ./hugetlb_use_2_pages &
# echo 2 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Because now you're checking for available_huge_pages in the
remove_pool_hugetlb_folio block, you will fail to free that page, and
will be accounted as surplus, thus failing to reduce the pool.

Of course, once the program terminates, all will be cleaned up, but in
the meantime that page (or more, depending on your set up) will not be
freed into the buddy allocator, which is unexpected.

And there are other scenarios where the same will happen.

So, it seems to me that we have to find a more deterministic approach.
The whole operation is almost entirely covered with the lock, other than
when we free the pages we collected, so I am sure we can find a way that
does not need special casing things.


-- 
Oscar Salvador
SUSE Labs


  parent reply	other threads:[~2025-04-03 14:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-01  8:23 [PATCH v2] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages Jinjiang Tu
2025-04-03  3:49 ` Andrew Morton
2025-04-03 14:26 ` Oscar Salvador [this message]
2025-04-04 13:18   ` Oscar Salvador
2025-04-06 10:30     ` Oscar Salvador
2025-04-06 10:37       ` Oscar Salvador
2025-04-07  7:26         ` Jinjiang Tu
2025-04-07  7:23     ` Jinjiang Tu
2025-04-07  9:35       ` Oscar Salvador
2025-04-07 10:00         ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-6anMyttLDIMWLy@localhost.localdomain \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=tujinjiang@huawei.com \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).