linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>,
	nao.horiguchi@gmail.com, linmiaohe@huawei.com, ziy@nvidia.com,
	david@redhat.com, lorenzo.stoakes@oracle.com,
	william.roche@oracle.com, tony.luck@intel.com,
	wangkefeng.wang@huawei.com, jane.chu@oracle.com,
	akpm@linux-foundation.org, osalvador@suse.de,
	muchun.song@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v1 1/2] mm/huge_memory: introduce uniform_split_unmapped_folio_to_zero_order
Date: Mon, 17 Nov 2025 13:43:04 +0000	[thread overview]
Message-ID: <aRsmaIfCAGy-DRcx@casper.infradead.org> (raw)
In-Reply-To: <aRqTLmJBuvBcLYMx@hyeyoo>

On Mon, Nov 17, 2025 at 12:15:23PM +0900, Harry Yoo wrote:
> On Sun, Nov 16, 2025 at 11:51:14AM +0000, Matthew Wilcox wrote:
> > But since we're only doing this on free, we won't need to do folio
> > allocations at all; we'll just be able to release the good pages to the
> > page allocator and sequester the hwpoison pages.
> 
> [+Cc PAGE ALLOCATOR folks]
> 
> So we need an interface to free only healthy portion of a hwpoison folio.
> 
> I think a proper approach to this should be to "free a hwpoison folio
> just like freeing a normal folio via folio_put() or free_frozen_pages(),
> then the page allocator will add only healthy pages to the freelist and
> isolate the hwpoison pages". Oherwise we'll end up open coding a lot,
> which is too fragile.

Yes, I think it should be handled by the page allocator.  There may be
some complexity to this that I've missed, eg if hugetlb wants to retain
the good 2MB chunks of a 1GB allocation.  I'm not sure that's a useful
thing to do or not.

> In fact, that can be done by teaching free_pages_prepare() how to handle
> the case where one or more subpages of a folio are hwpoison pages.
> 
> How this should be implemented in the page allocator in memdescs world?
> Hmm, we'll want to do some kind of non-uniform split, without actually
> splitting the folio but allocating struct buddy?

Let me sketch that out, realising that it's subject to change.

A page in buddy state can't need a memdesc allocated.  Otherwise we're
allocating memory to free memory, and that way lies madness.  We can't
do the hack of "embed struct buddy in the page that we're freeing"
because HIGHMEM.  So we'll never shrink struct page smaller than struct
buddy (which is fine because I've laid out how to get to a 64 bit struct
buddy, and we're probably two years from getting there anyway).

My design for handling hwpoison is that we do allocate a struct hwpoison
for a page.  It looks like this (for now, in my head):

struct hwpoison {
	memdesc_t original;
	... other things ...
};

So we can replace the memdesc in a page with a hwpoison memdesc when we
encounter the error.  We still need a folio flag to indicate that "this
folio contains a page with hwpoison".  I haven't put much thought yet
into interaction with HUGETLB_PAGE_OPTIMIZE_VMEMMAP; maybe "other things"
includes an index of where the actually poisoned page is in the folio,
so it doesn't matter if the pages alias with each other as we can recover
the information when it becomes useful to do so.

> But... for now I think hiding this complexity inside the page allocator
> is good enough. For now this would just mean splitting a frozen page
> inside the page allocator (probably non-uniform?). We can later re-implement
> this to provide better support for memdescs.

Yes, I like this approach.  But then I'm not the page allocator
maintainer ;-)

  parent reply	other threads:[~2025-11-17 13:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-16  1:47 [PATCH v1 0/2] Only free healthy pages in high-order HWPoison folio Jiaqi Yan
2025-11-16  1:47 ` [PATCH v1 1/2] mm/huge_memory: introduce uniform_split_unmapped_folio_to_zero_order Jiaqi Yan
2025-11-16 11:51   ` Matthew Wilcox
2025-11-17  3:15     ` Harry Yoo
2025-11-17  3:21       ` Zi Yan
2025-11-17  3:39         ` Harry Yoo
2025-11-17 13:43       ` Matthew Wilcox [this message]
2025-11-18  6:24         ` Jiaqi Yan
2025-11-18 10:19           ` Harry Yoo
2025-11-18 19:26             ` Jiaqi Yan
2025-11-18 21:54               ` Zi Yan
2025-11-19 12:37                 ` Harry Yoo
2025-11-19 19:21                   ` Jiaqi Yan
2025-11-19 20:35                     ` Zi Yan
2025-11-16 22:38   ` kernel test robot
2025-11-17 17:12   ` David Hildenbrand (Red Hat)
2025-11-16  1:47 ` [PATCH v1 2/2] mm/memory-failure: avoid free HWPoison high-order folio Jiaqi Yan
2025-11-16  2:10   ` Zi Yan
2025-11-18  5:12     ` Jiaqi Yan
2025-11-17 17:15   ` David Hildenbrand (Red Hat)
2025-11-18  5:17     ` Jiaqi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRsmaIfCAGy-DRcx@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=jackmanb@google.com \
    --cc=jane.chu@oracle.com \
    --cc=jiaqiyan@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=surenb@google.com \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=william.roche@oracle.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).