From: Mike Rapoport <rppt@kernel.org>
To: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>,
Changyuan Lyu <changyuanl@google.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Andrew Morton <akpm@linux-foundation.org>,
Baoquan He <bhe@redhat.com>, Pratyush Yadav <ptyadav@amazon.de>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH] kho: initialize tail pages for higher order folios properly
Date: Fri, 6 Jun 2025 11:04:39 +0300 [thread overview]
Message-ID: <aEKhF3HcrvG77Ogb@kernel.org> (raw)
In-Reply-To: <20250605171143.76963-1-pratyush@kernel.org>
On Thu, Jun 05, 2025 at 07:11:41PM +0200, Pratyush Yadav wrote:
> From: Pratyush Yadav <ptyadav@amazon.de>
>
> Currently, when restoring higher order folios, kho_restore_folio() only
> calls prep_compound_page() on all the pages. That is not enough to
> properly initialize the folios. The managed page count does not
> get updated, the reserved flag does not get dropped, and page count does
> not get initialized properly.
>
> Restoring a higher order folio with it results in the following BUG with
> CONFIG_DEBUG_VM when attempting to free the folio:
>
> BUG: Bad page state in process test pfn:104e2b
> page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b
> flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
> raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000
> raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000
> page dumped because: nonzero _refcount
> [...]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x4b/0x70
> bad_page.cold+0x97/0xb2
> __free_frozen_pages+0x616/0x850
> [...]
>
> Combine the path for 0-order and higher order folios, initialize the
> tail pages with a count of zero, and call adjust_managed_page_count() to
> account for all the pages instead of just missing them.
>
> In addition, since all the KHO-preserved pages get marked with
> MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not
> actually set (as can also be seen from the flags of the dumped page in
> the logs above). So drop the ClearPageReserved() calls.
>
> Fixes: fc33e4b44b271 ("kexec: enable KHO support for memory preservation")
> Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
> ---
>
> Side note: get_maintainers.pl for KHO only lists kexec@ as the mailing list.
> Since KHO has a bunch of MM bits as well, should we also add linux-mm@ to its
> MAINTAINERS entry?
>
> Adding linux-mm@ to this patch at least, in case MM people have an opinion on
> this.
>
> kernel/kexec_handover.c | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/kexec_handover.c b/kernel/kexec_handover.c
> index eb305e7e61296..5214ab27d1f8d 100644
> --- a/kernel/kexec_handover.c
> +++ b/kernel/kexec_handover.c
> @@ -157,11 +157,21 @@ static int __kho_preserve_order(struct kho_mem_track *track, unsigned long pfn,
> }
>
> /* almost as free_reserved_page(), just don't free the page */
> -static void kho_restore_page(struct page *page)
> +static void kho_restore_page(struct page *page, unsigned int order)
> {
> - ClearPageReserved(page);
So now we don't clear PG_Reserved even on order-0 pages? ;-)
> - init_page_count(page);
> - adjust_managed_page_count(page, 1);
> + unsigned int i, nr_pages = (1 << order);
Can you please declare 'i' inside the loop, looks nicer IMHO.
> +
> + /* Head page gets refcount of 1. */
> + set_page_count(page, 1);
ClearPageReserved(page) here?
> +
> + /* For higher order folios, tail pages get a page count of zero. */
> + for (i = 1; i < nr_pages; i++)
> + set_page_count(page + i, 0);
and here?
> +
> + if (order > 0)
> + prep_compound_page(page, order);
> +
> + adjust_managed_page_count(page, nr_pages);
> }
>
> /**
> @@ -179,15 +189,10 @@ struct folio *kho_restore_folio(phys_addr_t phys)
> return NULL;
>
> order = page->private;
> - if (order) {
> - if (order > MAX_PAGE_ORDER)
> - return NULL;
> -
> - prep_compound_page(page, order);
> - } else {
> - kho_restore_page(page);
> - }
> + if (order > MAX_PAGE_ORDER)
> + return NULL;
>
> + kho_restore_page(page, order);
> return page_folio(page);
> }
> EXPORT_SYMBOL_GPL(kho_restore_folio);
> --
> 2.47.1
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2025-06-06 8:04 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 17:11 [PATCH] kho: initialize tail pages for higher order folios properly Pratyush Yadav
2025-06-05 20:13 ` Andrew Morton
2025-06-06 8:04 ` Mike Rapoport [this message]
2025-06-06 16:23 ` Pratyush Yadav
2025-06-09 19:36 ` Mike Rapoport
2025-06-09 20:07 ` Pasha Tatashin
2025-06-10 5:44 ` Mike Rapoport
2025-06-10 11:20 ` Pasha Tatashin
2025-06-10 16:41 ` Mike Rapoport
2025-06-10 22:33 ` Pasha Tatashin
2025-06-11 13:06 ` Pratyush Yadav
2025-06-11 13:14 ` Pasha Tatashin
2025-06-11 13:35 ` Mike Rapoport
2025-06-11 14:01 ` Pratyush Yadav
2025-06-11 14:36 ` Mike Rapoport
2025-06-13 14:22 ` Pratyush Yadav
2025-06-13 16:21 ` Mike Rapoport
2025-06-11 13:38 ` Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEKhF3HcrvG77Ogb@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=changyuanl@google.com \
--cc=graf@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pasha.tatashin@soleen.com \
--cc=pratyush@kernel.org \
--cc=ptyadav@amazon.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.