All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xen/mm: Fix offlining pages to avoid corrupting the heap
@ 2026-05-28 14:47 Bernhard Kaindl
  2026-05-28 14:47 ` [PATCH 1/2] tools/tests/native: Test for Xen Panic after memory offlining Bernhard Kaindl
  2026-05-28 14:47 ` [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash Bernhard Kaindl
  0 siblings, 2 replies; 5+ messages in thread
From: Bernhard Kaindl @ 2026-05-28 14:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Bernhard Kaindl, Anthony PERARD, Andrew Cooper, Michal Orzel,
	Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

This series fixes a bug where offlining pages could lead to unaligned
buddies being merged back onto the free list. The result is a chain of
events that can corrupt the heap and trigger a Xen panic after a few
allocations and frees.

For example, an MCE caused by faulty RAM may mark pages as offline.
When a buddy containing offlined pages is freed, those pages
are moved to dedicated isolated page lists.

reserve_offline_page() lacks alignment checks and may grow adjacent
healthy spans into unaligned buddies that violate the fundamental buddy
invariant: buddies of a given order must be aligned to their size.

Consider a valid order-2 buddy (4 pages) with this layout:

   +---------------+-----------------+-----------------+----------------+
   | head page     | tail page 1     | tail page 2     | tail page 3    |
   +---------------+-----------------+-----------------+----------------+

reserve_offline_page() then merges unaligned tail pages:

   +---------------+-----------------+-----------------+----------------+
   | offlined page |     head page with a tail page    | single page    |
   +---------------+-----------------+-----------------+----------------+

This leads to a Xen panic, demonstrated by the test case:

1. When a single page is allocated from this buddy, MFN 7 is allocated:

        MFN 4             MFN 5             MFN 6             MFN 7
  +---------------+-----------------+-----------------+----------------+
  | offlined page |    head page        tail page     | allocated page |
  |               |       Unaligned buddies are       |                |
  |               |      an invariant violation!      |                |
  +---------------+-----------------+-----------------+----------------+

2. When MFN 7 is freed, the predecessor merge in free_heap_pages()
   kicks in, merging MFN 7 with its naturally aligned predecessor MFN 6:

        MFN 4             MFN 5             MFN 6            MFN 7
  +---------------+-----------------+-----------------+
  | offlined page |    head page         tail page    |
  |               |       Unaligned buddies are       |
  |               |      an invariant violation!      |
  +---------------+-----------------+-----------------+----------------+
                                    |    head page        tail page    |
                                    +-----------------+----------------+

  As shown, MFN 6 is double-freed. It is in two buddies:
  - As the tail page of the unaligned order-1 buddy starting at MFN 5.
  - As the head page of the aligned order-1 buddy starting at MFN 6.

3. The next allocations would allocate MFN 7 again, and MFN 6 as well:

   Due to the double-free, after the first allocation, MFN 6 remains on
   the free list even though its PGC_status is set to in-use.

        MFN 4             MFN 5             MFN 6            MFN 7
  +---------------+-----------------+-----------------+
  | offlined page |    head page         tail page    |
  |               |       Unaligned buddies are       |
  |               |      an invariant violation!      |
  +---------------+-----------------+-----------------+----------------+
                                    |   in-use page   |   in-use page  |
                                    +-----------------+----------------+

4. When the next page from this buddy is allocated, get_free_page()
   returns the buddy head MFN 5.  If the allocation is for order-0,
   alloc_heap_pages() splits page 6; otherwise, it keeps the buddy.
   Either way, the allocator checks the pages' PGC_status values and
   expects them not to be in-use. Because MFN 6 is already in-use,
   Xen panics (example panic log):

   pg[0] MFN 842adc c=0x4000000000000000 o=0 v=0 t=0
   Xen BUG at common/page_alloc.c:1324

I reproduced this while running intensive NUMA claim tests combined
with page offlining. The test case in this series demonstrates the
cascading corruption that leads to the panic without intentionally
having to crash a Xen instance to test for the bug.

Running the test produces the following output (trimmed):

   $ make -C tools/tests/native test TARGETS=offline-unaligned |
     grep -v ' xen/'
   |   The buddy #5 is not aligned to order-1!
   | <0>pg[0] MFN 00006 c=0x8000000000000001 o=1213 v=0 t=0
   | xen/common/page_alloc.c:1324: WE INVOKED a XEN BUG in alloc_heap_pages()

The second patch fixes the root cause and updates the test case to
serve as a regression test.

This series is based on the native test environment v3 for NUMA claims:
https://lists.xen.org/archives/html/xen-devel/2026-05/msg01163.html

It in turn depends on the NUMA claim sets v7 series:
https://lists.xen.org/archives/html/xen-devel/2026-05/msg00363.html

You can pull the series with dependencies for review and testing:

$ git pull git@gitlab.com:bernhardkaindl/xen.git offline-unaligned-buddies-v1
$ make -C tools/tests/native TARGETS=offline-unaligned test

Fixes: e4865c2315 ('Page offline support in Xen side')
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>

Bernhard Kaindl (2):
  tools/tests/native: Test for Xen Panic after memory offlining
  xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash

 tools/tests/native/offline-unaligned.c | 79 ++++++++++++++++++++++++++
 xen/common/page_alloc.c                |  5 ++
 2 files changed, 84 insertions(+)
 create mode 100644 tools/tests/native/offline-unaligned.c

-- 
2.39.5



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] tools/tests/native: Test for Xen Panic after memory offlining
  2026-05-28 14:47 [PATCH 0/2] xen/mm: Fix offlining pages to avoid corrupting the heap Bernhard Kaindl
@ 2026-05-28 14:47 ` Bernhard Kaindl
  2026-05-28 14:47 ` [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash Bernhard Kaindl
  1 sibling, 0 replies; 5+ messages in thread
From: Bernhard Kaindl @ 2026-05-28 14:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Bernhard Kaindl, Anthony PERARD

Add a test case testing for heap corruption bug caused by reserving
offlined pages from partially healthy buddies without alignment check.

For example, due to bad RAM, an MCE could cause pages marked as offline.
When a buddy containing offlined pages is freed, those pages
are moved to dedicated isolated page lists.

reserve_offline_page() lacks alignment checks and may grow adjacent
healthy spans into unaligned buddies that violate the fundamental buddy
invariant: buddies of a given order must be aligned to their size.

Consider a valid order-2 buddy (4 pages) with this layout:

   +---------------+-----------------+-----------------+----------------+
   |   head page   |   tail page 1   |   tail page 2   |   tail page 3  |
   +---------------+-----------------+-----------------+----------------+

reserve_offline_page() then merges unaligned tail pages:

   +---------------+-----------------+-----------------+----------------+
   | offlined page |     head page with a tail page    | single page    |
   +---------------+-----------------+-----------------+----------------+

This leads to a Xen panic, demonstrated by the test case:

1. When a single page is allocated from this buddy, MFN 7 is allocated:

        MFN 4             MFN 5             MFN 6             MFN 7
  +---------------+-----------------+-----------------+----------------+
  | offlined page |    head page        tail page     | allocated page |
  |               |       Unaligned buddies are       |                |
  |               |      an invariant violation!      |                |
  +---------------+-----------------+-----------------+----------------+

2. When MFN 7 is freed, the predecessor merge in free_heap_pages()
   kicks in, merging MFN 7 with its naturally aligned predecessor MFN 6:

        MFN 4             MFN 5             MFN 6            MFN 7
  +---------------+-----------------+-----------------+
  | offlined page |    head page         tail page    |
  |               |       Unaligned buddies are       |
  |               |      an invariant violation!      |
  +---------------+-----------------+-----------------+----------------+
                                    |    head page        tail page    |
                                    +-----------------+----------------+

  As shown, MFN 6 is double-freed. It is in two buddies:
  - As the tail page of the unaligned order-1 buddy starting at MFN 5.
  - As the head page of the aligned order-1 buddy starting at MFN 6.

3. The next allocations would allocate MFN 7 again, and MFN 6 as well:

   Due to the double-free, after the first allocation, MFN 6 remains on
   the free list even though its PGC_status is set to in-use.

        MFN 4             MFN 5             MFN 6            MFN 7
  +---------------+-----------------+-----------------+
  | offlined page |    head page         tail page    |
  |               |       Unaligned buddies are       |
  |               |      an invariant violation!      |
  +---------------+-----------------+-----------------+----------------+
                                    |   in-use page   |   in-use page  |
                                    +-----------------+----------------+

4. When the next page from this buddy is allocated, get_free_page()
   returns the buddy head MFN 5.  If the allocation is for order-0,
   alloc_heap_pages() splits page 6; otherwise, it keeps the buddy.
   Either way, the allocator checks the pages' PGC_status values and
   expects them not to be in-use. Because MFN 6 is already in-use,
   Xen panics (example panic log):

   pg[0] MFN 842adc c=0x4000000000000000 o=0 v=0 t=0
   Xen BUG at common/page_alloc.c:1324

I reproduced this while running intensive NUMA claim tests combined
with page offlining. The test case in this series demonstrates the
cascading corruption that leads to the panic without intentionally
having to crash a Xen instance to test for the bug.

Using the test case, the final lines are:

$ make -C tools/tests/native test TARGETS=offline-unaligned|grep -v ' xen/'
|   The buddy #5 is not aligned to order-1!
| <0>pg[0] MFN 00006 c=0x8000000000000001 o=1213 v=0 t=0
| xen/common/page_alloc.c:1324: WE INVOKED a XEN BUG in alloc_heap_pages()

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
 tools/tests/native/offline-unaligned.c | 171 +++++++++++++++++++++++++
 1 file changed, 171 insertions(+)
 create mode 100644 tools/tests/native/offline-unaligned.c

diff --git a/tools/tests/native/offline-unaligned.c b/tools/tests/native/offline-unaligned.c
new file mode 100644
index 000000000000..1186b1763bef
--- /dev/null
+++ b/tools/tests/native/offline-unaligned.c
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Test that offlining a predecessor of pages that must not be merged
+ * into an unaligned buddy is handled correctly. Specifically, verify
+ * that the Xen page allocator does not merge such unaligned buddies
+ * back onto the free list, which can produce a chain of events that
+ * leads to a Xen panic after a few allocations and frees.
+ *
+ * This test reproduces the scenario in isolation by offlining a page
+ * with an even MFN that has more than two following tail pages.
+ *
+ * If an unaligned buddy is returned to the free list, a sequence of
+ * allocations and a subsequent free can corrupt the free list state
+ * so that a later allocation triggers BUG() and crashes the instance.
+ * The test checks the free list behavior and, if the bug is present,
+ * confirms the resulting BUG().
+ *
+ * Copyright (C) 2026 Cloud Software Group
+ */
+#include "harness/common.h"
+
+/* test_bss_start must be first in the BSS segment */
+void __aligned(PAGE_SIZE) *test_bss_start;
+
+/* Include xen/mm.h so we can wrap page_list_del() to assert the corruption. */
+#define TEST_WRAP_XEN_INCLUDE_XEN_MM_H
+#include "harness/mm-wrapper.h"
+
+static bool expect_free_list_corruption;
+
+/*
+ * Wrap page_list_del() to not fail the test by virtue of the prepared
+ * free list state but continue the test like a running Xen instance
+ * would in many cases. Assert and expect the corruption, and continue.
+ */
+static inline void wrap_page_list_del(struct page_info *page,
+                                      struct page_list_head *head)
+{
+    printf("page_list_del: page MFN %lu, order %u\n",
+           mfn_x(page_to_mfn(page)), PFN_ORDER(page));
+
+    if ( expect_free_list_corruption )
+        EXPECT_FAIL_BEGIN();
+    CHECK(page->list.next && page->list.prev, "The free list is corrupt now!");
+    if ( expect_free_list_corruption )
+        EXPECT_FAIL_END(1);
+
+    if ( page->list.next && page->list.prev )
+        page_list_del(page, head);
+}
+#define page_list_del(page, head) wrap_page_list_del(page, head)
+
+/*
+ * Include the main test library that sets up scenarios, asserts
+ * allocator state, and provides the definitions and shims needed
+ * to call the Xen page allocator code in this test program.
+ */
+#define TEST_ENABLE_XC_DOMAIN_C
+#include "harness/native.h"
+
+/* Verify the behavior of buddy merging after offlining a page */
+static void test_unaligned_buddy_merge(int start_mfn)
+{
+    struct page_info *pg = frame_table + start_mfn;
+    uint32_t status = 0;
+
+    /*
+     * Prepare a valid order-2 buddy (4 pages) with this layout:
+     * +---------------+-----------------+-----------------+----------------+
+     * | head page     | tail page 1     | tail page 2     | tail page 3    |
+     * +---------------+-----------------+-----------------+----------------+
+     */
+    test_page_list_add_buddy(pg, order2); /* Seed the heap with this buddy */
+
+    /* Act */
+    offline_page(page_to_mfn(pg), 0, &status);
+    CHECK(status & PG_OFFLINE_OFFLINED, "Page should be offlined");
+
+    /*
+     * The correct free list state after offlining the head page of the buddy
+     * is the the healthy pages are merged back onto the free list using a
+     * single page and a size-aligned buddy of the remaining pages:
+     * +---------------+-----------------+-----------------+----------------+
+     * | offlined page | single page     |    head page with a tail page    |
+     * +---------------+-----------------+-----------------+----------------+
+     */
+    EXPECT_FAIL_BEGIN();
+    /*
+     * Due to a bug in reserve_offlined_page(), we get an unaligned buddy:
+     * +---------------+-----------------+-----------------+----------------+
+     * | offlined page |     head page with a tail page    | single page    |
+     * +---------------+-----------------+-----------------+----------------+
+     */
+    CHECK(page_aligned(pg + 1), "The buddy #%lu is not aligned to order-%d",
+          mfn_x(page_to_mfn(pg + 1)), PFN_ORDER(pg + 1));
+    EXPECT_FAIL_END(1);
+
+    /* Allocate and free a page to trigger buddy merging on free. */
+
+    /*
+     * After allocating and freeing MFN 7, we get a double-freed MFN 6 due
+     * to aligned predecessor merging in free_heap_pages():
+     *
+     *         MFN 4             MFN 5             MFN 6            MFN 7
+     *   +---------------+-----------------+-----------------+
+     *   | offlined page |    head page         tail page    |
+     *   |               |       Unaligned buddies are       |
+     *   |               |      an invariant violation!      |
+     *   +---------------+-----------------+-----------------+----------------+
+     *                                     |    head page        tail page    |
+     *                                     +-----------------+----------------+
+     */
+    expect_free_list_corruption = true;
+    free_domheap_pages(alloc_domheap_pages(dom1, order0, 0), order0);
+
+    /*
+     * At this point, the free list is already corrupt. In free_heap_pages(),
+     * the tail of the unaligned buddy was added to the free list a 2nd time
+     * as the page of an overlapping aligned buddy. This is per design of the
+     * algorithm: These pages are free and thus the merging occurs as expected.
+     *
+     * The next allocation allocates the tail of the unaligned buddy, which
+     * is now, due to the merge, also the head of the new aligned buddy.
+     */
+    CHECK((pg = alloc_domheap_pages(dom1, order1, 0)), "Alloc the order-1 pg");
+
+    /* Inspect the predecessor (pg is the tail of the unaligned buddy) */
+    EXPECT_FAIL_BEGIN();
+    /*
+     * After allocating two more pages, MFN 6 is free AND in-use:
+     *
+     *         MFN 4             MFN 5             MFN 6            MFN 7
+     *   +---------------+-----------------+-----------------+
+     *   | offlined page |    head page         tail page    |
+     *   +---------------+-----------------+-----------------+----------------+
+     *                                     |    in-use page      in-use page  |
+     *                                     +-----------------+----------------+
+     */
+    CHECK(page_aligned(pg - 1), "The buddy #%lu is not aligned to order-%d!",
+          mfn_x(page_to_mfn(pg - 1)), PFN_ORDER(pg - 1));
+    EXPECT_FAIL_END(1);
+
+    /* Allocate the remaining page; a clean heap should not hit BUG(). */
+    testcase_assert_expect_to_hit_bug = true;
+    /*
+     * As described above, if pg is the tail of an unaligned order-1 buddy,
+     * the unaligned buddy is still on the free list and this allocation will
+     * remove it from the free list and check alloc_heap_pages() checks the
+     * buddies to have a reference count of zero, and the already allocated
+     * page is returned as the tail of the unaligned buddy, causing the BUG().
+     *
+     *         MFN 4             MFN 5             MFN 6            MFN 7
+     *   +---------------+-----------------+-----------------+
+     *   | offlined page |    head page         tail page    | <- panic's Xen
+     *   +---------------+-----------------+-----------------+----------------+
+     *                                     |    in-use page      in-use page  |
+     *                                     +-----------------+----------------+
+     */
+    alloc_domheap_pages(dom1, order0, 0); /* Triggers BUG() */
+}
+
+int main(int argc, char *argv[])
+{
+    if ( !parse_args(argc, argv, "Test not growing unaligned buddies") )
+        return EXIT_FAILURE;
+
+    init_page_alloc_tests();
+    RUN_TESTCASE("TUBM", test_unaligned_buddy_merge, 4);
+
+    return test_complete();
+}
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash
  2026-05-28 14:47 [PATCH 0/2] xen/mm: Fix offlining pages to avoid corrupting the heap Bernhard Kaindl
  2026-05-28 14:47 ` [PATCH 1/2] tools/tests/native: Test for Xen Panic after memory offlining Bernhard Kaindl
@ 2026-05-28 14:47 ` Bernhard Kaindl
  2026-06-02 14:37   ` Jan Beulich
  1 sibling, 1 reply; 5+ messages in thread
From: Bernhard Kaindl @ 2026-05-28 14:47 UTC (permalink / raw)
  To: xen-devel
  Cc: Bernhard Kaindl, Anthony PERARD, Andrew Cooper, Michal Orzel,
	Jan Beulich, Julien Grall, Roger Pau Monné,
	Stefano Stabellini

reserve_offline_pages() is missing an alignment check and thus
has a relatively high probability of growing unaligned buddies.

Fix this by checking alignment before growing spans to the next order.
Update the test case to verify the fix and prevent future regressions.

Fixes: e4865c2315 ('Page offline support in Xen side')
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
---
 tools/tests/native/offline-unaligned.c | 92 --------------------------
 xen/common/page_alloc.c                |  5 ++
 2 files changed, 5 insertions(+), 92 deletions(-)

diff --git a/tools/tests/native/offline-unaligned.c b/tools/tests/native/offline-unaligned.c
index 1186b1763bef..593135722a3f 100644
--- a/tools/tests/native/offline-unaligned.c
+++ b/tools/tests/native/offline-unaligned.c
@@ -17,38 +17,6 @@
  *
  * Copyright (C) 2026 Cloud Software Group
  */
-#include "harness/common.h"
-
-/* test_bss_start must be first in the BSS segment */
-void __aligned(PAGE_SIZE) *test_bss_start;
-
-/* Include xen/mm.h so we can wrap page_list_del() to assert the corruption. */
-#define TEST_WRAP_XEN_INCLUDE_XEN_MM_H
-#include "harness/mm-wrapper.h"
-
-static bool expect_free_list_corruption;
-
-/*
- * Wrap page_list_del() to not fail the test by virtue of the prepared
- * free list state but continue the test like a running Xen instance
- * would in many cases. Assert and expect the corruption, and continue.
- */
-static inline void wrap_page_list_del(struct page_info *page,
-                                      struct page_list_head *head)
-{
-    printf("page_list_del: page MFN %lu, order %u\n",
-           mfn_x(page_to_mfn(page)), PFN_ORDER(page));
-
-    if ( expect_free_list_corruption )
-        EXPECT_FAIL_BEGIN();
-    CHECK(page->list.next && page->list.prev, "The free list is corrupt now!");
-    if ( expect_free_list_corruption )
-        EXPECT_FAIL_END(1);
-
-    if ( page->list.next && page->list.prev )
-        page_list_del(page, head);
-}
-#define page_list_del(page, head) wrap_page_list_del(page, head)
 
 /*
  * Include the main test library that sets up scenarios, asserts
@@ -84,78 +52,18 @@ static void test_unaligned_buddy_merge(int start_mfn)
      * | offlined page | single page     |    head page with a tail page    |
      * +---------------+-----------------+-----------------+----------------+
      */
-    EXPECT_FAIL_BEGIN();
-    /*
-     * Due to a bug in reserve_offlined_page(), we get an unaligned buddy:
-     * +---------------+-----------------+-----------------+----------------+
-     * | offlined page |     head page with a tail page    | single page    |
-     * +---------------+-----------------+-----------------+----------------+
-     */
     CHECK(page_aligned(pg + 1), "The buddy #%lu is not aligned to order-%d",
           mfn_x(page_to_mfn(pg + 1)), PFN_ORDER(pg + 1));
-    EXPECT_FAIL_END(1);
 
     /* Allocate and free a page to trigger buddy merging on free. */
-
-    /*
-     * After allocating and freeing MFN 7, we get a double-freed MFN 6 due
-     * to aligned predecessor merging in free_heap_pages():
-     *
-     *         MFN 4             MFN 5             MFN 6            MFN 7
-     *   +---------------+-----------------+-----------------+
-     *   | offlined page |    head page         tail page    |
-     *   |               |       Unaligned buddies are       |
-     *   |               |      an invariant violation!      |
-     *   +---------------+-----------------+-----------------+----------------+
-     *                                     |    head page        tail page    |
-     *                                     +-----------------+----------------+
-     */
-    expect_free_list_corruption = true;
     free_domheap_pages(alloc_domheap_pages(dom1, order0, 0), order0);
-
-    /*
-     * At this point, the free list is already corrupt. In free_heap_pages(),
-     * the tail of the unaligned buddy was added to the free list a 2nd time
-     * as the page of an overlapping aligned buddy. This is per design of the
-     * algorithm: These pages are free and thus the merging occurs as expected.
-     *
-     * The next allocation allocates the tail of the unaligned buddy, which
-     * is now, due to the merge, also the head of the new aligned buddy.
-     */
     CHECK((pg = alloc_domheap_pages(dom1, order1, 0)), "Alloc the order-1 pg");
 
     /* Inspect the predecessor (pg is the tail of the unaligned buddy) */
-    EXPECT_FAIL_BEGIN();
-    /*
-     * After allocating two more pages, MFN 6 is free AND in-use:
-     *
-     *         MFN 4             MFN 5             MFN 6            MFN 7
-     *   +---------------+-----------------+-----------------+
-     *   | offlined page |    head page         tail page    |
-     *   +---------------+-----------------+-----------------+----------------+
-     *                                     |    in-use page      in-use page  |
-     *                                     +-----------------+----------------+
-     */
     CHECK(page_aligned(pg - 1), "The buddy #%lu is not aligned to order-%d!",
           mfn_x(page_to_mfn(pg - 1)), PFN_ORDER(pg - 1));
-    EXPECT_FAIL_END(1);
 
     /* Allocate the remaining page; a clean heap should not hit BUG(). */
-    testcase_assert_expect_to_hit_bug = true;
-    /*
-     * As described above, if pg is the tail of an unaligned order-1 buddy,
-     * the unaligned buddy is still on the free list and this allocation will
-     * remove it from the free list and check alloc_heap_pages() checks the
-     * buddies to have a reference count of zero, and the already allocated
-     * page is returned as the tail of the unaligned buddy, causing the BUG().
-     *
-     *         MFN 4             MFN 5             MFN 6            MFN 7
-     *   +---------------+-----------------+-----------------+
-     *   | offlined page |    head page         tail page    | <- panic's Xen
-     *   +---------------+-----------------+-----------------+----------------+
-     *                                     |    in-use page      in-use page  |
-     *                                     +-----------------+----------------+
-     */
     alloc_domheap_pages(dom1, order0, 0); /* Triggers BUG() */
 }
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 46c01a9fca2a..d923ae02ade0 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1434,6 +1434,11 @@ static int reserve_offlined_page(struct page_info *head)
             if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
                 goto merge;
 
+            /* Do not grow to next_order if cur_head is not aligned to it. */
+            if ( (mfn_x(page_to_mfn(cur_head)) & ((1UL << next_order) - 1)) )
+                goto merge;
+
+            /* Check if any page in the next_order range is offlined. */
             for ( i = (1 << cur_order), pg = cur_head + (1 << cur_order );
                   i < (1 << next_order);
                   i++, pg++ )
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash
  2026-05-28 14:47 ` [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash Bernhard Kaindl
@ 2026-06-02 14:37   ` Jan Beulich
  2026-06-03 14:19     ` Bernhard Kaindl
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2026-06-02 14:37 UTC (permalink / raw)
  To: Bernhard Kaindl
  Cc: Anthony PERARD, Andrew Cooper, Michal Orzel, Julien Grall,
	Roger Pau Monné, Stefano Stabellini, xen-devel

On 28.05.2026 16:47, Bernhard Kaindl wrote:
> reserve_offline_pages() is missing an alignment check and thus
> has a relatively high probability of growing unaligned buddies.
> 
> Fix this by checking alignment before growing spans to the next order.
> Update the test case to verify the fix and prevent future regressions.
> 
> Fixes: e4865c2315 ('Page offline support in Xen side')
> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
> ---
>  tools/tests/native/offline-unaligned.c | 92 --------------------------
>  xen/common/page_alloc.c                |  5 ++
>  2 files changed, 5 insertions(+), 92 deletions(-)

As we will want to backport the bugfix (without the test), and as it makes
little sense ...

> --- a/tools/tests/native/offline-unaligned.c
> +++ b/tools/tests/native/offline-unaligned.c
> @@ -17,38 +17,6 @@
>   *
>   * Copyright (C) 2026 Cloud Software Group
>   */
> -#include "harness/common.h"
> -
> -/* test_bss_start must be first in the BSS segment */
> -void __aligned(PAGE_SIZE) *test_bss_start;
> -
> -/* Include xen/mm.h so we can wrap page_list_del() to assert the corruption. */
> -#define TEST_WRAP_XEN_INCLUDE_XEN_MM_H
> -#include "harness/mm-wrapper.h"
> -
> -static bool expect_free_list_corruption;
> -
> -/*
> - * Wrap page_list_del() to not fail the test by virtue of the prepared
> - * free list state but continue the test like a running Xen instance
> - * would in many cases. Assert and expect the corruption, and continue.
> - */
> -static inline void wrap_page_list_del(struct page_info *page,
> -                                      struct page_list_head *head)
> -{
> -    printf("page_list_del: page MFN %lu, order %u\n",
> -           mfn_x(page_to_mfn(page)), PFN_ORDER(page));
> -
> -    if ( expect_free_list_corruption )
> -        EXPECT_FAIL_BEGIN();
> -    CHECK(page->list.next && page->list.prev, "The free list is corrupt now!");
> -    if ( expect_free_list_corruption )
> -        EXPECT_FAIL_END(1);
> -
> -    if ( page->list.next && page->list.prev )
> -        page_list_del(page, head);
> -}
> -#define page_list_del(page, head) wrap_page_list_del(page, head)
>  
>  /*
>   * Include the main test library that sets up scenarios, asserts
> @@ -84,78 +52,18 @@ static void test_unaligned_buddy_merge(int start_mfn)
>       * | offlined page | single page     |    head page with a tail page    |
>       * +---------------+-----------------+-----------------+----------------+
>       */
> -    EXPECT_FAIL_BEGIN();
> -    /*
> -     * Due to a bug in reserve_offlined_page(), we get an unaligned buddy:
> -     * +---------------+-----------------+-----------------+----------------+
> -     * | offlined page |     head page with a tail page    | single page    |
> -     * +---------------+-----------------+-----------------+----------------+
> -     */
>      CHECK(page_aligned(pg + 1), "The buddy #%lu is not aligned to order-%d",
>            mfn_x(page_to_mfn(pg + 1)), PFN_ORDER(pg + 1));
> -    EXPECT_FAIL_END(1);
>  
>      /* Allocate and free a page to trigger buddy merging on free. */
> -
> -    /*
> -     * After allocating and freeing MFN 7, we get a double-freed MFN 6 due
> -     * to aligned predecessor merging in free_heap_pages():
> -     *
> -     *         MFN 4             MFN 5             MFN 6            MFN 7
> -     *   +---------------+-----------------+-----------------+
> -     *   | offlined page |    head page         tail page    |
> -     *   |               |       Unaligned buddies are       |
> -     *   |               |      an invariant violation!      |
> -     *   +---------------+-----------------+-----------------+----------------+
> -     *                                     |    head page        tail page    |
> -     *                                     +-----------------+----------------+
> -     */
> -    expect_free_list_corruption = true;
>      free_domheap_pages(alloc_domheap_pages(dom1, order0, 0), order0);
> -
> -    /*
> -     * At this point, the free list is already corrupt. In free_heap_pages(),
> -     * the tail of the unaligned buddy was added to the free list a 2nd time
> -     * as the page of an overlapping aligned buddy. This is per design of the
> -     * algorithm: These pages are free and thus the merging occurs as expected.
> -     *
> -     * The next allocation allocates the tail of the unaligned buddy, which
> -     * is now, due to the merge, also the head of the new aligned buddy.
> -     */
>      CHECK((pg = alloc_domheap_pages(dom1, order1, 0)), "Alloc the order-1 pg");
>  
>      /* Inspect the predecessor (pg is the tail of the unaligned buddy) */
> -    EXPECT_FAIL_BEGIN();
> -    /*
> -     * After allocating two more pages, MFN 6 is free AND in-use:
> -     *
> -     *         MFN 4             MFN 5             MFN 6            MFN 7
> -     *   +---------------+-----------------+-----------------+
> -     *   | offlined page |    head page         tail page    |
> -     *   +---------------+-----------------+-----------------+----------------+
> -     *                                     |    in-use page      in-use page  |
> -     *                                     +-----------------+----------------+
> -     */
>      CHECK(page_aligned(pg - 1), "The buddy #%lu is not aligned to order-%d!",
>            mfn_x(page_to_mfn(pg - 1)), PFN_ORDER(pg - 1));
> -    EXPECT_FAIL_END(1);
>  
>      /* Allocate the remaining page; a clean heap should not hit BUG(). */
> -    testcase_assert_expect_to_hit_bug = true;
> -    /*
> -     * As described above, if pg is the tail of an unaligned order-1 buddy,
> -     * the unaligned buddy is still on the free list and this allocation will
> -     * remove it from the free list and check alloc_heap_pages() checks the
> -     * buddies to have a reference count of zero, and the already allocated
> -     * page is returned as the tail of the unaligned buddy, causing the BUG().
> -     *
> -     *         MFN 4             MFN 5             MFN 6            MFN 7
> -     *   +---------------+-----------------+-----------------+
> -     *   | offlined page |    head page         tail page    | <- panic's Xen
> -     *   +---------------+-----------------+-----------------+----------------+
> -     *                                     |    in-use page      in-use page  |
> -     *                                     +-----------------+----------------+
> -     */
>      alloc_domheap_pages(dom1, order0, 0); /* Triggers BUG() */
>  }

... to first add a test covering the bad behavior (reporting it as good, by way
of the test succeeding), I think the actual bugfix (below wants to come first,
with the new test then being added to check for correct behavior right away.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1434,6 +1434,11 @@ static int reserve_offlined_page(struct page_info *head)
>              if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
>                  goto merge;
>  
> +            /* Do not grow to next_order if cur_head is not aligned to it. */
> +            if ( (mfn_x(page_to_mfn(cur_head)) & ((1UL << next_order) - 1)) )
> +                goto merge;

With this isolated, with the title changed to something which can be parsed
and doesn't duplicate "fix" as a word, and with the excess parentheses removed
from the if()'s expression:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

However, I'd like to suggest a possible simplification: Inductively we know
that cur_head is aligned to cur_order. Since next_order == cur_order + 1

            if ( mfn_x(page_to_mfn(cur_head)) & (1UL << cur_order) )
                goto merge;

ought to suffice? Of course if desired this could be written more explicitly
as

            if ( mfn_x(page_to_mfn(cur_head)) & (1UL << (next_order - 1)) )
                goto merge;

Yet overall I'd be tempted to drop the next_order variable altogether anyway
(not in this patch of course).

> +            /* Check if any page in the next_order range is offlined. */

This isn't quite accurate, as ...

>              for ( i = (1 << cur_order), pg = cur_head + (1 << cur_order );

... we start at 1 << cur_order. I.e. it's only the upper half of the range
covered by next_order which is being checked.

>                    i < (1 << next_order);
>                    i++, pg++ )

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash
  2026-06-02 14:37   ` Jan Beulich
@ 2026-06-03 14:19     ` Bernhard Kaindl
  0 siblings, 0 replies; 5+ messages in thread
From: Bernhard Kaindl @ 2026-06-03 14:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Anthony PERARD, Andrew Cooper, Michal Orzel, Julien Grall,
	Roger Pau Monne, Stefano Stabellini,
	xen-devel@lists.xenproject.org


[...]
> With this isolated, with the title changed to something which can be parsed
> and doesn't duplicate "fix" as a word, and with the excess parentheses removed
> from the if()'s expression:
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Ack, thanks for the review!

I'll submit v2 with all changes and the simplification momentarily.

   Bernhard

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-03 14:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 14:47 [PATCH 0/2] xen/mm: Fix offlining pages to avoid corrupting the heap Bernhard Kaindl
2026-05-28 14:47 ` [PATCH 1/2] tools/tests/native: Test for Xen Panic after memory offlining Bernhard Kaindl
2026-05-28 14:47 ` [PATCH 2/2] xen/mm: Fix offlining pages only make aligned buddies, fixes Xen crash Bernhard Kaindl
2026-06-02 14:37   ` Jan Beulich
2026-06-03 14:19     ` Bernhard Kaindl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.