public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/7] mm: dual-bitmap page allocator consistency checker
@ 2026-04-24 14:00 Sasha Levin
  2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin

Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect
access violations and content corruption, but none of them can detect
silent corruption in the page allocator's own metadata. If a hardware
bit flip corrupts an allocation bitmap, the allocator hands out a page
that is already in use (or fails to hand out a free one), and nothing
in the kernel notices. This series adds a dual-bitmap consistency checker
that maintains the invariant primary == ~secondary across two independently
allocated bitmaps, so that any single-bit corruption in either bitmap is
immediately detectable. The approach is based on NVIDIA safety research.

Field studies consistently show that DRAM errors at scale are far more
common than textbook assumptions suggest, even with ECC. Schroeder et al.
(SIGMETRICS 2009) found 8% of DIMMs experienced errors per year in
Google's fleet; Sridharan and Liberty (SC 2012) reported similar rates
at LANL; Meta's 2021-2022 work documented silent data corruption at
scale, including memory-related faults. The critical property of
allocator metadata corruption is that it doesn't trigger an invalid
memory access - the corrupted data is structurally valid, just wrong.
KASAN instruments accesses, not metadata integrity, so it cannot see
this class of fault.

Functional safety is a different discipline from security that aims
to reduce the risk of hardware and software misbehaving to an
acceptable level. Security hardens against adversaries; safety hardens
against random hardware failures (cosmic rays, cell wear-out, thermal
noise) and systematic software failures (bugs). ISO 26262 (automotive
functional safety) defines four Automotive Safety Integrity Levels,
ASIL A through D. ASIL-D, the most stringent, is derived from the
severity of the hazard in case of failure. IEC 61508 defines similar
levels (SIL-1 through SIL-4) for industrial systems, and there are
equivalent standards for avionics and medical devices. ISO 26262
requires Freedom From Interference (FFI): a safety element must not
be corrupted by faults in other elements. For an OS kernel, this means
the memory allocator's metadata must either be immune to corruption or
corruption must be detected before it propagates. The dual-bitmap
implements a way to protect from corruption coming from hardware or
software - two complementary representations of page allocation state,
allocated independently via memblock, where any single-bit fault in
either bitmap is immediately detectable. Performance is secondary to
correctness in this context. A safety mechanism must be simple enough
to audit and certify, must fail deterministically (panic, not
log-and-hope), and its correctness matters more than its throughput.
The dual-bitmap adds two atomic bitops per alloc/free, but for
safety-critical deployments this cost is acceptable because the
alternative - undetected corruption propagating silently - violates
the system's safety case. The static key ensures zero cost for kernels
that don't need it.

The natural question is why not use page_ext. The key objection from a
safety perspective is that page_ext stores per-page metadata in memory
that is itself subject to the same hardware faults we're trying to
detect. The dual-bitmap approach works because the two bitmaps are
independent allocations - corruption in one is caught by comparison
with the other. Embedding both in page_ext means a single fault could
corrupt both the tracking data and its redundant copy in the same
allocation region. ISO 26262 recommends this approach for protecting
against hardware faults, but it also helps against software faults -
co-locating both bitmaps in page_ext violates this principle. Beyond
the safety argument, there are practical issues: page_ext adds
8-100+ bytes per page depending on enabled features while the
dual-bitmap uses 2 bits per page total, and page_ext initializes
after the buddy allocator while the checker must be active before
memblock_free_all() hands pages to buddy.

Sasha Levin (7):
  mm: add generic dual-bitmap consistency primitives
  mm: add page consistency checker header
  mm: add Kconfig options for page consistency checker
  mm: add page consistency checker implementation
  mm/page_alloc: integrate page consistency hooks
  Documentation/mm: add page consistency checker documentation
  mm/page_consistency: add KUnit tests for dual-bitmap primitives

 Documentation/mm/index.rst            |   1 +
 Documentation/mm/page_consistency.rst | 211 +++++++++++++++
 MAINTAINERS                           |  10 +
 include/linux/dual_bitmap.h           | 216 ++++++++++++++++
 include/linux/page_consistency.h      |  84 ++++++
 mm/Kconfig.debug                      |  59 +++++
 mm/Makefile                           |   2 +
 mm/mm_init.c                          |   9 +
 mm/page_alloc.c                       |   4 +
 mm/page_consistency.c                 | 360 ++++++++++++++++++++++++++
 mm/page_consistency_test.c            | 274 ++++++++++++++++++++
 11 files changed, 1230 insertions(+)
 create mode 100644 Documentation/mm/page_consistency.rst
 create mode 100644 include/linux/dual_bitmap.h
 create mode 100644 include/linux/page_consistency.h
 create mode 100644 mm/page_consistency.c
 create mode 100644 mm/page_consistency_test.c

-- 
2.53.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC 1/7] mm: add generic dual-bitmap consistency primitives
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

Add a header-only library implementing a pair-of-complementary-bitmaps
integrity primitive: maintain two bitmaps where primary[i] == !secondary[i]
for every bit i, and detect corruption by checking that invariant.

The motivation (silent metadata corruption that KASAN/KFENCE cannot see,
plus the functional-safety argument for wanting this in the kernel) is
described in the cover letter; this patch only introduces the building
block.

The primary bitmap uses 1 for "allocated" and 0 for "free"; the secondary
uses the opposite convention. dual_bitmap_set() and dual_bitmap_clear()
update both bitmaps and return the previous primary bit so callers can
distinguish a real state transition from a double-alloc or double-free.
dual_bitmap_validate() walks every word and returns the number of words
that fail the invariant.

Concurrency note: set and clear perform two independent atomic bit
operations against the primary and secondary bitmaps, so the invariant
is transiently violated between those two ops. A concurrent reader can
observe an inconsistent pair on a healthy kernel. The validation
helpers absorb this by retrying a small number of times with cpu_relax()
and, after the retries, issuing an smp_rmb() and re-reading. Real
corruption is persistent and survives the retries; transient races
resolve within a few cpu_relax() loops. This keeps the update path
lock-free at the cost of a bounded false-positive probability under
extreme write rates, which is acceptable for a fail-stop integrity
check.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 MAINTAINERS                 |  10 ++
 include/linux/dual_bitmap.h | 216 ++++++++++++++++++++++++++++++++++++
 2 files changed, 226 insertions(+)
 create mode 100644 include/linux/dual_bitmap.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d1cc0e12fe1f..81b1f44215b3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19972,6 +19972,16 @@ F:	mm/page-writeback.c
 F:	mm/readahead.c
 F:	mm/truncate.c
 
+PAGE CONSISTENCY CHECKER
+M:	Sasha Levin <sashal@kernel.org>
+L:	linux-mm@kvack.org
+S:	Maintained
+F:	Documentation/mm/page_consistency.rst
+F:	include/linux/dual_bitmap.h
+F:	include/linux/page_consistency.h
+F:	mm/page_consistency.c
+F:	mm/page_consistency_test.c
+
 PAGE POOL
 M:	Jesper Dangaard Brouer <hawk@kernel.org>
 M:	Ilias Apalodimas <ilias.apalodimas@linaro.org>
diff --git a/include/linux/dual_bitmap.h b/include/linux/dual_bitmap.h
new file mode 100644
index 000000000000..136822267be1
--- /dev/null
+++ b/include/linux/dual_bitmap.h
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dual-bitmap consistency primitives
+ *
+ * Provides a generic library for maintaining dual bitmaps with the invariant
+ * that (primary == ~secondary). This pattern is useful for detecting
+ * single-bit memory corruption in bitmap-based data structures.
+ *
+ * Based on NVIDIA safety research.
+ */
+#ifndef _LINUX_DUAL_BITMAP_H
+#define _LINUX_DUAL_BITMAP_H
+
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/bitmap.h>
+#include <linux/bug.h>
+#include <asm/barrier.h>
+#include <linux/processor.h>
+
+/* Number of retries for transient inconsistencies from concurrent updates */
+#define DUAL_BITMAP_RETRY_COUNT 3
+
+/* Bitmap indices */
+enum dual_bitmap_index {
+	DUAL_BITMAP_PRIMARY = 0,	/* 0=free, 1=allocated */
+	DUAL_BITMAP_SECONDARY = 1,	/* 0=allocated, 1=free (complement) */
+	DUAL_BITMAP_COUNT = 2
+};
+
+/**
+ * struct dual_bitmap - Dual bitmap structure
+ * @bitmap: Array of two bitmap pointers [PRIMARY, SECONDARY]
+ * @nbits: Number of bits in each bitmap
+ */
+struct dual_bitmap {
+	unsigned long *bitmap[DUAL_BITMAP_COUNT];
+	unsigned int nbits;
+};
+
+/**
+ * dual_bitmap_consistent_word - Check if a word pair maintains the invariant
+ * @primary: Primary bitmap word
+ * @secondary: Secondary bitmap word
+ *
+ * Returns true if primary == ~secondary
+ */
+static inline bool dual_bitmap_consistent_word(unsigned long primary,
+					       unsigned long secondary)
+{
+	return primary == ~secondary;
+}
+
+/**
+ * dual_bitmap_set - Set bit in dual bitmap (mark as allocated)
+ * @db: Dual bitmap structure
+ * @bit: Bit position to set
+ *
+ * Sets bit in primary and clears corresponding bit in secondary.
+ * Returns the old value of the primary bit (true if was already set).
+ */
+static inline bool dual_bitmap_set(struct dual_bitmap *db, unsigned long bit)
+{
+	bool was_set;
+
+	if (WARN_ON_ONCE(bit >= db->nbits))
+		return false;
+
+	was_set = test_and_set_bit(bit, db->bitmap[DUAL_BITMAP_PRIMARY]);
+	test_and_clear_bit(bit, db->bitmap[DUAL_BITMAP_SECONDARY]);
+
+	return was_set;
+}
+
+/**
+ * dual_bitmap_clear - Clear bit in dual bitmap (mark as free)
+ * @db: Dual bitmap structure
+ * @bit: Bit position to clear
+ *
+ * Clears bit in primary and sets corresponding bit in secondary.
+ * Returns the old value of the primary bit (true if was set).
+ */
+static inline bool dual_bitmap_clear(struct dual_bitmap *db, unsigned long bit)
+{
+	bool was_set;
+
+	if (WARN_ON_ONCE(bit >= db->nbits))
+		return false;
+
+	was_set = test_and_clear_bit(bit, db->bitmap[DUAL_BITMAP_PRIMARY]);
+	test_and_set_bit(bit, db->bitmap[DUAL_BITMAP_SECONDARY]);
+
+	return was_set;
+}
+
+/**
+ * dual_bitmap_test - Test if bit is set in primary bitmap
+ * @db: Dual bitmap structure
+ * @bit: Bit position to test
+ *
+ * Returns true if bit is set in primary (allocated), false if clear (free).
+ */
+static inline bool dual_bitmap_test(const struct dual_bitmap *db,
+				    unsigned long bit)
+{
+	if (WARN_ON_ONCE(bit >= db->nbits))
+		return false;
+
+	return test_bit(bit, db->bitmap[DUAL_BITMAP_PRIMARY]);
+}
+
+/**
+ * dual_bitmap_consistent - Check consistency of a single bit
+ * @db: Dual bitmap structure
+ * @bit: Bit position to check
+ *
+ * Returns true if the bit values are consistent (primary != secondary).
+ * Uses retry logic to handle transient inconsistencies from concurrent
+ * updates - real corruption persists while races resolve quickly.
+ */
+static inline bool dual_bitmap_consistent(const struct dual_bitmap *db,
+					  unsigned long bit)
+{
+	int retries = DUAL_BITMAP_RETRY_COUNT;
+
+	if (WARN_ON_ONCE(bit >= db->nbits))
+		return false;
+
+	do {
+		bool primary = test_bit(bit, db->bitmap[DUAL_BITMAP_PRIMARY]);
+		bool secondary = test_bit(bit, db->bitmap[DUAL_BITMAP_SECONDARY]);
+
+		if (primary != secondary)
+			return true;  /* Consistent */
+
+		/* Inconsistent - could be transient race, retry */
+		cpu_relax();
+	} while (--retries > 0);
+
+	/*
+	 * Inconsistent after retries. Issue a read barrier and check
+	 * one last time to rule out stale/reordered reads.
+	 *
+	 * Note: the two test_bit() calls are still non-atomic w.r.t.
+	 * each other, so a concurrent set/clear between them can cause
+	 * a transient false positive. This is acceptable because real
+	 * corruption is persistent and will be caught on the next check.
+	 */
+	smp_rmb();
+	return test_bit(bit, db->bitmap[DUAL_BITMAP_PRIMARY]) !=
+	       test_bit(bit, db->bitmap[DUAL_BITMAP_SECONDARY]);
+}
+
+/**
+ * dual_bitmap_validate - Validate entire dual bitmap
+ * @db: Dual bitmap structure
+ *
+ * Checks that the invariant (primary == ~secondary) holds for all words.
+ * Uses retry logic to handle transient inconsistencies from concurrent
+ * updates - real corruption persists while races resolve quickly.
+ * Returns the number of inconsistent words found (0 = all consistent).
+ *
+ * Note: this is a cold-path diagnostic function kept inline for
+ * header-only library simplicity. It should not be called in hot paths.
+ */
+static inline unsigned long dual_bitmap_validate(const struct dual_bitmap *db)
+{
+	unsigned int words = BITS_TO_LONGS(db->nbits);
+	unsigned long violations = 0;
+	unsigned int i;
+
+	for (i = 0; i < words; i++) {
+		unsigned long primary, secondary;
+		int retries = DUAL_BITMAP_RETRY_COUNT;
+
+		do {
+			primary = READ_ONCE(db->bitmap[DUAL_BITMAP_PRIMARY][i]);
+			secondary = READ_ONCE(db->bitmap[DUAL_BITMAP_SECONDARY][i]);
+
+			if (dual_bitmap_consistent_word(primary, secondary))
+				break;  /* Consistent, move to next word */
+
+			cpu_relax();
+		} while (--retries > 0);
+
+		if (retries == 0) {
+			/*
+			 * Inconsistent after retries. Issue a read
+			 * barrier and re-read to rule out stale/reordered
+			 * memory views before declaring corruption.
+			 */
+			smp_rmb();
+			primary = READ_ONCE(db->bitmap[DUAL_BITMAP_PRIMARY][i]);
+			secondary = READ_ONCE(db->bitmap[DUAL_BITMAP_SECONDARY][i]);
+			if (!dual_bitmap_consistent_word(primary, secondary))
+				violations++;
+		}
+	}
+
+	return violations;
+}
+
+/**
+ * dual_bitmap_init - Initialize dual bitmap to empty state
+ * @db: Dual bitmap structure
+ *
+ * Sets primary to all zeros (nothing allocated) and secondary to all ones.
+ * The bitmaps must already be allocated before calling this.
+ */
+static inline void dual_bitmap_init(struct dual_bitmap *db)
+{
+	bitmap_zero(db->bitmap[DUAL_BITMAP_PRIMARY], db->nbits);
+	bitmap_fill(db->bitmap[DUAL_BITMAP_SECONDARY], db->nbits);
+}
+
+#endif /* _LINUX_DUAL_BITMAP_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 2/7] mm: add page consistency checker header
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
  2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

Define the interface for CONFIG_DEBUG_PAGE_CONSISTENCY. The API mirrors
the pattern used by page_table_check: inline wrapper functions check a
static key before calling the out-of-line tracking implementation, so
that callers in the allocator hot path only pay for a predicted-not-taken
branch when the feature is built in but not active.

The header is kept separate from the implementation so the hooks in
page_alloc.c can be added without pulling in implementation details.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 include/linux/page_consistency.h | 84 ++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 include/linux/page_consistency.h

diff --git a/include/linux/page_consistency.h b/include/linux/page_consistency.h
new file mode 100644
index 000000000000..f335fa3d6c5d
--- /dev/null
+++ b/include/linux/page_consistency.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dual-bitmap page consistency checking
+ *
+ * Provides corruption detection for page allocations using complementary
+ * bitmaps where the invariant (primary == ~secondary) must hold.
+ *
+ * Based on NVIDIA safety research.
+ */
+#ifndef _LINUX_PAGE_CONSISTENCY_H
+#define _LINUX_PAGE_CONSISTENCY_H
+
+#include <linux/types.h>
+#include <linux/mm_types.h>
+
+/* Return codes for page consistency checking */
+enum page_consistency_result {
+	PAGE_CONSISTENCY_OK = 0,
+	PAGE_CONSISTENCY_MISMATCH,
+	PAGE_CONSISTENCY_NOT_TRACKED,
+};
+
+#ifdef CONFIG_DEBUG_PAGE_CONSISTENCY
+
+#include <linux/jump_label.h>
+DECLARE_STATIC_KEY_FALSE(page_consistency_enabled);
+
+/* Initialization - called during mm_core_init() */
+void __init page_consistency_init(void);
+
+/* Core tracking functions */
+void __page_consistency_alloc(struct page *page, unsigned int order);
+void __page_consistency_free(struct page *page, unsigned int order);
+
+/* Validation functions */
+enum page_consistency_result page_consistency_check_page(struct page *page);
+enum page_consistency_result page_consistency_validate_all(void);
+
+/**
+ * page_consistency_alloc - Track page allocation
+ * @page: Allocated page
+ * @order: Allocation order
+ *
+ * Called from post_alloc_hook() to track page allocations.
+ * The static key avoids the out-of-line tracking call until initialization.
+ */
+static inline void page_consistency_alloc(struct page *page, unsigned int order)
+{
+	if (static_branch_unlikely(&page_consistency_enabled))
+		__page_consistency_alloc(page, order);
+}
+
+/**
+ * page_consistency_free - Track page free
+ * @page: Page being freed
+ * @order: Free order
+ *
+ * Called from free_pages_prepare() to track page frees.
+ * The static key avoids the out-of-line tracking call until initialization.
+ */
+static inline void page_consistency_free(struct page *page, unsigned int order)
+{
+	if (static_branch_unlikely(&page_consistency_enabled))
+		__page_consistency_free(page, order);
+}
+
+#else /* !CONFIG_DEBUG_PAGE_CONSISTENCY */
+
+static inline void __init page_consistency_init(void) {}
+static inline void page_consistency_alloc(struct page *page, unsigned int order) {}
+static inline void page_consistency_free(struct page *page, unsigned int order) {}
+
+static inline enum page_consistency_result page_consistency_check_page(struct page *page)
+{
+	return PAGE_CONSISTENCY_OK;
+}
+
+static inline enum page_consistency_result page_consistency_validate_all(void)
+{
+	return PAGE_CONSISTENCY_OK;
+}
+
+#endif /* CONFIG_DEBUG_PAGE_CONSISTENCY */
+#endif /* _LINUX_PAGE_CONSISTENCY_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 3/7] mm: add Kconfig options for page consistency checker
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
  2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
  2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

Add two configuration options for the dual-bitmap page consistency
checker.

DEBUG_PAGE_CONSISTENCY enables the feature itself. It depends on
DEBUG_KERNEL since this is a debugging tool, and selects DEBUG_FS to
provide the statistics interface. Memory overhead is two bits per
physical page frame across two bitmaps, so about 1 MB for a 16 GB
system. The bitmaps are statically sized at boot from memblock, so
memory hotplug is not supported and the option depends on
!MEMORY_HOTPLUG.

DEBUG_PAGE_CONSISTENCY_PANIC controls the response to a detected
violation. When enabled (the default) the kernel panics on
double-alloc, double-free, or bitmap corruption; when disabled it
logs a warning and continues.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 mm/Kconfig.debug | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 7638d75b27db..a005c904677c 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -144,6 +144,65 @@ config PAGE_TABLE_CHECK_ENFORCED
 
 	  If unsure say "n".
 
+config DEBUG_PAGE_CONSISTENCY
+	bool "Debug page allocator with dual-bitmap consistency checking"
+	depends on DEBUG_KERNEL
+	depends on !MEMORY_HOTPLUG
+	select DEBUG_FS
+	help
+	  Enable dual-bitmap tracking of page allocations for corruption
+	  detection. Uses two complementary bitmaps where the invariant
+	  (primary == ~secondary) must hold. Any bit flip in either bitmap
+	  will be detected.
+
+	  This is useful for safety-critical systems requiring Freedom From
+	  Interference (FFI) guarantees per ISO 26262 (ASIL-D) and IEC 61508
+	  (SIL-3).
+
+	  When disabled, the hooks compile away. When enabled, a static key
+	  gates tracking until initialization succeeds. The bitmaps are flat,
+	  covering the entire PFN range from memblock_start_of_DRAM() to
+	  memblock_end_of_DRAM() including any holes. This is deliberate:
+	  simple (pfn - min_pfn) indexing is trivially auditable and avoids
+	  auxiliary data structures that could themselves be subject to
+	  corruption. Memory overhead is two bits per PFN in the spanned
+	  range, e.g. ~4 MB total for a 64 GB system. Waste from holes is
+	  typically under 2%.
+
+	  Based on NVIDIA safety research.
+
+	  If unsure, say N.
+
+config DEBUG_PAGE_CONSISTENCY_PANIC
+	bool "Panic on page consistency failure"
+	depends on DEBUG_PAGE_CONSISTENCY
+	default y
+	help
+	  If enabled, the kernel will panic when a page consistency
+	  violation is detected, such as double-alloc or double-free.
+
+	  If disabled, a WARN with a stack trace is emitted and execution
+	  continues.
+
+	  For safety-critical systems, say Y.
+	  For debugging/development, say N.
+
+config DEBUG_PAGE_CONSISTENCY_KUNIT_TEST
+	tristate "KUnit tests for dual-bitmap consistency primitives" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Enable KUnit tests for the dual-bitmap primitives defined in
+	  <linux/dual_bitmap.h>. These tests verify the core algorithm:
+	  setting and clearing bits in complementary bitmaps, detecting
+	  double-set and double-clear conditions, and detecting simulated
+	  corruption.
+
+	  The tests exercise only the header-only dual_bitmap library and
+	  do not require CONFIG_DEBUG_PAGE_CONSISTENCY.
+
+	  If unsure, say N.
+
 config PAGE_POISONING
 	bool "Poison pages after freeing"
 	help
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
                   ` (2 preceding siblings ...)
  2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 14:25   ` David Hildenbrand (Arm)
  2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

This is the core implementation of the dual-bitmap page allocator
consistency checker.

During initialization, two bitmaps are allocated covering the physical
memory range reported by memblock. The primary bitmap starts zeroed
(all pages free) and the secondary bitmap starts filled with ones
(the complement). As pages are allocated and freed, both bitmaps are
updated atomically using test_and_set_bit / test_and_clear_bit.

The mark_page_state() helper is the heart of the checker. When
allocating, it sets the bit in primary and clears it in secondary.
If the primary bit was already set, that indicates a double-alloc
and triggers a panic (or warning, depending on config). The free
path updates the bitmaps but defers double-free detection until
after boot completes, since reserved boot memory pages are
legitimately freed via free_reserved_area() and free_initmem()
without ever being allocated through the buddy allocator.

mark_page_state() returns whether the bitmap state actually changed,
and the pages_tracked counter is only updated for real transitions.
This keeps the counter an accurate reflection of the number of bits
currently set in the primary bitmap, rather than a signed delta that
can go negative during boot because of the reserved-area / initmem
frees described above. The same property means post-boot "freeing of
untracked pages" (e.g. a driver unloading a region it received via
memblock) is detected as a real violation; by construction this code
path remains a very small surface.

Initialization validates that the spanned PFN range fits in an unsigned
int (bitmap_bytes and the bitmap APIs are bounded by that) and disables
the feature if it does not; a zero-span memblock is treated the same
way.

A debugfs interface is provided at /sys/kernel/debug/page_consistency/
with two files: "stats" shows counters for allocations, frees, and
violations detected, while writing to "validate" triggers a full scan
that checks the complement invariant holds for every bitmap word.

The enable check at the debugfs late_initcall uses static_key_enabled()
rather than the static_branch_unlikely() hot-path helper, which is the
idiomatic form for a cold init path.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 mm/Makefile           |   1 +
 mm/page_consistency.c | 360 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 361 insertions(+)
 create mode 100644 mm/page_consistency.c

diff --git a/mm/Makefile b/mm/Makefile
index 8ad2ab08244e..2ee360001456 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -128,6 +128,7 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o
 obj-$(CONFIG_BALLOON) += balloon.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_PAGE_TABLE_CHECK) += page_table_check.o
+obj-$(CONFIG_DEBUG_PAGE_CONSISTENCY) += page_consistency.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_SECRETMEM) += secretmem.o
 obj-$(CONFIG_CMA_SYSFS) += cma_sysfs.o
diff --git a/mm/page_consistency.c b/mm/page_consistency.c
new file mode 100644
index 000000000000..f98059a1dcc0
--- /dev/null
+++ b/mm/page_consistency.c
@@ -0,0 +1,360 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Dual-bitmap page allocation consistency checker
+ *
+ * Provides corruption detection for page allocations using complementary
+ * bitmaps. The invariant (primary == ~secondary) detects any single-bit
+ * corruption in either bitmap.
+ *
+ * Based on NVIDIA safety research.
+ */
+
+#define pr_fmt(fmt) "page_consistency: " fmt
+
+#include <linux/page_consistency.h>
+#include <linux/dual_bitmap.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+#include <linux/bitmap.h>
+#include <linux/atomic.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/limits.h>
+
+DEFINE_STATIC_KEY_FALSE(page_consistency_enabled);
+
+struct page_consistency_stats {
+	atomic64_t pages_tracked;
+	atomic64_t alloc_count;
+	atomic64_t free_count;
+	atomic64_t violations_detected;
+};
+
+static struct page_consistency_stats page_consistency_stats;
+
+/* Internal state */
+static struct {
+	struct dual_bitmap db;
+	unsigned long min_pfn;
+	unsigned long max_pfn;
+} pc_state __ro_after_init;
+
+/**
+ * pfn_to_bit - Convert PFN to bitmap bit index
+ * @pfn: Page frame number
+ *
+ * Returns the bit index in the bitmap for the given PFN.
+ */
+static inline unsigned long pfn_to_bit(unsigned long pfn)
+{
+	return pfn - pc_state.min_pfn;
+}
+
+/**
+ * pfn_in_range - Check if PFN is within tracked range
+ * @pfn: Page frame number to check
+ *
+ * Returns true if the PFN is within the range being tracked.
+ */
+static inline bool pfn_in_range(unsigned long pfn)
+{
+	return pfn >= pc_state.min_pfn && pfn < pc_state.max_pfn;
+}
+
+/**
+ * mark_page_state - Update both bitmaps for a page state change
+ * @pfn: Page frame number
+ * @is_alloc: true for allocation, false for free
+ *
+ * Updates both bitmaps atomically and detects double-alloc/double-free.
+ * Double-free detection is deferred until system_state reaches SYSTEM_RUNNING
+ * because reserved boot memory pages may be freed via free_reserved_area()
+ * and free_initmem() without ever being allocated through the buddy allocator.
+ *
+ * Returns true if the primary bit actually transitioned to the requested
+ * state (0->1 for alloc, 1->0 for free), false if it was already in that
+ * state. Callers use this to keep pages_tracked an accurate reflection of
+ * the number of bits set in the primary bitmap.
+ */
+static bool mark_page_state(unsigned long pfn, bool is_alloc)
+{
+	unsigned long bit = pfn_to_bit(pfn);
+	bool was_allocated;
+
+	/*
+	 * Check the complement invariant before the update. The dual bitops
+	 * below unconditionally write the secondary bit, so a corruption
+	 * confined to the secondary bitmap would be silently erased by the
+	 * very next alloc/free on that PFN. Primary-only corruption is still
+	 * caught via the was_allocated check; this pre-check closes the gap
+	 * for the secondary side so that corruption is reported symmetrically.
+	 */
+	if (unlikely(!dual_bitmap_consistent(&pc_state.db, bit))) {
+		atomic64_inc(&page_consistency_stats.violations_detected);
+#ifdef CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC
+		panic("page_consistency: bitmap corruption at PFN %lu before %s\n",
+		      pfn, is_alloc ? "alloc" : "free");
+#else
+		WARN(1, "page_consistency: bitmap corruption at PFN %lu before %s\n",
+		     pfn, is_alloc ? "alloc" : "free");
+#endif
+	}
+
+	if (is_alloc) {
+		was_allocated = dual_bitmap_set(&pc_state.db, bit);
+		if (unlikely(was_allocated)) {
+			atomic64_inc(&page_consistency_stats.violations_detected);
+#ifdef CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC
+			panic("page_consistency: DOUBLE-ALLOC detected: PFN %lu\n",
+			      pfn);
+#else
+			WARN(1, "page_consistency: DOUBLE-ALLOC detected: PFN %lu\n",
+			     pfn);
+#endif
+			return false;
+		}
+		return true;
+	}
+
+	was_allocated = dual_bitmap_clear(&pc_state.db, bit);
+	if (!was_allocated) {
+		/*
+		 * Only flag double-free after system is fully running.
+		 * During boot, free_reserved_area() and free_initmem() free
+		 * pages never allocated through the buddy allocator - these
+		 * are not bugs. system_state reaches SYSTEM_RUNNING only after
+		 * all such freeing is complete.
+		 */
+		if (unlikely(system_state >= SYSTEM_RUNNING)) {
+			atomic64_inc(&page_consistency_stats.violations_detected);
+#ifdef CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC
+			panic("page_consistency: DOUBLE-FREE detected: PFN %lu\n",
+			      pfn);
+#else
+			WARN(1, "page_consistency: DOUBLE-FREE detected: PFN %lu\n",
+			     pfn);
+#endif
+		}
+		return false;
+	}
+	return true;
+}
+
+/**
+ * __page_consistency_alloc - Track page allocation
+ * @page: Allocated page
+ * @order: Allocation order
+ *
+ * Called from post_alloc_hook() when page_consistency_enabled is true.
+ */
+void __page_consistency_alloc(struct page *page, unsigned int order)
+{
+	unsigned long pfn = page_to_pfn(page);
+	unsigned int nr_pages = 1U << order;
+	unsigned long last_pfn = pfn + nr_pages - 1;
+	unsigned int i, transitions = 0;
+
+	if (!pfn_in_range(pfn) || !pfn_in_range(last_pfn))
+		return;
+
+	for (i = 0; i < nr_pages; i++)
+		if (mark_page_state(pfn + i, true))
+			transitions++;
+
+	atomic64_add(transitions, &page_consistency_stats.pages_tracked);
+	atomic64_inc(&page_consistency_stats.alloc_count);
+}
+
+/**
+ * __page_consistency_free - Track page free
+ * @page: Page being freed
+ * @order: Free order
+ *
+ * Called from free_pages_prepare() when page_consistency_enabled is true.
+ */
+void __page_consistency_free(struct page *page, unsigned int order)
+{
+	unsigned long pfn = page_to_pfn(page);
+	unsigned int nr_pages = 1U << order;
+	unsigned long last_pfn = pfn + nr_pages - 1;
+	unsigned int i, transitions = 0;
+
+	if (!pfn_in_range(pfn) || !pfn_in_range(last_pfn))
+		return;
+
+	for (i = 0; i < nr_pages; i++)
+		if (mark_page_state(pfn + i, false))
+			transitions++;
+
+	atomic64_sub(transitions, &page_consistency_stats.pages_tracked);
+	atomic64_inc(&page_consistency_stats.free_count);
+}
+
+/**
+ * page_consistency_check_page - Check consistency for a single page
+ * @page: Page to check
+ *
+ * Returns PAGE_CONSISTENCY_OK if consistent, PAGE_CONSISTENCY_MISMATCH
+ * if corruption detected, or PAGE_CONSISTENCY_NOT_TRACKED if outside range.
+ */
+enum page_consistency_result page_consistency_check_page(struct page *page)
+{
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long bit;
+
+	if (!pfn_in_range(pfn))
+		return PAGE_CONSISTENCY_NOT_TRACKED;
+
+	bit = pfn_to_bit(pfn);
+
+	if (!dual_bitmap_consistent(&pc_state.db, bit)) {
+		atomic64_inc(&page_consistency_stats.violations_detected);
+		pr_err("Consistency violation for PFN %lu\n", pfn);
+		return PAGE_CONSISTENCY_MISMATCH;
+	}
+
+	return PAGE_CONSISTENCY_OK;
+}
+
+/**
+ * page_consistency_validate_all - Validate entire bitmap
+ *
+ * Performs a full consistency check of all bitmap words.
+ * Returns PAGE_CONSISTENCY_OK if all consistent, PAGE_CONSISTENCY_MISMATCH
+ * if any violations found.
+ */
+enum page_consistency_result page_consistency_validate_all(void)
+{
+	unsigned long violations;
+
+	violations = dual_bitmap_validate(&pc_state.db);
+
+	if (violations) {
+		/*
+		 * violations counts inconsistent words, not bits. One word
+		 * could contain up to BITS_PER_LONG corrupted bits.
+		 */
+		atomic64_add(violations, &page_consistency_stats.violations_detected);
+		pr_err("Validation found %lu inconsistent words\n", violations);
+		return PAGE_CONSISTENCY_MISMATCH;
+	}
+
+	pr_info("Validation passed: %u bits checked\n", pc_state.db.nbits);
+	return PAGE_CONSISTENCY_OK;
+}
+
+#ifdef CONFIG_DEBUG_FS
+/* Debugfs interface */
+
+static int stats_show(struct seq_file *m, void *v)
+{
+	seq_printf(m, "pages_tracked:       %lld\n",
+		   atomic64_read(&page_consistency_stats.pages_tracked));
+	seq_printf(m, "alloc_count:         %lld\n",
+		   atomic64_read(&page_consistency_stats.alloc_count));
+	seq_printf(m, "free_count:          %lld\n",
+		   atomic64_read(&page_consistency_stats.free_count));
+	seq_printf(m, "violations_detected: %lld\n",
+		   atomic64_read(&page_consistency_stats.violations_detected));
+	seq_printf(m, "bitmap_size_bits:    %u\n", pc_state.db.nbits);
+	seq_printf(m, "pfn_range:           [%lu-%lu)\n",
+		   pc_state.min_pfn, pc_state.max_pfn);
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(stats);
+
+static ssize_t validate_write(struct file *file, const char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	int result = page_consistency_validate_all();
+
+	return result == PAGE_CONSISTENCY_OK ? count : -EIO;
+}
+
+static const struct file_operations validate_fops = {
+	.write = validate_write,
+	.llseek = noop_llseek,
+};
+
+static int __init page_consistency_debugfs_init(void)
+{
+	struct dentry *dir;
+
+	if (!static_key_enabled(&page_consistency_enabled.key))
+		return 0;
+
+	dir = debugfs_create_dir("page_consistency", NULL);
+	debugfs_create_file("stats", 0444, dir, NULL, &stats_fops);
+	debugfs_create_file("validate", 0200, dir, NULL, &validate_fops);
+
+	return 0;
+}
+late_initcall(page_consistency_debugfs_init);
+#endif /* CONFIG_DEBUG_FS */
+
+/**
+ * page_consistency_init - Initialize the page consistency checker
+ *
+ * Called during mm initialization to set up the dual bitmap tracking.
+ * Must be called while memblock is still active (before memblock_free_all()).
+ */
+void __init page_consistency_init(void)
+{
+	unsigned long spanned_pfns;
+	size_t bitmap_bytes;
+
+	/*
+	 * Size bitmaps to cover the full PFN range including any holes.
+	 * Holes waste a few bits but a flat bitmap keeps the indexing
+	 * trivial (pfn - min_pfn) and avoids additional data structures
+	 * that would themselves be subject to corruption.  This matches
+	 * the approach used by pageblock_flags.
+	 */
+	pc_state.min_pfn = PHYS_PFN(memblock_start_of_DRAM());
+	pc_state.max_pfn = PHYS_PFN(memblock_end_of_DRAM());
+	spanned_pfns = pc_state.max_pfn - pc_state.min_pfn;
+	if (!spanned_pfns || spanned_pfns > UINT_MAX) {
+		pr_err("PFN span %lu cannot be represented by bitmap APIs, feature disabled\n",
+		       spanned_pfns);
+		return;
+	}
+
+	pc_state.db.nbits = spanned_pfns;
+
+	bitmap_bytes = BITS_TO_LONGS(pc_state.db.nbits) * sizeof(unsigned long);
+
+	pr_info("Initializing: PFN range [%lu-%lu), %u bits (%zu KB per bitmap)\n",
+		pc_state.min_pfn, pc_state.max_pfn, pc_state.db.nbits,
+		bitmap_bytes / 1024);
+
+	/* Allocate primary bitmap (zeroed by memblock_alloc) */
+	pc_state.db.bitmap[DUAL_BITMAP_PRIMARY] =
+		memblock_alloc(bitmap_bytes, SMP_CACHE_BYTES);
+	if (!pc_state.db.bitmap[DUAL_BITMAP_PRIMARY]) {
+		pr_err("Failed to allocate primary bitmap, feature disabled\n");
+		return;
+	}
+
+	/* Allocate secondary bitmap */
+	pc_state.db.bitmap[DUAL_BITMAP_SECONDARY] =
+		memblock_alloc(bitmap_bytes, SMP_CACHE_BYTES);
+	if (!pc_state.db.bitmap[DUAL_BITMAP_SECONDARY]) {
+		pr_err("Failed to allocate secondary bitmap, feature disabled\n");
+		memblock_free(pc_state.db.bitmap[DUAL_BITMAP_PRIMARY],
+			      bitmap_bytes);
+		pc_state.db.bitmap[DUAL_BITMAP_PRIMARY] = NULL;
+		return;
+	}
+
+	/*
+	 * Initialize: primary all zeros (already done by memblock_alloc),
+	 * secondary all ones. Use dual_bitmap_init() for consistency.
+	 */
+	dual_bitmap_init(&pc_state.db);
+
+	/* Enable tracking */
+	static_branch_enable(&page_consistency_enabled);
+	pr_info("Initialized successfully, tracking enabled\n");
+}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 5/7] mm/page_alloc: integrate page consistency hooks
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
                   ` (3 preceding siblings ...)
  2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 14:00 ` [RFC 6/7] Documentation/mm: add page consistency checker documentation Sasha Levin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

Wire up the page consistency checker with the page allocator by adding
tracking hooks in the allocation and free paths. The hooks follow the
same pattern already established by page_owner and page_table_check,
inserting calls at the points where page state transitions occur.

In post_alloc_hook(), a call to page_consistency_alloc() is added after
the page_table_check_alloc() call. This records the allocation in both
bitmaps, setting the primary bit and clearing the secondary bit for each
page in the allocation.

In __free_pages_prepare(), calls to page_consistency_free() are added
in both the early-return path for the hwpoison check and the normal
exit path. These calls clear the primary bitmap and set the secondary
bitmap, maintaining the complementary relationship that enables
corruption detection. The free hook lives in the internal
__free_pages_prepare() rather than its free_pages_prepare() wrapper so
that every free path (including the bulk folio free path) is observed
exactly once.

Initialization is hooked into mm_core_init() immediately before
memblock_free_all(), while memblock is still active so it can use
memblock_alloc() for the bitmaps, and after kho_memory_init() so that
any memory handed back by a kexec-handover source has already been
accounted.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 mm/mm_init.c    | 9 +++++++++
 mm/page_alloc.c | 4 ++++
 2 files changed, 13 insertions(+)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index df34797691bd..4d9495fb8789 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -22,6 +22,7 @@
 #include <linux/kmemleak.h>
 #include <linux/kfence.h>
 #include <linux/page_ext.h>
+#include <linux/page_consistency.h>
 #include <linux/pti.h>
 #include <linux/pgtable.h>
 #include <linux/stackdepot.h>
@@ -2717,6 +2718,14 @@ void __init mm_core_init(void)
 	 */
 	kho_memory_init();
 
+	/*
+	 * page_consistency_init() must run while memblock is active so it
+	 * can use memblock_alloc() for the bitmaps.  Boot-time reserved pages
+	 * may be freed before SYSTEM_RUNNING without ever having been allocated
+	 * through the buddy allocator, so the checker suppresses double-free
+	 * reports until boot has completed.
+	 */
+	page_consistency_init();
 	memblock_free_all();
 	mem_init();
 	kmem_cache_init();
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2d4b6f1a554e..ae8f619875e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -46,6 +46,7 @@
 #include <linux/sched/mm.h>
 #include <linux/page_owner.h>
 #include <linux/page_table_check.h>
+#include <linux/page_consistency.h>
 #include <linux/memcontrol.h>
 #include <linux/ftrace.h>
 #include <linux/lockdep.h>
@@ -1374,6 +1375,7 @@ __always_inline bool __free_pages_prepare(struct page *page,
 		/* Do not let hwpoison pages hit pcplists/buddy */
 		reset_page_owner(page, order);
 		page_table_check_free(page, order);
+		page_consistency_free(page, order);
 		pgalloc_tag_sub(page, 1 << order);
 
 		/*
@@ -1432,6 +1434,7 @@ __always_inline bool __free_pages_prepare(struct page *page,
 	page->private = 0;
 	reset_page_owner(page, order);
 	page_table_check_free(page, order);
+	page_consistency_free(page, order);
 	pgalloc_tag_sub(page, 1 << order);
 
 	if (!PageHighMem(page) && !(fpi_flags & FPI_TRYLOCK)) {
@@ -1888,6 +1891,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 
 	set_page_owner(page, order, gfp_flags);
 	page_table_check_alloc(page, order);
+	page_consistency_alloc(page, order);
 	pgalloc_tag_add(page, current, 1 << order);
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 6/7] Documentation/mm: add page consistency checker documentation
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
                   ` (4 preceding siblings ...)
  2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

Add documentation for the page consistency checker feature. The document
explains the dual-bitmap algorithm, describes the configuration options,
and covers the debugfs interface for monitoring and validation.

The algorithm section explains how the complementary bitmaps work: the
primary bitmap uses 1 for allocated and 0 for free, while the secondary
bitmap uses the opposite convention. This redundancy means any single-bit
corruption in either bitmap will cause a detectable violation of the
invariant that primary[bit] must equal ~secondary[bit].

The document also explains the intentional limitation around double-free
detection. During boot, free_reserved_area() releases pages that were
never allocated through the buddy allocator. Flagging these as errors
would generate many false positives, so double-free detection is
deferred until after boot completes.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 Documentation/mm/index.rst            |   1 +
 Documentation/mm/page_consistency.rst | 211 ++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100644 Documentation/mm/page_consistency.rst

diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
index 7aa2a8886908..bef6c9bbc976 100644
--- a/Documentation/mm/index.rst
+++ b/Documentation/mm/index.rst
@@ -57,6 +57,7 @@ documentation, or deleted if it has served its purpose.
    page_frags
    page_owner
    page_table_check
+   page_consistency
    remap_file_pages
    split_page_table_lock
    transhuge
diff --git a/Documentation/mm/page_consistency.rst b/Documentation/mm/page_consistency.rst
new file mode 100644
index 000000000000..dd1bde68f1a5
--- /dev/null
+++ b/Documentation/mm/page_consistency.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Page Consistency Checker
+=======================
+
+The page consistency checker is a debugging feature that uses dual
+complementary bitmaps to detect corruption in page allocation tracking.
+It maintains the invariant that for every bit position, the primary
+bitmap value equals the bitwise complement of the secondary bitmap value.
+
+Overview
+========
+
+Memory corruption can silently flip bits in kernel data structures,
+leading to difficult-to-diagnose failures. The page consistency checker
+addresses this by maintaining redundant tracking of page allocation
+state. Any single-bit corruption in either bitmap will cause a detectable
+inconsistency, allowing the corruption to be caught rather than causing
+silent data corruption or mysterious crashes later.
+
+The bitmaps are flat, covering the entire PFN range from
+``memblock_start_of_DRAM()`` to ``memblock_end_of_DRAM()`` including any
+holes in physical memory. This is a deliberate design choice: simple
+``pfn - min_pfn`` indexing is trivially auditable, which matters for a
+safety mechanism. Sparse or section-aware indexing would add auxiliary
+data structures that could themselves be subject to corruption. See
+`Limitations`_ for a detailed analysis of memory overhead including
+holes.
+
+The approach is based on NVIDIA safety research and is
+particularly useful for safety-critical systems requiring Freedom From
+Interference (FFI) guarantees per ISO 26262 (ASIL-D) and IEC 61508
+(SIL-3).
+
+Algorithm
+=========
+
+The checker maintains two bitmaps tracking page allocation state:
+
+Primary bitmap
+  Bit set to 1 when page is allocated, 0 when free.
+
+Secondary bitmap
+  Bit set to 0 when page is allocated, 1 when free.
+
+The invariant that must always hold is::
+
+    primary[bit] == ~secondary[bit]
+
+When a page is allocated, the checker sets the bit in the primary bitmap
+and clears it in the secondary bitmap. When freed, it clears in primary
+and sets in secondary. If the operation finds the bit already in the
+expected final state, a double-allocation or double-free has occurred.
+
+Full validation can be performed by checking that every word in the
+primary bitmap equals the bitwise complement of the corresponding word
+in the secondary bitmap.
+
+Concurrency Handling
+====================
+
+The dual-bitmap update operations (set/clear) modify both bitmaps with
+separate atomic operations. This creates a brief window where a concurrent
+validation could observe a transient inconsistency.
+
+The implementation handles this by retrying validation when an inconsistency
+is detected. Real memory corruption is persistent and will fail all retries.
+Transient inconsistencies from concurrent updates resolve quickly and pass
+on retry.
+
+Double-Free Detection
+=====================
+
+Double-free detection is deferred until the system is fully running. During
+boot, free_reserved_area() and free_initmem() release memory pages that were
+never allocated through the buddy allocator. These would appear as double-frees
+but are expected behavior.
+
+The checker uses ``system_state >= SYSTEM_RUNNING`` to determine when boot
+is complete. This state is reached only after all init memory has been freed,
+ensuring no false positives from legitimate boot-time freeing. Any attempt to
+free a page that is not marked as allocated after this point will be flagged
+as a violation.
+
+Configuration
+=============
+
+The feature is controlled by two Kconfig options:
+
+``CONFIG_DEBUG_PAGE_CONSISTENCY``
+  Enable the page consistency checker. Memory overhead is two bits per
+  PFN in the spanned range (start to end of DRAM, including holes),
+  roughly 4 MB total for a 64 GB system. When this option is disabled,
+  the allocator hooks compile away. When enabled, a static key gates
+  tracking until initialization succeeds.
+
+``CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC``
+  When enabled, the kernel will panic immediately upon detecting a
+  consistency violation. When disabled, a warning with a stack trace
+  is emitted and execution continues. Safety-critical systems should
+  enable this option.
+
+Debugfs Interface
+=================
+
+When CONFIG_DEBUG_FS is enabled, the checker exposes files under
+``/sys/kernel/debug/page_consistency/``:
+
+``stats``
+  Read-only file showing tracking statistics::
+
+    pages_tracked:       12345
+    alloc_count:         67890
+    free_count:          55545
+    violations_detected: 0
+    bitmap_size_bits:    1048576
+    pfn_range:           [256-1048831]
+
+``validate``
+  Write-only file. Writing any value triggers a full validation of
+  all bitmap words. Returns success if all words are consistent,
+  or -EIO if any violations are found.
+
+Usage
+=====
+
+To use the page consistency checker:
+
+1. Enable ``CONFIG_DEBUG_PAGE_CONSISTENCY`` in your kernel configuration.
+
+2. Optionally enable ``CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC`` if you want
+   the kernel to halt immediately upon detecting corruption.
+
+3. Boot the kernel. The checker will automatically initialize and begin
+   tracking page allocations.
+
+4. Monitor statistics via debugfs::
+
+     cat /sys/kernel/debug/page_consistency/stats
+
+5. Trigger manual validation::
+
+     echo 1 > /sys/kernel/debug/page_consistency/validate
+
+Limitations
+===========
+
+As described in `Overview`_, the bitmaps use a flat layout covering the
+entire spanned PFN range, including any holes. Bits corresponding to
+holes are initialized to the free state and remain inert; they maintain
+the complement invariant and never trigger false positives. The kernel's
+own ``pageblock_flags`` bitmaps use the same flat approach, sizing to
+``zone->spanned_pages`` which includes holes.
+
+Memory overhead
+---------------
+
+The cost is 2 bits per PFN in the range (1 bit per bitmap x 2 bitmaps),
+allocated via ``memblock_alloc()`` before the buddy allocator is
+available. A hole wastes ``hole_size / PAGE_SIZE / 8`` bytes per bitmap.
+In practice the waste from holes is negligible::
+
+  System         Holes    Per-bitmap size   Hole waste   Waste/bitmap
+  -----------    ------   ---------------   ----------   ------------
+  64 GB, flat    none     2 MB              0            0%
+  256 GB, flat   none     8 MB              0            0%
+  256 GB         4 GB     8.1 MB            128 KB       1.5%
+  1 TB           16 GB    32.5 MB           512 KB       1.5%
+
+On x86_64 the typical hole between low memory (below 4 GB) and high
+memory is the largest source of waste. On arm64 with
+``memblock_start_of_DRAM()`` typically at 0x80000000 (2 GB), holes
+within the DRAM range are generally small or absent.
+
+Other limitations
+-----------------
+
+The feature is incompatible with ``CONFIG_MEMORY_HOTPLUG`` because the
+bitmaps are sized at boot based on the initial physical memory range.
+Hot-added memory would fall outside the tracked PFN range and be silently
+ignored.
+
+Boot-time reserved pages are not tracked as allocations. Freeing such a
+page before ``SYSTEM_RUNNING`` is expected and is ignored by the
+double-free detector. Freeing an untracked reserved page after boot is
+reported as a double-free.
+
+The feature detects corruption in the tracking bitmaps themselves, not
+corruption in the actual page contents. For page content verification,
+see CONFIG_PAGE_POISONING.
+
+Implementation Details
+======================
+
+The checker hooks into the page allocator at two points:
+
+- ``post_alloc_hook()`` calls ``page_consistency_alloc()`` after a
+  successful allocation.
+
+- ``free_pages_prepare()`` calls ``page_consistency_free()`` when pages
+  are being returned to the allocator.
+
+Both hooks use static keys (``static_branch_unlikely``) so the overhead
+is a single no-op when the feature is disabled.
+
+The bitmaps are allocated during ``mm_core_init()`` using
+``memblock_alloc()`` before ``memblock_free_all()`` releases memblock
+memory to the buddy allocator. The secondary bitmap is initialized with
+all bits set to 1, establishing the initial complementary relationship
+with the zeroed primary bitmap.
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
                   ` (5 preceding siblings ...)
  2026-04-24 14:00 ` [RFC 6/7] Documentation/mm: add page consistency checker documentation Sasha Levin
@ 2026-04-24 14:00 ` Sasha Levin
  2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
  2026-04-24 15:42 ` Vlastimil Babka (SUSE)
  8 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:00 UTC (permalink / raw)
  To: akpm, david, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

From: Sasha Levin <sashal@nvidia.com>

Add a KUnit test suite that exercises the dual-bitmap algorithm used by
the page consistency checker. The tests verify that the core invariant
is maintained through various operations and that corruption can be
reliably detected.

The test suite covers several scenarios. The initial-state test confirms
that a freshly initialized dual bitmap with zeroed primary and filled
secondary passes validation. The set and clear tests verify that normal
operations maintain the complementary relationship between bitmaps. The
double-set and double-clear tests confirm that attempts to set an
already-set bit or clear an already-clear bit are properly detected and
reported through the return value.

The corruption detection tests are particularly important for validating
the safety guarantees. These tests directly manipulate one bitmap without
updating its complement, simulating what would happen if a memory error
flipped a bit. Both primary and secondary corruption scenarios are
tested, confirming that either type is caught by validation.

The suite also includes boundary condition tests covering the first bit,
last bit, and word boundaries to ensure the bit manipulation logic
handles edge cases correctly.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 mm/Makefile                |   1 +
 mm/page_consistency_test.c | 274 +++++++++++++++++++++++++++++++++++++
 2 files changed, 275 insertions(+)
 create mode 100644 mm/page_consistency_test.c

diff --git a/mm/Makefile b/mm/Makefile
index 2ee360001456..7106aeb79cf5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -129,6 +129,7 @@ obj-$(CONFIG_BALLOON) += balloon.o
 obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_PAGE_TABLE_CHECK) += page_table_check.o
 obj-$(CONFIG_DEBUG_PAGE_CONSISTENCY) += page_consistency.o
+obj-$(CONFIG_DEBUG_PAGE_CONSISTENCY_KUNIT_TEST) += page_consistency_test.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_SECRETMEM) += secretmem.o
 obj-$(CONFIG_CMA_SYSFS) += cma_sysfs.o
diff --git a/mm/page_consistency_test.c b/mm/page_consistency_test.c
new file mode 100644
index 000000000000..6cd587f8146f
--- /dev/null
+++ b/mm/page_consistency_test.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit tests for dual-bitmap primitives
+ *
+ * Tests the dual-bitmap consistency checking algorithm used by the page
+ * consistency checker. These tests verify the core invariant maintenance
+ * and corruption detection logic.
+ */
+
+#include <kunit/test.h>
+#include <linux/dual_bitmap.h>
+
+#define TEST_BITMAP_BITS 256
+
+struct dual_bitmap_test_context {
+	struct dual_bitmap db;
+	unsigned long primary[BITS_TO_LONGS(TEST_BITMAP_BITS)];
+	unsigned long secondary[BITS_TO_LONGS(TEST_BITMAP_BITS)];
+};
+
+static int dual_bitmap_test_init(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx;
+
+	ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->db.bitmap[DUAL_BITMAP_PRIMARY] = ctx->primary;
+	ctx->db.bitmap[DUAL_BITMAP_SECONDARY] = ctx->secondary;
+	ctx->db.nbits = TEST_BITMAP_BITS;
+
+	/* Initialize: primary all zeros, secondary all ones */
+	dual_bitmap_init(&ctx->db);
+
+	test->priv = ctx;
+	return 0;
+}
+
+static void test_initial_state_consistent(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+}
+
+static void test_set_maintains_consistency(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+	bool was_set;
+
+	/* Set bit 42 */
+	was_set = dual_bitmap_set(&ctx->db, 42);
+	KUNIT_EXPECT_FALSE(test, was_set);
+
+	/* Verify consistency */
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+
+	/* Verify individual bit consistency */
+	KUNIT_EXPECT_TRUE(test, dual_bitmap_consistent(&ctx->db, 42));
+}
+
+static void test_clear_maintains_consistency(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+	bool was_set;
+
+	/* First set the bit */
+	dual_bitmap_set(&ctx->db, 100);
+
+	/* Now clear it */
+	was_set = dual_bitmap_clear(&ctx->db, 100);
+	KUNIT_EXPECT_TRUE(test, was_set);
+
+	/* Verify consistency */
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+}
+
+static void test_double_set_detected(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	bool was_set;
+
+	/* Set bit 50 */
+	was_set = dual_bitmap_set(&ctx->db, 50);
+	KUNIT_EXPECT_FALSE(test, was_set);
+
+	/* Try to set it again - should report it was already set */
+	was_set = dual_bitmap_set(&ctx->db, 50);
+	KUNIT_EXPECT_TRUE(test, was_set);
+}
+
+static void test_double_clear_detected(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	bool was_set;
+
+	/* Clear bit 60 which is already clear (never set) */
+	was_set = dual_bitmap_clear(&ctx->db, 60);
+	KUNIT_EXPECT_FALSE(test, was_set);
+}
+
+static void test_corruption_in_primary_detected(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+
+	/* Corrupt the primary bitmap directly */
+	set_bit(75, ctx->primary);
+
+	/* Validation should detect the corruption */
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_GT(test, violations, 0UL);
+
+	/* Individual bit check should also fail */
+	KUNIT_EXPECT_FALSE(test, dual_bitmap_consistent(&ctx->db, 75));
+}
+
+static void test_corruption_in_secondary_detected(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+
+	/* Corrupt the secondary bitmap directly */
+	clear_bit(80, ctx->secondary);
+
+	/* Validation should detect the corruption */
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_GT(test, violations, 0UL);
+
+	/* Individual bit check should also fail */
+	KUNIT_EXPECT_FALSE(test, dual_bitmap_consistent(&ctx->db, 80));
+}
+
+static void test_multiple_operations(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+	unsigned long i;
+
+	/* Set bits 0-63 */
+	for (i = 0; i < 64; i++)
+		dual_bitmap_set(&ctx->db, i);
+
+	/* Clear bits 32-63 */
+	for (i = 32; i < 64; i++)
+		dual_bitmap_clear(&ctx->db, i);
+
+	/* Validate entire bitmap */
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+
+	/* Verify expected state: bits 0-31 set, rest clear */
+	for (i = 0; i < 32; i++)
+		KUNIT_EXPECT_TRUE(test, test_bit(i, ctx->primary));
+	for (i = 32; i < TEST_BITMAP_BITS; i++)
+		KUNIT_EXPECT_FALSE(test, test_bit(i, ctx->primary));
+}
+
+static void test_boundary_bits(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+	unsigned long violations;
+
+	/* Test first bit */
+	dual_bitmap_set(&ctx->db, 0);
+	KUNIT_EXPECT_TRUE(test, dual_bitmap_consistent(&ctx->db, 0));
+
+	/* Test last bit */
+	dual_bitmap_set(&ctx->db, TEST_BITMAP_BITS - 1);
+	KUNIT_EXPECT_TRUE(test, dual_bitmap_consistent(&ctx->db, TEST_BITMAP_BITS - 1));
+
+	/* Test word boundary (last bit of first word / first bit of second word) */
+	dual_bitmap_set(&ctx->db, BITS_PER_LONG - 1);
+	dual_bitmap_set(&ctx->db, BITS_PER_LONG);
+	KUNIT_EXPECT_TRUE(test, dual_bitmap_consistent(&ctx->db, BITS_PER_LONG - 1));
+	KUNIT_EXPECT_TRUE(test, dual_bitmap_consistent(&ctx->db, BITS_PER_LONG));
+
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+}
+
+static void test_dual_bitmap_test_func(struct kunit *test)
+{
+	struct dual_bitmap_test_context *ctx = test->priv;
+
+	/* Initially all bits should be clear (not allocated) */
+	KUNIT_EXPECT_FALSE(test, dual_bitmap_test(&ctx->db, 10));
+
+	/* After setting, bit should be set */
+	dual_bitmap_set(&ctx->db, 10);
+	KUNIT_EXPECT_TRUE(test, dual_bitmap_test(&ctx->db, 10));
+
+	/* After clearing, bit should be clear again */
+	dual_bitmap_clear(&ctx->db, 10);
+	KUNIT_EXPECT_FALSE(test, dual_bitmap_test(&ctx->db, 10));
+}
+
+/* Test with non-word-aligned nbits to exercise partial-word handling */
+#define TEST_UNALIGNED_BITS 100  /* not a multiple of BITS_PER_LONG */
+
+struct dual_bitmap_unaligned_context {
+	struct dual_bitmap db;
+	unsigned long primary[BITS_TO_LONGS(TEST_UNALIGNED_BITS)];
+	unsigned long secondary[BITS_TO_LONGS(TEST_UNALIGNED_BITS)];
+};
+
+static void test_non_aligned_nbits(struct kunit *test)
+{
+	struct dual_bitmap_unaligned_context *ctx;
+	unsigned long violations;
+	unsigned long i;
+
+	ctx = kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, ctx);
+
+	ctx->db.bitmap[DUAL_BITMAP_PRIMARY] = ctx->primary;
+	ctx->db.bitmap[DUAL_BITMAP_SECONDARY] = ctx->secondary;
+	ctx->db.nbits = TEST_UNALIGNED_BITS;
+
+	dual_bitmap_init(&ctx->db);
+
+	/* Initial state should be consistent */
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+
+	/* Set and clear bits near the non-aligned boundary */
+	for (i = TEST_UNALIGNED_BITS - 5; i < TEST_UNALIGNED_BITS; i++) {
+		dual_bitmap_set(&ctx->db, i);
+		KUNIT_EXPECT_TRUE(test, dual_bitmap_consistent(&ctx->db, i));
+	}
+
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+
+	/* Clear them back */
+	for (i = TEST_UNALIGNED_BITS - 5; i < TEST_UNALIGNED_BITS; i++)
+		dual_bitmap_clear(&ctx->db, i);
+
+	violations = dual_bitmap_validate(&ctx->db);
+	KUNIT_EXPECT_EQ(test, violations, 0UL);
+}
+
+static struct kunit_case dual_bitmap_test_cases[] = {
+	KUNIT_CASE(test_initial_state_consistent),
+	KUNIT_CASE(test_set_maintains_consistency),
+	KUNIT_CASE(test_clear_maintains_consistency),
+	KUNIT_CASE(test_double_set_detected),
+	KUNIT_CASE(test_double_clear_detected),
+	KUNIT_CASE(test_corruption_in_primary_detected),
+	KUNIT_CASE(test_corruption_in_secondary_detected),
+	KUNIT_CASE(test_multiple_operations),
+	KUNIT_CASE(test_boundary_bits),
+	KUNIT_CASE(test_dual_bitmap_test_func),
+	KUNIT_CASE(test_non_aligned_nbits),
+	{},
+};
+
+static struct kunit_suite dual_bitmap_test_suite = {
+	.name = "dual_bitmap",
+	.init = dual_bitmap_test_init,
+	.test_cases = dual_bitmap_test_cases,
+};
+
+kunit_test_suites(&dual_bitmap_test_suite);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("KUnit tests for dual-bitmap consistency primitives");
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
@ 2026-04-24 14:25   ` David Hildenbrand (Arm)
  2026-04-24 14:49     ` Sasha Levin
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 14:25 UTC (permalink / raw)
  To: Sasha Levin, akpm, corbet
  Cc: ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko, skhan, jackmanb,
	hannes, ziy, linux-mm, linux-doc, linux-kernel, Sasha Levin,
	Sanif Veeras, Claude:claude-opus-4-7

> +	/*
> +	 * Size bitmaps to cover the full PFN range including any holes.
> +	 * Holes waste a few bits but a flat bitmap keeps the indexing
> +	 * trivial (pfn - min_pfn) and avoids additional data structures
> +	 * that would themselves be subject to corruption.  This matches
> +	 * the approach used by pageblock_flags.
> +	 */
> +	pc_state.min_pfn = PHYS_PFN(memblock_start_of_DRAM());
> +	pc_state.max_pfn = PHYS_PFN(memblock_end_of_DRAM());
> +	spanned_pfns = pc_state.max_pfn - pc_state.min_pfn;
> +	if (!spanned_pfns || spanned_pfns > UINT_MAX) {
> +		pr_err("PFN span %lu cannot be represented by bitmap APIs, feature disabled\n",
> +		       spanned_pfns);
> +		return;
> +	}
> +
> +	pc_state.db.nbits = spanned_pfns;
> +
> +	bitmap_bytes = BITS_TO_LONGS(pc_state.db.nbits) * sizeof(unsigned long);
> +
> +	pr_info("Initializing: PFN range [%lu-%lu), %u bits (%zu KB per bitmap)\n",
> +		pc_state.min_pfn, pc_state.max_pfn, pc_state.db.nbits,
> +		bitmap_bytes / 1024);
> +
> +	/* Allocate primary bitmap (zeroed by memblock_alloc) */
> +	pc_state.db.bitmap[DUAL_BITMAP_PRIMARY] =
> +		memblock_alloc(bitmap_bytes, SMP_CACHE_BYTES);
> +	if (!pc_state.db.bitmap[DUAL_BITMAP_PRIMARY]) {
> +		pr_err("Failed to allocate primary bitmap, feature disabled\n");
> +		return;
> +	}
> 

One bitmap that covers all sparse memory available at boot.

Conclusion: Just horrible.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 14:25   ` David Hildenbrand (Arm)
@ 2026-04-24 14:49     ` Sasha Levin
  2026-04-24 15:06       ` Pasha Tatashin
  2026-04-24 18:26       ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 14:49 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: akpm, corbet, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	skhan, jackmanb, hannes, ziy, linux-mm, linux-doc, linux-kernel,
	Sasha Levin, Sanif Veeras, Claude:claude-opus-4-7

On Fri, Apr 24, 2026 at 04:25:41PM +0200, David Hildenbrand (Arm) wrote:
>> +	/*
>> +	 * Size bitmaps to cover the full PFN range including any holes.
>> +	 * Holes waste a few bits but a flat bitmap keeps the indexing
>> +	 * trivial (pfn - min_pfn) and avoids additional data structures
>> +	 * that would themselves be subject to corruption.  This matches
>> +	 * the approach used by pageblock_flags.
>> +	 */
>> +	pc_state.min_pfn = PHYS_PFN(memblock_start_of_DRAM());
>> +	pc_state.max_pfn = PHYS_PFN(memblock_end_of_DRAM());
>> +	spanned_pfns = pc_state.max_pfn - pc_state.min_pfn;
>> +	if (!spanned_pfns || spanned_pfns > UINT_MAX) {
>> +		pr_err("PFN span %lu cannot be represented by bitmap APIs, feature disabled\n",
>> +		       spanned_pfns);
>> +		return;
>> +	}
>> +
>> +	pc_state.db.nbits = spanned_pfns;
>> +
>> +	bitmap_bytes = BITS_TO_LONGS(pc_state.db.nbits) * sizeof(unsigned long);
>> +
>> +	pr_info("Initializing: PFN range [%lu-%lu), %u bits (%zu KB per bitmap)\n",
>> +		pc_state.min_pfn, pc_state.max_pfn, pc_state.db.nbits,
>> +		bitmap_bytes / 1024);
>> +
>> +	/* Allocate primary bitmap (zeroed by memblock_alloc) */
>> +	pc_state.db.bitmap[DUAL_BITMAP_PRIMARY] =
>> +		memblock_alloc(bitmap_bytes, SMP_CACHE_BYTES);
>> +	if (!pc_state.db.bitmap[DUAL_BITMAP_PRIMARY]) {
>> +		pr_err("Failed to allocate primary bitmap, feature disabled\n");
>> +		return;
>> +	}
>>
>
>One bitmap that covers all sparse memory available at boot.
>
>Conclusion: Just horrible.

Depends on who's looking at the code :)

I picked it for auditability: covering the whole range with two
memblock_alloc'd arrays means the only thing on the lookup path is the bitmap
words themselves, which is what the dual-bitmap invariant already checks.

We could go with per-section bitmaps which will fix the waste but pull
mem_section[] into the trust boundary, so we'd have to start validating it too.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 14:49     ` Sasha Levin
@ 2026-04-24 15:06       ` Pasha Tatashin
  2026-04-24 18:28         ` David Hildenbrand (Arm)
  2026-04-24 18:26       ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 22+ messages in thread
From: Pasha Tatashin @ 2026-04-24 15:06 UTC (permalink / raw)
  To: Sasha Levin
  Cc: David Hildenbrand (Arm), akpm, corbet, ljs, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, skhan, jackmanb, hannes, ziy, linux-mm,
	linux-doc, linux-kernel, Sasha Levin, Sanif Veeras,
	Claude:claude-opus-4-7

On 04-24 10:49, Sasha Levin wrote:
> On Fri, Apr 24, 2026 at 04:25:41PM +0200, David Hildenbrand (Arm) wrote:
> > > +	/*
> > > +	 * Size bitmaps to cover the full PFN range including any holes.
> > > +	 * Holes waste a few bits but a flat bitmap keeps the indexing
> > > +	 * trivial (pfn - min_pfn) and avoids additional data structures
> > > +	 * that would themselves be subject to corruption.  This matches
> > > +	 * the approach used by pageblock_flags.
> > > +	 */
> > > +	pc_state.min_pfn = PHYS_PFN(memblock_start_of_DRAM());
> > > +	pc_state.max_pfn = PHYS_PFN(memblock_end_of_DRAM());
> > > +	spanned_pfns = pc_state.max_pfn - pc_state.min_pfn;
> > > +	if (!spanned_pfns || spanned_pfns > UINT_MAX) {
> > > +		pr_err("PFN span %lu cannot be represented by bitmap APIs, feature disabled\n",
> > > +		       spanned_pfns);
> > > +		return;
> > > +	}
> > > +
> > > +	pc_state.db.nbits = spanned_pfns;
> > > +
> > > +	bitmap_bytes = BITS_TO_LONGS(pc_state.db.nbits) * sizeof(unsigned long);
> > > +
> > > +	pr_info("Initializing: PFN range [%lu-%lu), %u bits (%zu KB per bitmap)\n",
> > > +		pc_state.min_pfn, pc_state.max_pfn, pc_state.db.nbits,
> > > +		bitmap_bytes / 1024);
> > > +
> > > +	/* Allocate primary bitmap (zeroed by memblock_alloc) */
> > > +	pc_state.db.bitmap[DUAL_BITMAP_PRIMARY] =
> > > +		memblock_alloc(bitmap_bytes, SMP_CACHE_BYTES);
> > > +	if (!pc_state.db.bitmap[DUAL_BITMAP_PRIMARY]) {
> > > +		pr_err("Failed to allocate primary bitmap, feature disabled\n");
> > > +		return;
> > > +	}
> > > 
> > 
> > One bitmap that covers all sparse memory available at boot.
> > 
> > Conclusion: Just horrible.
> 
> Depends on who's looking at the code :)
> 
> I picked it for auditability: covering the whole range with two
> memblock_alloc'd arrays means the only thing on the lookup path is the bitmap
> words themselves, which is what the dual-bitmap invariant already checks.

The issue is that we are going back in time to a flat memory,
without NUMA or hotplug support. We need an abstraction that avoids
allocating this memory in enormous contiguous chunks, as thit approach
will not work on modern hardware.

> 
> We could go with per-section bitmaps which will fix the waste but pull
> mem_section[] into the trust boundary, so we'd have to start validating it too.

Page-ext provides all of these capabilities, but as you described in the
cover letter, it does not meet your requirements. Therefore, I believe
a new abstraction layer is needed.

Pasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
                   ` (6 preceding siblings ...)
  2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
@ 2026-04-24 15:34 ` Matthew Wilcox
  2026-04-24 15:53   ` Sasha Levin
  2026-04-24 15:42 ` Vlastimil Babka (SUSE)
  8 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2026-04-24 15:34 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, david, corbet, ljs, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, skhan, jackmanb, hannes, ziy, linux-mm, linux-doc,
	linux-kernel

On Fri, Apr 24, 2026 at 10:00:49AM -0400, Sasha Levin wrote:
> corruption must be detected before it propagates. The dual-bitmap
> implements a way to protect from corruption coming from hardware or
> software - two complementary representations of page allocation state,
> allocated independently via memblock, where any single-bit fault in
> either bitmap is immediately detectable. Performance is secondary to
> correctness in this context. A safety mechanism must be simple enough
> to audit and certify, must fail deterministically (panic, not
> log-and-hope), and its correctness matters more than its throughput.
> The dual-bitmap adds two atomic bitops per alloc/free, but for
> safety-critical deployments this cost is acceptable because the
> alternative - undetected corruption propagating silently - violates
> the system's safety case. The static key ensures zero cost for kernels
> that don't need it.

But doubling the storage requirement in order to achieve merely detection
is significantly worse than state-of-the-art in 1950 (when Richard
Hamming invented Hamming codes).  If we used a (7,3) code, we'd have
SECDED at a lower cost.  Of course, there are far better codes available
than that today.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
  2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
                   ` (7 preceding siblings ...)
  2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
@ 2026-04-24 15:42 ` Vlastimil Babka (SUSE)
  2026-04-24 16:25   ` Sasha Levin
  8 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-04-24 15:42 UTC (permalink / raw)
  To: Sasha Levin, akpm, david, corbet
  Cc: ljs, Liam.Howlett, rppt, surenb, mhocko, skhan, jackmanb, hannes,
	ziy, linux-mm, linux-doc, linux-kernel

On 4/24/26 16:00, Sasha Levin wrote:
> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect
> access violations and content corruption, but none of them can detect
> silent corruption in the page allocator's own metadata. If a hardware
> bit flip corrupts an allocation bitmap, the allocator hands out a page

An allocation what? The page allocator is a buddy allocator, it has no
bitmap to track free/allocated state of pages?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
  2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
@ 2026-04-24 15:53   ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 15:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, david, corbet, ljs, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, skhan, jackmanb, hannes, ziy, linux-mm, linux-doc,
	linux-kernel

On Fri, Apr 24, 2026 at 04:34:12PM +0100, Matthew Wilcox wrote:
>On Fri, Apr 24, 2026 at 10:00:49AM -0400, Sasha Levin wrote:
>> corruption must be detected before it propagates. The dual-bitmap
>> implements a way to protect from corruption coming from hardware or
>> software - two complementary representations of page allocation state,
>> allocated independently via memblock, where any single-bit fault in
>> either bitmap is immediately detectable. Performance is secondary to
>> correctness in this context. A safety mechanism must be simple enough
>> to audit and certify, must fail deterministically (panic, not
>> log-and-hope), and its correctness matters more than its throughput.
>> The dual-bitmap adds two atomic bitops per alloc/free, but for
>> safety-critical deployments this cost is acceptable because the
>> alternative - undetected corruption propagating silently - violates
>> the system's safety case. The static key ensures zero cost for kernels
>> that don't need it.
>
>But doubling the storage requirement in order to achieve merely detection
>is significantly worse than state-of-the-art in 1950 (when Richard
>Hamming invented Hamming codes).  If we used a (7,3) code, we'd have
>SECDED at a lower cost.  Of course, there are far better codes available
>than that today.

I agree with the density concern. I have two reasons for that:

1. Update cost. On the alloc/free hot path the dual-bitmap update is two
independent test_and_set_bit. A Hamming/SECDED codeword needs a
read-modify-write of the whole word with locking on every state change.

2. Correlated faults. The two copies need to sit in different physical memory
so a multi-bit fault (row, column, bank, row-hammer) can only hit one of them.
See this paper which has some numbers:
https://dl.acm.org/doi/epdf/10.1145/2786763.2694348 - About 21% of DRAM faults
span more than one bit, plain SECDED can leave up to 20 FIT per device of
undetected errors from those, and it only helps at all if data and parity bits
are spread across physically separate cells.

Two memblock_alloc'd bitmaps give that separation for free. You could
interleave a code across two independent regions instead, but then
the invariant check stops being a one-line complement check, which is
what I was trying to keep simple for the audit side.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
  2026-04-24 15:42 ` Vlastimil Babka (SUSE)
@ 2026-04-24 16:25   ` Sasha Levin
  2026-04-25  5:51     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 16:25 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: akpm, david, corbet, ljs, Liam.Howlett, rppt, surenb, mhocko,
	skhan, jackmanb, hannes, ziy, linux-mm, linux-doc, linux-kernel

On Fri, Apr 24, 2026 at 05:42:53PM +0200, Vlastimil Babka (SUSE) wrote:
>On 4/24/26 16:00, Sasha Levin wrote:
>> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect
>> access violations and content corruption, but none of them can detect
>> silent corruption in the page allocator's own metadata. If a hardware
>> bit flip corrupts an allocation bitmap, the allocator hands out a page
>
>An allocation what? The page allocator is a buddy allocator, it has no
>bitmap to track free/allocated state of pages?

You're right, the cover letter is misleading there. Buddy doesn't use a bitmap:
PageBuddy lives in page_type, the free list is a list, and page->private holds
the order. The dual-bitmap is new metadata the feature adds, maintained from
the alloc/free hooks.

What it actually catches is the same PFN being handed out twice before it's
freed, or freed without having been allocated. Not every kind of buddy
corruption shows up that way, but the common bad ones do. Corruption of the
bitmap itself shows up through the complement invariant.

I'll fix the wording in v2.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 14:49     ` Sasha Levin
  2026-04-24 15:06       ` Pasha Tatashin
@ 2026-04-24 18:26       ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 18:26 UTC (permalink / raw)
  To: Sasha Levin
  Cc: akpm, corbet, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	skhan, jackmanb, hannes, ziy, linux-mm, linux-doc, linux-kernel,
	Sasha Levin, Sanif Veeras, Claude:claude-opus-4-7

On 4/24/26 16:49, Sasha Levin wrote:
> On Fri, Apr 24, 2026 at 04:25:41PM +0200, David Hildenbrand (Arm) wrote:
>>> +    /*
>>> +     * Size bitmaps to cover the full PFN range including any holes.
>>> +     * Holes waste a few bits but a flat bitmap keeps the indexing
>>> +     * trivial (pfn - min_pfn) and avoids additional data structures
>>> +     * that would themselves be subject to corruption.  This matches
>>> +     * the approach used by pageblock_flags.
>>> +     */
>>> +    pc_state.min_pfn = PHYS_PFN(memblock_start_of_DRAM());
>>> +    pc_state.max_pfn = PHYS_PFN(memblock_end_of_DRAM());
>>> +    spanned_pfns = pc_state.max_pfn - pc_state.min_pfn;
>>> +    if (!spanned_pfns || spanned_pfns > UINT_MAX) {
>>> +        pr_err("PFN span %lu cannot be represented by bitmap APIs, feature
>>> disabled\n",
>>> +               spanned_pfns);
>>> +        return;
>>> +    }
>>> +
>>> +    pc_state.db.nbits = spanned_pfns;
>>> +
>>> +    bitmap_bytes = BITS_TO_LONGS(pc_state.db.nbits) * sizeof(unsigned long);
>>> +
>>> +    pr_info("Initializing: PFN range [%lu-%lu), %u bits (%zu KB per bitmap)\n",
>>> +        pc_state.min_pfn, pc_state.max_pfn, pc_state.db.nbits,
>>> +        bitmap_bytes / 1024);
>>> +
>>> +    /* Allocate primary bitmap (zeroed by memblock_alloc) */
>>> +    pc_state.db.bitmap[DUAL_BITMAP_PRIMARY] =
>>> +        memblock_alloc(bitmap_bytes, SMP_CACHE_BYTES);
>>> +    if (!pc_state.db.bitmap[DUAL_BITMAP_PRIMARY]) {
>>> +        pr_err("Failed to allocate primary bitmap, feature disabled\n");
>>> +        return;
>>> +    }
>>>
>>
>> One bitmap that covers all sparse memory available at boot.
>>
>> Conclusion: Just horrible.
> 
> Depends on who's looking at the code :)

Or who generated that code ;)

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 15:06       ` Pasha Tatashin
@ 2026-04-24 18:28         ` David Hildenbrand (Arm)
  2026-04-24 23:34           ` Sasha Levin
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-24 18:28 UTC (permalink / raw)
  To: Pasha Tatashin, Sasha Levin
  Cc: akpm, corbet, ljs, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	skhan, jackmanb, hannes, ziy, linux-mm, linux-doc, linux-kernel,
	Sasha Levin, Sanif Veeras, Claude:claude-opus-4-7

On 4/24/26 17:06, Pasha Tatashin wrote:
> On 04-24 10:49, Sasha Levin wrote:
>> On Fri, Apr 24, 2026 at 04:25:41PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> One bitmap that covers all sparse memory available at boot.
>>>
>>> Conclusion: Just horrible.
>>
>> Depends on who's looking at the code :)
>>
>> I picked it for auditability: covering the whole range with two
>> memblock_alloc'd arrays means the only thing on the lookup path is the bitmap
>> words themselves, which is what the dual-bitmap invariant already checks.
> 
> The issue is that we are going back in time to a flat memory,
> without NUMA or hotplug support. We need an abstraction that avoids
> allocating this memory in enormous contiguous chunks, as thit approach
> will not work on modern hardware.
> 
>>
>> We could go with per-section bitmaps which will fix the waste but pull
>> mem_section[] into the trust boundary, so we'd have to start validating it too.
> 
> Page-ext provides all of these capabilities, but as you described in the
> cover letter, it does not meet your requirements. Therefore, I believe
> a new abstraction layer is needed.

If we decided that we want this (and I am not convinced), we definitely want
something that supports sparsity and, in particular, something that support
memory hotplug.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 18:28         ` David Hildenbrand (Arm)
@ 2026-04-24 23:34           ` Sasha Levin
  2026-04-25  5:30             ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 22+ messages in thread
From: Sasha Levin @ 2026-04-24 23:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Pasha Tatashin, akpm, corbet, ljs, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, skhan, jackmanb, hannes, ziy, linux-mm, linux-doc,
	linux-kernel, Sasha Levin, Sanif Veeras, Claude:claude-opus-4-7

On Fri, Apr 24, 2026 at 08:28:14PM +0200, David Hildenbrand (Arm) wrote:
>On 4/24/26 17:06, Pasha Tatashin wrote:
>> On 04-24 10:49, Sasha Levin wrote:
>>> On Fri, Apr 24, 2026 at 04:25:41PM +0200, David Hildenbrand (Arm) wrote:
>>>>
>>>> One bitmap that covers all sparse memory available at boot.
>>>>
>>>> Conclusion: Just horrible.
>>>
>>> Depends on who's looking at the code :)
>>>
>>> I picked it for auditability: covering the whole range with two
>>> memblock_alloc'd arrays means the only thing on the lookup path is the bitmap
>>> words themselves, which is what the dual-bitmap invariant already checks.
>>
>> The issue is that we are going back in time to a flat memory,
>> without NUMA or hotplug support. We need an abstraction that avoids
>> allocating this memory in enormous contiguous chunks, as thit approach
>> will not work on modern hardware.
>>
>>>
>>> We could go with per-section bitmaps which will fix the waste but pull
>>> mem_section[] into the trust boundary, so we'd have to start validating it too.
>>
>> Page-ext provides all of these capabilities, but as you described in the
>> cover letter, it does not meet your requirements. Therefore, I believe
>> a new abstraction layer is needed.
>
>If we decided that we want this (and I am not convinced), we definitely want
>something that supports sparsity and, in particular, something that support
>memory hotplug.

Makes sense. Let me take a few days and see if I can find some middle ground
here.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-24 23:34           ` Sasha Levin
@ 2026-04-25  5:30             ` David Hildenbrand (Arm)
  2026-04-25 16:38               ` Sasha Levin
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-25  5:30 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pasha Tatashin, akpm, corbet, ljs, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, skhan, jackmanb, hannes, ziy, linux-mm, linux-doc,
	linux-kernel, Sasha Levin, Sanif Veeras, Claude:claude-opus-4-7

On 4/25/26 01:34, Sasha Levin wrote:
> On Fri, Apr 24, 2026 at 08:28:14PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/24/26 17:06, Pasha Tatashin wrote:
>>>
>>> The issue is that we are going back in time to a flat memory,
>>> without NUMA or hotplug support. We need an abstraction that avoids
>>> allocating this memory in enormous contiguous chunks, as thit approach
>>> will not work on modern hardware.
>>>
>>>
>>> Page-ext provides all of these capabilities, but as you described in the
>>> cover letter, it does not meet your requirements. Therefore, I believe
>>> a new abstraction layer is needed.
>>
>> If we decided that we want this (and I am not convinced), we definitely want
>> something that supports sparsity and, in particular, something that support
>> memory hotplug.
> 
> Makes sense. Let me take a few days and see if I can find some middle ground
> here.
> 

"The natural question is why not use page_ext. The key objection from a
safety perspective is that page_ext stores per-page metadata in memory
that is itself subject to the same hardware faults we're trying to
detect. The dual-bitmap approach works because the two bitmaps are
independent allocations - corruption in one is caught by comparison
with the other."

So you want to have two bits per page, whereby both bits come from in dependent
pages I assume?

Storing one bit in page_ext and one bit in page flags would be possible if we
had a spare bit in page flags ...  We could allocate two bitmaps per memory section.

But the real question is: how far away do these bits have to be in memory to be
considered "independent" and not prone to the same corruption?

1 bit?
1 byte?
64 byte?
4096 byte?
???

"Embedding both in page_ext means a single fault could
corrupt both the tracking data and its redundant copy in the same
allocation region."

I might be wrong, but isn't that the case for any such fault, as you don't 100%
know how the DIMM is organized internally?

Do we really expect that a MCE event would, for example, very likely corrupt two
neighboring bits, or two bits in the same byte etc? What are the odds that we care?

It's hard to tell here which part of this work is "too research focused". For
example, if I were to write a paper about that, I would make such claims to make
it sound more complicated than it needs to be :)

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
  2026-04-24 16:25   ` Sasha Levin
@ 2026-04-25  5:51     ` David Hildenbrand (Arm)
  2026-04-25 16:09       ` Sasha Levin
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-25  5:51 UTC (permalink / raw)
  To: Sasha Levin, Vlastimil Babka (SUSE)
  Cc: akpm, corbet, ljs, Liam.Howlett, rppt, surenb, mhocko, skhan,
	jackmanb, hannes, ziy, linux-mm, linux-doc, linux-kernel

On 4/24/26 18:25, Sasha Levin wrote:
> On Fri, Apr 24, 2026 at 05:42:53PM +0200, Vlastimil Babka (SUSE) wrote:
>> On 4/24/26 16:00, Sasha Levin wrote:
>>> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect
>>> access violations and content corruption, but none of them can detect
>>> silent corruption in the page allocator's own metadata. If a hardware
>>> bit flip corrupts an allocation bitmap, the allocator hands out a page
>>
>> An allocation what? The page allocator is a buddy allocator, it has no
>> bitmap to track free/allocated state of pages?
> 
> You're right, the cover letter is misleading there. Buddy doesn't use a bitmap:
> PageBuddy lives in page_type, the free list is a list, and page->private holds
> the order. The dual-bitmap is new metadata the feature adds, maintained from
> the alloc/free hooks.

Given that you have PageBuddy (first "bit"), could we use a second bit in page_ext?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
  2026-04-25  5:51     ` David Hildenbrand (Arm)
@ 2026-04-25 16:09       ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-25 16:09 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Vlastimil Babka (SUSE), akpm, corbet, ljs, Liam.Howlett, rppt,
	surenb, mhocko, skhan, jackmanb, hannes, ziy, linux-mm, linux-doc,
	linux-kernel

On Sat, Apr 25, 2026 at 07:51:10AM +0200, David Hildenbrand (Arm) wrote:
>On 4/24/26 18:25, Sasha Levin wrote:
>> On Fri, Apr 24, 2026 at 05:42:53PM +0200, Vlastimil Babka (SUSE) wrote:
>>> On 4/24/26 16:00, Sasha Levin wrote:
>>>> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect
>>>> access violations and content corruption, but none of them can detect
>>>> silent corruption in the page allocator's own metadata. If a hardware
>>>> bit flip corrupts an allocation bitmap, the allocator hands out a page
>>>
>>> An allocation what? The page allocator is a buddy allocator, it has no
>>> bitmap to track free/allocated state of pages?
>>
>> You're right, the cover letter is misleading there. Buddy doesn't use a bitmap:
>> PageBuddy lives in page_type, the free list is a list, and page->private holds
>> the order. The dual-bitmap is new metadata the feature adds, maintained from
>> the alloc/free hooks.
>
>Given that you have PageBuddy (first "bit"), could we use a second bit in page_ext?

Hmm... Thats an interesting idea.

I can see two concerns with something like this:

1. The checker has to be live before memblock_free_all() hands pages to buddy.
page_ext isn't fully up that early I think.

2. page_type encodes buddy, offline, slab tags, etc... and a page that isn't
PageBuddy isn't necessarily allocated through alloc_pages. The invariant gets
case-y.

But let me think about it a bit more.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC 4/7] mm: add page consistency checker implementation
  2026-04-25  5:30             ` David Hildenbrand (Arm)
@ 2026-04-25 16:38               ` Sasha Levin
  0 siblings, 0 replies; 22+ messages in thread
From: Sasha Levin @ 2026-04-25 16:38 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Pasha Tatashin, akpm, corbet, ljs, Liam.Howlett, vbabka, rppt,
	surenb, mhocko, skhan, jackmanb, hannes, ziy, linux-mm, linux-doc,
	linux-kernel, Sasha Levin, Sanif Veeras, Claude:claude-opus-4-7

On Sat, Apr 25, 2026 at 07:30:56AM +0200, David Hildenbrand (Arm) wrote:
>On 4/25/26 01:34, Sasha Levin wrote:
>> On Fri, Apr 24, 2026 at 08:28:14PM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/24/26 17:06, Pasha Tatashin wrote:
>>>>
>>>> The issue is that we are going back in time to a flat memory,
>>>> without NUMA or hotplug support. We need an abstraction that avoids
>>>> allocating this memory in enormous contiguous chunks, as thit approach
>>>> will not work on modern hardware.
>>>>
>>>>
>>>> Page-ext provides all of these capabilities, but as you described in the
>>>> cover letter, it does not meet your requirements. Therefore, I believe
>>>> a new abstraction layer is needed.
>>>
>>> If we decided that we want this (and I am not convinced), we definitely want
>>> something that supports sparsity and, in particular, something that support
>>> memory hotplug.
>>
>> Makes sense. Let me take a few days and see if I can find some middle ground
>> here.
>>
>
>"The natural question is why not use page_ext. The key objection from a
>safety perspective is that page_ext stores per-page metadata in memory
>that is itself subject to the same hardware faults we're trying to
>detect. The dual-bitmap approach works because the two bitmaps are
>independent allocations - corruption in one is caught by comparison
>with the other."
>
>So you want to have two bits per page, whereby both bits come from in dependent
>pages I assume?

Ideally completely different DIMMs (though as you point out later, we can't
easily make this happen).

>Storing one bit in page_ext and one bit in page flags would be possible if we
>had a spare bit in page flags ...  We could allocate two bitmaps per memory section.

Right, I think that the approach you proposed is roughly equivalent spatially
(though I'd need to check with the safety folks here).

>But the real question is: how far away do these bits have to be in memory to be
>considered "independent" and not prone to the same corruption?
>
>1 bit?
>1 byte?
>64 byte?
>4096 byte?
>???

The notes I have from the research side of things (which should be taken with a
grain of salt) are something along the lines of:

  - ~79% are a single bit corruption
  - ~9% are row faults, so multiple bit corruption within ~8kb
  - ~4% are bank faults, so multiple bit corruption within ~512mb

Obviously the numbers would be very different depending on usecase, hardware,
physical location (did you know bits are more likely to flip in higher
altitudes?)...

>"Embedding both in page_ext means a single fault could
>corrupt both the tracking data and its redundant copy in the same
>allocation region."
>
>I might be wrong, but isn't that the case for any such fault, as you don't 100%
>know how the DIMM is organized internally?
>
>Do we really expect that a MCE event would, for example, very likely corrupt two
>neighboring bits, or two bits in the same byte etc? What are the odds that we care?

For something like a datacenter deployment I'd agree with you - the odds are
too low to care. For an unsupervised self driving vehicle, where there's no
human (locally or remotely) available to take over, I'd like the odds to be as
low as possible :)

>It's hard to tell here which part of this work is "too research focused". For
>example, if I were to write a paper about that, I would make such claims to make
>it sound more complicated than it needs to be :)

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-04-25 16:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
2026-04-24 14:25   ` David Hildenbrand (Arm)
2026-04-24 14:49     ` Sasha Levin
2026-04-24 15:06       ` Pasha Tatashin
2026-04-24 18:28         ` David Hildenbrand (Arm)
2026-04-24 23:34           ` Sasha Levin
2026-04-25  5:30             ` David Hildenbrand (Arm)
2026-04-25 16:38               ` Sasha Levin
2026-04-24 18:26       ` David Hildenbrand (Arm)
2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
2026-04-24 14:00 ` [RFC 6/7] Documentation/mm: add page consistency checker documentation Sasha Levin
2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
2026-04-24 15:53   ` Sasha Levin
2026-04-24 15:42 ` Vlastimil Babka (SUSE)
2026-04-24 16:25   ` Sasha Levin
2026-04-25  5:51     ` David Hildenbrand (Arm)
2026-04-25 16:09       ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox