public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: akpm@linux-foundation.org, david@kernel.org, corbet@lwn.net
Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	skhan@linuxfoundation.org, jackmanb@google.com,
	hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Sasha Levin <sashal@nvidia.com>,
	Sanif Veeras <sveeras@nvidia.com>,
	"Claude:claude-opus-4-7" <noreply@anthropic.com>
Subject: [RFC 6/7] Documentation/mm: add page consistency checker documentation
Date: Fri, 24 Apr 2026 10:00:55 -0400	[thread overview]
Message-ID: <20260424140056.2094777-7-sashal@kernel.org> (raw)
In-Reply-To: <20260424140056.2094777-1-sashal@kernel.org>

From: Sasha Levin <sashal@nvidia.com>

Add documentation for the page consistency checker feature. The document
explains the dual-bitmap algorithm, describes the configuration options,
and covers the debugfs interface for monitoring and validation.

The algorithm section explains how the complementary bitmaps work: the
primary bitmap uses 1 for allocated and 0 for free, while the secondary
bitmap uses the opposite convention. This redundancy means any single-bit
corruption in either bitmap will cause a detectable violation of the
invariant that primary[bit] must equal ~secondary[bit].

The document also explains the intentional limitation around double-free
detection. During boot, free_reserved_area() releases pages that were
never allocated through the buddy allocator. Flagging these as errors
would generate many false positives, so double-free detection is
deferred until after boot completes.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 Documentation/mm/index.rst            |   1 +
 Documentation/mm/page_consistency.rst | 211 ++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100644 Documentation/mm/page_consistency.rst

diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
index 7aa2a8886908..bef6c9bbc976 100644
--- a/Documentation/mm/index.rst
+++ b/Documentation/mm/index.rst
@@ -57,6 +57,7 @@ documentation, or deleted if it has served its purpose.
    page_frags
    page_owner
    page_table_check
+   page_consistency
    remap_file_pages
    split_page_table_lock
    transhuge
diff --git a/Documentation/mm/page_consistency.rst b/Documentation/mm/page_consistency.rst
new file mode 100644
index 000000000000..dd1bde68f1a5
--- /dev/null
+++ b/Documentation/mm/page_consistency.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Page Consistency Checker
+=======================
+
+The page consistency checker is a debugging feature that uses dual
+complementary bitmaps to detect corruption in page allocation tracking.
+It maintains the invariant that for every bit position, the primary
+bitmap value equals the bitwise complement of the secondary bitmap value.
+
+Overview
+========
+
+Memory corruption can silently flip bits in kernel data structures,
+leading to difficult-to-diagnose failures. The page consistency checker
+addresses this by maintaining redundant tracking of page allocation
+state. Any single-bit corruption in either bitmap will cause a detectable
+inconsistency, allowing the corruption to be caught rather than causing
+silent data corruption or mysterious crashes later.
+
+The bitmaps are flat, covering the entire PFN range from
+``memblock_start_of_DRAM()`` to ``memblock_end_of_DRAM()`` including any
+holes in physical memory. This is a deliberate design choice: simple
+``pfn - min_pfn`` indexing is trivially auditable, which matters for a
+safety mechanism. Sparse or section-aware indexing would add auxiliary
+data structures that could themselves be subject to corruption. See
+`Limitations`_ for a detailed analysis of memory overhead including
+holes.
+
+The approach is based on NVIDIA safety research and is
+particularly useful for safety-critical systems requiring Freedom From
+Interference (FFI) guarantees per ISO 26262 (ASIL-D) and IEC 61508
+(SIL-3).
+
+Algorithm
+=========
+
+The checker maintains two bitmaps tracking page allocation state:
+
+Primary bitmap
+  Bit set to 1 when page is allocated, 0 when free.
+
+Secondary bitmap
+  Bit set to 0 when page is allocated, 1 when free.
+
+The invariant that must always hold is::
+
+    primary[bit] == ~secondary[bit]
+
+When a page is allocated, the checker sets the bit in the primary bitmap
+and clears it in the secondary bitmap. When freed, it clears in primary
+and sets in secondary. If the operation finds the bit already in the
+expected final state, a double-allocation or double-free has occurred.
+
+Full validation can be performed by checking that every word in the
+primary bitmap equals the bitwise complement of the corresponding word
+in the secondary bitmap.
+
+Concurrency Handling
+====================
+
+The dual-bitmap update operations (set/clear) modify both bitmaps with
+separate atomic operations. This creates a brief window where a concurrent
+validation could observe a transient inconsistency.
+
+The implementation handles this by retrying validation when an inconsistency
+is detected. Real memory corruption is persistent and will fail all retries.
+Transient inconsistencies from concurrent updates resolve quickly and pass
+on retry.
+
+Double-Free Detection
+=====================
+
+Double-free detection is deferred until the system is fully running. During
+boot, free_reserved_area() and free_initmem() release memory pages that were
+never allocated through the buddy allocator. These would appear as double-frees
+but are expected behavior.
+
+The checker uses ``system_state >= SYSTEM_RUNNING`` to determine when boot
+is complete. This state is reached only after all init memory has been freed,
+ensuring no false positives from legitimate boot-time freeing. Any attempt to
+free a page that is not marked as allocated after this point will be flagged
+as a violation.
+
+Configuration
+=============
+
+The feature is controlled by two Kconfig options:
+
+``CONFIG_DEBUG_PAGE_CONSISTENCY``
+  Enable the page consistency checker. Memory overhead is two bits per
+  PFN in the spanned range (start to end of DRAM, including holes),
+  roughly 4 MB total for a 64 GB system. When this option is disabled,
+  the allocator hooks compile away. When enabled, a static key gates
+  tracking until initialization succeeds.
+
+``CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC``
+  When enabled, the kernel will panic immediately upon detecting a
+  consistency violation. When disabled, a warning with a stack trace
+  is emitted and execution continues. Safety-critical systems should
+  enable this option.
+
+Debugfs Interface
+=================
+
+When CONFIG_DEBUG_FS is enabled, the checker exposes files under
+``/sys/kernel/debug/page_consistency/``:
+
+``stats``
+  Read-only file showing tracking statistics::
+
+    pages_tracked:       12345
+    alloc_count:         67890
+    free_count:          55545
+    violations_detected: 0
+    bitmap_size_bits:    1048576
+    pfn_range:           [256-1048831]
+
+``validate``
+  Write-only file. Writing any value triggers a full validation of
+  all bitmap words. Returns success if all words are consistent,
+  or -EIO if any violations are found.
+
+Usage
+=====
+
+To use the page consistency checker:
+
+1. Enable ``CONFIG_DEBUG_PAGE_CONSISTENCY`` in your kernel configuration.
+
+2. Optionally enable ``CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC`` if you want
+   the kernel to halt immediately upon detecting corruption.
+
+3. Boot the kernel. The checker will automatically initialize and begin
+   tracking page allocations.
+
+4. Monitor statistics via debugfs::
+
+     cat /sys/kernel/debug/page_consistency/stats
+
+5. Trigger manual validation::
+
+     echo 1 > /sys/kernel/debug/page_consistency/validate
+
+Limitations
+===========
+
+As described in `Overview`_, the bitmaps use a flat layout covering the
+entire spanned PFN range, including any holes. Bits corresponding to
+holes are initialized to the free state and remain inert; they maintain
+the complement invariant and never trigger false positives. The kernel's
+own ``pageblock_flags`` bitmaps use the same flat approach, sizing to
+``zone->spanned_pages`` which includes holes.
+
+Memory overhead
+---------------
+
+The cost is 2 bits per PFN in the range (1 bit per bitmap x 2 bitmaps),
+allocated via ``memblock_alloc()`` before the buddy allocator is
+available. A hole wastes ``hole_size / PAGE_SIZE / 8`` bytes per bitmap.
+In practice the waste from holes is negligible::
+
+  System         Holes    Per-bitmap size   Hole waste   Waste/bitmap
+  -----------    ------   ---------------   ----------   ------------
+  64 GB, flat    none     2 MB              0            0%
+  256 GB, flat   none     8 MB              0            0%
+  256 GB         4 GB     8.1 MB            128 KB       1.5%
+  1 TB           16 GB    32.5 MB           512 KB       1.5%
+
+On x86_64 the typical hole between low memory (below 4 GB) and high
+memory is the largest source of waste. On arm64 with
+``memblock_start_of_DRAM()`` typically at 0x80000000 (2 GB), holes
+within the DRAM range are generally small or absent.
+
+Other limitations
+-----------------
+
+The feature is incompatible with ``CONFIG_MEMORY_HOTPLUG`` because the
+bitmaps are sized at boot based on the initial physical memory range.
+Hot-added memory would fall outside the tracked PFN range and be silently
+ignored.
+
+Boot-time reserved pages are not tracked as allocations. Freeing such a
+page before ``SYSTEM_RUNNING`` is expected and is ignored by the
+double-free detector. Freeing an untracked reserved page after boot is
+reported as a double-free.
+
+The feature detects corruption in the tracking bitmaps themselves, not
+corruption in the actual page contents. For page content verification,
+see CONFIG_PAGE_POISONING.
+
+Implementation Details
+======================
+
+The checker hooks into the page allocator at two points:
+
+- ``post_alloc_hook()`` calls ``page_consistency_alloc()`` after a
+  successful allocation.
+
+- ``free_pages_prepare()`` calls ``page_consistency_free()`` when pages
+  are being returned to the allocator.
+
+Both hooks use static keys (``static_branch_unlikely``) so the overhead
+is a single no-op when the feature is disabled.
+
+The bitmaps are allocated during ``mm_core_init()`` using
+``memblock_alloc()`` before ``memblock_free_all()`` releases memblock
+memory to the buddy allocator. The secondary bitmap is initialized with
+all bits set to 1, establishing the initial complementary relationship
+with the zeroed primary bitmap.
-- 
2.53.0



  parent reply	other threads:[~2026-04-24 14:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
2026-04-24 14:25   ` David Hildenbrand (Arm)
2026-04-24 14:49     ` Sasha Levin
2026-04-24 15:06       ` Pasha Tatashin
2026-04-24 18:28         ` David Hildenbrand (Arm)
2026-04-24 23:34           ` Sasha Levin
2026-04-25  5:30             ` David Hildenbrand (Arm)
2026-04-25 16:38               ` Sasha Levin
2026-04-24 18:26       ` David Hildenbrand (Arm)
2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
2026-04-24 14:00 ` Sasha Levin [this message]
2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
2026-04-24 15:53   ` Sasha Levin
2026-04-24 15:42 ` Vlastimil Babka (SUSE)
2026-04-24 16:25   ` Sasha Levin
2026-04-25  5:51     ` David Hildenbrand (Arm)
2026-04-25 16:09       ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260424140056.2094777-7-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=noreply@anthropic.com \
    --cc=rppt@kernel.org \
    --cc=sashal@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=sveeras@nvidia.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox