All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: akpm@linux-foundation.org, david@kernel.org, corbet@lwn.net
Cc: ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	skhan@linuxfoundation.org, jackmanb@google.com,
	hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Sasha Levin <sashal@nvidia.com>,
	Sanif Veeras <sveeras@nvidia.com>,
	"Claude:claude-opus-4-7" <noreply@anthropic.com>
Subject: [RFC 6/7] Documentation/mm: add page consistency checker documentation
Date: Fri, 24 Apr 2026 10:00:55 -0400	[thread overview]
Message-ID: <20260424140056.2094777-7-sashal@kernel.org> (raw)
In-Reply-To: <20260424140056.2094777-1-sashal@kernel.org>

From: Sasha Levin <sashal@nvidia.com>

Add documentation for the page consistency checker feature. The document
explains the dual-bitmap algorithm, describes the configuration options,
and covers the debugfs interface for monitoring and validation.

The algorithm section explains how the complementary bitmaps work: the
primary bitmap uses 1 for allocated and 0 for free, while the secondary
bitmap uses the opposite convention. This redundancy means any single-bit
corruption in either bitmap will cause a detectable violation of the
invariant that primary[bit] must equal ~secondary[bit].

The document also explains the intentional limitation around double-free
detection. During boot, free_reserved_area() releases pages that were
never allocated through the buddy allocator. Flagging these as errors
would generate many false positives, so double-free detection is
deferred until after boot completes.

Based-on-patch-by: Sanif Veeras <sveeras@nvidia.com>
Assisted-by: Claude:claude-opus-4-7 <noreply@anthropic.com>
Signed-off-by: Sasha Levin <sashal@nvidia.com>
---
 Documentation/mm/index.rst            |   1 +
 Documentation/mm/page_consistency.rst | 211 ++++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100644 Documentation/mm/page_consistency.rst

diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst
index 7aa2a8886908..bef6c9bbc976 100644
--- a/Documentation/mm/index.rst
+++ b/Documentation/mm/index.rst
@@ -57,6 +57,7 @@ documentation, or deleted if it has served its purpose.
    page_frags
    page_owner
    page_table_check
+   page_consistency
    remap_file_pages
    split_page_table_lock
    transhuge
diff --git a/Documentation/mm/page_consistency.rst b/Documentation/mm/page_consistency.rst
new file mode 100644
index 000000000000..dd1bde68f1a5
--- /dev/null
+++ b/Documentation/mm/page_consistency.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Page Consistency Checker
+=======================
+
+The page consistency checker is a debugging feature that uses dual
+complementary bitmaps to detect corruption in page allocation tracking.
+It maintains the invariant that for every bit position, the primary
+bitmap value equals the bitwise complement of the secondary bitmap value.
+
+Overview
+========
+
+Memory corruption can silently flip bits in kernel data structures,
+leading to difficult-to-diagnose failures. The page consistency checker
+addresses this by maintaining redundant tracking of page allocation
+state. Any single-bit corruption in either bitmap will cause a detectable
+inconsistency, allowing the corruption to be caught rather than causing
+silent data corruption or mysterious crashes later.
+
+The bitmaps are flat, covering the entire PFN range from
+``memblock_start_of_DRAM()`` to ``memblock_end_of_DRAM()`` including any
+holes in physical memory. This is a deliberate design choice: simple
+``pfn - min_pfn`` indexing is trivially auditable, which matters for a
+safety mechanism. Sparse or section-aware indexing would add auxiliary
+data structures that could themselves be subject to corruption. See
+`Limitations`_ for a detailed analysis of memory overhead including
+holes.
+
+The approach is based on NVIDIA safety research and is
+particularly useful for safety-critical systems requiring Freedom From
+Interference (FFI) guarantees per ISO 26262 (ASIL-D) and IEC 61508
+(SIL-3).
+
+Algorithm
+=========
+
+The checker maintains two bitmaps tracking page allocation state:
+
+Primary bitmap
+  Bit set to 1 when page is allocated, 0 when free.
+
+Secondary bitmap
+  Bit set to 0 when page is allocated, 1 when free.
+
+The invariant that must always hold is::
+
+    primary[bit] == ~secondary[bit]
+
+When a page is allocated, the checker sets the bit in the primary bitmap
+and clears it in the secondary bitmap. When freed, it clears in primary
+and sets in secondary. If the operation finds the bit already in the
+expected final state, a double-allocation or double-free has occurred.
+
+Full validation can be performed by checking that every word in the
+primary bitmap equals the bitwise complement of the corresponding word
+in the secondary bitmap.
+
+Concurrency Handling
+====================
+
+The dual-bitmap update operations (set/clear) modify both bitmaps with
+separate atomic operations. This creates a brief window where a concurrent
+validation could observe a transient inconsistency.
+
+The implementation handles this by retrying validation when an inconsistency
+is detected. Real memory corruption is persistent and will fail all retries.
+Transient inconsistencies from concurrent updates resolve quickly and pass
+on retry.
+
+Double-Free Detection
+=====================
+
+Double-free detection is deferred until the system is fully running. During
+boot, free_reserved_area() and free_initmem() release memory pages that were
+never allocated through the buddy allocator. These would appear as double-frees
+but are expected behavior.
+
+The checker uses ``system_state >= SYSTEM_RUNNING`` to determine when boot
+is complete. This state is reached only after all init memory has been freed,
+ensuring no false positives from legitimate boot-time freeing. Any attempt to
+free a page that is not marked as allocated after this point will be flagged
+as a violation.
+
+Configuration
+=============
+
+The feature is controlled by two Kconfig options:
+
+``CONFIG_DEBUG_PAGE_CONSISTENCY``
+  Enable the page consistency checker. Memory overhead is two bits per
+  PFN in the spanned range (start to end of DRAM, including holes),
+  roughly 4 MB total for a 64 GB system. When this option is disabled,
+  the allocator hooks compile away. When enabled, a static key gates
+  tracking until initialization succeeds.
+
+``CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC``
+  When enabled, the kernel will panic immediately upon detecting a
+  consistency violation. When disabled, a warning with a stack trace
+  is emitted and execution continues. Safety-critical systems should
+  enable this option.
+
+Debugfs Interface
+=================
+
+When CONFIG_DEBUG_FS is enabled, the checker exposes files under
+``/sys/kernel/debug/page_consistency/``:
+
+``stats``
+  Read-only file showing tracking statistics::
+
+    pages_tracked:       12345
+    alloc_count:         67890
+    free_count:          55545
+    violations_detected: 0
+    bitmap_size_bits:    1048576
+    pfn_range:           [256-1048831]
+
+``validate``
+  Write-only file. Writing any value triggers a full validation of
+  all bitmap words. Returns success if all words are consistent,
+  or -EIO if any violations are found.
+
+Usage
+=====
+
+To use the page consistency checker:
+
+1. Enable ``CONFIG_DEBUG_PAGE_CONSISTENCY`` in your kernel configuration.
+
+2. Optionally enable ``CONFIG_DEBUG_PAGE_CONSISTENCY_PANIC`` if you want
+   the kernel to halt immediately upon detecting corruption.
+
+3. Boot the kernel. The checker will automatically initialize and begin
+   tracking page allocations.
+
+4. Monitor statistics via debugfs::
+
+     cat /sys/kernel/debug/page_consistency/stats
+
+5. Trigger manual validation::
+
+     echo 1 > /sys/kernel/debug/page_consistency/validate
+
+Limitations
+===========
+
+As described in `Overview`_, the bitmaps use a flat layout covering the
+entire spanned PFN range, including any holes. Bits corresponding to
+holes are initialized to the free state and remain inert; they maintain
+the complement invariant and never trigger false positives. The kernel's
+own ``pageblock_flags`` bitmaps use the same flat approach, sizing to
+``zone->spanned_pages`` which includes holes.
+
+Memory overhead
+---------------
+
+The cost is 2 bits per PFN in the range (1 bit per bitmap x 2 bitmaps),
+allocated via ``memblock_alloc()`` before the buddy allocator is
+available. A hole wastes ``hole_size / PAGE_SIZE / 8`` bytes per bitmap.
+In practice the waste from holes is negligible::
+
+  System         Holes    Per-bitmap size   Hole waste   Waste/bitmap
+  -----------    ------   ---------------   ----------   ------------
+  64 GB, flat    none     2 MB              0            0%
+  256 GB, flat   none     8 MB              0            0%
+  256 GB         4 GB     8.1 MB            128 KB       1.5%
+  1 TB           16 GB    32.5 MB           512 KB       1.5%
+
+On x86_64 the typical hole between low memory (below 4 GB) and high
+memory is the largest source of waste. On arm64 with
+``memblock_start_of_DRAM()`` typically at 0x80000000 (2 GB), holes
+within the DRAM range are generally small or absent.
+
+Other limitations
+-----------------
+
+The feature is incompatible with ``CONFIG_MEMORY_HOTPLUG`` because the
+bitmaps are sized at boot based on the initial physical memory range.
+Hot-added memory would fall outside the tracked PFN range and be silently
+ignored.
+
+Boot-time reserved pages are not tracked as allocations. Freeing such a
+page before ``SYSTEM_RUNNING`` is expected and is ignored by the
+double-free detector. Freeing an untracked reserved page after boot is
+reported as a double-free.
+
+The feature detects corruption in the tracking bitmaps themselves, not
+corruption in the actual page contents. For page content verification,
+see CONFIG_PAGE_POISONING.
+
+Implementation Details
+======================
+
+The checker hooks into the page allocator at two points:
+
+- ``post_alloc_hook()`` calls ``page_consistency_alloc()`` after a
+  successful allocation.
+
+- ``free_pages_prepare()`` calls ``page_consistency_free()`` when pages
+  are being returned to the allocator.
+
+Both hooks use static keys (``static_branch_unlikely``) so the overhead
+is a single no-op when the feature is disabled.
+
+The bitmaps are allocated during ``mm_core_init()`` using
+``memblock_alloc()`` before ``memblock_free_all()`` releases memblock
+memory to the buddy allocator. The secondary bitmap is initialized with
+all bits set to 1, establishing the initial complementary relationship
+with the zeroed primary bitmap.
-- 
2.53.0


  parent reply	other threads:[~2026-04-24 14:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
2026-04-24 14:25   ` David Hildenbrand (Arm)
2026-04-24 14:49     ` Sasha Levin
2026-04-24 15:06       ` Pasha Tatashin
2026-04-24 18:28         ` David Hildenbrand (Arm)
2026-04-24 23:34           ` Sasha Levin
2026-04-25  5:30             ` David Hildenbrand (Arm)
2026-04-25 16:38               ` Sasha Levin
2026-04-27 12:32                 ` David Hildenbrand (Arm)
2026-04-27 14:10                   ` Sasha Levin
2026-04-27 15:40                     ` David Hildenbrand (Arm)
2026-04-27 18:56                       ` Sasha Levin
2026-04-27 19:37                         ` David Hildenbrand (Arm)
2026-04-27 23:24                           ` Sasha Levin
2026-04-28  7:22                             ` David Hildenbrand (Arm)
2026-04-24 18:26       ` David Hildenbrand (Arm)
2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
2026-04-24 14:00 ` Sasha Levin [this message]
2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
2026-04-24 15:53   ` Sasha Levin
2026-04-24 15:42 ` Vlastimil Babka (SUSE)
2026-04-24 16:25   ` Sasha Levin
2026-04-25  5:51     ` David Hildenbrand (Arm)
2026-04-25 16:09       ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260424140056.2094777-7-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=noreply@anthropic.com \
    --cc=rppt@kernel.org \
    --cc=sashal@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=sveeras@nvidia.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.