From: Sasha Levin <sashal@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, david@kernel.org, corbet@lwn.net,
ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
skhan@linuxfoundation.org, jackmanb@google.com,
hannes@cmpxchg.org, ziy@nvidia.com, linux-mm@kvack.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC 0/7] mm: dual-bitmap page allocator consistency checker
Date: Fri, 24 Apr 2026 11:53:57 -0400 [thread overview]
Message-ID: <aeuSFV6IcfQ0lifO@laps> (raw)
In-Reply-To: <aeuNdOmgxA-k9BqY@casper.infradead.org>
On Fri, Apr 24, 2026 at 04:34:12PM +0100, Matthew Wilcox wrote:
>On Fri, Apr 24, 2026 at 10:00:49AM -0400, Sasha Levin wrote:
>> corruption must be detected before it propagates. The dual-bitmap
>> implements a way to protect from corruption coming from hardware or
>> software - two complementary representations of page allocation state,
>> allocated independently via memblock, where any single-bit fault in
>> either bitmap is immediately detectable. Performance is secondary to
>> correctness in this context. A safety mechanism must be simple enough
>> to audit and certify, must fail deterministically (panic, not
>> log-and-hope), and its correctness matters more than its throughput.
>> The dual-bitmap adds two atomic bitops per alloc/free, but for
>> safety-critical deployments this cost is acceptable because the
>> alternative - undetected corruption propagating silently - violates
>> the system's safety case. The static key ensures zero cost for kernels
>> that don't need it.
>
>But doubling the storage requirement in order to achieve merely detection
>is significantly worse than state-of-the-art in 1950 (when Richard
>Hamming invented Hamming codes). If we used a (7,3) code, we'd have
>SECDED at a lower cost. Of course, there are far better codes available
>than that today.
I agree with the density concern. I have two reasons for that:
1. Update cost. On the alloc/free hot path the dual-bitmap update is two
independent test_and_set_bit. A Hamming/SECDED codeword needs a
read-modify-write of the whole word with locking on every state change.
2. Correlated faults. The two copies need to sit in different physical memory
so a multi-bit fault (row, column, bank, row-hammer) can only hit one of them.
See this paper which has some numbers:
https://dl.acm.org/doi/epdf/10.1145/2786763.2694348 - About 21% of DRAM faults
span more than one bit, plain SECDED can leave up to 20 FIT per device of
undetected errors from those, and it only helps at all if data and parity bits
are spread across physically separate cells.
Two memblock_alloc'd bitmaps give that separation for free. You could
interleave a code across two independent regions instead, but then
the invariant check stops being a one-line complement check, which is
what I was trying to keep simple for the audit side.
--
Thanks,
Sasha
next prev parent reply other threads:[~2026-04-24 15:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
2026-04-24 14:25 ` David Hildenbrand (Arm)
2026-04-24 14:49 ` Sasha Levin
2026-04-24 15:06 ` Pasha Tatashin
2026-04-24 18:28 ` David Hildenbrand (Arm)
2026-04-24 23:34 ` Sasha Levin
2026-04-25 5:30 ` David Hildenbrand (Arm)
2026-04-25 16:38 ` Sasha Levin
2026-04-24 18:26 ` David Hildenbrand (Arm)
2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
2026-04-24 14:00 ` [RFC 6/7] Documentation/mm: add page consistency checker documentation Sasha Levin
2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
2026-04-24 15:53 ` Sasha Levin [this message]
2026-04-24 15:42 ` Vlastimil Babka (SUSE)
2026-04-24 16:25 ` Sasha Levin
2026-04-25 5:51 ` David Hildenbrand (Arm)
2026-04-25 16:09 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeuSFV6IcfQ0lifO@laps \
--to=sashal@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox