From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Sasha Levin <sashal@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>,
akpm@linux-foundation.org, corbet@lwn.net, ljs@kernel.org,
Liam.Howlett@oracle.com, vbabka@kernel.org, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, skhan@linuxfoundation.org,
jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
linux-mm@kvack.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, Sasha Levin <sashal@nvidia.com>,
Sanif Veeras <sveeras@nvidia.com>,
"Claude:claude-opus-4-7" <noreply@anthropic.com>
Subject: Re: [RFC 4/7] mm: add page consistency checker implementation
Date: Mon, 27 Apr 2026 14:32:43 +0200 [thread overview]
Message-ID: <f910969d-1071-4174-bd42-da45cb6f9749@kernel.org> (raw)
In-Reply-To: <aezt96xgz_qyf4d-@laps>
>> But the real question is: how far away do these bits have to be in memory to be
>> considered "independent" and not prone to the same corruption?
>>
>> 1 bit?
>> 1 byte?
>> 64 byte?
>> 4096 byte?
>> ???
>
> The notes I have from the research side of things (which should be taken with a
> grain of salt) are something along the lines of:
>
> - ~79% are a single bit corruption
> - ~9% are row faults, so multiple bit corruption within ~8kb
> - ~4% are bank faults, so multiple bit corruption within ~512mb
Interesting numbers, thanks! What are the other missing %?
>
> Obviously the numbers would be very different depending on usecase, hardware,
> physical location (did you know bits are more likely to flip in higher
> altitudes?)...
Yeah, heavy cosmic ray apparently makes the problem worse.
The 512mb case is obviously tricky to handle (and is very hw dependent).
Placing bits at least two pages apart could be done more easily.
>
>> "Embedding both in page_ext means a single fault could
>> corrupt both the tracking data and its redundant copy in the same
>> allocation region."
>>
>> I might be wrong, but isn't that the case for any such fault, as you don't 100%
>> know how the DIMM is organized internally?
>>
>> Do we really expect that a MCE event would, for example, very likely corrupt two
>> neighboring bits, or two bits in the same byte etc? What are the odds that we
>> care?
>
> For something like a datacenter deployment I'd agree with you - the odds are
> too low to care. For an unsupervised self driving vehicle, where there's no
> human (locally or remotely) available to take over, I'd like the odds to be as
> low as possible :)
I thought that people usually use special RT OSes (with proven logic etc) for
any safety-related systems. Using Linux on the core safety system sounds ... scary.
But, I'd expect corruption of other data (user pages? page tables?) a much
bigger problem than page al locator metdata? What am I missing that this here is
-- in context of the bigger problems there -- a thing we particularly care about?
--
Cheers,
David
next prev parent reply other threads:[~2026-04-27 12:32 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 14:00 [RFC 0/7] mm: dual-bitmap page allocator consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 1/7] mm: add generic dual-bitmap consistency primitives Sasha Levin
2026-04-24 14:00 ` [RFC 2/7] mm: add page consistency checker header Sasha Levin
2026-04-24 14:00 ` [RFC 3/7] mm: add Kconfig options for page consistency checker Sasha Levin
2026-04-24 14:00 ` [RFC 4/7] mm: add page consistency checker implementation Sasha Levin
2026-04-24 14:25 ` David Hildenbrand (Arm)
2026-04-24 14:49 ` Sasha Levin
2026-04-24 15:06 ` Pasha Tatashin
2026-04-24 18:28 ` David Hildenbrand (Arm)
2026-04-24 23:34 ` Sasha Levin
2026-04-25 5:30 ` David Hildenbrand (Arm)
2026-04-25 16:38 ` Sasha Levin
2026-04-27 12:32 ` David Hildenbrand (Arm) [this message]
2026-04-27 14:10 ` Sasha Levin
2026-04-27 15:40 ` David Hildenbrand (Arm)
2026-04-24 18:26 ` David Hildenbrand (Arm)
2026-04-24 14:00 ` [RFC 5/7] mm/page_alloc: integrate page consistency hooks Sasha Levin
2026-04-24 14:00 ` [RFC 6/7] Documentation/mm: add page consistency checker documentation Sasha Levin
2026-04-24 14:00 ` [RFC 7/7] mm/page_consistency: add KUnit tests for dual-bitmap primitives Sasha Levin
2026-04-24 15:34 ` [RFC 0/7] mm: dual-bitmap page allocator consistency checker Matthew Wilcox
2026-04-24 15:53 ` Sasha Levin
2026-04-24 15:42 ` Vlastimil Babka (SUSE)
2026-04-24 16:25 ` Sasha Levin
2026-04-25 5:51 ` David Hildenbrand (Arm)
2026-04-25 16:09 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f910969d-1071-4174-bd42-da45cb6f9749@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=noreply@anthropic.com \
--cc=pasha.tatashin@soleen.com \
--cc=rppt@kernel.org \
--cc=sashal@kernel.org \
--cc=sashal@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=sveeras@nvidia.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox