From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Brian Ruley <brian.ruley@gehealthcare.com>
Cc: Will Deacon <will@kernel.org>,
Steve Capper <steve.capper@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
Date: Fri, 10 Apr 2026 12:18:11 +0100 [thread overview]
Message-ID: <adjcc7QvquXI3G0Q@shell.armlinux.org.uk> (raw)
In-Reply-To: <adjYlUk8_JjPivNi@zoo11.fihel.lab.ge-healthcare.net>
On Fri, Apr 10, 2026 at 02:01:41PM +0300, Brian Ruley wrote:
> Thank you for the clarification, this is very educational for me.
> I understand your scepticism, and I can't explain what's going on based
> on what you replied. However, I do honestly believe there is a problem
> here. I'll share the exact testing details and the instrumentation
> we added that convinced us to reach out at the end. One idea we also
> had was that could cache aliasing be happening here.
>
> To clarify any potential misunderstanding, we've observed the
> following:
>
> - Sporadic SIGILL and SIGSEGV under memory pressure
> - Scales with core count, i.e., quad core more likely to reproduce
> than dual core. We haven't observed an issue on single core.
> - Coredumps show valid instructions at the faulting PC.
> The CPU executed something different from what's in memory.
> This pointed us to stale I-cache.
> - Instrumentation indicates a correlation.
> A per-CPU ring buffer tracking exec page migrations was dumped on
> SIGILL. The faulting PC matched a recently migrated pages.
> - We started seeing this after upgrade 6.1->6.12->6.18. We bisected
> two commits which had an impact, but we weren't convinced that
> either was the root cause: 5dfab109d5193e6c224d96cabf90e9cc2c039884
> and 6faea3422e3b4e8de44a55aa3e6e843320da66d2.
> - Failed processes include systemd, tar, bash, ...
> - Debug options, e.g., page poisoning, seems to hide the bug
>
>
> > So you're saying that stress-ng doesn't reproduce this bug but
> triggers the OOM-killer... confused.
>
> Apologies for the confusion. I meant that with `stress-ng' we created
> the memory pressure and OOM might have played a role in exposing the
> "bug" as we (at the time) believed that anything that would trigger
> memory free/reclaims and page migration was the key. One note I'll add
> is that in our test we invoked stress-ng for 2 minutes (--timeout 2m)
> and after each we would reboot the device. We had observed that reboots
> seemed to have a discernible effect on the occurence in earlier testing
> so we kept that in. I'm beginning to doubt if it had an effect now,
> and unfortunately it's all anecdotal.
>
> One more thing, even if you don't accept the patch, is this patch
> harmful in any way or is it just sub-optimal?
>
> I'll send the instrumentation patch as a follow-up, migh be there's a
> flaw in it.
I'll try it - I have Cortex A9 systems (some which I rely on...)
Please can you also try to track the history of what happens for
the PTEs corresponding to the old and new PFN?
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2026-04-10 11:18 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-09 12:54 [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user Brian Ruley
2026-04-09 13:56 ` Will Deacon
2026-04-09 14:21 ` Russell King (Oracle)
2026-04-09 14:43 ` Russell King (Oracle)
2026-04-09 15:17 ` Brian Ruley
2026-04-09 16:00 ` Russell King (Oracle)
2026-04-10 11:01 ` Brian Ruley
2026-04-10 11:18 ` Russell King (Oracle) [this message]
2026-04-10 11:43 ` [RFC PATCH] test: " Brian Ruley
2026-04-15 8:53 ` kernel test robot
2026-04-15 12:04 ` kernel test robot
2026-04-15 13:41 ` kernel test robot
2026-04-13 10:58 ` [PATCH] " Will Deacon
2026-04-13 11:17 ` Brian Ruley
2026-04-13 14:42 ` Russell King (Oracle)
2026-04-13 15:24 ` Brian Ruley
2026-04-14 6:28 ` Brian Ruley
2026-04-14 7:44 ` Brian Ruley
2026-04-14 11:08 ` Will Deacon
2026-04-14 11:43 ` Brian Ruley
2026-04-09 14:15 ` Russell King (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adjcc7QvquXI3G0Q@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=brian.ruley@gehealthcare.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=steve.capper@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.