public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Brian Ruley <brian.ruley@gehealthcare.com>
Cc: Will Deacon <will@kernel.org>,
	Steve Capper <steve.capper@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user
Date: Fri, 10 Apr 2026 12:18:11 +0100	[thread overview]
Message-ID: <adjcc7QvquXI3G0Q@shell.armlinux.org.uk> (raw)
In-Reply-To: <adjYlUk8_JjPivNi@zoo11.fihel.lab.ge-healthcare.net>

On Fri, Apr 10, 2026 at 02:01:41PM +0300, Brian Ruley wrote:
> Thank you for the clarification, this is very educational for me.
> I understand your scepticism, and I can't explain what's going on based
> on what you replied. However, I do honestly believe there is a problem
> here. I'll share the exact testing details and the instrumentation
> we added that convinced us to reach out at the end. One idea we also
> had was that could cache aliasing be happening here.
> 
> To clarify any potential misunderstanding, we've observed the
> following:
> 
> - Sporadic SIGILL and SIGSEGV under memory pressure
> - Scales with core count, i.e., quad core more likely to reproduce
>   than dual core. We haven't observed an issue on single core.
> - Coredumps show valid instructions at the faulting PC.
>   The CPU executed something different from what's in memory.
>   This pointed us to stale I-cache.
> - Instrumentation indicates a correlation.
>   A per-CPU ring buffer tracking exec page migrations was dumped on
>   SIGILL. The faulting PC matched a recently migrated pages.
> - We started seeing this after upgrade 6.1->6.12->6.18. We bisected
>   two commits which had an impact, but we weren't convinced that
>   either was the root cause: 5dfab109d5193e6c224d96cabf90e9cc2c039884
>   and 6faea3422e3b4e8de44a55aa3e6e843320da66d2.
> - Failed processes include systemd, tar, bash, ...
> - Debug options, e.g., page poisoning, seems to hide the bug
> 
> 
> > So you're saying that stress-ng doesn't reproduce this bug but
> triggers the OOM-killer... confused.
> 
> Apologies for the confusion. I meant that with `stress-ng' we created
> the memory pressure and OOM might have played a role in exposing the
> "bug" as we (at the time) believed that anything that would trigger
> memory free/reclaims and page migration was the key. One note I'll add
> is that in our test we invoked stress-ng for 2 minutes (--timeout 2m)
> and after each we would reboot the device. We had observed that reboots
> seemed to have a discernible effect on the occurence in earlier testing
> so we kept that in. I'm beginning to doubt if it had an effect now,
> and unfortunately it's all anecdotal.
> 
> One more thing, even if you don't accept the patch, is this patch
> harmful in any way or is it just sub-optimal?
> 
> I'll send the instrumentation patch as a follow-up, migh be there's a
> flaw in it.

I'll try it - I have Cortex A9 systems (some which I rely on...)

Please can you also try to track the history of what happens for
the PTEs corresponding to the old and new PFN?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!


  reply	other threads:[~2026-04-10 11:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 12:54 [PATCH] mm/arm: pgtable: remove young bit check for pte_valid_user Brian Ruley
2026-04-09 13:56 ` Will Deacon
2026-04-09 14:21   ` Russell King (Oracle)
2026-04-09 14:43   ` Russell King (Oracle)
2026-04-09 15:17   ` Brian Ruley
2026-04-09 16:00     ` Russell King (Oracle)
2026-04-10 11:01       ` Brian Ruley
2026-04-10 11:18         ` Russell King (Oracle) [this message]
2026-04-10 11:43           ` [RFC PATCH] test: " Brian Ruley
2026-04-09 14:15 ` [PATCH] " Russell King (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adjcc7QvquXI3G0Q@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=brian.ruley@gehealthcare.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=steve.capper@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox