Linux-Rockchip Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: David Arendt <admin@prnet.org>
To: Qu Wenruo <wqu@suse.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-rockchip@lists.infradead.org,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable
Date: Thu, 12 Feb 2026 23:23:44 +0100	[thread overview]
Message-ID: <de166323-bb9d-4240-bc42-08ae32067284@prnet.org> (raw)
In-Reply-To: <f95f0d27-5bee-4363-b0f0-75e95b2a470d@suse.com>

On 2/12/26 10:05 PM, Qu Wenruo wrote:
>
>
> 在 2026/2/13 06:41, David Arendt 写道:
>> Hello,
>>
>> I am using a Kubernetes Cluster with 3 Orange PI5 MAX nodes. The data 
>> is stored using a btrfs filesystem as backend. If using kernel 6.19.0 
>> or kernel 6.18.10 I have experienced many crashes during high IO load 
>> on all 3 nodes. Reverting back to 6.18.9 solves the problems 
>> completely. Unfortunately the crashes are spontaneous reboots without 
>> leaving a trace in any logfile, so I have no stacktrace of them. 
>> After the crashes I have sometimes incorrect btrfs csums for a file 
>> but these may also be a result of a partial write due to the crash. 
>> On one node I had a btrfs error logged without crashing, but I am not 
>> sure if this is the root cause or a result of a prior crash. A scrub 
>> after reboot returned no error with 6.19.0.
>
> The offending tree dump items are:
>
> Feb 10 13:31:07 opi02 kernel:  item 92 key (13218356101120
> Feb 10 13:31:07 opi02 kernel:  item 93 key (13216208642048
> Feb 10 13:31:07 opi02 kernel:  item 94 key (13218356162560
>
> Obviously item 93 is smaller than all its previous and next item keys.
>
> hex(13218356101120) = 0xc05a36b8000
> hex(13216208642048) = 0xc05236be000
> hex(13218356162560) = 0xc05a36c7000
>
> It looks like something fliped, "0xc05a3" -> "0xc0523"
>
> 0xa -> 0x2 is exactly one bit flipped.
>
> So either the memory hardware has something wrong and resulting a 
> sticking bit (always 0), or there is something inside the kernel 
> touching memory it shouldn't.
>
> And this exactly matches the symptom, changing random bit of your 
> kernel, crash always expected.
>
>
> Can you run a memtest to make sure it is not hardware problems first?

Hello,

I don't know of anything like memtest86 for the arm64 platform for 
testing the whole memory, so I used the user space memtester to check 
the 14G of unused ram on all 3 machines while using kernel 6.18.10.

Here is the result of the first iteration (same on every machine):

memtester version 4.7.1 (64-bit)
Copyright (C) 2001-2024 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 14000MB (14680064000 bytes)
got  14000MB (14680064000 bytes), trying mlock ...locked.
Loop 1:
   Stuck Address       : ok
   Random Value        : ok
   Compare XOR         : ok
   Compare SUB         : ok
   Compare MUL         : ok
   Compare DIV         : ok
   Compare OR          : ok
   Compare AND         : ok
   Sequential Increment: ok
   Solid Bits          : ok
   Block Sequential    : ok
   Checkerboard        : ok
   Bit Spread          : ok
   Bit Flip            : ok
   Walking Ones        : ok
   Walking Zeroes      : ok

I don't think it is hardware a failure as it is happening on 3 different 
machines. Crashes occur somewhere between 30 minutes and 12 hours on all 
3 machines that have been running without a single crash for more than a 
year now with older kernel versions including 4 days with 6.18.9 and all 
version from 6.18.0 to 6.18.9, so it seems to be caused by something 
that has changed between 6.18.9 and 6.18.10.

Thanks,

David Arendt

>
> Thanks,
> Qu
>
>
>>
>> Unfortunately I don't have more information at the moment.
>>
>> Thanks in advance,
>>
>> David Arendt
>>
>>
>


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

       reply	other threads:[~2026-02-12 22:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8ebe4d76-eb07-499b-b140-1f300c1b8d7e@prnet.org>
     [not found] ` <f95f0d27-5bee-4363-b0f0-75e95b2a470d@suse.com>
2026-02-12 22:23   ` David Arendt [this message]
2026-02-12 22:48     ` Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable Qu Wenruo
2026-02-13  5:09       ` David Arendt
2026-02-13 17:34       ` Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable (could eventually be iscsi related) David Arendt
2026-02-12 20:31 Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable David Arendt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de166323-bb9d-4240-bc42-08ae32067284@prnet.org \
    --to=admin@prnet.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox