All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Arendt <admin@prnet.org>
To: Qu Wenruo <wqu@suse.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-rockchip@lists.infradead.org,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable
Date: Thu, 12 Feb 2026 23:23:44 +0100	[thread overview]
Message-ID: <de166323-bb9d-4240-bc42-08ae32067284@prnet.org> (raw)
In-Reply-To: <f95f0d27-5bee-4363-b0f0-75e95b2a470d@suse.com>

On 2/12/26 10:05 PM, Qu Wenruo wrote:
>
>
> 在 2026/2/13 06:41, David Arendt 写道:
>> Hello,
>>
>> I am using a Kubernetes Cluster with 3 Orange PI5 MAX nodes. The data 
>> is stored using a btrfs filesystem as backend. If using kernel 6.19.0 
>> or kernel 6.18.10 I have experienced many crashes during high IO load 
>> on all 3 nodes. Reverting back to 6.18.9 solves the problems 
>> completely. Unfortunately the crashes are spontaneous reboots without 
>> leaving a trace in any logfile, so I have no stacktrace of them. 
>> After the crashes I have sometimes incorrect btrfs csums for a file 
>> but these may also be a result of a partial write due to the crash. 
>> On one node I had a btrfs error logged without crashing, but I am not 
>> sure if this is the root cause or a result of a prior crash. A scrub 
>> after reboot returned no error with 6.19.0.
>
> The offending tree dump items are:
>
> Feb 10 13:31:07 opi02 kernel:  item 92 key (13218356101120
> Feb 10 13:31:07 opi02 kernel:  item 93 key (13216208642048
> Feb 10 13:31:07 opi02 kernel:  item 94 key (13218356162560
>
> Obviously item 93 is smaller than all its previous and next item keys.
>
> hex(13218356101120) = 0xc05a36b8000
> hex(13216208642048) = 0xc05236be000
> hex(13218356162560) = 0xc05a36c7000
>
> It looks like something fliped, "0xc05a3" -> "0xc0523"
>
> 0xa -> 0x2 is exactly one bit flipped.
>
> So either the memory hardware has something wrong and resulting a 
> sticking bit (always 0), or there is something inside the kernel 
> touching memory it shouldn't.
>
> And this exactly matches the symptom, changing random bit of your 
> kernel, crash always expected.
>
>
> Can you run a memtest to make sure it is not hardware problems first?

Hello,

I don't know of anything like memtest86 for the arm64 platform for 
testing the whole memory, so I used the user space memtester to check 
the 14G of unused ram on all 3 machines while using kernel 6.18.10.

Here is the result of the first iteration (same on every machine):

memtester version 4.7.1 (64-bit)
Copyright (C) 2001-2024 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 14000MB (14680064000 bytes)
got  14000MB (14680064000 bytes), trying mlock ...locked.
Loop 1:
   Stuck Address       : ok
   Random Value        : ok
   Compare XOR         : ok
   Compare SUB         : ok
   Compare MUL         : ok
   Compare DIV         : ok
   Compare OR          : ok
   Compare AND         : ok
   Sequential Increment: ok
   Solid Bits          : ok
   Block Sequential    : ok
   Checkerboard        : ok
   Bit Spread          : ok
   Bit Flip            : ok
   Walking Ones        : ok
   Walking Zeroes      : ok

I don't think it is hardware a failure as it is happening on 3 different 
machines. Crashes occur somewhere between 30 minutes and 12 hours on all 
3 machines that have been running without a single crash for more than a 
year now with older kernel versions including 4 days with 6.18.9 and all 
version from 6.18.0 to 6.18.9, so it seems to be caused by something 
that has changed between 6.18.9 and 6.18.10.

Thanks,

David Arendt

>
> Thanks,
> Qu
>
>
>>
>> Unfortunately I don't have more information at the moment.
>>
>> Thanks in advance,
>>
>> David Arendt
>>
>>
>


WARNING: multiple messages have this Message-ID (diff)
From: David Arendt <admin@prnet.org>
To: Qu Wenruo <wqu@suse.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-rockchip@lists.infradead.org,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable
Date: Thu, 12 Feb 2026 23:23:44 +0100	[thread overview]
Message-ID: <de166323-bb9d-4240-bc42-08ae32067284@prnet.org> (raw)
In-Reply-To: <f95f0d27-5bee-4363-b0f0-75e95b2a470d@suse.com>

On 2/12/26 10:05 PM, Qu Wenruo wrote:
>
>
> 在 2026/2/13 06:41, David Arendt 写道:
>> Hello,
>>
>> I am using a Kubernetes Cluster with 3 Orange PI5 MAX nodes. The data 
>> is stored using a btrfs filesystem as backend. If using kernel 6.19.0 
>> or kernel 6.18.10 I have experienced many crashes during high IO load 
>> on all 3 nodes. Reverting back to 6.18.9 solves the problems 
>> completely. Unfortunately the crashes are spontaneous reboots without 
>> leaving a trace in any logfile, so I have no stacktrace of them. 
>> After the crashes I have sometimes incorrect btrfs csums for a file 
>> but these may also be a result of a partial write due to the crash. 
>> On one node I had a btrfs error logged without crashing, but I am not 
>> sure if this is the root cause or a result of a prior crash. A scrub 
>> after reboot returned no error with 6.19.0.
>
> The offending tree dump items are:
>
> Feb 10 13:31:07 opi02 kernel:  item 92 key (13218356101120
> Feb 10 13:31:07 opi02 kernel:  item 93 key (13216208642048
> Feb 10 13:31:07 opi02 kernel:  item 94 key (13218356162560
>
> Obviously item 93 is smaller than all its previous and next item keys.
>
> hex(13218356101120) = 0xc05a36b8000
> hex(13216208642048) = 0xc05236be000
> hex(13218356162560) = 0xc05a36c7000
>
> It looks like something fliped, "0xc05a3" -> "0xc0523"
>
> 0xa -> 0x2 is exactly one bit flipped.
>
> So either the memory hardware has something wrong and resulting a 
> sticking bit (always 0), or there is something inside the kernel 
> touching memory it shouldn't.
>
> And this exactly matches the symptom, changing random bit of your 
> kernel, crash always expected.
>
>
> Can you run a memtest to make sure it is not hardware problems first?

Hello,

I don't know of anything like memtest86 for the arm64 platform for 
testing the whole memory, so I used the user space memtester to check 
the 14G of unused ram on all 3 machines while using kernel 6.18.10.

Here is the result of the first iteration (same on every machine):

memtester version 4.7.1 (64-bit)
Copyright (C) 2001-2024 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 14000MB (14680064000 bytes)
got  14000MB (14680064000 bytes), trying mlock ...locked.
Loop 1:
   Stuck Address       : ok
   Random Value        : ok
   Compare XOR         : ok
   Compare SUB         : ok
   Compare MUL         : ok
   Compare DIV         : ok
   Compare OR          : ok
   Compare AND         : ok
   Sequential Increment: ok
   Solid Bits          : ok
   Block Sequential    : ok
   Checkerboard        : ok
   Bit Spread          : ok
   Bit Flip            : ok
   Walking Ones        : ok
   Walking Zeroes      : ok

I don't think it is hardware a failure as it is happening on 3 different 
machines. Crashes occur somewhere between 30 minutes and 12 hours on all 
3 machines that have been running without a single crash for more than a 
year now with older kernel versions including 4 days with 6.18.9 and all 
version from 6.18.0 to 6.18.9, so it seems to be caused by something 
that has changed between 6.18.9 and 6.18.10.

Thanks,

David Arendt

>
> Thanks,
> Qu
>
>
>>
>> Unfortunately I don't have more information at the moment.
>>
>> Thanks in advance,
>>
>> David Arendt
>>
>>
>


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

  reply	other threads:[~2026-02-12 22:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12 20:11 Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable David Arendt
2026-02-12 21:05 ` Qu Wenruo
2026-02-12 22:23   ` David Arendt [this message]
2026-02-12 22:23     ` David Arendt
2026-02-12 22:48     ` Qu Wenruo
2026-02-12 22:48       ` Qu Wenruo
2026-02-13  5:09       ` David Arendt
2026-02-13  5:09         ` David Arendt
2026-02-13 17:34       ` Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable (could eventually be iscsi related) David Arendt
2026-02-13 17:34         ` David Arendt
  -- strict thread matches above, loose matches on Subject: below --
2026-02-12 20:31 Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable David Arendt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de166323-bb9d-4240-bc42-08ae32067284@prnet.org \
    --to=admin@prnet.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.