From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mtafr.prnet.org (mtafr.prnet.org [54.38.152.168]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 50E2A261B92; Thu, 12 Feb 2026 22:23:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=54.38.152.168 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770935028; cv=none; b=XCeAeC0M3HcnMFoQBFtE7LJ3udVTLqbvyUd4pjMZiJlVfFm9gvp9fc+sQ7d3NUXR23cxNWszr1uBHIriQa/YJVqv6wu5jn9jDxR6GTDxFAbv5OI9aSV+kGdQJVrte8huUnJ1ZdJbJuE/WgiiazgCRIBnoL5bhl8b0ef73FaD7/s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770935028; c=relaxed/simple; bh=1lJZTEsVYEFdbl8KMEFBkfZm9PQynTuS91keAuDmlMg=; h=Message-ID:Date:MIME-Version:Subject:To:References:From: In-Reply-To:Content-Type; b=jPKW7fZ+j7htGyJu8SKtInyw2/js8nI1GGnJo4RdlyGOSNzgFnKjkuok9Tt0mtzB1W9qmqgCAAdQdp0YlMeqwGMCjwgKLyJNBBug3a4wyiy5pfVYETb2OskvPN7ldXnpMC2769cJYe6NgNt37spiKOpE8+2dGhF0VmXaaBL9VB8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=prnet.org; spf=pass smtp.mailfrom=prnet.org; arc=none smtp.client-ip=54.38.152.168 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=prnet.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=prnet.org Received: from secure.prnet.org (unknown [10.1.0.1]) by mtafr.prnet.org (Postfix) with ESMTP id 5409B2004D; Thu, 12 Feb 2026 23:24:10 +0100 (CET) Received: from [IPV6:2001:7e8:cf00:bc01:8cac:7fff:fe9e:52] (unknown [IPv6:2001:7e8:cf00:bc01:8cac:7fff:fe9e:52]) by secure.prnet.org (Postfix) with ESMTPSA id 86CC810003F; Thu, 12 Feb 2026 23:23:44 +0100 (CET) Message-ID: Date: Thu, 12 Feb 2026 23:23:44 +0100 Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Orange PI 5 MAX: very unstable using kernel 6.19.0 and 6.18.10, 6.18.9 perfectly stable To: Qu Wenruo , LKML , linux-rockchip@lists.infradead.org, Btrfs BTRFS References: <8ebe4d76-eb07-499b-b140-1f300c1b8d7e@prnet.org> Content-Language: en-US From: David Arendt In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2/12/26 10:05 PM, Qu Wenruo wrote: > > > 在 2026/2/13 06:41, David Arendt 写道: >> Hello, >> >> I am using a Kubernetes Cluster with 3 Orange PI5 MAX nodes. The data >> is stored using a btrfs filesystem as backend. If using kernel 6.19.0 >> or kernel 6.18.10 I have experienced many crashes during high IO load >> on all 3 nodes. Reverting back to 6.18.9 solves the problems >> completely. Unfortunately the crashes are spontaneous reboots without >> leaving a trace in any logfile, so I have no stacktrace of them. >> After the crashes I have sometimes incorrect btrfs csums for a file >> but these may also be a result of a partial write due to the crash. >> On one node I had a btrfs error logged without crashing, but I am not >> sure if this is the root cause or a result of a prior crash. A scrub >> after reboot returned no error with 6.19.0. > > The offending tree dump items are: > > Feb 10 13:31:07 opi02 kernel:  item 92 key (13218356101120 > Feb 10 13:31:07 opi02 kernel:  item 93 key (13216208642048 > Feb 10 13:31:07 opi02 kernel:  item 94 key (13218356162560 > > Obviously item 93 is smaller than all its previous and next item keys. > > hex(13218356101120) = 0xc05a36b8000 > hex(13216208642048) = 0xc05236be000 > hex(13218356162560) = 0xc05a36c7000 > > It looks like something fliped, "0xc05a3" -> "0xc0523" > > 0xa -> 0x2 is exactly one bit flipped. > > So either the memory hardware has something wrong and resulting a > sticking bit (always 0), or there is something inside the kernel > touching memory it shouldn't. > > And this exactly matches the symptom, changing random bit of your > kernel, crash always expected. > > > Can you run a memtest to make sure it is not hardware problems first? Hello, I don't know of anything like memtest86 for the arm64 platform for testing the whole memory, so I used the user space memtester to check the 14G of unused ram on all 3 machines while using kernel 6.18.10. Here is the result of the first iteration (same on every machine): memtester version 4.7.1 (64-bit) Copyright (C) 2001-2024 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 14000MB (14680064000 bytes) got  14000MB (14680064000 bytes), trying mlock ...locked. Loop 1:   Stuck Address       : ok   Random Value        : ok   Compare XOR         : ok   Compare SUB         : ok   Compare MUL         : ok   Compare DIV         : ok   Compare OR          : ok   Compare AND         : ok   Sequential Increment: ok   Solid Bits          : ok   Block Sequential    : ok   Checkerboard        : ok   Bit Spread          : ok   Bit Flip            : ok   Walking Ones        : ok   Walking Zeroes      : ok I don't think it is hardware a failure as it is happening on 3 different machines. Crashes occur somewhere between 30 minutes and 12 hours on all 3 machines that have been running without a single crash for more than a year now with older kernel versions including 4 days with 6.18.9 and all version from 6.18.0 to 6.18.9, so it seems to be caused by something that has changed between 6.18.9 and 6.18.10. Thanks, David Arendt > > Thanks, > Qu > > >> >> Unfortunately I don't have more information at the moment. >> >> Thanks in advance, >> >> David Arendt >> >> >