Re: Lots of fastmap writes

From: Rickard x Andersson <rickaran@axis.com>
To: Zhihao Cheng <chengzhihao1@huawei.com>,
	richard@nod.at, linux-mtd@lists.infradead.org
Cc: rickard314.andersson@gmail.com
Subject: Re: Lots of fastmap writes
Date: Mon, 17 Jun 2024 13:20:49 +0200	[thread overview]
Message-ID: <0a53be8a-6d21-0ae3-a013-e1155140c889@axis.com> (raw)
In-Reply-To: <a8bf10e3-009f-3d4b-fa4b-43bbc6e1bebc@huawei.com>

On 6/14/24 14:28, Zhihao Cheng wrote:
> 在 2024/6/14 19:42, Rickard X Andersson 写道:
>> On 6/4/24 03:52, Zhihao Cheng wrote:
> 
> [...]
>>>
>>> BTW, after applying the patches, the kernel should run on a new 
>>> flash, the improved wear-leveling algorithm cannot rescue the worn 
>>> out image.
>>>
>>
>> Thanks for the patches!
>>
>> I have backported the patches to Linux kernel 6.1. Do you think the 
>> patches are safe to apply to Linux kernel 6.1?
> 
> Yes, it's okay. I have backported the patches to our product(kernel 
> v5.10) and it works fine.

Thanks! I backported the patches to Linux 6.1 and did run my own stress 
test for a few days. (On another device with fresh flash memory.) It 
seems like the wear of the fastmap physical blocks (0-63) is a lot less 
now with the patches applied, which is good.

However I got this problem after almost 3 days of stress testing (file 
system is set to read only mode):

[ 7885.036577][  T182] ubi2: scrubbed PEB 2904 (LEB 0:229), data moved 
to PEB 627
[83721.724621][  T182] ubi2: scrubbed PEB 983 (LEB 0:3240), data moved 
to PEB 7
[83721.832521][  T182] ubi2: scrubbed PEB 997 (LEB 0:2819), data moved 
to PEB 5
[83784.750714][  T182] ubi2: scrubbed PEB 1927 (LEB 0:10), data moved to 
PEB 2
[165812.657934][  T182] ubi2: scrubbed PEB 3691 (LEB 0:11), data moved 
to PEB 18
[166748.055242][  T182] ubi2: scrubbed PEB 3045 (LEB 0:2), data moved to 
PEB 837
[166834.742451][  T182] ubi2: scrubbed PEB 918 (LEB 0:2), data moved to 
PEB 43
[239986.496840][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan: 
corrupt empty space at LEB 3519:101376
[239986.506809][T31387] UBIFS error (ubi2:0 pid 31387): 
ubifs_scanned_corruption: corruption at LEB 3519:101376
[239986.519742][T31387] UBIFS error (ubi2:0 pid 31387): 
ubifs_scanned_corruption: first 8192 bytes from LEB 3519:101376
[239986.532052][T31387] 00000000: fffffffe ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................
[239986.532230][T31387] 00000020: ffffffff ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................
[239986.532450][T31387] 00000040: ffffffff ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................
[239986.532607][T31387] 00000060: ffffffff ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................
[239986.532732][T31387] 00000080: ffffffff ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................

...

[239986.603283][T31387] 00001000: fffffffe ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................
[239986.603667][T31387] 00001020: ffffffff ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................

...

[239986.707743][T31387] 00001fe0: ffffffff ffffffff ffffffff ffffffff 
ffffffff ffffffff ffffffff ffffffff  ................................

[239986.707894][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan: LEB 
3519 scanning failed
[239986.724625][T31387] UBIFS error (ubi2:0 pid 31387): do_commit: 
commit failed, error -117
[239986.734335][T31387] UBIFS warning (ubi2:0 pid 31387): 
ubifs_ro_mode.part.0: switched to read-only mode, error -117
[239986.748276][T31387] CPU: 0 PID: 31387 Comm: sync Kdump: loaded Not 
tainted 6.1.55-axis9-devel #1
[239986.757327][T31387] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[239986.764095][T31387]  unwind_backtrace from show_stack+0x18/0x1c
[239986.770208][T31387]  show_stack from dump_stack_lvl+0x24/0x2c
[239986.776215][T31387]  dump_stack_lvl from do_commit+0xc0/0x528
[239986.782167][T31387]  do_commit from ubifs_sync_fs+0x84/0x98
[239986.787991][T31387]  ubifs_sync_fs from iterate_supers+0x9c/0x118
[239986.794268][T31387]  iterate_supers from ksys_sync+0x54/0x8c
[239986.800175][T31387]  ksys_sync from sys_sync+0x10/0x18
[239986.805492][T31387]  sys_sync from ret_fast_syscall+0x0/0x64
[239986.811394][T31387] Exception stack(0xc81b5fa8 to 0xc81b5ff0)
[239986.817314][T31387] 5fa0:                   00000072 be8b5d44 
00000001 be8b5d44 00000000 004e5299
[239986.826423][T31387] 5fc0: 00000072 be8b5d44 00000000 00000024 
004a12cd b6f74ce8 00000000 004f806c
[239986.835530][T31387] 5fe0: 004f8f14 be8b5bac 004e529f b6ef4e58

Is the above error something you have seen before?

>>
>> Another thing, would it not be possible to rescue that particular worn 
>> out device by simply turning fastmap off on that device?
>>
> 
> Can I regard the rescuing as making erase counters become normal 
> again(max - min <= UBI_WL_THRESHOLD)? If so, I'm afraid that not all 
> PEBs can be rescued, according to get_peb_for_wl().
> For example: PB, PC cannot be rescued, unless PA is taken for writing 
> and then wl is just right scheduled.
> 
> ubi->free tree:
>       29600(PB)
> 1(PA)        29600(PC)

I mean that I think that the badly worn device could be made usable 
again by turning off fastmap. I mean would it not work properly? I do 
however understand that the first 64 physical erase blocks would not be 
used in practice since the erase counts of those blocks are very high. 
But would not the filsystem work OK? Or am I missing something?

Thanks for all help!
Rickard Andersson

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/