* Lots of fastmap writes
@ 2024-06-03 8:55 Rickard x Andersson
2024-06-04 1:41 ` Zhihao Cheng
0 siblings, 1 reply; 11+ messages in thread
From: Rickard x Andersson @ 2024-06-03 8:55 UTC (permalink / raw)
To: richard, linux-mtd; +Cc: rickard314.andersson
Hi,
I have a system running Linux 5.10 which logs quite a lot to a database.
The system has been running OK since before Christmas but now it usually
fails after a few hours with errors like these:
May 6 22:29:58 172.26.203.90 warning ubi2 warning: ubi_io_read: error
-74 (ECC error) while reading 58 bytes from PEB 1:230872, read only 58
bytes, retry
May 7 00:11:08 172.26.203.90 warning ubi2 warning: ubi_io_read: error
-74 (ECC error) while reading 58 bytes from PEB 40:239752, read only 58
bytes, retry
May 7 00:11:08 172.26.203.90 err ubi2 error: ubi_io_read: error
-74 (ECC error) while reading 58 bytes from PEB 40:239752, read 58 bytes
Fastmap is used on this system. The ECC errors are usually in the
fastmap area, erase blocks 0- 63.
When looking more closely at the erase counters they look something like
this:
0 - 63: 29600
64 - 2043: 2200
It seems like 30 % of the writes are writes to the fastmap area. Any
ideas of what can cause this many writes to the fastmap area? Heavy load?
Any ideas are welcome.
nand: device found, Manufacturer ID: 0x98, Chip ID: 0xdc
nand: Toshiba TC58NVG2S0H 4G 3.3V 8-bit
nand: 512 MiB, SLC, erase size: 256 KiB, page size: 4096, OOB size: 256
nand: 2 chips detected
UBIFS (ubi2:0): Mounting in authenticated mode
UBIFS (ubi2:0): background thread "ubifs_bgt2_0" started, PID 187
UBIFS (ubi2:0): UBIFS: mounted UBI device 2, volume 0, name "data_volume"
UBIFS (ubi2:0): LEB size: 253952 bytes (248 KiB), min./max. I/O unit
sizes: 4096 bytes/4096 bytes
UBIFS (ubi2:0): FS size: 505110528 bytes (481 MiB, 1989 LEBs), journal
size 25395200 bytes (24 MiB, 100 LEBs)
UBIFS (ubi2:0): reserved for root: 4676575 bytes (4566 KiB)
UBIFS (ubi2:0): media format: w5/r0 (latest is w5/r0), UUID
990A460D-A55E-4B58-ACAD-01FEBC7AF839, small LPT model
ubi2: default fastmap pool size: 100
ubi2: default fastmap WL pool size: 50
ubi2: attaching mtd5
ubi2: attached by fastmap
ubi2: fastmap pool size: 100
ubi2: fastmap WL pool size: 50
ubi2: attached mtd5 (name "data", size 512 MiB)
ubi2: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
ubi2: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
ubi2: VID header offset: 4096 (aligned 4096), data offset: 8192
ubi2: good PEBs: 2040, bad PEBs: 8, corrupted PEBs: 0
ubi2: user volume: 1, internal volumes: 1, max. volumes count: 128
ubi2: max/mean erase counter: 29685/3107, WL threshold: 4096, image
sequence number: 4060280209
ubi2: available PEBs: 0, total reserved PEBs: 2040, PEBs reserved for
bad PEB handling: 32
ubi2: background thread "ubi_bgt2d" started, PID 186
Best regards,
Rickard Andersson
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-03 8:55 Lots of fastmap writes Rickard x Andersson
@ 2024-06-04 1:41 ` Zhihao Cheng
2024-06-04 1:52 ` Zhihao Cheng
2024-06-04 6:47 ` Richard Weinberger
0 siblings, 2 replies; 11+ messages in thread
From: Zhihao Cheng @ 2024-06-04 1:41 UTC (permalink / raw)
To: Rickard x Andersson, richard, linux-mtd; +Cc: rickard314.andersson
在 2024/6/3 16:55, Rickard x Andersson 写道:
> Hi,
>
> I have a system running Linux 5.10 which logs quite a lot to a database.
> The system has been running OK since before Christmas but now it usually
> fails after a few hours with errors like these:
>
> May 6 22:29:58 172.26.203.90 warning ubi2 warning: ubi_io_read: error
> -74 (ECC error) while reading 58 bytes from PEB 1:230872, read only 58
> bytes, retry
> May 7 00:11:08 172.26.203.90 warning ubi2 warning: ubi_io_read: error
> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read only 58
> bytes, retry
> May 7 00:11:08 172.26.203.90 err ubi2 error: ubi_io_read: error
> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read 58 bytes
>
> Fastmap is used on this system. The ECC errors are usually in the
> fastmap area, erase blocks 0- 63.
>
> When looking more closely at the erase counters they look something like
> this:
>
> 0 - 63: 29600
> 64 - 2043: 2200
Try this series of patches
https://lore.kernel.org/linux-mtd/20230812080005.3162125-2-chengzhihao1@huawei.com/T/
>
> It seems like 30 % of the writes are writes to the fastmap area. Any
> ideas of what can cause this many writes to the fastmap area? Heavy load?
>
> Any ideas are welcome.
>
> nand: device found, Manufacturer ID: 0x98, Chip ID: 0xdc
> nand: Toshiba TC58NVG2S0H 4G 3.3V 8-bit
> nand: 512 MiB, SLC, erase size: 256 KiB, page size: 4096, OOB size: 256
> nand: 2 chips detected
>
> UBIFS (ubi2:0): Mounting in authenticated mode
> UBIFS (ubi2:0): background thread "ubifs_bgt2_0" started, PID 187
> UBIFS (ubi2:0): UBIFS: mounted UBI device 2, volume 0, name "data_volume"
> UBIFS (ubi2:0): LEB size: 253952 bytes (248 KiB), min./max. I/O unit
> sizes: 4096 bytes/4096 bytes
> UBIFS (ubi2:0): FS size: 505110528 bytes (481 MiB, 1989 LEBs), journal
> size 25395200 bytes (24 MiB, 100 LEBs)
> UBIFS (ubi2:0): reserved for root: 4676575 bytes (4566 KiB)
> UBIFS (ubi2:0): media format: w5/r0 (latest is w5/r0), UUID
> 990A460D-A55E-4B58-ACAD-01FEBC7AF839, small LPT model
>
> ubi2: default fastmap pool size: 100
> ubi2: default fastmap WL pool size: 50
> ubi2: attaching mtd5
> ubi2: attached by fastmap
> ubi2: fastmap pool size: 100
> ubi2: fastmap WL pool size: 50
> ubi2: attached mtd5 (name "data", size 512 MiB)
> ubi2: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> ubi2: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> ubi2: VID header offset: 4096 (aligned 4096), data offset: 8192
> ubi2: good PEBs: 2040, bad PEBs: 8, corrupted PEBs: 0
> ubi2: user volume: 1, internal volumes: 1, max. volumes count: 128
> ubi2: max/mean erase counter: 29685/3107, WL threshold: 4096, image
> sequence number: 4060280209
> ubi2: available PEBs: 0, total reserved PEBs: 2040, PEBs reserved for
> bad PEB handling: 32
> ubi2: background thread "ubi_bgt2d" started, PID 186
>
> Best regards,
> Rickard Andersson
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
> .
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-04 1:41 ` Zhihao Cheng
@ 2024-06-04 1:52 ` Zhihao Cheng
2024-06-14 11:42 ` Rickard X Andersson
2024-06-04 6:47 ` Richard Weinberger
1 sibling, 1 reply; 11+ messages in thread
From: Zhihao Cheng @ 2024-06-04 1:52 UTC (permalink / raw)
To: Rickard x Andersson, richard, linux-mtd; +Cc: rickard314.andersson
在 2024/6/4 9:41, Zhihao Cheng 写道:
> 在 2024/6/3 16:55, Rickard x Andersson 写道:
>> Hi,
>>
>> I have a system running Linux 5.10 which logs quite a lot to a
>> database. The system has been running OK since before Christmas but
>> now it usually fails after a few hours with errors like these:
>>
>> May 6 22:29:58 172.26.203.90 warning ubi2 warning: ubi_io_read:
>> error -74 (ECC error) while reading 58 bytes from PEB 1:230872, read
>> only 58 bytes, retry
>> May 7 00:11:08 172.26.203.90 warning ubi2 warning: ubi_io_read:
>> error -74 (ECC error) while reading 58 bytes from PEB 40:239752, read
>> only 58 bytes, retry
>> May 7 00:11:08 172.26.203.90 err ubi2 error: ubi_io_read: error
>> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read 58 bytes
>>
>> Fastmap is used on this system. The ECC errors are usually in the
>> fastmap area, erase blocks 0- 63.
>>
>> When looking more closely at the erase counters they look something
>> like this:
>>
>> 0 - 63: 29600
>> 64 - 2043: 2200
>
> Try this series of patches
> https://lore.kernel.org/linux-mtd/20230812080005.3162125-2-chengzhihao1@huawei.com/T/
>
BTW, after applying the patches, the kernel should run on a new flash,
the improved wear-leveling algorithm cannot rescue the worn out image.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-04 1:52 ` Zhihao Cheng
@ 2024-06-14 11:42 ` Rickard X Andersson
2024-06-14 12:28 ` Zhihao Cheng
0 siblings, 1 reply; 11+ messages in thread
From: Rickard X Andersson @ 2024-06-14 11:42 UTC (permalink / raw)
To: Zhihao Cheng, richard, linux-mtd; +Cc: rickard314.andersson
On 6/4/24 03:52, Zhihao Cheng wrote:
> 在 2024/6/4 9:41, Zhihao Cheng 写道:
>> 在 2024/6/3 16:55, Rickard x Andersson 写道:
>>> Hi,
>>>
>>> I have a system running Linux 5.10 which logs quite a lot to a
>>> database. The system has been running OK since before Christmas but
>>> now it usually fails after a few hours with errors like these:
>>>
>>> May 6 22:29:58 172.26.203.90 warning ubi2 warning: ubi_io_read:
>>> error -74 (ECC error) while reading 58 bytes from PEB 1:230872, read
>>> only 58 bytes, retry
>>> May 7 00:11:08 172.26.203.90 warning ubi2 warning: ubi_io_read:
>>> error -74 (ECC error) while reading 58 bytes from PEB 40:239752, read
>>> only 58 bytes, retry
>>> May 7 00:11:08 172.26.203.90 err ubi2 error: ubi_io_read:
>>> error -74 (ECC error) while reading 58 bytes from PEB 40:239752, read
>>> 58 bytes
>>>
>>> Fastmap is used on this system. The ECC errors are usually in the
>>> fastmap area, erase blocks 0- 63.
>>>
>>> When looking more closely at the erase counters they look something
>>> like this:
>>>
>>> 0 - 63: 29600
>>> 64 - 2043: 2200
>>
>> Try this series of patches
>> https://lore.kernel.org/linux-mtd/20230812080005.3162125-2-chengzhihao1@huawei.com/T/
>
> BTW, after applying the patches, the kernel should run on a new flash,
> the improved wear-leveling algorithm cannot rescue the worn out image.
>
Thanks for the patches!
I have backported the patches to Linux kernel 6.1. Do you think the
patches are safe to apply to Linux kernel 6.1?
Another thing, would it not be possible to rescue that particular worn
out device by simply turning fastmap off on that device?
Best regards,
Rickard
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-14 11:42 ` Rickard X Andersson
@ 2024-06-14 12:28 ` Zhihao Cheng
2024-06-17 11:20 ` Rickard x Andersson
0 siblings, 1 reply; 11+ messages in thread
From: Zhihao Cheng @ 2024-06-14 12:28 UTC (permalink / raw)
To: Rickard X Andersson, richard, linux-mtd; +Cc: rickard314.andersson
在 2024/6/14 19:42, Rickard X Andersson 写道:
> On 6/4/24 03:52, Zhihao Cheng wrote:
[...]
>>
>> BTW, after applying the patches, the kernel should run on a new flash,
>> the improved wear-leveling algorithm cannot rescue the worn out image.
>>
>
> Thanks for the patches!
>
> I have backported the patches to Linux kernel 6.1. Do you think the
> patches are safe to apply to Linux kernel 6.1?
Yes, it's okay. I have backported the patches to our product(kernel
v5.10) and it works fine.
>
> Another thing, would it not be possible to rescue that particular worn
> out device by simply turning fastmap off on that device?
>
Can I regard the rescuing as making erase counters become normal
again(max - min <= UBI_WL_THRESHOLD)? If so, I'm afraid that not all
PEBs can be rescued, according to get_peb_for_wl().
For example: PB, PC cannot be rescued, unless PA is taken for writing
and then wl is just right scheduled.
ubi->free tree:
29600(PB)
1(PA) 29600(PC)
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Lots of fastmap writes
2024-06-14 12:28 ` Zhihao Cheng
@ 2024-06-17 11:20 ` Rickard x Andersson
2024-06-17 13:21 ` Zhihao Cheng
0 siblings, 1 reply; 11+ messages in thread
From: Rickard x Andersson @ 2024-06-17 11:20 UTC (permalink / raw)
To: Zhihao Cheng, richard, linux-mtd; +Cc: rickard314.andersson
On 6/14/24 14:28, Zhihao Cheng wrote:
> 在 2024/6/14 19:42, Rickard X Andersson 写道:
>> On 6/4/24 03:52, Zhihao Cheng wrote:
>
> [...]
>>>
>>> BTW, after applying the patches, the kernel should run on a new
>>> flash, the improved wear-leveling algorithm cannot rescue the worn
>>> out image.
>>>
>>
>> Thanks for the patches!
>>
>> I have backported the patches to Linux kernel 6.1. Do you think the
>> patches are safe to apply to Linux kernel 6.1?
>
> Yes, it's okay. I have backported the patches to our product(kernel
> v5.10) and it works fine.
Thanks! I backported the patches to Linux 6.1 and did run my own stress
test for a few days. (On another device with fresh flash memory.) It
seems like the wear of the fastmap physical blocks (0-63) is a lot less
now with the patches applied, which is good.
However I got this problem after almost 3 days of stress testing (file
system is set to read only mode):
[ 7885.036577][ T182] ubi2: scrubbed PEB 2904 (LEB 0:229), data moved
to PEB 627
[83721.724621][ T182] ubi2: scrubbed PEB 983 (LEB 0:3240), data moved
to PEB 7
[83721.832521][ T182] ubi2: scrubbed PEB 997 (LEB 0:2819), data moved
to PEB 5
[83784.750714][ T182] ubi2: scrubbed PEB 1927 (LEB 0:10), data moved to
PEB 2
[165812.657934][ T182] ubi2: scrubbed PEB 3691 (LEB 0:11), data moved
to PEB 18
[166748.055242][ T182] ubi2: scrubbed PEB 3045 (LEB 0:2), data moved to
PEB 837
[166834.742451][ T182] ubi2: scrubbed PEB 918 (LEB 0:2), data moved to
PEB 43
[239986.496840][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan:
corrupt empty space at LEB 3519:101376
[239986.506809][T31387] UBIFS error (ubi2:0 pid 31387):
ubifs_scanned_corruption: corruption at LEB 3519:101376
[239986.519742][T31387] UBIFS error (ubi2:0 pid 31387):
ubifs_scanned_corruption: first 8192 bytes from LEB 3519:101376
[239986.532052][T31387] 00000000: fffffffe ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
[239986.532230][T31387] 00000020: ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
[239986.532450][T31387] 00000040: ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
[239986.532607][T31387] 00000060: ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
[239986.532732][T31387] 00000080: ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
...
[239986.603283][T31387] 00001000: fffffffe ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
[239986.603667][T31387] 00001020: ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
...
[239986.707743][T31387] 00001fe0: ffffffff ffffffff ffffffff ffffffff
ffffffff ffffffff ffffffff ffffffff ................................
[239986.707894][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan: LEB
3519 scanning failed
[239986.724625][T31387] UBIFS error (ubi2:0 pid 31387): do_commit:
commit failed, error -117
[239986.734335][T31387] UBIFS warning (ubi2:0 pid 31387):
ubifs_ro_mode.part.0: switched to read-only mode, error -117
[239986.748276][T31387] CPU: 0 PID: 31387 Comm: sync Kdump: loaded Not
tainted 6.1.55-axis9-devel #1
[239986.757327][T31387] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[239986.764095][T31387] unwind_backtrace from show_stack+0x18/0x1c
[239986.770208][T31387] show_stack from dump_stack_lvl+0x24/0x2c
[239986.776215][T31387] dump_stack_lvl from do_commit+0xc0/0x528
[239986.782167][T31387] do_commit from ubifs_sync_fs+0x84/0x98
[239986.787991][T31387] ubifs_sync_fs from iterate_supers+0x9c/0x118
[239986.794268][T31387] iterate_supers from ksys_sync+0x54/0x8c
[239986.800175][T31387] ksys_sync from sys_sync+0x10/0x18
[239986.805492][T31387] sys_sync from ret_fast_syscall+0x0/0x64
[239986.811394][T31387] Exception stack(0xc81b5fa8 to 0xc81b5ff0)
[239986.817314][T31387] 5fa0: 00000072 be8b5d44
00000001 be8b5d44 00000000 004e5299
[239986.826423][T31387] 5fc0: 00000072 be8b5d44 00000000 00000024
004a12cd b6f74ce8 00000000 004f806c
[239986.835530][T31387] 5fe0: 004f8f14 be8b5bac 004e529f b6ef4e58
Is the above error something you have seen before?
>>
>> Another thing, would it not be possible to rescue that particular worn
>> out device by simply turning fastmap off on that device?
>>
>
> Can I regard the rescuing as making erase counters become normal
> again(max - min <= UBI_WL_THRESHOLD)? If so, I'm afraid that not all
> PEBs can be rescued, according to get_peb_for_wl().
> For example: PB, PC cannot be rescued, unless PA is taken for writing
> and then wl is just right scheduled.
>
> ubi->free tree:
> 29600(PB)
> 1(PA) 29600(PC)
I mean that I think that the badly worn device could be made usable
again by turning off fastmap. I mean would it not work properly? I do
however understand that the first 64 physical erase blocks would not be
used in practice since the erase counts of those blocks are very high.
But would not the filsystem work OK? Or am I missing something?
Thanks for all help!
Rickard Andersson
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-17 11:20 ` Rickard x Andersson
@ 2024-06-17 13:21 ` Zhihao Cheng
2024-06-17 13:48 ` Rickard x Andersson
0 siblings, 1 reply; 11+ messages in thread
From: Zhihao Cheng @ 2024-06-17 13:21 UTC (permalink / raw)
To: Rickard x Andersson, richard, linux-mtd; +Cc: rickard314.andersson
在 2024/6/17 19:20, Rickard x Andersson 写道:
> On 6/14/24 14:28, Zhihao Cheng wrote:
>> 在 2024/6/14 19:42, Rickard X Andersson 写道:
>>> On 6/4/24 03:52, Zhihao Cheng wrote:
>>
>> [...]
>>>>
>>>> BTW, after applying the patches, the kernel should run on a new
>>>> flash, the improved wear-leveling algorithm cannot rescue the worn
>>>> out image.
>>>>
>>>
>>> Thanks for the patches!
>>>
>>> I have backported the patches to Linux kernel 6.1. Do you think the
>>> patches are safe to apply to Linux kernel 6.1?
>>
>> Yes, it's okay. I have backported the patches to our product(kernel
>> v5.10) and it works fine.
>
> Thanks! I backported the patches to Linux 6.1 and did run my own stress
> test for a few days. (On another device with fresh flash memory.) It
> seems like the wear of the fastmap physical blocks (0-63) is a lot less
> now with the patches applied, which is good.
>
> However I got this problem after almost 3 days of stress testing (file
> system is set to read only mode):
>
>
> [ 7885.036577][ T182] ubi2: scrubbed PEB 2904 (LEB 0:229), data moved
> to PEB 627
> [83721.724621][ T182] ubi2: scrubbed PEB 983 (LEB 0:3240), data moved
> to PEB 7
> [83721.832521][ T182] ubi2: scrubbed PEB 997 (LEB 0:2819), data moved
> to PEB 5
> [83784.750714][ T182] ubi2: scrubbed PEB 1927 (LEB 0:10), data moved to
> PEB 2
> [165812.657934][ T182] ubi2: scrubbed PEB 3691 (LEB 0:11), data moved
> to PEB 18
> [166748.055242][ T182] ubi2: scrubbed PEB 3045 (LEB 0:2), data moved to
> PEB 837
> [166834.742451][ T182] ubi2: scrubbed PEB 918 (LEB 0:2), data moved to
> PEB 43
Looks like that some of PEBs have met the bitflip errors.
> [239986.496840][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan:
> corrupt empty space at LEB 3519:101376
> [239986.506809][T31387] UBIFS error (ubi2:0 pid 31387):
> ubifs_scanned_corruption: corruption at LEB 3519:101376
> [239986.519742][T31387] UBIFS error (ubi2:0 pid 31387):
> ubifs_scanned_corruption: first 8192 bytes from LEB 3519:101376
> [239986.532052][T31387] 00000000: fffffffe ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
The data content(0xfffffffe) is weird, shouldn't it be '0xffffffff'? One
bit flips. and there is no ECC error messages!
> [239986.532230][T31387] 00000020: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
> [239986.532450][T31387] 00000040: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
> [239986.532607][T31387] 00000060: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
> [239986.532732][T31387] 00000080: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
>
> ...
>
> [239986.603283][T31387] 00001000: fffffffe ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
Here is too.
> [239986.603667][T31387] 00001020: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
>
> ...
>
> [239986.707743][T31387] 00001fe0: ffffffff ffffffff ffffffff ffffffff
> ffffffff ffffffff ffffffff ffffffff ................................
>
>
> [239986.707894][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan: LEB
> 3519 scanning failed
> [239986.724625][T31387] UBIFS error (ubi2:0 pid 31387): do_commit:
> commit failed, error -117
> [239986.734335][T31387] UBIFS warning (ubi2:0 pid 31387):
> ubifs_ro_mode.part.0: switched to read-only mode, error -117
> [239986.748276][T31387] CPU: 0 PID: 31387 Comm: sync Kdump: loaded Not
> tainted 6.1.55-axis9-devel #1
> [239986.757327][T31387] Hardware name: Freescale i.MX6 SoloX (Device Tree)
> [239986.764095][T31387] unwind_backtrace from show_stack+0x18/0x1c
> [239986.770208][T31387] show_stack from dump_stack_lvl+0x24/0x2c
> [239986.776215][T31387] dump_stack_lvl from do_commit+0xc0/0x528
> [239986.782167][T31387] do_commit from ubifs_sync_fs+0x84/0x98
> [239986.787991][T31387] ubifs_sync_fs from iterate_supers+0x9c/0x118
> [239986.794268][T31387] iterate_supers from ksys_sync+0x54/0x8c
> [239986.800175][T31387] ksys_sync from sys_sync+0x10/0x18
> [239986.805492][T31387] sys_sync from ret_fast_syscall+0x0/0x64
> [239986.811394][T31387] Exception stack(0xc81b5fa8 to 0xc81b5ff0)
> [239986.817314][T31387] 5fa0: 00000072 be8b5d44
> 00000001 be8b5d44 00000000 004e5299
> [239986.826423][T31387] 5fc0: 00000072 be8b5d44 00000000 00000024
> 004a12cd b6f74ce8 00000000 004f806c
> [239986.835530][T31387] 5fe0: 004f8f14 be8b5bac 004e529f b6ef4e58
>
> Is the above error something you have seen before?
I met this kind of error(corrupt empty space) for several times (both
v4.4 and v5.10), to be honest, I have no idea how it happens. it looks
like that something wrong happens on flash(eg. uncorrected bitfilps).
>
>>>
>>> Another thing, would it not be possible to rescue that particular
>>> worn out device by simply turning fastmap off on that device?
>>>
>>
>> Can I regard the rescuing as making erase counters become normal
>> again(max - min <= UBI_WL_THRESHOLD)? If so, I'm afraid that not all
>> PEBs can be rescued, according to get_peb_for_wl().
>> For example: PB, PC cannot be rescued, unless PA is taken for writing
>> and then wl is just right scheduled.
>>
>> ubi->free tree:
>> 29600(PB)
>> 1(PA) 29600(PC)
>
> I mean that I think that the badly worn device could be made usable
> again by turning off fastmap. I mean would it not work properly? I do
> however understand that the first 64 physical erase blocks would not be
> used in practice since the erase counts of those blocks are very high.
> But would not the filsystem work OK? Or am I missing something?
>
I think the first 64 PEBs could be used when the number of free PEBs
belows 64, for example UBI runs out of space or there are many erasing
works not being executed before getting a free PEB.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-17 13:21 ` Zhihao Cheng
@ 2024-06-17 13:48 ` Rickard x Andersson
2024-06-17 13:55 ` Zhihao Cheng
0 siblings, 1 reply; 11+ messages in thread
From: Rickard x Andersson @ 2024-06-17 13:48 UTC (permalink / raw)
To: Zhihao Cheng, richard, linux-mtd; +Cc: rickard314.andersson
On 6/17/24 15:21, Zhihao Cheng wrote:
> 在 2024/6/17 19:20, Rickard x Andersson 写道:
>> On 6/14/24 14:28, Zhihao Cheng wrote:
>>> 在 2024/6/14 19:42, Rickard X Andersson 写道:
>>>> On 6/4/24 03:52, Zhihao Cheng wrote:
>>>
>>> [...]
>>>>>
>>>>> BTW, after applying the patches, the kernel should run on a new
>>>>> flash, the improved wear-leveling algorithm cannot rescue the worn
>>>>> out image.
>>>>>
>>>>
>>>> Thanks for the patches!
>>>>
>>>> I have backported the patches to Linux kernel 6.1. Do you think the
>>>> patches are safe to apply to Linux kernel 6.1?
>>>
>>> Yes, it's okay. I have backported the patches to our product(kernel
>>> v5.10) and it works fine.
>>
>> Thanks! I backported the patches to Linux 6.1 and did run my own
>> stress test for a few days. (On another device with fresh flash
>> memory.) It seems like the wear of the fastmap physical blocks (0-63)
>> is a lot less now with the patches applied, which is good.
>>
>> However I got this problem after almost 3 days of stress testing (file
>> system is set to read only mode):
>>
>>
>> [ 7885.036577][ T182] ubi2: scrubbed PEB 2904 (LEB 0:229), data moved
>> to PEB 627
>> [83721.724621][ T182] ubi2: scrubbed PEB 983 (LEB 0:3240), data moved
>> to PEB 7
>> [83721.832521][ T182] ubi2: scrubbed PEB 997 (LEB 0:2819), data moved
>> to PEB 5
>> [83784.750714][ T182] ubi2: scrubbed PEB 1927 (LEB 0:10), data moved
>> to PEB 2
>> [165812.657934][ T182] ubi2: scrubbed PEB 3691 (LEB 0:11), data moved
>> to PEB 18
>> [166748.055242][ T182] ubi2: scrubbed PEB 3045 (LEB 0:2), data moved
>> to PEB 837
>> [166834.742451][ T182] ubi2: scrubbed PEB 918 (LEB 0:2), data moved
>> to PEB 43
>
> Looks like that some of PEBs have met the bitflip errors.
One thing that struck me. When looking at the scrubbing being done
above, is it not strange that data is moved from physical PEBs outside
fastmap area into the fastmap area? For example from PEB 997 to PEB 5?
>> [239986.496840][T31387] UBIFS error (ubi2:0 pid 31387): ubifs_scan:
>> corrupt empty space at LEB 3519:101376
>> [239986.506809][T31387] UBIFS error (ubi2:0 pid 31387):
>> ubifs_scanned_corruption: corruption at LEB 3519:101376
>> [239986.519742][T31387] UBIFS error (ubi2:0 pid 31387):
>> ubifs_scanned_corruption: first 8192 bytes from LEB 3519:101376
>> [239986.532052][T31387] 00000000: fffffffe ffffffff ffffffff ffffffff
>> ffffffff ffffffff ffffffff ffffffff ................................
>
> The data content(0xfffffffe) is weird, shouldn't it be '0xffffffff'? One
> bit flips. and there is no ECC error messages!
Yes strange!
>>>> Another thing, would it not be possible to rescue that particular
>>>> worn out device by simply turning fastmap off on that device?
>>>>
>>>
>>> Can I regard the rescuing as making erase counters become normal
>>> again(max - min <= UBI_WL_THRESHOLD)? If so, I'm afraid that not all
>>> PEBs can be rescued, according to get_peb_for_wl().
>>> For example: PB, PC cannot be rescued, unless PA is taken for writing
>>> and then wl is just right scheduled.
>>>
>>> ubi->free tree:
>>> 29600(PB)
>>> 1(PA) 29600(PC)
>>
>> I mean that I think that the badly worn device could be made usable
>> again by turning off fastmap. I mean would it not work properly? I do
>> however understand that the first 64 physical erase blocks would not
>> be used in practice since the erase counts of those blocks are very
>> high. But would not the filsystem work OK? Or am I missing something?
>>
>
> I think the first 64 PEBs could be used when the number of free PEBs
> belows 64, for example UBI runs out of space or there are many erasing
> works not being executed before getting a free PEB.
Ok, I think I understand. The device is probably usable but if the flash
is becoming almost full or if the system is under pressure I could run
inte problems.
Thanks!
/Rickard A.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-17 13:48 ` Rickard x Andersson
@ 2024-06-17 13:55 ` Zhihao Cheng
0 siblings, 0 replies; 11+ messages in thread
From: Zhihao Cheng @ 2024-06-17 13:55 UTC (permalink / raw)
To: Rickard x Andersson, richard, linux-mtd; +Cc: rickard314.andersson
在 2024/6/17 21:48, Rickard x Andersson 写道:
> On 6/17/24 15:21, Zhihao Cheng wrote:
>> 在 2024/6/17 19:20, Rickard x Andersson 写道:
>>> On 6/14/24 14:28, Zhihao Cheng wrote:
>>>> 在 2024/6/14 19:42, Rickard X Andersson 写道:
>>>>> On 6/4/24 03:52, Zhihao Cheng wrote:
>>>>
>>>> [...]
>>>>>>
>>>>>> BTW, after applying the patches, the kernel should run on a new
>>>>>> flash, the improved wear-leveling algorithm cannot rescue the worn
>>>>>> out image.
>>>>>>
>>>>>
>>>>> Thanks for the patches!
>>>>>
>>>>> I have backported the patches to Linux kernel 6.1. Do you think the
>>>>> patches are safe to apply to Linux kernel 6.1?
>>>>
>>>> Yes, it's okay. I have backported the patches to our product(kernel
>>>> v5.10) and it works fine.
>>>
>>> Thanks! I backported the patches to Linux 6.1 and did run my own
>>> stress test for a few days. (On another device with fresh flash
>>> memory.) It seems like the wear of the fastmap physical blocks (0-63)
>>> is a lot less now with the patches applied, which is good.
>>>
>>> However I got this problem after almost 3 days of stress testing
>>> (file system is set to read only mode):
>>>
>>>
>>> [ 7885.036577][ T182] ubi2: scrubbed PEB 2904 (LEB 0:229), data
>>> moved to PEB 627
>>> [83721.724621][ T182] ubi2: scrubbed PEB 983 (LEB 0:3240), data
>>> moved to PEB 7
>>> [83721.832521][ T182] ubi2: scrubbed PEB 997 (LEB 0:2819), data
>>> moved to PEB 5
>>> [83784.750714][ T182] ubi2: scrubbed PEB 1927 (LEB 0:10), data moved
>>> to PEB 2
>>> [165812.657934][ T182] ubi2: scrubbed PEB 3691 (LEB 0:11), data
>>> moved to PEB 18
>>> [166748.055242][ T182] ubi2: scrubbed PEB 3045 (LEB 0:2), data moved
>>> to PEB 837
>>> [166834.742451][ T182] ubi2: scrubbed PEB 918 (LEB 0:2), data moved
>>> to PEB 43
>>
>> Looks like that some of PEBs have met the bitflip errors.
>
> One thing that struck me. When looking at the scrubbing being done
> above, is it not strange that data is moved from physical PEBs outside
> fastmap area into the fastmap area? For example from PEB 997 to PEB 5?
Yes, it is expected. The first 64 PEBs usally have bigger erase counter,
so UBI moves cold data into bigger ec PEB, it is the part of
wear-leveling algorithm.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-04 1:41 ` Zhihao Cheng
2024-06-04 1:52 ` Zhihao Cheng
@ 2024-06-04 6:47 ` Richard Weinberger
2024-06-14 11:45 ` Rickard X Andersson
1 sibling, 1 reply; 11+ messages in thread
From: Richard Weinberger @ 2024-06-04 6:47 UTC (permalink / raw)
To: chengzhihao1; +Cc: Rickard x Andersson, linux-mtd, rickard314 andersson
----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "Rickard x Andersson" <rickaran@axis.com>, "richard" <richard@nod.at>, "linux-mtd" <linux-mtd@lists.infradead.org>
> CC: "rickard314 andersson" <rickard314.andersson@gmail.com>
> Gesendet: Dienstag, 4. Juni 2024 03:41:21
> Betreff: Re: Lots of fastmap writes
> 在 2024/6/3 16:55, Rickard x Andersson 写道:
>> Hi,
>>
>> I have a system running Linux 5.10 which logs quite a lot to a database.
>> The system has been running OK since before Christmas but now it usually
>> fails after a few hours with errors like these:
>>
>> May 6 22:29:58 172.26.203.90 warning ubi2 warning: ubi_io_read: error
>> -74 (ECC error) while reading 58 bytes from PEB 1:230872, read only 58
>> bytes, retry
>> May 7 00:11:08 172.26.203.90 warning ubi2 warning: ubi_io_read: error
>> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read only 58
>> bytes, retry
>> May 7 00:11:08 172.26.203.90 err ubi2 error: ubi_io_read: error
>> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read 58 bytes
>>
>> Fastmap is used on this system. The ECC errors are usually in the
>> fastmap area, erase blocks 0- 63.
>>
>> When looking more closely at the erase counters they look something like
>> this:
>>
>> 0 - 63: 29600
>> 64 - 2043: 2200
Are all of the first 64 LEBs worn out that badly or just one?
> Try this series of patches
> https://lore.kernel.org/linux-mtd/20230812080005.3162125-2-chengzhihao1@huawei.com/T/
>>
>> It seems like 30 % of the writes are writes to the fastmap area. Any
>> ideas of what can cause this many writes to the fastmap area? Heavy load?
>>
>> Any ideas are welcome.
>>
>> nand: device found, Manufacturer ID: 0x98, Chip ID: 0xdc
>> nand: Toshiba TC58NVG2S0H 4G 3.3V 8-bit
>> nand: 512 MiB, SLC, erase size: 256 KiB, page size: 4096, OOB size: 256
>> nand: 2 chips detected
>>
>> UBIFS (ubi2:0): Mounting in authenticated mode
>> UBIFS (ubi2:0): background thread "ubifs_bgt2_0" started, PID 187
>> UBIFS (ubi2:0): UBIFS: mounted UBI device 2, volume 0, name "data_volume"
>> UBIFS (ubi2:0): LEB size: 253952 bytes (248 KiB), min./max. I/O unit
>> sizes: 4096 bytes/4096 bytes
>> UBIFS (ubi2:0): FS size: 505110528 bytes (481 MiB, 1989 LEBs), journal
>> size 25395200 bytes (24 MiB, 100 LEBs)
>> UBIFS (ubi2:0): reserved for root: 4676575 bytes (4566 KiB)
>> UBIFS (ubi2:0): media format: w5/r0 (latest is w5/r0), UUID
>> 990A460D-A55E-4B58-ACAD-01FEBC7AF839, small LPT model
>>
>> ubi2: default fastmap pool size: 100
>> ubi2: default fastmap WL pool size: 50
>> ubi2: attaching mtd5
>> ubi2: attached by fastmap
>> ubi2: fastmap pool size: 100
>> ubi2: fastmap WL pool size: 50
As Zhihao Cheng said, his patch series might help.
Making the fastmap pools larger is also an option.
With larger pools, fastmap needs write less often, but scan
more at attach time.
Thanks,
//richard
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Lots of fastmap writes
2024-06-04 6:47 ` Richard Weinberger
@ 2024-06-14 11:45 ` Rickard X Andersson
0 siblings, 0 replies; 11+ messages in thread
From: Rickard X Andersson @ 2024-06-14 11:45 UTC (permalink / raw)
To: Richard Weinberger, chengzhihao1; +Cc: linux-mtd, rickard314 andersson
On 6/4/24 08:47, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> An: "Rickard x Andersson" <rickaran@axis.com>, "richard" <richard@nod.at>, "linux-mtd" <linux-mtd@lists.infradead.org>
>> CC: "rickard314 andersson" <rickard314.andersson@gmail.com>
>> Gesendet: Dienstag, 4. Juni 2024 03:41:21
>> Betreff: Re: Lots of fastmap writes
>
>> 在 2024/6/3 16:55, Rickard x Andersson 写道:
>>> Hi,
>>>
>>> I have a system running Linux 5.10 which logs quite a lot to a database.
>>> The system has been running OK since before Christmas but now it usually
>>> fails after a few hours with errors like these:
>>>
>>> May 6 22:29:58 172.26.203.90 warning ubi2 warning: ubi_io_read: error
>>> -74 (ECC error) while reading 58 bytes from PEB 1:230872, read only 58
>>> bytes, retry
>>> May 7 00:11:08 172.26.203.90 warning ubi2 warning: ubi_io_read: error
>>> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read only 58
>>> bytes, retry
>>> May 7 00:11:08 172.26.203.90 err ubi2 error: ubi_io_read: error
>>> -74 (ECC error) while reading 58 bytes from PEB 40:239752, read 58 bytes
>>>
>>> Fastmap is used on this system. The ECC errors are usually in the
>>> fastmap area, erase blocks 0- 63.
>>>
>>> When looking more closely at the erase counters they look something like
>>> this:
>>>
>>> 0 - 63: 29600
>>> 64 - 2043: 2200
>
> Are all of the first 64 LEBs worn out that badly or just one?
All of the first 64 PEBs have approximately the same erase counters, i.e
29600.
Best regards,
Rickard
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-06-17 13:55 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 8:55 Lots of fastmap writes Rickard x Andersson
2024-06-04 1:41 ` Zhihao Cheng
2024-06-04 1:52 ` Zhihao Cheng
2024-06-14 11:42 ` Rickard X Andersson
2024-06-14 12:28 ` Zhihao Cheng
2024-06-17 11:20 ` Rickard x Andersson
2024-06-17 13:21 ` Zhihao Cheng
2024-06-17 13:48 ` Rickard x Andersson
2024-06-17 13:55 ` Zhihao Cheng
2024-06-04 6:47 ` Richard Weinberger
2024-06-14 11:45 ` Rickard X Andersson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox