From: Richard Weinberger <richard@nod.at>
To: u-boot@lists.denx.de
Subject: [U-Boot] UBI fixable bit-flip issue
Date: Thu, 12 Jul 2018 10:46:11 +0200 [thread overview]
Message-ID: <14079607.ILCUqeWBoJ@blindfold> (raw)
In-Reply-To: <6079a07f-b819-ed81-6d2c-58ae38629595@denx.de>
Mark,
Am Donnerstag, 12. Juli 2018, 07:22:13 CEST schrieb Heiko Schocher:
> Hello Mark,
>
> added Richard Weinberger to cc...
>
> Am 12.07.2018 um 02:28 schrieb Mark Spieth:
> > Hi
> >
> > In the process of investigating a boot failure on one of our devices, the
> >
> > UBI: fixable bit-flip detected at PEB
> >
> > message was seen with the following behaviour during kernel load in u-boot.
> >
> > Read [2285568] bytes
> > UBI: fixable bit-flip detected at PEB 415
> > UBI: schedule PEB 415 for scrubbing
> > UBI: fixable bit-flip detected at PEB 415
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: schedule PEB 419 for scrubbing
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: schedule PEB 420 for scrubbing
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: fixable bit-flip detected at PEB 419
> > UBI: fixable bit-flip detected at PEB 420
> > UBI: fixable bit-flip detected at PEB 419
> >
> > This repeats until reset.
Do you see the same symptom also on Linux?
We need to be very sure that it is actually a UBI problem.
> > This fix is not a root cause fix though. Investigating further led to the following root cause
> > solution. The following is AFAICT.
> >
> > When the scrubber chooses a PEB to move the from the free balanced tree. This tree is sorted by EC
> > (erase count) and then by PEB number.
> >
> > The find_wl_entry call uses a max parameter of WL_FREE_MAX_DIFF which is 8192 in this config. So the
> > find_wl_entry function will find a PEB that is better in error count that the current PEB EC. This
error count? You mean erase count?
> > can easily cause it to find the PEB that was just moved from if it is the lowest numbered PEB in the
> > free tree. Waiting for EC to go above 8192 would take a long time and cause premature aging of the
> > flash PEBs in question.
> >
> > The easy solution is to change the max parameter to this call to 0 so it finds a PEB with a smaller
> > EC than the one being replaced. This means it wont use the previously discarded PEB as its first
> > choice.
For scrubbing this might be a good idea, but not for regular wear-leveling.
See comment in UBI:
/*
* When a physical eraseblock is moved, the WL sub-system has to pick the target
* physical eraseblock to move to. The simplest way would be just to pick the
* one with the highest erase counter. But in certain workloads this could lead
* to an unlimited wear of one or few physical eraseblock. Indeed, imagine a
* situation when the picked physical eraseblock is constantly erased after the
* data is written to it. So, we have a constant which limits the highest erase
* counter of the free physical eraseblock to pick. Namely, the WL sub-system
* does not pick eraseblocks with erase counter greater than the lowest erase
* counter plus %WL_FREE_MAX_DIFF.
*/
#define WL_FREE_MAX_DIFF (2*UBI_WL_THRESHOLD)
So we could change the logic such that for regular wear-leveling we keep using WL_FREE_MAX_DIFF,
but for scrubbing (which is 1:1 wear-leveling but the source PEB is showing bit-flips) we use
a lower value. IMHO WL_FREE_MAX_DIFF/2 would be a good choice.
I'm not sure whether 0 is too extreme and might cause other distortions.
Mark, can you please file a patch and send it to linux-mtd mailing list?
Such a change needs to go through Linux and then to u-boot.
But first we need to think about and discuss it in detail.
> I am not sure if it is so easy ...
>
> > This fix was implemented and fixable bit-flip errors no longer hang/freeze the boot process! UBI
> > erase and reformat was used between re-tests to get consistent results.
> >
> > Adding the above 75% correctable bitflip threshold is also a good thing as less movement will ensue
> > when the FLASH is new, but as the flash ages, the root cause will once again be invoked causing
> > un-recoverable boot failures.
> >
> > Note this fault is also in the latest kernel drivers for UBI and may also exist in other wear
> > leveling implementations. The kernel driver issue may be at fault for android devices locking
> > up/freezing sporadically during FLASH read when scrubbing due to a relatively full flash and
> > correctable errors causing ping pong PEB moves.
> >
> > The question is, is my root cause solution sound or have I missed something?
>
> I have to think about, before I write nonsene, but may Richard has
> here a deeper insight.
Please see my comments. :)
Thanks,
//richard
next prev parent reply other threads:[~2018-07-12 8:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-12 0:28 [U-Boot] UBI fixable bit-flip issue Mark Spieth
2018-07-12 5:22 ` Heiko Schocher
2018-07-12 5:38 ` Mark Spieth
2018-07-12 8:08 ` Heiko Schocher
2018-07-12 8:46 ` Richard Weinberger [this message]
2018-07-12 9:50 ` Mark Spieth
2018-07-12 14:03 ` Mark Spieth
2018-08-16 8:50 ` Richard Weinberger
2018-08-16 23:31 ` Mark Spieth
-- strict thread matches above, loose matches on Subject: below --
2012-12-14 18:03 [U-Boot] UBI Fixable " Vikram Narayanan
2012-12-15 3:14 ` Vikram Narayanan
2012-12-17 8:44 ` Holger Brunck
2012-12-17 18:00 ` Vikram Narayanan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14079607.ILCUqeWBoJ@blindfold \
--to=richard@nod.at \
--cc=u-boot@lists.denx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox