linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Scott Branden <sbranden@broadcom.com>
To: Richard Weinberger <richard@nod.at>,
	Richard Weinberger <richard.weinberger@gmail.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>
Subject: Re: suspect UBIFS async operations causing issues during reboot
Date: Wed, 5 Nov 2014 14:52:27 -0800	[thread overview]
Message-ID: <545AAA2B.8090007@broadcom.com> (raw)
In-Reply-To: <545A6AA0.8050901@nod.at>

On 14-11-05 10:21 AM, Richard Weinberger wrote:
> Hi!
>
> Am 05.11.2014 um 18:56 schrieb Scott Branden:
>> Hi Richard,
>>
>> Thanks for the feedback.  Comments inline.
>>
>> On 14-11-05 01:22 AM, Richard Weinberger wrote:
>>> On Wed, Nov 5, 2014 at 9:32 AM, Scott Branden <sbranden@broadcom.com> wrote:
>>>> We are doing reboot testing with UBIFS on the 3.10 kernel with a new chipset
>>>> we are working on.
>>>>
>>>> Over 1000's of reboots we eventually find that the NAND has uncorrectable
>>>> ECC errors reported on a random page when it is mounted.
>>>>
>>>> We have found the problem is that a NAND erase operation is in progress when
>>>> the reboot occurs. Since the NAND is in the middle of the erase operation
>>>> the page is mostly FF with some random bits not erased when the reboot
>>>> occurs.
>>>>
>>>> We suspect the problem is the asynchronous nature of the UBIFS operations.
>>>> Perhaps the small write buffer that can take 3-5 seconds to be written or
>>>> some other operation occuring in UBI/UBIFS?  I don't think the shutdown of
>>>> the filesystem is dealing with all the threads properly.
>>>
>>> And what about powercuts?
>> powercuts would exhibit the exact same behaviour as we are observing: the erase is interrupted by loss of power so the NAND block being erased would be in a partially erased
>> state.  powercuts have little to do with the reboot sequence I am describing.
>>
>>> UBI/UBIFS was designed to survive powercuts.
>> Yes, this does not cause UBIFS to fail to survive the powercut.  It does cause blocks to not be erased properly.
>
> Makes sense.
>
>> The block that didn't finish to erase is uncorrectable on next boot-up:
>>
>> [    1.330000] UBI: attaching mtd7 to ubi0
>> [    2.000000] iproc_nand 18046000.nand: uncorrectable error at 0x18700000
>>
>> This issue is this blocks shouldn't be corrupted in the first place if UBI/UBIFS shut downs properly.
>>
>>> If your NAND shows strange issues even after a clean reboot something nasty is
>>> going on. Does your driver pass all UBI/MTD test?
>>>
>> We are in the process of running the MTD tests.  But this appears to have nothing to do with a buggy driver or not.  The NAND driver will do what it is told to do.  If it is told
>> to erase a block it will erase a block.  It can't control if the system reboots in the middle of this operation?
>>
>> This appears to be a UBI/UBIFS issue.  UBI/UBIFS operations are still going on after the filesystem in unmounted.  The shutdown process completes and a reboot happens.  My guess is
>> these operations are due to the asynchronous threads of UBI/UBIFS not being handled properly during the shutdown process?
>>
>> I have found other people have reported unexplained flash corruption. We back ported this to the 3.10 kernel which solved most of the flash corruption issues:
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/super.c?id=807612db2f9940b9fa6deaef054eb16d51bd3e00
>>
>> This only remaining flash corruption issue is due to the described issue of reboot happening in the middle of an erase cycle.
>
> You can verify your hypothesis easily. Add a printk() to ubi_detach_mtd_dev(). This function shuts down UBI and also the background thread which does
> all erase work.
Hi Richard,

The printk never happens.

I only find ubi_detach_mtd_dev can be called by ubi_exit.   But ubi_exit 
is only called if it is a module...

static void __exit ubi_exit(void)
{
	int i;

	for (i = 0; i < UBI_MAX_DEVICES; i++)
		if (ubi_devices[i]) {
			mutex_lock(&ubi_devices_mutex);
			ubi_detach_mtd_dev(ubi_devices[i]->ubi_num, 1);
			mutex_unlock(&ubi_devices_mutex);
		}
	ubi_debugfs_exit();
	kmem_cache_destroy(ubi_wl_entry_slab);
	misc_deregister(&ubi_ctrl_cdev);
	class_remove_file(ubi_class, &ubi_version);
	class_destroy(ubi_class);
}
module_exit(ubi_exit);

>
> Thanks,
> //richard
>

  reply	other threads:[~2014-11-05 22:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-05  8:32 suspect UBIFS async operations causing issues during reboot Scott Branden
2014-11-05  9:22 ` Richard Weinberger
2014-11-05 17:56   ` Scott Branden
2014-11-05 18:21     ` Richard Weinberger
2014-11-05 22:52       ` Scott Branden [this message]
2014-11-06 21:56         ` Scott Branden
2014-11-07  8:45           ` Richard Weinberger
2014-11-07 17:31             ` Scott Branden
2014-11-09 10:20               ` Richard Weinberger
2014-11-10  5:10                 ` Scott Branden
2014-11-26  8:17                   ` Brian Norris
2014-11-26  8:30                     ` Richard Weinberger
2014-11-26  9:25                       ` Brian Norris
2014-11-27 19:07                     ` Scott Branden
2014-11-10  8:44                 ` Ricard Wanderlof
2014-11-10  9:08                   ` Richard Weinberger
2014-11-10  7:44         ` Tanya Brokhman
2014-11-12 11:20 ` Artem Bityutskiy
2014-11-15  3:30   ` Scott Branden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=545AAA2B.8090007@broadcom.com \
    --to=sbranden@broadcom.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard.weinberger@gmail.com \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).