suspect UBIFS async operations causing issues during reboot

* suspect UBIFS async operations causing issues during reboot
@ 2014-11-05  8:32 Scott Branden
  2014-11-05  9:22 ` Richard Weinberger
  2014-11-12 11:20 ` Artem Bityutskiy
  0 siblings, 2 replies; 19+ messages in thread
From: Scott Branden @ 2014-11-05  8:32 UTC (permalink / raw)
  To: linux-mtd

We are doing reboot testing with UBIFS on the 3.10 kernel with a new 
chipset we are working on.

Over 1000's of reboots we eventually find that the NAND has 
uncorrectable ECC errors reported on a random page when it is mounted.

We have found the problem is that a NAND erase operation is in progress 
when the reboot occurs. Since the NAND is in the middle of the erase 
operation the page is mostly FF with some random bits not erased when 
the reboot occurs.

We suspect the problem is the asynchronous nature of the UBIFS 
operations.  Perhaps the small write buffer that can take 3-5 seconds to 
be written or some other operation occuring in UBI/UBIFS?  I don't think 
the shutdown of the filesystem is dealing with all the threads properly.

Log below with printks adding in iproc_nand driver showing erase 
operations in progress when "Restarting system." happens.

Stopped Setup Virtual Console.
Stopping Apply Kernel Variables...
Stopped Apply Kernel Variables.
Starting Notify Audit System and Update UTMP about System Shutdown...
Stopping Runtime Directory...
Stopping Remount API VFS...
Stopped Remount API VFS.
Stopping Remount Root FS...
Stopped Remount Root FS.
Stopping Collect Read-Ahead Data...
Stopped Collect Read-Ahead Data.
Stopping Media Directory...[   18.370000] systemd[1]: Unit 
systemd-readahead-collect.service entered failed state.

Started Console System Reboot Logging.
Stopped Runtime Directory.
Stopped Media Directory.
[   18.490000] systemd[1]: Shutting down.
Sending SIGTERM to remaining processes...
Sending SIGKILL to remaining processes...
Unmounting file systems.
[   18.530000] iproc_nand_cmdfunc: cmd 0x60 addr 0x14a40000
[   18.540000] iproc_nand_waitfunc: native cmd 8 intfc status 0xc00000e0
[   18.550000] UBIFS: background thread "ubifs_bgt0_0" stops
Disabling swaps.
Detaching loop devices.
Detaching DM devices.
[   18.560000] iproc_nand_cmdfunc: cmd 0x60 addr 0x18680000
[   18.570000] iproc_nand_waitfunc: native cmd 8 intfc status 0xc00000e0
[   18.580000] Restarting system.
[   18.580000] iproc_nand_cmdfunc: cmd 0x60 addr 0x18700000

<REBOOT happens here with NAND ERASE COMMAND in progress corrupting 
0x18700000 NAND Addresses!>  Corrupted NAND only happens when erase 
operation in progress when restarting system happens.

^ permalink raw reply	[flat|nested] 19+ messages in thread