linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* UBIFS corruption after software upgrade
@ 2012-07-24 21:02 Russell Zuck
  2012-08-01 12:40 ` Russell Zuck
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Russell Zuck @ 2012-07-24 21:02 UTC (permalink / raw)
  To: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 3991 bytes --]

I am attempting to troubleshoot an issue where the root partition 
(ubi0_0) of my embedded platform becomes corrupted after a few upgrades 
of our system software.

Upgrade Process:
1. upgrade daemon checks for upgrades, from a file server, periodically.
2. If available, the upgrade daemon retrieves the upgrade package (just 
a tarball) and unpacks it.  The tarball contains several install 
scripts, a version file, and another tarball that contains all of our 
system software, required config files, and lib dependencies.
3. The upgrade daemon is triggered to deploy the upgrade package.
4. Deploy
	a. run pre-install script
	b. run install script. This just unpacks the system software tarball 
onto the existing root filesystem.
	c. run the post-install script
	d. cleanup the upgrade staging directory
	e. reboot

Upgrade Process Variations:
1. Terminate all system software before deploying the upgrade package
2. sync after unpacking the system software package
3. sync again before rebooting

I have attached a verbose log of the console output during the upgrades. 
  After the 2nd system reboot, UBIFS is throwing invalid inode errors.

Please see the section that follows for system details.  I have not yet 
attempted to reproduce this issue using nandsim nor do I have a simple 
reproducer for the problem yet.  Please provide some ideas about how to 
simulate the reboot using nandsim as I think the reboot portion of this 
scenario will be critical to reproducing the issue.

System Information
++++++++++++++++++
Kernel Base: 2.6.35

UBIFS Patches: most recent patches applied from 
git://git.infradead.org/~dedekind/ubifs-v2.6.35.git 3 days ago.

MTD Tests: all test run without error with the exception of 
mtd_torturetest which I didn't run

UBIFS extra self-checks: I mounted the debugfs but don't know how to 
make use of of the self-checks.

UBIFS debug prints: I enabled UBI and UBIFS verbose debugging

root filesystem: ubi0_0, the mtd3 partition, is mounted with -o sync

mtdinfo:
root@ctc-imx51 ~$ mtdinfo -a
Count of MTD devices:           4
Present MTD devices:            mtd0, mtd1, mtd2, mtd3
Sysfs interface supported:      yes

mtd0
Name:                           nor_flash
Type:                           dataflash
Eraseblock size:                512 bytes
Amount of eraseblocks:          8192 (4194304 bytes, 4.0 MiB)
Minimum input/output unit size: 512 bytes
Sub-page size:                  512 bytes
Character device major/minor:   90:0
Bad blocks are allowed:         false
Device is writable:             true

mtd1
Name:                           nand.bootloader
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          8 (1048576 bytes, 1024.0 KiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  512 bytes
OOB size:                       64 bytes
Character device major/minor:   90:2
Bad blocks are allowed:         true
Device is writable:             true

mtd2
Name:                           nand.kernel
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          24 (3145728 bytes, 3.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  512 bytes
OOB size:                       64 bytes
Character device major/minor:   90:4
Bad blocks are allowed:         true
Device is writable:             true

mtd3
Name:                           nand.rootfs
Type:                           nand
Eraseblock size:                131072 bytes, 128.0 KiB
Amount of eraseblocks:          4064 (532676608 bytes, 508.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size:                  512 bytes
OOB size:                       64 bytes
Character device major/minor:   90:6
Bad blocks are allowed:         true
Device is writable:             true

Any assistance you can provide with this issue will be greatly appreciated.

Best regards,

Russell Zuck

[-- Attachment #2: toggle_upgrade_post_patch_verbose.log.tar.gz --]
[-- Type: application/gzip, Size: 147730 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-07-24 21:02 UBIFS corruption after software upgrade Russell Zuck
@ 2012-08-01 12:40 ` Russell Zuck
  2012-08-02  2:10   ` Subodh Nijsure
                     ` (2 more replies)
  2012-08-22 13:02 ` Artem Bityutskiy
  2012-08-31 11:30 ` Artem Bityutskiy
  2 siblings, 3 replies; 10+ messages in thread
From: Russell Zuck @ 2012-08-01 12:40 UTC (permalink / raw)
  To: linux-mtd

On 7/24/2012 5:02 PM, Russell Zuck wrote:
> I have not yet attempted to reproduce this issue using nandsim nor do I
> have a simple reproducer for the problem yet.  Please provide some ideas
> about how to simulate the reboot using nandsim as I think the reboot
> portion of this scenario will be critical to reproducing the issue.

Can anyone direct me to some information that might help me simulate a 
reboot using nandsim to attempt to reproduce my UBIFS corruption issue?

Additionally, I would really appreciate any suggestions, tips, tricks, 
criticisms of the upgrade process I outlined previously, or "Have you 
considered this" type questions.  I've read everything I could find that 
even remotely pertains to my scenario but nothing is clicking for me 
yet.  I'm desperately trying to resolve my issue.  Any guidance would be 
welcome.

Best regards,

Russ

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-08-01 12:40 ` Russell Zuck
@ 2012-08-02  2:10   ` Subodh Nijsure
  2012-08-22 13:18   ` Artem Bityutskiy
  2012-08-22 13:21   ` Artem Bityutskiy
  2 siblings, 0 replies; 10+ messages in thread
From: Subodh Nijsure @ 2012-08-02  2:10 UTC (permalink / raw)
  To: linux-mtd

On 08/01/2012 05:40 AM, Russell Zuck wrote:
> On 7/24/2012 5:02 PM, Russell Zuck wrote:
>> I have not yet attempted to reproduce this issue using nandsim nor do I
>> have a simple reproducer for the problem yet.  Please provide some ideas
>> about how to simulate the reboot using nandsim as I think the reboot
>> portion of this scenario will be critical to reproducing the issue.
>
> Can anyone direct me to some information that might help me simulate a 
> reboot using nandsim to attempt to reproduce my UBIFS corruption issue?
>
> Additionally, I would really appreciate any suggestions, tips, tricks, 
> criticisms of the upgrade process I outlined previously, or "Have you 
> considered this" type questions.  I've read everything I could find 
> that even remotely pertains to my scenario but nothing is clicking for 
> me yet.  I'm desperately trying to resolve my issue.  Any guidance 
> would be welcome.

Some of the things I have done/considered to minimize issues  with UBIFS 
getting corrupted.

. Partition your UBIFS such that your run-time software ('os') is on one 
UBIFS and that is always mounted read-only.
. Create independent UBIFS (data) partition where your applications 
read/write data. If you can't mount this data partition at startup, have 
tools around in first partition that can recreate this 'data' partition 
and may be recreate contents of 'data' partition with default values. 
Might consider mounting this data partition with -o sync option.
. Use stuff like inotify to verify that your applications are not using 
flash as scratch storage to store intermediate data, that could be more 
easily assembled in tmpfs and then copied to flash.


-Subodh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-07-24 21:02 UBIFS corruption after software upgrade Russell Zuck
  2012-08-01 12:40 ` Russell Zuck
@ 2012-08-22 13:02 ` Artem Bityutskiy
  2012-08-31 11:30 ` Artem Bityutskiy
  2 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2012-08-22 13:02 UTC (permalink / raw)
  To: Russell Zuck; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On Tue, 2012-07-24 at 17:02 -0400, Russell Zuck wrote:
> I am attempting to troubleshoot an issue where the root partition 
> (ubi0_0) of my embedded platform becomes corrupted after a few upgrades 
> of our system software.
> 
> Upgrade Process:
> 1. upgrade daemon checks for upgrades, from a file server, periodically.
> 2. If available, the upgrade daemon retrieves the upgrade package (just 
> a tarball) and unpacks it.  The tarball contains several install 
> scripts, a version file, and another tarball that contains all of our 
> system software, required config files, and lib dependencies.
> 3. The upgrade daemon is triggered to deploy the upgrade package.
> 4. Deploy
> 	a. run pre-install script
> 	b. run install script. This just unpacks the system software tarball 
> onto the existing root filesystem.
> 	c. run the post-install script
> 	d. cleanup the upgrade staging directory
> 	e. reboot
> 
> Upgrade Process Variations:
> 1. Terminate all system software before deploying the upgrade package
> 2. sync after unpacking the system software package
> 3. sync again before rebooting

Well, I do not see anything which could corrupt the file-system from the
description.

> UBIFS extra self-checks: I mounted the debugfs but don't know how to 
> make use of of the self-checks.

You just need to enable them by writing 1 to the corresponding files.
See http://www.linux-mtd.infradead.org/faq/ubifs.html#L_how_debug

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-08-01 12:40 ` Russell Zuck
  2012-08-02  2:10   ` Subodh Nijsure
@ 2012-08-22 13:18   ` Artem Bityutskiy
  2012-08-24 17:55     ` Russell Zuck
  2012-08-22 13:21   ` Artem Bityutskiy
  2 siblings, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2012-08-22 13:18 UTC (permalink / raw)
  To: Russell Zuck; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

On Wed, 2012-08-01 at 08:40 -0400, Russell Zuck wrote:
> On 7/24/2012 5:02 PM, Russell Zuck wrote:
> > I have not yet attempted to reproduce this issue using nandsim nor do I
> > have a simple reproducer for the problem yet.  Please provide some ideas
> > about how to simulate the reboot using nandsim as I think the reboot
> > portion of this scenario will be critical to reproducing the issue.
> 
> Can anyone direct me to some information that might help me simulate a 
> reboot using nandsim to attempt to reproduce my UBIFS corruption issue?
> 
> Additionally, I would really appreciate any suggestions, tips, tricks, 
> criticisms of the upgrade process I outlined previously, or "Have you 
> considered this" type questions.  I've read everything I could find that 
> even remotely pertains to my scenario but nothing is clicking for me 
> yet.  I'm desperately trying to resolve my issue.  Any guidance would be 
> welcome.

Do you have a kernel which worked for you? One strategy would be to
bisect or look at the differences.

WRT power-cut testing. Could you try your kernel + nandsim. Just do the
following:

$ modprobe nandsim first_id_byte=0x20 second_id_byte=0xac third_id_byte=0x00 fourth_id_byte=0x15
$ modprobe ubi mtd=0
$ ubimkvol /dev/ubi0 --lebs 4064 -N test_fs
$ mount -t ubifs /dev/ubi0_0 /mnt/ubifs

You should end up with something close to your system.

Please, then try to do:

$ echo 1 > /sys/kernel/debug/ubifs/tst_recovery
$ cd $HOME/git/mtd-utils/tests/fs-tests/integrity
$ ./integck -p /mnt/ubifs

This should start recovery testing. Does it work?

The "integck" test is part of the mtd-utils repository.

But from your description it looks like something related to your
system, not UBIFS.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-08-01 12:40 ` Russell Zuck
  2012-08-02  2:10   ` Subodh Nijsure
  2012-08-22 13:18   ` Artem Bityutskiy
@ 2012-08-22 13:21   ` Artem Bityutskiy
  2 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2012-08-22 13:21 UTC (permalink / raw)
  To: Russell Zuck; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]

On Wed, 2012-08-01 at 08:40 -0400, Russell Zuck wrote:
> On 7/24/2012 5:02 PM, Russell Zuck wrote:
> > I have not yet attempted to reproduce this issue using nandsim nor do I
> > have a simple reproducer for the problem yet.  Please provide some ideas
> > about how to simulate the reboot using nandsim as I think the reboot
> > portion of this scenario will be critical to reproducing the issue.
> 
> Can anyone direct me to some information that might help me simulate a 
> reboot using nandsim to attempt to reproduce my UBIFS corruption issue?
> 
> Additionally, I would really appreciate any suggestions, tips, tricks, 
> criticisms of the upgrade process I outlined previously, or "Have you 
> considered this" type questions.  I've read everything I could find that 
> even remotely pertains to my scenario but nothing is clicking for me 
> yet.  I'm desperately trying to resolve my issue.  Any guidance would be 
> welcome.

I'd first forget about your update scheme and just stress-tested UBIFS.
Mount it, try testing it using the integck test - does it work? E.g.,
run 'integck -n0' and let it run for hours or overnight. If that is ok,
try to power-cut the device while integck is working. Does it mount
after reboot? Can you remove the integck working files and re-run it?
Does it work? Try to repeat power cut several times.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-08-22 13:18   ` Artem Bityutskiy
@ 2012-08-24 17:55     ` Russell Zuck
  2012-08-27 16:17       ` Russell Zuck
  2012-08-31 11:21       ` Artem Bityutskiy
  0 siblings, 2 replies; 10+ messages in thread
From: Russell Zuck @ 2012-08-24 17:55 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

On 8/22/2012 9:18 AM, Artem Bityutskiy wrote:
> Do you have a kernel which worked for you? One strategy would be to
> bisect or look at the differences.
I don't have a kernel configuration for which my upgrade scenario has 
not eventually produced files system errors.

> WRT power-cut testing. Could you try your kernel + nandsim. Just do the
> following:
>
> $ modprobe nandsim first_id_byte=0x20 second_id_byte=0xac third_id_byte=0x00 fourth_id_byte=0x15
> $ modprobe ubi mtd=0
> $ ubimkvol /dev/ubi0 --lebs 4064 -N test_fs
> $ mount -t ubifs /dev/ubi0_0 /mnt/ubifs
>
> You should end up with something close to your system.
>
> Please, then try to do:
>
> $ echo 1 > /sys/kernel/debug/ubifs/tst_recovery
> $ cd $HOME/git/mtd-utils/tests/fs-tests/integrity
> $ ./integck -p /mnt/ubifs
>
> This should start recovery testing. Does it work?
I've made sure to apply the latest patches from the 2.6.35 backport repo 
and am using the latest from the mtd-utils repo.

$ ./integck -p /mnt/ubifs

..fails soon after I run the test.  There is a ton of logged data.  I've 
attempted to snip out the first error.

=============== snip =====================
[  375.323163] UBIFS warning (pid 2103): power_cut_emulated: failing 
after 2877 calls
[  375.331523] UBIFS: un-mount UBI device 1, volume 0
[  375.336416] UBIFS DBG (pid 2070): ubifs_bg_thread: background thread 
"ubifs_bgt1_0" stops
[  375.352677] UBIFS: mounted UBI device 1, volume 0, name "test_fs"
[  375.358919] UBIFS: mounted read-only
[  375.362523] UBIFS: file system size:   520740864 bytes (508536 KiB, 
496 MiB, 4036 LEBs)
[  375.370590] UBIFS: journal size:       26062848 bytes (25452 KiB, 24 
MiB, 202 LEBs)
[  375.378299] UBIFS: media format:       w4/r0 (latest is w4/r0)
[  375.384174] UBIFS: default compressor: lzo
[  375.388293] UBIFS: reserved for root:  4952683 bytes (4836 KiB)
[  375.394257] UBIFS DBG (pid 2103): mount_ubifs: compiled on: 
Aug 23 2012 at 11:30:24
[  375.402652] UBIFS DBG (pid 2103): mount_ubifs: min. I/O unit size: 
2048 bytes
[  375.409925] UBIFS DBG (pid 2103): mount_ubifs: max. write size: 
2048 bytes
[  375.417202] UBIFS DBG (pid 2103): mount_ubifs: LEB size: 
129024 bytes (126 KiB)
[  375.425521] UBIFS DBG (pid 2103): mount_ubifs: data journal heads:  1
[  375.431999] UBIFS DBG (pid 2103): mount_ubifs: UUID: 
83D4BFF8-5B39-40B1-9A86-430C0E45852A
[  375.441539] UBIFS DBG (pid 2103): mount_ubifs: big_lpt              0
[  375.448027] UBIFS DBG (pid 2103): mount_ubifs: log LEBs:            9 
(3 - 11)
[  375.455303] UBIFS DBG (pid 2103): mount_ubifs: LPT area LEBs:       2 
(12 - 13)
[  375.462652] UBIFS DBG (pid 2103): mount_ubifs: orphan area LEBs:    2 
(14 - 15)
[  375.470014] UBIFS DBG (pid 2103): mount_ubifs: main area LEBs: 
4036 (16 - 4051)
[  375.477811] UBIFS DBG (pid 2103): mount_ubifs: index LEBs:          1
[  375.484299] UBIFS DBG (pid 2103): mount_ubifs: total index bytes: 
48 (0 KiB, 0 MiB)
[  375.492169] UBIFS DBG (pid 2103): mount_ubifs: key hash type:       0
[  375.498655] UBIFS DBG (pid 2103): mount_ubifs: tree fanout:         8
[  375.505141] UBIFS DBG (pid 2103): mount_ubifs: reserved GC LEB:     18
[  375.511701] UBIFS DBG (pid 2103): mount_ubifs: first main LEB:      16
[  375.518273] UBIFS DBG (pid 2103): mount_ubifs: max. znode size      240
[  375.524938] UBIFS DBG (pid 2103): mount_ubifs: max. index node size 192
[  375.531587] UBIFS DBG (pid 2103): mount_ubifs: node sizes: 
data 48, inode 160, dentry 56
[  375.540518] UBIFS DBG (pid 2103): mount_ubifs: node sizes: 
trun 56, sb 4096, master 512
[  375.549363] UBIFS DBG (pid 2103): mount_ubifs: node sizes: 
ref 64, cmt. start 32, orph 32
[  375.558382] UBIFS DBG (pid 2103): mount_ubifs: max. node sizes: 
data 4144, inode 4256 dentry 312, idx 188
[  375.568358] UBIFS DBG (pid 2103): mount_ubifs: dead watermark:      2048
[  375.575106] UBIFS DBG (pid 2103): mount_ubifs: dark watermark:      6144
[  375.581840] UBIFS DBG (pid 2103): mount_ubifs: LEB overhead:        560
[  375.588502] UBIFS DBG (pid 2103): mount_ubifs: max. dark space: 
24797184 (24216 KiB, 23 MiB)
[  375.597347] UBIFS DBG (pid 2103): mount_ubifs: maximum bud bytes: 
24901632 (24318 KiB, 23 MiB)
[  375.606193] UBIFS DBG (pid 2103): mount_ubifs: BG commit bud bytes: 
20232576 (19758 KiB, 19 MiB)
[  375.615036] UBIFS DBG (pid 2103): mount_ubifs: current bud bytes    0 
(0 KiB, 0 MiB)
[  375.622820] UBIFS DBG (pid 2103): mount_ubifs: max. seq. number:    11
[  375.629394] UBIFS DBG (pid 2103): mount_ubifs: commit number:       1
[  375.644554] UBIFS DBG (pid 2107): ubifs_bg_thread: background thread 
"ubifs_bgt1_0" started, PID 2107
[  375.663774] UBIFS DBG (pid 2107): ubifs_bg_thread: background thread 
"ubifs_bgt1_0" stops
[  375.672118] UBIFS warning (pid 2103): power_cut_emulated: failing in 
master LEB 2
[  375.679752] UBIFS warning (pid 2103): power_cut_emulated: ========== 
Power cut emulated ==========
[  375.688780] Backtrace:
[  375.691294] [<8002b4ec>] (dump_backtrace+0x0/0x10c) from [<8034a738>] 
(dump_stack+0x18/0x1c)
[  375.699811]  r7:00000001 r6:8972a000 r5:896d6000 r4:00000001
[  375.705600] [<8034a720>] (dump_stack+0x0/0x1c) from [<8017dae0>] 
(power_cut_emulated+0x488/0x508)
[  375.714548] [<8017d658>] (power_cut_emulated+0x0/0x508) from 
[<8017dce8>] (dbg_leb_write+0x3c/0x158)
[  375.723822] [<8017dcac>] (dbg_leb_write+0x0/0x158) from [<8015970c>] 
(ubifs_leb_write+0xa0/0x120)
[  375.732769] [<8015966c>] (ubifs_leb_write+0x0/0x120) from 
[<80159928>] (ubifs_write_node+0x19c/0x1f8)
[  375.742071]  r8:00000200 r7:00003000 r6:896c0800 r5:00000002 r4:8972a000
[  375.748929] [<8015978c>] (ubifs_write_node+0x0/0x1f8) from 
[<80160580>] (ubifs_write_master+0x11c/0x12c)
[  375.758477] [<80160464>] (ubifs_write_master+0x0/0x12c) from 
[<80153b5c>] (ubifs_remount_fs+0x68c/0x81c)
[  375.768015]  r7:00000003 r6:000001b0 r5:8972a7ac r4:8972a000
[  375.773782] [<801534d0>] (ubifs_remount_fs+0x0/0x81c) from 
[<800b7d44>] (do_remount_sb+0xc4/0x120)
[  375.782799] [<800b7c80>] (do_remount_sb+0x0/0x120) from [<800cff18>] 
(do_mount+0x224/0x7dc)
[  375.791217]  r6:00200021 r5:00000000 r4:00000060
[  375.795925] [<800cfcf4>] (do_mount+0x0/0x7dc) from [<800d055c>] 
(sys_mount+0x8c/0xcc)
[  375.803816] [<800d04d0>] (sys_mount+0x0/0xcc) from [<80027fc0>] 
(ret_fast_syscall+0x0/0x30)
[  375.812208]  r7:00000015 r6:00000001 r5:00022140 r4:00023048
[  375.817971] UBIFS warning (pid 2103): cut_data: filled bytes 52-1290 
with 0xFFs
[  375.825422] UBIFS error (pid 2103): ubifs_leb_write: writing 2048 
bytes to LEB 2:12288 failed, error -30
[  375.834969] UBIFS warning (pid 2103): ubifs_ro_mode: switched to 
read-only mode, error -30
[  375.843285] Backtrace:
[  375.845773] [<8002b4ec>] (dump_backtrace+0x0/0x10c) from [<8034a738>] 
(dump_stack+0x18/0x1c)
[  375.854267]  r7:00003000 r6:00000002 r5:8972a000 r4:00000004
[  375.860020] [<8034a720>] (dump_stack+0x0/0x1c) from [<801587e4>] 
(ubifs_ro_mode+0x6c/0x78)
[  375.868349] [<80158778>] (ubifs_ro_mode+0x0/0x78) from [<80159750>] 
(ubifs_leb_write+0xe4/0x120)
[  375.877190]  r5:8972a000 r4:ffffffe2
[  375.880822] [<8015966c>] (ubifs_leb_write+0x0/0x120) from 
[<80159928>] (ubifs_write_node+0x19c/0x1f8)
[  375.890098]  r8:00000200 r7:00003000 r6:896c0800 r5:00000002 r4:8972a000
[  375.896925] [<8015978c>] (ubifs_write_node+0x0/0x1f8) from 
[<80160580>] (ubifs_write_master+0x11c/0x12c)
[  375.906470] [<80160464>] (ubifs_write_master+0x0/0x12c) from 
[<80153b5c>] (ubifs_remount_fs+0x68c/0x81c)
[  375.916007]  r7:00000003 r6:000001b0 r5:8972a7ac r4:8972a000
[  375.921759] [<801534d0>] (ubifs_remount_fs+0x0/0x81c) from 
[<800b7d44>] (do_remount_sb+0xc4/0x120)
[  375.930782] [<800b7c80>] (do_remount_sb+0x0/0x120) from [<800cff18>] 
(do_mount+0x224/0x7dc)
[  375.939186]  r6:00200021 r5:00000000 r4:00000060
[  375.943891] [<800cfcf4>] (do_mount+0x0/0x7dc) from [<800d055c>] 
(sys_mount+0x8c/0xcc)
[  375.951767] [<800d04d0>] (sys_mount+0x0/0xcc) from [<80027fc0>] 
(ret_fast_syscall+0x0/0x30)
[  375.960170]  r7:00000015 r6:00000001 r5:00022140 r4:00023048
[  375.965924] Backtrace:
[  375.968409] [<8002b4ec>] (dump_backtrace+0x0/0x10c) from [<8034a738>] 
(dump_stack+0x18/0x1c)
[  375.976900]  r7:00003000 r6:00000002 r5:8972a000 r4:ffffffe2
[  375.982654] [<8034a720>] (dump_stack+0x0/0x1c) from [<80159754>] 
(ubifs_leb_write+0xe8/0x120)
[  375.991242] [<8015966c>] (ubifs_leb_write+0x0/0x120) from 
[<80159928>] (ubifs_write_node+0x19c/0x1f8)
[  376.000518]  r8:00000200 r7:00003000 r6:896c0800 r5:00000002 r4:8972a000
[  376.007343] [<8015978c>] (ubifs_write_node+0x0/0x1f8) from 
[<80160580>] (ubifs_write_master+0x11c/0x12c)
[  376.016888] [<80160464>] (ubifs_write_master+0x0/0x12c) from 
[<80153b5c>] (ubifs_remount_fs+0x68c/0x81c)
[  376.026425]  r7:00000003 r6:000001b0 r5:8972a7ac r4:8972a000
[  376.032175] [<801534d0>] (ubifs_remount_fs+0x0/0x81c) from 
[<800b7d44>] (do_remount_sb+0xc4/0x120)
[  376.041196] [<800b7c80>] (do_remount_sb+0x0/0x120) from [<800cff18>] 
(do_mount+0x224/0x7dc)
[  376.049600]  r6:00200021 r5:00000000 r4:00000060
[  376.054305] [<800cfcf4>] (do_mount+0x0/0x7dc) from [<800d055c>] 
(sys_mount+0x8c/0xcc)
[  376.062180] [<800d04d0>] (sys_mount+0x0/0xcc) from [<80027fc0>] 
(ret_fast_syscall+0x0/0x30)
[  376.070584]  r7:00000015 r6:00000001 r5:00022140 r4:00023048
[  376.076585] UBIFS: cannot re-mount R/W due to prior errors
[  376.082748] UBIFS: un-mount UBI device 1, volume 0
[  376.095799] UBIFS error (pid 2103): ubifs_scan: bad node
[  376.105554] UBIFS: recovered master node from LEB 1
============= end snip ========================

Does anything in this output suggest a direction to start digging?  I 
would really appreciate any direction you could provide.

> But from your description it looks like something related to your
> system, not UBIFS.
In spite of this, I appreciate your efforts to assist me.

Best regards,

Russell Zuck

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-08-24 17:55     ` Russell Zuck
@ 2012-08-27 16:17       ` Russell Zuck
  2012-08-31 11:21       ` Artem Bityutskiy
  1 sibling, 0 replies; 10+ messages in thread
From: Russell Zuck @ 2012-08-27 16:17 UTC (permalink / raw)
  To: dedekind1; +Cc: linux-mtd

I'm resending this response because I haven't seen it in the mailing 
list archives.  I just changed my mailing list email address so there 
might have been a glitch due to that change.  My apologies if this is a 
duplicate reply.

On 8/22/2012 9:18 AM, Artem Bityutskiy wrote:
> Do you have a kernel which worked for you? One strategy would be to
> bisect or look at the differences.
I don't have a kernel configuration for which my upgrade scenario has
not eventually produced files system errors.

> WRT power-cut testing. Could you try your kernel + nandsim. Just do the
> following:
>
> $ modprobe nandsim first_id_byte=0x20 second_id_byte=0xac third_id_byte=0x00 fourth_id_byte=0x15
> $ modprobe ubi mtd=0
> $ ubimkvol /dev/ubi0 --lebs 4064 -N test_fs
> $ mount -t ubifs /dev/ubi0_0 /mnt/ubifs
>
> You should end up with something close to your system.
>
> Please, then try to do:
>
> $ echo 1 > /sys/kernel/debug/ubifs/tst_recovery
> $ cd $HOME/git/mtd-utils/tests/fs-tests/integrity
> $ ./integck -p /mnt/ubifs
>
> This should start recovery testing. Does it work?
I've made sure to apply the latest patches from the 2.6.35 backport repo
and am using the latest from the mtd-utils repo.

$ ./integck -p /mnt/ubifs

..fails soon after I run the test.  There is a ton of logged data.  I've
attempted to snip out the first error.

=============== snip =====================
[  375.323163] UBIFS warning (pid 2103): power_cut_emulated: failing
after 2877 calls
[  375.331523] UBIFS: un-mount UBI device 1, volume 0
[  375.336416] UBIFS DBG (pid 2070): ubifs_bg_thread: background thread
"ubifs_bgt1_0" stops
[  375.352677] UBIFS: mounted UBI device 1, volume 0, name "test_fs"
[  375.358919] UBIFS: mounted read-only
[  375.362523] UBIFS: file system size:   520740864 bytes (508536 KiB,
496 MiB, 4036 LEBs)
[  375.370590] UBIFS: journal size:       26062848 bytes (25452 KiB, 24
MiB, 202 LEBs)
[  375.378299] UBIFS: media format:       w4/r0 (latest is w4/r0)
[  375.384174] UBIFS: default compressor: lzo
[  375.388293] UBIFS: reserved for root:  4952683 bytes (4836 KiB)
[  375.394257] UBIFS DBG (pid 2103): mount_ubifs: compiled on:
Aug 23 2012 at 11:30:24
[  375.402652] UBIFS DBG (pid 2103): mount_ubifs: min. I/O unit size:
2048 bytes
[  375.409925] UBIFS DBG (pid 2103): mount_ubifs: max. write size:
2048 bytes
[  375.417202] UBIFS DBG (pid 2103): mount_ubifs: LEB size:
129024 bytes (126 KiB)
[  375.425521] UBIFS DBG (pid 2103): mount_ubifs: data journal heads:  1
[  375.431999] UBIFS DBG (pid 2103): mount_ubifs: UUID:
83D4BFF8-5B39-40B1-9A86-430C0E45852A
[  375.441539] UBIFS DBG (pid 2103): mount_ubifs: big_lpt              0
[  375.448027] UBIFS DBG (pid 2103): mount_ubifs: log LEBs:            9
(3 - 11)
[  375.455303] UBIFS DBG (pid 2103): mount_ubifs: LPT area LEBs:       2
(12 - 13)
[  375.462652] UBIFS DBG (pid 2103): mount_ubifs: orphan area LEBs:    2
(14 - 15)
[  375.470014] UBIFS DBG (pid 2103): mount_ubifs: main area LEBs:
4036 (16 - 4051)
[  375.477811] UBIFS DBG (pid 2103): mount_ubifs: index LEBs:          1
[  375.484299] UBIFS DBG (pid 2103): mount_ubifs: total index bytes:
48 (0 KiB, 0 MiB)
[  375.492169] UBIFS DBG (pid 2103): mount_ubifs: key hash type:       0
[  375.498655] UBIFS DBG (pid 2103): mount_ubifs: tree fanout:         8
[  375.505141] UBIFS DBG (pid 2103): mount_ubifs: reserved GC LEB:     18
[  375.511701] UBIFS DBG (pid 2103): mount_ubifs: first main LEB:      16
[  375.518273] UBIFS DBG (pid 2103): mount_ubifs: max. znode size      240
[  375.524938] UBIFS DBG (pid 2103): mount_ubifs: max. index node size 192
[  375.531587] UBIFS DBG (pid 2103): mount_ubifs: node sizes:
data 48, inode 160, dentry 56
[  375.540518] UBIFS DBG (pid 2103): mount_ubifs: node sizes:
trun 56, sb 4096, master 512
[  375.549363] UBIFS DBG (pid 2103): mount_ubifs: node sizes:
ref 64, cmt. start 32, orph 32
[  375.558382] UBIFS DBG (pid 2103): mount_ubifs: max. node sizes:
data 4144, inode 4256 dentry 312, idx 188
[  375.568358] UBIFS DBG (pid 2103): mount_ubifs: dead watermark:      2048
[  375.575106] UBIFS DBG (pid 2103): mount_ubifs: dark watermark:      6144
[  375.581840] UBIFS DBG (pid 2103): mount_ubifs: LEB overhead:        560
[  375.588502] UBIFS DBG (pid 2103): mount_ubifs: max. dark space:
24797184 (24216 KiB, 23 MiB)
[  375.597347] UBIFS DBG (pid 2103): mount_ubifs: maximum bud bytes:
24901632 (24318 KiB, 23 MiB)
[  375.606193] UBIFS DBG (pid 2103): mount_ubifs: BG commit bud bytes:
20232576 (19758 KiB, 19 MiB)
[  375.615036] UBIFS DBG (pid 2103): mount_ubifs: current bud bytes    0
(0 KiB, 0 MiB)
[  375.622820] UBIFS DBG (pid 2103): mount_ubifs: max. seq. number:    11
[  375.629394] UBIFS DBG (pid 2103): mount_ubifs: commit number:       1
[  375.644554] UBIFS DBG (pid 2107): ubifs_bg_thread: background thread
"ubifs_bgt1_0" started, PID 2107
[  375.663774] UBIFS DBG (pid 2107): ubifs_bg_thread: background thread
"ubifs_bgt1_0" stops
[  375.672118] UBIFS warning (pid 2103): power_cut_emulated: failing in
master LEB 2
[  375.679752] UBIFS warning (pid 2103): power_cut_emulated: ==========
Power cut emulated ==========
[  375.688780] Backtrace:
[  375.691294] [<8002b4ec>] (dump_backtrace+0x0/0x10c) from [<8034a738>]
(dump_stack+0x18/0x1c)
[  375.699811]  r7:00000001 r6:8972a000 r5:896d6000 r4:00000001
[  375.705600] [<8034a720>] (dump_stack+0x0/0x1c) from [<8017dae0>]
(power_cut_emulated+0x488/0x508)
[  375.714548] [<8017d658>] (power_cut_emulated+0x0/0x508) from
[<8017dce8>] (dbg_leb_write+0x3c/0x158)
[  375.723822] [<8017dcac>] (dbg_leb_write+0x0/0x158) from [<8015970c>]
(ubifs_leb_write+0xa0/0x120)
[  375.732769] [<8015966c>] (ubifs_leb_write+0x0/0x120) from
[<80159928>] (ubifs_write_node+0x19c/0x1f8)
[  375.742071]  r8:00000200 r7:00003000 r6:896c0800 r5:00000002 r4:8972a000
[  375.748929] [<8015978c>] (ubifs_write_node+0x0/0x1f8) from
[<80160580>] (ubifs_write_master+0x11c/0x12c)
[  375.758477] [<80160464>] (ubifs_write_master+0x0/0x12c) from
[<80153b5c>] (ubifs_remount_fs+0x68c/0x81c)
[  375.768015]  r7:00000003 r6:000001b0 r5:8972a7ac r4:8972a000
[  375.773782] [<801534d0>] (ubifs_remount_fs+0x0/0x81c) from
[<800b7d44>] (do_remount_sb+0xc4/0x120)
[  375.782799] [<800b7c80>] (do_remount_sb+0x0/0x120) from [<800cff18>]
(do_mount+0x224/0x7dc)
[  375.791217]  r6:00200021 r5:00000000 r4:00000060
[  375.795925] [<800cfcf4>] (do_mount+0x0/0x7dc) from [<800d055c>]
(sys_mount+0x8c/0xcc)
[  375.803816] [<800d04d0>] (sys_mount+0x0/0xcc) from [<80027fc0>]
(ret_fast_syscall+0x0/0x30)
[  375.812208]  r7:00000015 r6:00000001 r5:00022140 r4:00023048
[  375.817971] UBIFS warning (pid 2103): cut_data: filled bytes 52-1290
with 0xFFs
[  375.825422] UBIFS error (pid 2103): ubifs_leb_write: writing 2048
bytes to LEB 2:12288 failed, error -30
[  375.834969] UBIFS warning (pid 2103): ubifs_ro_mode: switched to
read-only mode, error -30
[  375.843285] Backtrace:
[  375.845773] [<8002b4ec>] (dump_backtrace+0x0/0x10c) from [<8034a738>]
(dump_stack+0x18/0x1c)
[  375.854267]  r7:00003000 r6:00000002 r5:8972a000 r4:00000004
[  375.860020] [<8034a720>] (dump_stack+0x0/0x1c) from [<801587e4>]
(ubifs_ro_mode+0x6c/0x78)
[  375.868349] [<80158778>] (ubifs_ro_mode+0x0/0x78) from [<80159750>]
(ubifs_leb_write+0xe4/0x120)
[  375.877190]  r5:8972a000 r4:ffffffe2
[  375.880822] [<8015966c>] (ubifs_leb_write+0x0/0x120) from
[<80159928>] (ubifs_write_node+0x19c/0x1f8)
[  375.890098]  r8:00000200 r7:00003000 r6:896c0800 r5:00000002 r4:8972a000
[  375.896925] [<8015978c>] (ubifs_write_node+0x0/0x1f8) from
[<80160580>] (ubifs_write_master+0x11c/0x12c)
[  375.906470] [<80160464>] (ubifs_write_master+0x0/0x12c) from
[<80153b5c>] (ubifs_remount_fs+0x68c/0x81c)
[  375.916007]  r7:00000003 r6:000001b0 r5:8972a7ac r4:8972a000
[  375.921759] [<801534d0>] (ubifs_remount_fs+0x0/0x81c) from
[<800b7d44>] (do_remount_sb+0xc4/0x120)
[  375.930782] [<800b7c80>] (do_remount_sb+0x0/0x120) from [<800cff18>]
(do_mount+0x224/0x7dc)
[  375.939186]  r6:00200021 r5:00000000 r4:00000060
[  375.943891] [<800cfcf4>] (do_mount+0x0/0x7dc) from [<800d055c>]
(sys_mount+0x8c/0xcc)
[  375.951767] [<800d04d0>] (sys_mount+0x0/0xcc) from [<80027fc0>]
(ret_fast_syscall+0x0/0x30)
[  375.960170]  r7:00000015 r6:00000001 r5:00022140 r4:00023048
[  375.965924] Backtrace:
[  375.968409] [<8002b4ec>] (dump_backtrace+0x0/0x10c) from [<8034a738>]
(dump_stack+0x18/0x1c)
[  375.976900]  r7:00003000 r6:00000002 r5:8972a000 r4:ffffffe2
[  375.982654] [<8034a720>] (dump_stack+0x0/0x1c) from [<80159754>]
(ubifs_leb_write+0xe8/0x120)
[  375.991242] [<8015966c>] (ubifs_leb_write+0x0/0x120) from
[<80159928>] (ubifs_write_node+0x19c/0x1f8)
[  376.000518]  r8:00000200 r7:00003000 r6:896c0800 r5:00000002 r4:8972a000
[  376.007343] [<8015978c>] (ubifs_write_node+0x0/0x1f8) from
[<80160580>] (ubifs_write_master+0x11c/0x12c)
[  376.016888] [<80160464>] (ubifs_write_master+0x0/0x12c) from
[<80153b5c>] (ubifs_remount_fs+0x68c/0x81c)
[  376.026425]  r7:00000003 r6:000001b0 r5:8972a7ac r4:8972a000
[  376.032175] [<801534d0>] (ubifs_remount_fs+0x0/0x81c) from
[<800b7d44>] (do_remount_sb+0xc4/0x120)
[  376.041196] [<800b7c80>] (do_remount_sb+0x0/0x120) from [<800cff18>]
(do_mount+0x224/0x7dc)
[  376.049600]  r6:00200021 r5:00000000 r4:00000060
[  376.054305] [<800cfcf4>] (do_mount+0x0/0x7dc) from [<800d055c>]
(sys_mount+0x8c/0xcc)
[  376.062180] [<800d04d0>] (sys_mount+0x0/0xcc) from [<80027fc0>]
(ret_fast_syscall+0x0/0x30)
[  376.070584]  r7:00000015 r6:00000001 r5:00022140 r4:00023048
[  376.076585] UBIFS: cannot re-mount R/W due to prior errors
[  376.082748] UBIFS: un-mount UBI device 1, volume 0
[  376.095799] UBIFS error (pid 2103): ubifs_scan: bad node
[  376.105554] UBIFS: recovered master node from LEB 1
============= end snip ========================

Does anything in this output suggest a direction to start digging?  I
would really appreciate any direction you could provide.

> But from your description it looks like something related to your
> system, not UBIFS.
In spite of this, I appreciate your efforts to assist me.

Best regards,

Russell Zuck

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-08-24 17:55     ` Russell Zuck
  2012-08-27 16:17       ` Russell Zuck
@ 2012-08-31 11:21       ` Artem Bityutskiy
  1 sibling, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2012-08-31 11:21 UTC (permalink / raw)
  To: Russell Zuck; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

> $ ./integck -p /mnt/ubifs
> 
> ..fails soon after I run the test.  There is a ton of logged data.  I've 
> attempted to snip out the first error.

How exactly?

Sure there will be a lot of nise in dmesg because we emulate power cuts
and switch to R/O mode at random moments, so a lot of UBIFS operastions
fail and print a lot of error messages and dumps. But that's fine as
long as the system works.

The idea is that UBIFS switches to R/O mode randomly and integck will
fail with EROFS at some point. Then it will unmount the FS and mount it
back, and verify some files, then remove everything and continue, and so
on forever. This way we emulate huge amount of power cuts very quickly.

If integck fails, then the last messages of dmesg are interesting.

Last time I tested UBIFS this way, the only issues which I did not
resolve were that it was unable to mount because there was no space.
That was very rare and I had to run it for the night. I just did not
have time to dig this - must be some accounting issues.

> [  376.076585] UBIFS: cannot re-mount R/W due to prior errors
> [  376.082748] UBIFS: un-mount UBI device 1, volume 0
> [  376.095799] UBIFS error (pid 2103): ubifs_scan: bad node
> [  376.105554] UBIFS: recovered master node from LEB 1
> ============= end snip ========================
> 
> Does anything in this output suggest a direction to start digging?  I 
> would really appreciate any direction you could provide.

Hmm. I've just set up ubifs-v2.6.35 on a kvm virtual machine and started
testing like this:

$ cat ~/tmp/nandsim.sh 
#!/bin/sh

umount /mnt/ubifs
rmmod ubifs ubi mtdram nandsim

dmesg -c > /dev/null
modprobe nandsim first_id_byte=0xec second_id_byte=0xd3 third_id_byte=0x51 fourth_id_byte=0x95
modprobe ubi mtd=0
modprobe mtdchar
ubimkvol /dev/ubi0 --lebs 4064 -N xyZ
mount -t ubifs /dev/ubi0_0 /mnt/ubifs
echo 1 > /sys/kernel/debug/ubifs/tst_recovery
#echo 'format "UBIFS DBG" +pf' > /sys/kernel/debug/dynamic_debug/control

$ sudo ~/tmp/nandsim.sh
# sudo ./integck /mnt/ubifs/

Let's see. I will try to look at failures, but so far it works fine. But
I do not promise anything - I have limited amount of time for this hobby
stuff.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS corruption after software upgrade
  2012-07-24 21:02 UBIFS corruption after software upgrade Russell Zuck
  2012-08-01 12:40 ` Russell Zuck
  2012-08-22 13:02 ` Artem Bityutskiy
@ 2012-08-31 11:30 ` Artem Bityutskiy
  2 siblings, 0 replies; 10+ messages in thread
From: Artem Bityutskiy @ 2012-08-31 11:30 UTC (permalink / raw)
  To: Russell Zuck; +Cc: linux-mtd

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Tue, 2012-07-24 at 17:02 -0400, Russell Zuck wrote:
> I am attempting to troubleshoot an issue where the root partition 
> (ubi0_0) of my embedded platform becomes corrupted after a few upgrades 
> of our system software.

Just took a look at your dmesg log again, it really does not look like
UBIFS issue. Something has changed so that UBIFS reads all 0xFFs instead
of nodes - may be driver returns bogus data.

If you have a working kernel then you can bisect and find the commit
which broke your system.

E.g., if you know 2.6.32 worked, but 2.6.35 does not, you can check if
2.6.33 works or not. If yes, check 2.6.34. When you find the first
non-working release, you may start using 'git bisect' to find the quilty
commit.

-- 
Best Regards,
Artem Bityutskiy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-31 11:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-24 21:02 UBIFS corruption after software upgrade Russell Zuck
2012-08-01 12:40 ` Russell Zuck
2012-08-02  2:10   ` Subodh Nijsure
2012-08-22 13:18   ` Artem Bityutskiy
2012-08-24 17:55     ` Russell Zuck
2012-08-27 16:17       ` Russell Zuck
2012-08-31 11:21       ` Artem Bityutskiy
2012-08-22 13:21   ` Artem Bityutskiy
2012-08-22 13:02 ` Artem Bityutskiy
2012-08-31 11:30 ` Artem Bityutskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).