From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mdfmta010.mxout.tch.inty.net ([91.221.169.51] helo=smtp.demon.co.uk) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1XgEuU-0003Je-8B for linux-mtd@lists.infradead.org; Mon, 20 Oct 2014 15:30:15 +0000 Date: Mon, 20 Oct 2014 16:25:36 +0100 (BST) From: Steve B To: Artem Bityutskiy Subject: Re: UBI/UBFS: ubi_eba_read_leb() reporting unmapped LEB In-Reply-To: <1413818380.7906.352.camel@sauron.fi.intel.com> Message-ID: References: <1413812127.7906.282.camel@sauron.fi.intel.com> <1413813912.7906.305.camel@sauron.fi.intel.com> <1413818380.7906.352.camel@sauron.fi.intel.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: linux-mtd@lists.infradead.org List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Thank you very much for your suggestions Artem, gives me plenty to look into :) On Mon, 20 Oct 2014, Artem Bityutskiy wrote: > On Mon, 2014-10-20 at 15:51 +0100, Steve B wrote: >> The log on the target is very minimal as I was only capturing error >> logs, so don't have much info to share on that, but here is the log from >> the target: >> >> [ 13.178297] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) >> [ 13.178481] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 >> [ 13.178668] Not a node, first 24 bytes: >> [ 13.178755] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ........................ >> [ 13.179093] [<80012b7c>] (unwind_backtrace+0x0/0x11c) from [<80131138>] (ubifs_read_node+0x25c/0x2a0) >> [ 13.179311] [<80131138>] (ubifs_read_node+0x25c/0x2a0) from [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) >> [ 13.185906] [<8014d2ac>] (ubifs_tnc_read_node+0x60/0x198) from [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) >> [ 13.195449] [<801343e0>] (ubifs_tnc_locate+0xa8/0x180) from [<80126e1c>] (do_readpage+0x1bc/0x448) >> [ 13.204388] [<80126e1c>] (do_readpage+0x1bc/0x448) from [<801284f8>] (ubifs_readpage+0x3e8/0x428) >> [ 13.213246] [<801284f8>] (ubifs_readpage+0x3e8/0x428) from [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) >> [ 13.222973] [<8008e6e4>] (generic_file_aio_read+0x328/0x6f4) from [<800b37c4>] (do_sync_read+0x98/0xd4) >> [ 13.232342] [<800b37c4>] (do_sync_read+0x98/0xd4) from [<800b3e0c>] (vfs_read+0xa4/0x134) >> [ 13.240501] [<800b3e0c>] (vfs_read+0xa4/0x134) from [<800b4190>] (sys_read+0x34/0x68) >> [ 13.248316] [<800b4190>] (sys_read+0x34/0x68) from [<8000e280>] (ret_fast_syscall+0x0/0x30) >> [ 13.256639] UBIFS error (pid 1273): do_readpage: cannot read page 991 of inode 91, error -22 >> [ 13.265085] UBIFS error (pid 1273): ubifs_read_node: bad node type (255 but expected 1) >> [ 13.273046] UBIFS error (pid 1273): ubifs_read_node: bad node at LEB 39:0, LEB mapping status 0 >> >> I'm currently trying to create a test to more reliably reproduce the issue, >> when I do I can share those logs. >> >> Fastmap doesn't exist in the kernel version we are running (3.4.0) >> >> The first test to show up this issue simple copies a file in a tight loop and >> power is pulled randomly on the board. I am now trying to refine the test to speed >> up the failure by randomly pulling the power in the NAND driver after a random >> number of erase/write calls, but I haven't been able to reproduce the issue >> with this. > > Being able to reproduce this would make it bughunting a relatively easy > task. > > You could also try to establish a separate UBI power cut testing. > Something like writing various patterns directly to the UBI volume, > having power cut, and making sure that what you wrote before power cut > is there is exactly what you wrote. UBI writes synchronously, so once > the write returns, the data must be on the flash. And you should write > in min. I/O unit sizes (on NAND page at a time). > > I guess you can have a mirror file (if you have a second media), or a > fixed algorithm of how you select the LEB and offset to write. And you > generate random power cuts, and then after boot-up you verify that the > media contains valid data - everything, for which the write operation > was finished, should be there. > > With this, you'd exclude the whole layer of UBIFS complexity. And if you > are lucky, and the bug is in UBI, it will be much easier to catch and to > fix. > > With a mirror file it is probably easy. You just do modify the mirror > file, recored the last operation, sync the mirror file and the file > where you recorded the operation, then you enable the random power cut, > and start the operation on the UBI volume. The point is to not have > power cut before the mirror file is being changed. > > Artem. > >