From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga18.intel.com ([134.134.136.126]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fk3Ez-0003qq-Qn for linux-mtd@lists.infradead.org; Mon, 30 Jul 2018 08:09:19 +0000 Subject: Re: UBIFS question: Power-cuts after ubifs_leb_unmap() To: Richard Weinberger Cc: artem.bityutskiy@linux.intel.com, linux-mtd@lists.infradead.org References: <1756392.9dNXh6ho7o@blindfold> <2255423.z5H1mqHeVL@blindfold> <0c744441-3ebc-9d7b-7c74-70d65a85c6fd@intel.com> <3508570.fEnza5EDYF@blindfold> From: Adrian Hunter Message-ID: <086b1694-a182-5873-a4c2-c754d8bf3cda@intel.com> Date: Mon, 30 Jul 2018 11:07:25 +0300 MIME-Version: 1.0 In-Reply-To: <3508570.fEnza5EDYF@blindfold> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 30/07/18 10:28, Richard Weinberger wrote: > Am Montag, 30. Juli 2018, 08:55:24 CEST schrieb Adrian Hunter: >> On 26/07/18 01:29, Richard Weinberger wrote: >>> Artem, Adrian, >>> >>> Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy: >>>> Hi, I looked at this quickly and talked to Adrian. I cannot solve this >>>> for you but here are some thoughts. >>> >>> Both of your thoughts are much appreciated! >>> >>>> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote: >>>>> Artem, Adrian, >>>>> >>>>> While playing with a new UBI/UBIFS test framework I managed to hit >>>>> this error, >>>>> with lprops self-checks enabled: >>>>> >>>>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad >>>>> accounting of >>>>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty >>>>> 0 >>>> >>>> So it has 126976 - 118072 = 8904 of used space. >>>> >>> >>> [...] >>> >>>> 1. If you believe some specific place like the one you pointed contains >>>> a bug, then emulate a power cut in that place and try to reproduce the >>>> bug. >>>> >>>> 2. Try to focus on recovery replay, this is the main suspect. Try to >>>> figure out what is that used space in that LEB? >>> >>> After spending the third night in a row staring at hexdumps, I think I know >>> what is going on. >>> >>> I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes >>> as used. >>> To double check that value I've analyzed the index tree. To my surprise it was >>> referencing nodes at LEB 11! >>> Summing up the length of these nodes is exactly 8904. So, the LPT is correct. >>> >>> All nodes on LEB 11 that are referenced by the index tree are of type >>> UBIFS_INO_NODE and have UBIFS_XATTR_FL set. >>> So, our problem is xattr related. We forget xattr inodes and the free space >>> accounting goes nuts sooner or later. >>> >>> Inspecting the journal gave more details. Only buds of head type BASEHD and >>> DATAHD are present. No GCHD. Therefore GC was not interrupted. >>> With the IO transaction log from before the power-cut I was able to figure >>> where the orphaned xattr inodes belong to. >>> And indeed, the host inode of said xattr is in the journal for deletion. >>> >>> In theory during replay, removal of the host inode should also remove all >>> xattr inodes and we are good. >>> So the question is, why fails UBIFS to remove the xattr inodes even if we have >>> the host inode in the journal? >>> >>> Another round of digging though the index uncovered the problem. The >>> UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11. >>> This means upon replay, when UBIFS tries to locate all xattr entries >>> of the host inode, it faces the dangling branch case and the loop in >>> ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be >>> removed, but the xattr inodes stay. >>> >>> Therefore the root of the problem is that UBIFS places only the host inode >>> into the journal upon removal of a file with xattrs and assumes that upon >>> replay we can discover all xattr inodes. That will fail if due to GC the LEB >>> got already unmapped. >>> >>> As intermediate solution I suggest adding also xattr inodes to the journal >>> upon unlink. This will make unlink more expensive but for now it is IMHO ok >>> until a more sophisticated solution is found. >>> Files usually have only a few xattr, so the cost is not that bad. >>> >>> What do you guys think? Does my analysis make sense to you? >> >> It doesn't seem like there is a way to avoid adding more information to the >> journal upon deletion, but you probably need to cater for the possibility >> that there is a power cut while writing that information e.g. it wouldn't be >> good to remove half the xattrs but leave the rest > > With grouped nodes a power-cut is okay, the bigger problem is the number of xattrs. > We support up to 2^16. Do delete an inode via journal we need UBIFS_INO_NODE_SZ bytes > in the journal. > So, in worst case we need to write 2^16 times UBIFS_INO_NODE_SZ bytes to the journal. > I'm still looking into this option right now, just to be sure that this is really the > solution to the problem I see. Finding a stable reproducer is also not easy... > > Another possibility is playing with ubifs_tnc_remove_ino(), upon unlink() it removes > all entries from the TNC and marks them as dirty. > The problem is that also all xattr inodes are marked as dirty. > If we manage to find a way that these get marked only as dirty when we are sure that > they are gone the GC will no longer unmap a "used" LEB. > But that all is very tricky. What about: add the inode to be deleted to orphans, then delete the xattrs one at a time, then when they are all gone, delete the inode and remove it from orphans.