From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from szxga03-in.huawei.com ([119.145.14.66]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YJJ3C-0005kY-B6 for linux-mtd@lists.infradead.org; Thu, 05 Feb 2015 09:48:43 +0000 Message-ID: <54D33C36.9060805@huawei.com> Date: Thu, 5 Feb 2015 17:47:34 +0800 From: hujianyang MIME-Version: 1.0 To: Artem Bityutskiy Subject: [RFC] UBIFS recovery Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Cc: Richard Weinberger , linux-mtd , Sheng Yong List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Current UBIFS is lack of recovery method, that means, once a UBIFS partition refuse to mount, all data on that partition may lose. The default recovery mechanism in UBIFS now can deal with corruption on master node or power cut cleanup. But it's not enough. UBIFS on flash may suffer different kinds of data corrupted, the most common case, ECC error. I've scanned the archive of maillist and found the recovery method was once requested(Sorry, I can't find the link). Artem suggested we could introduce a new repairing mount option instead of working on a new userspace repairing tool. But seems no more efforts had been done so far. There are two ways for UBIFS recovery. One is repairing UBIFS image in userspace via UBI interfaces, the other is repairing the corrupted data during mount by default or via a special mount option. The userspace tool is the most effective way to repair a partition. It could have enough time and resource to whole scan the target and cleanup the corrupted while the file-system offline. But it's hard to program: many structures and functions in kernel need to be copied into this utility, current ubi-utils focus mostly on UBI device, not UBIFS, and the subsequent updating of file-system should consider the userspace tool. It's too complicated. Another way is expanding the existing recovery methods in recovery.c. It's easy to add new recovery method in this way, few lines changes could improve reliability in some fields. But it's hard to give a global view to control these recovery features, they are dispersed in mounting path. Also, make it hard to add new features after importing lots of recovery methods. I can't say which way is better. It depends on what we expect on UBIFS. Actually I'm working on a userspace tool ubidump, it can print on-flash format of a specified LEB now and add features like file-system repairing can be considered. On the other hand, I'm working on expanding UBIFS recovery method in kernel. e.g. cleanup all the logs if an error occur while replaying buds, revert file- system to last commit state instead of mounting fail. Regardless of how to fix a corrupt partition, the first stuff should be done is adding a method that try to mount file-system R/O instead of breaking down to give users a chance to copy their valid data out from the corrupt image. Thanks! Hu buds replay patch for linux 3.10 stable: diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index 3187925..e2208a2 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -706,14 +706,35 @@ static int replay_buds(struct ubifs_info *c) list_for_each_entry(b, &c->replay_buds, list) { err = replay_bud(c, b); - if (err) - return err; + if (err) { + ubifs_err("error %d during buds replay, try to revert\n", + err); + goto revert; + } ubifs_assert(b->sqnum > prev_sqnum); prev_sqnum = b->sqnum; } return 0; + +revert: + prev_sqnum = 0; + + list_for_each_entry(b, &c->replay_buds, list) { + /* + * Revert to last commit state, update lprops by setting + * the state of space used by buds to dirty. + */ + b->free = c->leb_size % c->min_io_size; + b->dirty = c->leb_size - b->bud->start - b->free; + + ubifs_assert(b->sqnum > prev_sqnum); + prev_sqnum = b->sqnum; + } + ubifs_warn("revert to last commit state with data lost\n"); + + return 1; } /** @@ -1036,13 +1057,15 @@ int ubifs_replay_journal(struct ubifs_info *c) lnum = ubifs_next_log_lnum(c, lnum); } while (lnum != c->ltail_lnum); - err = replay_buds(c); - if (err) - goto out; - - err = apply_replay_list(c); - if (err) - goto out; + /* + * If an error occur during buds replay, try to revert filesystem + * to last commit state. Should not apply corrupt replay list. + */ + if (!replay_buds(c)) { + err = apply_replay_list(c); + if (err) + goto out; + } err = set_buds_lprops(c); if (err) -- 1.6.0.2