Re: [PATCH 1/1] fs:ubifs:recovery:fixup UBIFS cannot recover master node issue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Weinberger <richard@nod.at>
To: "Bean Huo 霍斌斌 (beanhuo)" <beanhuo@micron.com>,
	"Artem Bityutskiy" <dedekind1@gmail.com>,
	"Adrian Hunter" <adrian.hunter@intel.com>,
	"Brian Norris" <computersforpeace@gmail.com>
Cc: "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Boris Brezillon <boris.brezillon@free-electrons.com>
Subject: Re: [PATCH 1/1] fs:ubifs:recovery:fixup UBIFS cannot recover master node issue
Date: Fri, 11 Dec 2015 10:12:24 +0100	[thread overview]
Message-ID: <566A9378.4070900@nod.at> (raw)
In-Reply-To: <A765B125120D1346A63912DDE6D8B6310BF53F90@NTXXIAMBX02.xacn.micron.com>

Bean,

Am 11.12.2015 um 09:26 schrieb Bean Huo 霍斌斌 (beanhuo):
> For MLC NAND, paired page issue is now a common known issue.
> This patch is just for master node cannot be recovered while
> there will two pages be damaged in one single master node block.
> As for this patch, if there are more than one page data in
> master node block being damaged, and as long as exist one
> uncorrupted master node block, master node will be recovered.

So, this patch is part if a larger patch series to make UBIFS MLC aware?

> This patch has been tested on Micron MLC NAND MT29F32G08CBADAWP.
> Issue descripted:
> http://lists.infradead.org/pipermail/linux-mtd/2015-November/063016.html
> 
> Signed-off-by: Bean Huo <beanhuo@micron.com>
> ---
>  fs/ubifs/recovery.c | 75 ++++++++++++++++++++++++++++++++++-------------------
>  1 file changed, 49 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
> index 695fc71..e3154e6 100644
> --- a/fs/ubifs/recovery.c
> +++ b/fs/ubifs/recovery.c
> @@ -128,7 +128,7 @@ static int get_master_node(const struct ubifs_info *c, int lnum, void **pbuf,
>  	while (offs + UBIFS_MST_NODE_SZ <= c->leb_size) {
>  		struct ubifs_ch *ch = buf;
>  
> -		if (le32_to_cpu(ch->magic) != UBIFS_NODE_MAGIC)
> +		if (le32_to_cpu(ch->magic) == 0xFFFFFFFF)

The purpose of this check was to trigger upon garbage data (including 0xFF).
Now you only check for 0xFF bytes.

>  			break;
>  		offs += sz;
>  		buf  += sz;
> @@ -137,37 +137,40 @@ static int get_master_node(const struct ubifs_info *c, int lnum, void **pbuf,
>  	/* See if there was a valid master node before that */
>  	if (offs) {
>  		int ret;
> -
> +retry:
>  		offs -= sz;
>  		buf  -= sz;
>  		len  += sz;
>  		ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 1);
> -		if (ret != SCANNED_A_NODE && offs) {
> -			/* Could have been corruption so check one place back */
> -			offs -= sz;
> -			buf  -= sz;
> -			len  += sz;
> -			ret = ubifs_scan_a_node(c, buf, len, lnum, offs, 1);
> -			if (ret != SCANNED_A_NODE)
> -				/*
> -				 * We accept only one area of corruption because
> -				 * we are assuming that it was caused while
> -				 * trying to write a master node.
> -				 */
> -				goto out_err;
> -		}
> -		if (ret == SCANNED_A_NODE) {
> -			struct ubifs_ch *ch = buf;
> -
> -			if (ch->node_type != UBIFS_MST_NODE)
> +		if (ret != SCANNED_A_NODE) {
> +			/* Could have been corruption so check more
> +			 * place back. We accept two areas of corruption
> +			 * because we are assuming that for MLC NAND,it
> +			 * was caused while trying to write a lower
> +			 * page, upper page being damaged.
> +			 */
> +			if (offs > 0)
> +				goto retry;
> +			else
>  				goto out_err;
> +			}
> +			if (ret == SCANNED_A_NODE) {
> +				struct ubifs_ch *ch = buf;
> +
> +				if (ch->node_type != UBIFS_MST_NODE) {
> +					if (offs)
> +						goto retry;
> +					else
> +						goto out_err;
> +				}

Here you kill another sanity check.

>  			dbg_rcvry("found a master node at %d:%d", lnum, offs);
>  			*mst = buf;
>  			offs += sz;
>  			buf  += sz;
>  			len  -= sz;
> -		}
> +			}
>  	}
> +

Useless line break. :)

>  	/* Check for corruption */
>  	if (offs < c->leb_size) {
>  		if (!is_empty(buf, min_t(int, len, sz))) {
> @@ -178,10 +181,6 @@ static int get_master_node(const struct ubifs_info *c, int lnum, void **pbuf,
>  		buf  += sz;
>  		len  -= sz;
>  	}
> -	/* Check remaining empty space */
> -	if (offs < c->leb_size)
> -		if (!is_empty(buf, len))
> -			goto out_err;

Another check gone. :(

>  	*pbuf = sbuf;
>  	return 0;
>  
> @@ -236,7 +235,7 @@ out:
>  int ubifs_recover_master_node(struct ubifs_info *c)
>  {
>  	void *buf1 = NULL, *buf2 = NULL, *cor1 = NULL, *cor2 = NULL;
> -	struct ubifs_mst_node *mst1 = NULL, *mst2 = NULL, *mst;
> +	struct ubifs_mst_node *mst1 = NULL, *mst2 = NULL, *mst = NULL;
>  	const int sz = c->mst_node_alsz;
>  	int err, offs1, offs2;
>  
> @@ -280,6 +279,28 @@ int ubifs_recover_master_node(struct ubifs_info *c)
>  				if (cor1)
>  					goto out_err;
>  				mst = mst1;
> +			} else if (offs2 + sz != offs1) {
> +				if (le32_to_cpu(mst1->ch.sqnum) >
> +					le32_to_cpu(mst2->ch.sqnum)) {
> +					/*
> +					 * 1st LEB written, occurred power
> +					 * loss while writing 2nd LEB.
> +					 */
> +					if (cor1)
> +						goto out_err;
> +					mst = mst1;
> +				} else if (le32_to_cpu(mst1->ch.sqnum) <
> +					le32_to_cpu(mst2->ch.sqnum)) {
> +				/* While writing 1st LEB, occurred power loss */
> +					if (!cor2) {
> +						if (mst2->flags &
> +						   cpu_to_le32(UBIFS_MST_DIRTY))
> +							mst = mst2;
> +						else
> +							goto out_err;
> +					} else
> +					goto out_err;
> +				}
>  			} else
>  				goto out_err;
>  		} else {
> @@ -305,6 +326,8 @@ int ubifs_recover_master_node(struct ubifs_info *c)
>  		mst = mst2;
>  	}
>  
> +	if (mst == NULL)
> +		goto out_err;
>  	ubifs_msg(c, "recovered master node from LEB %d",
>  		  (mst == mst1 ? UBIFS_MST_LNUM : UBIFS_MST_LNUM + 1));

That said, please explain your patch in more detail. i.e. Why do you remove these checks?
Why is this correct to do so?
To me it looks like an ad-hoc solution to make UBIFS not
abort on your MLC by removing well-established checks.
I agree that UBIFS's master node checks are very picky but for SLC they are correct and make a lot of sense.
Adding MLC support must not hurt UBIFS's SLC robustness.

Thanks,
//richard

next prev parent reply	other threads:[~2015-12-11  9:12 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-11  8:26 [PATCH 1/1] fs:ubifs:recovery:fixup UBIFS cannot recover master node issue Bean Huo 霍斌斌 (beanhuo)
2015-12-11  8:26 ` Bean Huo 霍斌斌 (beanhuo)
2015-12-11  9:12 ` Richard Weinberger [this message]
2015-12-14  3:55   ` Bean Huo 霍斌斌 (beanhuo)
2015-12-14  3:55     ` Bean Huo 霍斌斌 (beanhuo)
2015-12-14 18:00     ` Richard Weinberger
2016-01-28  2:42       ` Bean Huo 霍斌斌 (beanhuo)
2016-01-28  9:31         ` Richard Weinberger
2016-02-01  7:17           ` Bean Huo 霍斌斌 (beanhuo)
2016-02-01  8:28             ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=566A9378.4070900@nod.at \
    --to=richard@nod.at \
    --cc=adrian.hunter@intel.com \
    --cc=beanhuo@micron.com \
    --cc=boris.brezillon@free-electrons.com \
    --cc=computersforpeace@gmail.com \
    --cc=dedekind1@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.