[PATCH 1/3] ubi: Deal with interrupted erasures in WL

public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH 1/3] ubi: Deal with interrupted erasures in WL
@ 2016-08-24 12:36 Richard Weinberger
  2016-08-24 12:36 ` [PATCH 2/3] ubi: Fix races around ubi_refill_pools() Richard Weinberger
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Richard Weinberger @ 2016-08-24 12:36 UTC (permalink / raw)
  To: linux-mtd; +Cc: Richard Weinberger, stable

When Fastmap is used we can face here an -EBADMSG
since Fastmap cannot know about unmaps.
If the erasure was interrupted the PEB may show ECC
errors and UBI would go to ro-mode as it assumes
that the PEB was check during attach time, which is
not the case with Fastmap.

Cc: <stable@vger.kernel.org>
Fixes: dbb7d2a88d ("UBI: Add fastmap core")
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/wl.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index f453326..b419c7c 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -644,7 +644,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
 				int shutdown)
 {
 	int err, scrubbing = 0, torture = 0, protect = 0, erroneous = 0;
-	int vol_id = -1, lnum = -1;
+	int erase = 0, keep = 0, vol_id = -1, lnum = -1;
 #ifdef CONFIG_MTD_UBI_FASTMAP
 	int anchor = wrk->anchor;
 #endif
@@ -780,6 +780,16 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
 			       e1->pnum);
 			scrubbing = 1;
 			goto out_not_moved;
+		} else if (ubi->fast_attach && err == UBI_IO_BAD_HDR_EBADMSG) {
+			/*
+			 * While a full scan would detect interrupted erasures
+			 * at attach time we can face them here when attached from
+			 * Fastmap.
+			 */
+			dbg_wl("PEB %d has ECC errors, maybe from an interrupted erasure",
+			       e1->pnum);
+			erase = 1;
+			goto out_not_moved;
 		}
 
 		ubi_err(ubi, "error %d while reading VID header from PEB %d",
@@ -815,6 +825,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
 			 * Target PEB had bit-flips or write error - torture it.
 			 */
 			torture = 1;
+			keep = 1;
 			goto out_not_moved;
 		}
 
@@ -901,7 +912,7 @@ out_not_moved:
 		ubi->erroneous_peb_count += 1;
 	} else if (scrubbing)
 		wl_tree_add(e1, &ubi->scrub);
-	else
+	else if (keep)
 		wl_tree_add(e1, &ubi->used);
 	if (dst_leb_clean) {
 		wl_tree_add(e2, &ubi->free);
@@ -922,6 +933,12 @@ out_not_moved:
 			goto out_ro;
 	}
 
+	if (erase) {
+		err = do_sync_erase(ubi, e1, vol_id, lnum, 1);
+		if (err)
+			goto out_ro;
+	}
+
 	mutex_unlock(&ubi->move_mutex);
 	return 0;
 
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] ubi: Fix races around ubi_refill_pools()
  2016-08-24 12:36 [PATCH 1/3] ubi: Deal with interrupted erasures in WL Richard Weinberger
@ 2016-08-24 12:36 ` Richard Weinberger
  2016-08-24 12:36 ` [PATCH 3/3] ubi: Fix Fastmap's update_vol() Richard Weinberger
  2016-09-05 20:57 ` [PATCH 1/3] ubi: Deal with interrupted erasures in WL Boris Brezillon
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Weinberger @ 2016-08-24 12:36 UTC (permalink / raw)
  To: linux-mtd; +Cc: Richard Weinberger, stable

When writing a new Fastmap the first thing that happens
is refilling the pools in memory.
At this stage it is possible that new PEBs from the new pools
get already claimed and written with data.
If this happens before the new Fastmap data structure hits the
flash and we face power cut the freshly written PEB will not
scanned and unnoticed.

Solve the issue by locking the pools until Fastmap is written.

Cc: <stable@vger.kernel.org>
Fixes: dbb7d2a88d ("UBI: Add fastmap core")
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/eba.c        |  4 ++--
 drivers/mtd/ubi/fastmap-wl.c |  6 ++++--
 drivers/mtd/ubi/fastmap.c    | 14 ++++++++++----
 drivers/mtd/ubi/wl.c         | 20 ++++++++++++++------
 4 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c
index ebf5172..6c8e16d 100644
--- a/drivers/mtd/ubi/eba.c
+++ b/drivers/mtd/ubi/eba.c
@@ -1088,6 +1088,8 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
 	struct ubi_volume *vol;
 	uint32_t crc;
 
+	ubi_assert(rwsem_is_locked(&ubi->fm_eba_sem));
+
 	vol_id = be32_to_cpu(vid_hdr->vol_id);
 	lnum = be32_to_cpu(vid_hdr->lnum);
 
@@ -1230,9 +1232,7 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to,
 	}
 
 	ubi_assert(vol->eba_tbl[lnum] == from);
-	down_read(&ubi->fm_eba_sem);
 	vol->eba_tbl[lnum] = to;
-	up_read(&ubi->fm_eba_sem);
 
 out_unlock_buf:
 	mutex_unlock(&ubi->buf_mutex);
diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c
index 30d3999..4f0bd6b 100644
--- a/drivers/mtd/ubi/fastmap-wl.c
+++ b/drivers/mtd/ubi/fastmap-wl.c
@@ -262,6 +262,8 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi)
 	struct ubi_fm_pool *pool = &ubi->fm_wl_pool;
 	int pnum;
 
+	ubi_assert(rwsem_is_locked(&ubi->fm_eba_sem));
+
 	if (pool->used == pool->size) {
 		/* We cannot update the fastmap here because this
 		 * function is called in atomic context.
@@ -303,7 +305,7 @@ int ubi_ensure_anchor_pebs(struct ubi_device *ubi)
 
 	wrk->anchor = 1;
 	wrk->func = &wear_leveling_worker;
-	schedule_ubi_work(ubi, wrk);
+	__schedule_ubi_work(ubi, wrk);
 	return 0;
 }
 
@@ -344,7 +346,7 @@ int ubi_wl_put_fm_peb(struct ubi_device *ubi, struct ubi_wl_entry *fm_e,
 	spin_unlock(&ubi->wl_lock);
 
 	vol_id = lnum ? UBI_FM_DATA_VOLUME_ID : UBI_FM_SB_VOLUME_ID;
-	return schedule_erase(ubi, e, vol_id, lnum, torture);
+	return schedule_erase(ubi, e, vol_id, lnum, torture, true);
 }
 
 /**
diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 48eb55f..da33100 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -1522,22 +1522,30 @@ int ubi_update_fastmap(struct ubi_device *ubi)
 	struct ubi_wl_entry *tmp_e;
 
 	down_write(&ubi->fm_protect);
+	down_write(&ubi->work_sem);
+	down_write(&ubi->fm_eba_sem);
 
 	ubi_refill_pools(ubi);
 
 	if (ubi->ro_mode || ubi->fm_disabled) {
+		up_write(&ubi->fm_eba_sem);
+		up_write(&ubi->work_sem);
 		up_write(&ubi->fm_protect);
 		return 0;
 	}
 
 	ret = ubi_ensure_anchor_pebs(ubi);
 	if (ret) {
+		up_write(&ubi->fm_eba_sem);
+		up_write(&ubi->work_sem);
 		up_write(&ubi->fm_protect);
 		return ret;
 	}
 
 	new_fm = kzalloc(sizeof(*new_fm), GFP_KERNEL);
 	if (!new_fm) {
+		up_write(&ubi->fm_eba_sem);
+		up_write(&ubi->work_sem);
 		up_write(&ubi->fm_protect);
 		return -ENOMEM;
 	}
@@ -1646,16 +1654,14 @@ int ubi_update_fastmap(struct ubi_device *ubi)
 		new_fm->e[0] = tmp_e;
 	}
 
-	down_write(&ubi->work_sem);
-	down_write(&ubi->fm_eba_sem);
 	ret = ubi_write_fastmap(ubi, new_fm);
-	up_write(&ubi->fm_eba_sem);
-	up_write(&ubi->work_sem);
 
 	if (ret)
 		goto err;
 
 out_unlock:
+	up_write(&ubi->fm_eba_sem);
+	up_write(&ubi->work_sem);
 	up_write(&ubi->fm_protect);
 	kfree(old_fm);
 	return ret;
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
index b419c7c..d1f0d65 100644
--- a/drivers/mtd/ubi/wl.c
+++ b/drivers/mtd/ubi/wl.c
@@ -580,7 +580,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
  * failure.
  */
 static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
-			  int vol_id, int lnum, int torture)
+			  int vol_id, int lnum, int torture, bool nested)
 {
 	struct ubi_work *wl_wrk;
 
@@ -599,7 +599,10 @@ static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
 	wl_wrk->lnum = lnum;
 	wl_wrk->torture = torture;
 
-	schedule_ubi_work(ubi, wl_wrk);
+	if (nested)
+		__schedule_ubi_work(ubi, wl_wrk);
+	else
+		schedule_ubi_work(ubi, wl_wrk);
 	return 0;
 }
 
@@ -660,6 +663,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
 	if (!vid_hdr)
 		return -ENOMEM;
 
+	down_read(&ubi->fm_eba_sem);
 	mutex_lock(&ubi->move_mutex);
 	spin_lock(&ubi->wl_lock);
 	ubi_assert(!ubi->move_from && !ubi->move_to);
@@ -890,6 +894,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
 
 	dbg_wl("done");
 	mutex_unlock(&ubi->move_mutex);
+	up_read(&ubi->fm_eba_sem);
 	return 0;
 
 	/*
@@ -940,6 +945,7 @@ out_not_moved:
 	}
 
 	mutex_unlock(&ubi->move_mutex);
+	up_read(&ubi->fm_eba_sem);
 	return 0;
 
 out_error:
@@ -961,6 +967,7 @@ out_error:
 out_ro:
 	ubi_ro_mode(ubi);
 	mutex_unlock(&ubi->move_mutex);
+	up_read(&ubi->fm_eba_sem);
 	ubi_assert(err != 0);
 	return err < 0 ? err : -EIO;
 
@@ -968,6 +975,7 @@ out_cancel:
 	ubi->wl_scheduled = 0;
 	spin_unlock(&ubi->wl_lock);
 	mutex_unlock(&ubi->move_mutex);
+	up_read(&ubi->fm_eba_sem);
 	ubi_free_vid_hdr(ubi, vid_hdr);
 	return 0;
 }
@@ -1090,7 +1098,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk)
 		int err1;
 
 		/* Re-schedule the LEB for erasure */
-		err1 = schedule_erase(ubi, e, vol_id, lnum, 0);
+		err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false);
 		if (err1) {
 			wl_entry_destroy(ubi, e);
 			err = err1;
@@ -1271,7 +1279,7 @@ retry:
 	}
 	spin_unlock(&ubi->wl_lock);
 
-	err = schedule_erase(ubi, e, vol_id, lnum, torture);
+	err = schedule_erase(ubi, e, vol_id, lnum, torture, false);
 	if (err) {
 		spin_lock(&ubi->wl_lock);
 		wl_tree_add(e, &ubi->used);
@@ -1562,7 +1570,7 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
 		e->pnum = aeb->pnum;
 		e->ec = aeb->ec;
 		ubi->lookuptbl[e->pnum] = e;
-		if (schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0)) {
+		if (schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0, false)) {
 			wl_entry_destroy(ubi, e);
 			goto out_free;
 		}
@@ -1641,7 +1649,7 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
 			e->ec = aeb->ec;
 			ubi_assert(!ubi->lookuptbl[e->pnum]);
 			ubi->lookuptbl[e->pnum] = e;
-			if (schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0)) {
+			if (schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0, false)) {
 				wl_entry_destroy(ubi, e);
 				goto out_free;
 			}
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] ubi: Fix Fastmap's update_vol()
  2016-08-24 12:36 [PATCH 1/3] ubi: Deal with interrupted erasures in WL Richard Weinberger
  2016-08-24 12:36 ` [PATCH 2/3] ubi: Fix races around ubi_refill_pools() Richard Weinberger
@ 2016-08-24 12:36 ` Richard Weinberger
  2016-09-05 21:06   ` Boris Brezillon
  2016-09-05 20:57 ` [PATCH 1/3] ubi: Deal with interrupted erasures in WL Boris Brezillon
  2 siblings, 1 reply; 5+ messages in thread
From: Richard Weinberger @ 2016-08-24 12:36 UTC (permalink / raw)
  To: linux-mtd; +Cc: Richard Weinberger, stable

Usually Fastmap is free to consider every PEB in one of the pools
as newer than the existing PEB. Since PEBs in a pool are by definition
newer than everything else.
But update_vol() missed the case that a pool can contain more than
one candidate.

Cc: <stable@vger.kernel.org>
Fixes: dbb7d2a88d ("UBI: Add fastmap core")
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 drivers/mtd/ubi/fastmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index da33100..ccb3ceb 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -328,6 +328,7 @@ static int update_vol(struct ubi_device *ubi, struct ubi_attach_info *ai,
 			aeb->pnum = new_aeb->pnum;
 			aeb->copy_flag = new_vh->copy_flag;
 			aeb->scrub = new_aeb->scrub;
+			aeb->sqnum = new_aeb->sqnum;
 			kmem_cache_free(ai->aeb_slab_cache, new_aeb);
 
 		/* new_aeb is older */
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] ubi: Deal with interrupted erasures in WL
  2016-08-24 12:36 [PATCH 1/3] ubi: Deal with interrupted erasures in WL Richard Weinberger
  2016-08-24 12:36 ` [PATCH 2/3] ubi: Fix races around ubi_refill_pools() Richard Weinberger
  2016-08-24 12:36 ` [PATCH 3/3] ubi: Fix Fastmap's update_vol() Richard Weinberger
@ 2016-09-05 20:57 ` Boris Brezillon
  2 siblings, 0 replies; 5+ messages in thread
From: Boris Brezillon @ 2016-09-05 20:57 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, stable

Hi Richard,

On Wed, 24 Aug 2016 14:36:13 +0200
Richard Weinberger <richard@nod.at> wrote:

> When Fastmap is used we can face here an -EBADMSG
> since Fastmap cannot know about unmaps.
> If the erasure was interrupted the PEB may show ECC
> errors and UBI would go to ro-mode as it assumes
> that the PEB was check during attach time, which is
> not the case with Fastmap.
> 
> Cc: <stable@vger.kernel.org>
> Fixes: dbb7d2a88d ("UBI: Add fastmap core")
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  drivers/mtd/ubi/wl.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
> index f453326..b419c7c 100644
> --- a/drivers/mtd/ubi/wl.c
> +++ b/drivers/mtd/ubi/wl.c
> @@ -644,7 +644,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
>  				int shutdown)
>  {
>  	int err, scrubbing = 0, torture = 0, protect = 0, erroneous = 0;
> -	int vol_id = -1, lnum = -1;
> +	int erase = 0, keep = 0, vol_id = -1, lnum = -1;
>  #ifdef CONFIG_MTD_UBI_FASTMAP
>  	int anchor = wrk->anchor;
>  #endif
> @@ -780,6 +780,16 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
>  			       e1->pnum);
>  			scrubbing = 1;
>  			goto out_not_moved;
> +		} else if (ubi->fast_attach && err == UBI_IO_BAD_HDR_EBADMSG) {
> +			/*
> +			 * While a full scan would detect interrupted erasures
> +			 * at attach time we can face them here when attached from
> +			 * Fastmap.
> +			 */
> +			dbg_wl("PEB %d has ECC errors, maybe from an interrupted erasure",
> +			       e1->pnum);
> +			erase = 1;
> +			goto out_not_moved;

Is this really safe to consider all blocks with a corrupted VID header
as a valid corruption caused by a power-cut?
What if the corruption happened afterward?

I'm asking this question, but I actually don't have any solution apart
flagging each scanned PEB, which brings a non-negligible overhead (a
bitmap marking already scanned PEBs).

>  		}
>  
>  		ubi_err(ubi, "error %d while reading VID header from PEB %d",
> @@ -815,6 +825,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
>  			 * Target PEB had bit-flips or write error - torture it.
>  			 */
>  			torture = 1;
> +			keep = 1;
>  			goto out_not_moved;
>  		}
>  
> @@ -901,7 +912,7 @@ out_not_moved:
>  		ubi->erroneous_peb_count += 1;
>  	} else if (scrubbing)
>  		wl_tree_add(e1, &ubi->scrub);
> -	else
> +	else if (keep)
>  		wl_tree_add(e1, &ubi->used);
>  	if (dst_leb_clean) {
>  		wl_tree_add(e2, &ubi->free);
> @@ -922,6 +933,12 @@ out_not_moved:
>  			goto out_ro;
>  	}
>  
> +	if (erase) {
> +		err = do_sync_erase(ubi, e1, vol_id, lnum, 1);
> +		if (err)
> +			goto out_ro;
> +	}
> +
>  	mutex_unlock(&ubi->move_mutex);
>  	return 0;
>  

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 3/3] ubi: Fix Fastmap's update_vol()
  2016-08-24 12:36 ` [PATCH 3/3] ubi: Fix Fastmap's update_vol() Richard Weinberger
@ 2016-09-05 21:06   ` Boris Brezillon
  0 siblings, 0 replies; 5+ messages in thread
From: Boris Brezillon @ 2016-09-05 21:06 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: linux-mtd, stable

On Wed, 24 Aug 2016 14:36:15 +0200
Richard Weinberger <richard@nod.at> wrote:

> Usually Fastmap is free to consider every PEB in one of the pools
> as newer than the existing PEB. Since PEBs in a pool are by definition
> newer than everything else.
> But update_vol() missed the case that a pool can contain more than
> one candidate.

Hm, it's not really that update_vol() was not taking this case into
account (actually it is comparing 2 LEBs and picking the latest
version), it's just that it's not updating ->sqnum as it should be. Or,
am I missing something?

> 
> Cc: <stable@vger.kernel.org>
> Fixes: dbb7d2a88d ("UBI: Add fastmap core")
> Signed-off-by: Richard Weinberger <richard@nod.at>

Anyway, apart from the commit message,

Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>

> ---
>  drivers/mtd/ubi/fastmap.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index da33100..ccb3ceb 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> @@ -328,6 +328,7 @@ static int update_vol(struct ubi_device *ubi, struct ubi_attach_info *ai,
>  			aeb->pnum = new_aeb->pnum;
>  			aeb->copy_flag = new_vh->copy_flag;
>  			aeb->scrub = new_aeb->scrub;
> +			aeb->sqnum = new_aeb->sqnum;
>  			kmem_cache_free(ai->aeb_slab_cache, new_aeb);
>  
>  		/* new_aeb is older */

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-05 21:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-24 12:36 [PATCH 1/3] ubi: Deal with interrupted erasures in WL Richard Weinberger
2016-08-24 12:36 ` [PATCH 2/3] ubi: Fix races around ubi_refill_pools() Richard Weinberger
2016-08-24 12:36 ` [PATCH 3/3] ubi: Fix Fastmap's update_vol() Richard Weinberger
2016-09-05 21:06   ` Boris Brezillon
2016-09-05 20:57 ` [PATCH 1/3] ubi: Deal with interrupted erasures in WL Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox