Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Weinberger <richard@nod.at>
To: Tanya Brokhman <tlinder@codeaurora.org>, dedekind1@gmail.com
Cc: linux-mtd@lists.infradead.org, linux-arm-msm@vger.kernel.org,
	jlauruhn@micron.com
Subject: Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
Date: Tue, 11 Nov 2014 22:39:19 +0100	[thread overview]
Message-ID: <54628207.5030205@nod.at> (raw)
In-Reply-To: <54627350.9080509@codeaurora.org>

Tanya,

Am 11.11.2014 um 21:36 schrieb Tanya Brokhman:
> Hi Artem,
> 
> Hope I didn't drop any ccs this time... Sorry about that. Not on purpose.
> 
> On 11/7/2014 10:58 AM, Artem Bityutskiy wrote:
>> On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote:
>>> What I'm trying to say - it
>>> may be too late and you may lose data here. "preferred to prevent rather
>>> than cure".
>>
>> First of all, just to clarify, I do not have a goal of turning down your
>> patches. I just want to understand why this is the best design, and if
>> it is helpful to all Linux MTD users.
>>
>> Modern flashes have strong ECC codes protecting against many bit-flips.
>> MTD even was modified to stop reporting about a single or few bit-flips,
>> because those happen too often and they are "harmless", and do not
>> require scrubbing. We have the threshold value in MTD for this, which is
>> configurable, of course.
>>
>> Bit-flips develop slowly over time. If you get one more bit-flips, it is
>> not too late yet. You can mitigate the "too late" part by reading more
>> often of course.
>>
>> You also may lower the bit-flip threshold when reading for scrubbing.
>>
>> Could you try to "sell" your design in a way that it becomes clear why
>> it is better than just reading the entire flash periodically.
> 
> Please see my "selling" bellow :)
> 
>  Some hard
>> experimental data would be preferable.
> 
> Unfortunately none. This is done for a new device that we received just now. The development was done on a virtual machine with nandsim. Testing was more of stability and regression
> 
>>
>> The advantages of the "read all periodically" approach were:
>>
>> 1. Simple, no modifications needed
>> 2. No need to write if the media is read-only, except when scrubbing
>> happens.
>> 3. Should cover all the NAND effects, including the "radiation" one.
> 
> Disadvantages (as I see it):
> 1. performance hit: when do you trigger the "read-all"? will effect performance

Only a stupid implementation will re-read/scrub all PEBs at once.
We can use a low priority thread. We can do this even in userspace.

> 2. finds bitflips only when they are present instead of preventing them from happening

We can scrub unconditionally.
Even if we scrub every PEB once a week the erase counters won't go up very much.

> Perhaps our design is an overkill for this and not covering 100% of te usecases. But it was requested by our customers to handle read-disturb and data retention specifically (as in
> "prevent" and not just "fix"). This is due to a new NAND device that should operate in high temperature and last for ~15-20 years.
> 
> But we did rethink this and we're dropping the "last erase timestamp" that was used to handle "data retention". We will force-scrub all PEBs once in a while (triggered by user) as
> Richard suggested.
> We're keeping the read counters though. I know that not all "read-disturb" scenarios are covered by this but it's more coverage then we have at the moment. So not 100% perfect
> solution but better then none.
> 
> I will update the implementation and change the fastmap layout (as suggested by Richard earlier) or try using internal UBI volume. Still have some study to do on that...

Please don't (ab)use fastmap. If you really need persistent read-counters use an internal UBI volume.
But I think that time-based unconditional scrubbing will also do it. As long we don't have sane threshold values
keeping counters is useless.

Thanks,
//richard

WARNING: multiple messages have this Message-ID (diff)

From: Richard Weinberger <richard@nod.at>
To: Tanya Brokhman <tlinder@codeaurora.org>, dedekind1@gmail.com
Cc: linux-arm-msm@vger.kernel.org, jlauruhn@micron.com,
	linux-mtd@lists.infradead.org
Subject: Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
Date: Tue, 11 Nov 2014 22:39:19 +0100	[thread overview]
Message-ID: <54628207.5030205@nod.at> (raw)
In-Reply-To: <54627350.9080509@codeaurora.org>

Tanya,

Am 11.11.2014 um 21:36 schrieb Tanya Brokhman:
> Hi Artem,
> 
> Hope I didn't drop any ccs this time... Sorry about that. Not on purpose.
> 
> On 11/7/2014 10:58 AM, Artem Bityutskiy wrote:
>> On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote:
>>> What I'm trying to say - it
>>> may be too late and you may lose data here. "preferred to prevent rather
>>> than cure".
>>
>> First of all, just to clarify, I do not have a goal of turning down your
>> patches. I just want to understand why this is the best design, and if
>> it is helpful to all Linux MTD users.
>>
>> Modern flashes have strong ECC codes protecting against many bit-flips.
>> MTD even was modified to stop reporting about a single or few bit-flips,
>> because those happen too often and they are "harmless", and do not
>> require scrubbing. We have the threshold value in MTD for this, which is
>> configurable, of course.
>>
>> Bit-flips develop slowly over time. If you get one more bit-flips, it is
>> not too late yet. You can mitigate the "too late" part by reading more
>> often of course.
>>
>> You also may lower the bit-flip threshold when reading for scrubbing.
>>
>> Could you try to "sell" your design in a way that it becomes clear why
>> it is better than just reading the entire flash periodically.
> 
> Please see my "selling" bellow :)
> 
>  Some hard
>> experimental data would be preferable.
> 
> Unfortunately none. This is done for a new device that we received just now. The development was done on a virtual machine with nandsim. Testing was more of stability and regression
> 
>>
>> The advantages of the "read all periodically" approach were:
>>
>> 1. Simple, no modifications needed
>> 2. No need to write if the media is read-only, except when scrubbing
>> happens.
>> 3. Should cover all the NAND effects, including the "radiation" one.
> 
> Disadvantages (as I see it):
> 1. performance hit: when do you trigger the "read-all"? will effect performance

Only a stupid implementation will re-read/scrub all PEBs at once.
We can use a low priority thread. We can do this even in userspace.

> 2. finds bitflips only when they are present instead of preventing them from happening

We can scrub unconditionally.
Even if we scrub every PEB once a week the erase counters won't go up very much.

> Perhaps our design is an overkill for this and not covering 100% of te usecases. But it was requested by our customers to handle read-disturb and data retention specifically (as in
> "prevent" and not just "fix"). This is due to a new NAND device that should operate in high temperature and last for ~15-20 years.
> 
> But we did rethink this and we're dropping the "last erase timestamp" that was used to handle "data retention". We will force-scrub all PEBs once in a while (triggered by user) as
> Richard suggested.
> We're keeping the read counters though. I know that not all "read-disturb" scenarios are covered by this but it's more coverage then we have at the moment. So not 100% perfect
> solution but better then none.
> 
> I will update the implementation and change the fastmap layout (as suggested by Richard earlier) or try using internal UBI volume. Still have some study to do on that...

Please don't (ab)use fastmap. If you really need persistent read-counters use an internal UBI volume.
But I think that time-based unconditional scrubbing will also do it. As long we don't have sane threshold values
keeping counters is useless.

Thanks,
//richard

next prev parent reply	other threads:[~2014-11-11 21:39 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-26 13:49 [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling Tanya Brokhman
2014-10-26 13:49 ` Tanya Brokhman
2014-10-26 20:39 ` Richard Weinberger
2014-10-26 20:39   ` Richard Weinberger
2014-10-27  8:41   ` Tanya Brokhman
2014-10-27  8:41     ` Tanya Brokhman
2014-10-27  8:56     ` Richard Weinberger
2014-10-27  8:56       ` Richard Weinberger
2014-10-29 11:03       ` Tanya Brokhman
2014-10-29 12:00         ` Richard Weinberger
2014-10-31 13:12           ` Tanya Brokhman
2014-10-31 15:34             ` Richard Weinberger
2014-10-31 15:39               ` Richard Weinberger
2014-10-31 22:55                 ` Jeff Lauruhn (jlauruhn)
2014-11-02 13:30                   ` Tanya Brokhman
2014-11-07  9:21                     ` Artem Bityutskiy
2014-11-07  9:21                       ` Artem Bityutskiy
2014-11-02 13:25                 ` Tanya Brokhman
2014-11-06  8:07                   ` Artem Bityutskiy
2014-11-06  8:07                     ` Artem Bityutskiy
2014-11-06 12:16                     ` Tanya Brokhman
2014-11-07  8:55                       ` Artem Bityutskiy
2014-11-07  8:58                       ` Artem Bityutskiy
2014-11-11 20:36                         ` Tanya Brokhman
2014-11-11 20:36                           ` Tanya Brokhman
2014-11-11 21:39                           ` Richard Weinberger [this message]
2014-11-11 21:39                             ` Richard Weinberger
2014-11-12 12:07                             ` Artem Bityutskiy
2014-11-12 12:07                               ` Artem Bityutskiy
2014-11-12 13:01                               ` Richard Weinberger
2014-11-12 13:01                                 ` Richard Weinberger
2014-11-12 13:32                                 ` Artem Bityutskiy
2014-11-12 13:32                                   ` Artem Bityutskiy
2014-11-12 15:37                                   ` Richard Weinberger
2014-11-12 15:37                                     ` Richard Weinberger
2014-11-12 11:55                           ` Artem Bityutskiy
2014-11-12 11:55                             ` Artem Bityutskiy
2014-11-13 12:13                             ` Tanya Brokhman
2014-11-13 12:13                               ` Tanya Brokhman
2014-11-13 13:36                               ` Artem Bityutskiy
2014-11-13 13:36                                 ` Artem Bityutskiy
2014-11-23  8:13                                 ` Tanya Brokhman
2014-11-23  8:13                                   ` Tanya Brokhman
2014-11-02 13:23               ` Tanya Brokhman
2014-11-02 13:54                 ` Richard Weinberger
2014-11-02 14:12                   ` Tanya Brokhman
2014-11-02 17:02                     ` Richard Weinberger
2014-11-02 17:18                       ` Tanya Brokhman
     [not found] <201411101307.03225.jbe@pengutronix.de>
2014-11-10 12:35 ` Richard Weinberger
2014-11-10 13:12   ` Juergen Borleis
2014-11-11  9:23     ` Richard Weinberger
2014-11-10 13:13   ` Ricard Wanderlof
2014-11-10 13:42     ` Juergen Borleis
2014-11-10 13:52       ` Ricard Wanderlof

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54628207.5030205@nod.at \
    --to=richard@nod.at \
    --cc=dedekind1@gmail.com \
    --cc=jlauruhn@micron.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=tlinder@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.