* [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
@ 2014-10-26 13:49 Tanya Brokhman
2014-10-26 20:39 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-10-26 13:49 UTC (permalink / raw)
To: dedekind1, richard; +Cc: linux-arm-msm, linux-mtd, Tanya Brokhman
One of the limitations of the NAND devices is the method used to read
NAND flash memory may cause bit-flips on the surrounding cells and result
in uncorrectable ECC errors. This is known as the read disturb or data
retention.
Today’s Linux NAND drivers implementation doesn’t address the read disturb
and the data retention limitations of the NAND devices. To date these
issues could be overlooked since the possibility of their occurrence in
today’s NAND devices is very low.
With the evolution of NAND devices and the requirement for a “long life”
NAND flash, read disturb and data retention can no longer be ignored
otherwise there will be data loss over time.
The following patch set implements handling of Read-disturb and Data
retention by the UBI layer.
Changes from V1:
- Documentation file was added in the first patch that describes the
design in general.
All other patches were unchanged and resent just for reference. Still
working on comments from Richard on fastmap layout.
All comments that were made for V1 will be addressed in the next patch
set. This version is just for the addition of the documentation file.
Tanya Brokhman (5):
mtd: ubi: Read disturb infrastructure
mtd: ubi: Fill read disturb statistics
mtd: ubi: Make in_wl_tree function public
mtd: ubi: Read threshold verification
mtd: ubi: Add sysfs entry to force all pebs' scan
Documentation/mtd/ubi/ubi-read-disturb.txt | 145 ++++++++++++++++
drivers/mtd/ubi/attach.c | 137 +++++++++++----
drivers/mtd/ubi/build.c | 81 +++++++++
drivers/mtd/ubi/debug.c | 11 ++
drivers/mtd/ubi/eba.c | 7 +-
drivers/mtd/ubi/fastmap.c | 132 +++++++++++---
drivers/mtd/ubi/io.c | 28 +++
drivers/mtd/ubi/ubi-media.h | 32 +++-
drivers/mtd/ubi/ubi.h | 62 ++++++-
drivers/mtd/ubi/vtbl.c | 6 +-
drivers/mtd/ubi/wl.c | 270 +++++++++++++++++++++++++++--
11 files changed, 835 insertions(+), 76 deletions(-)
create mode 100644 Documentation/mtd/ubi/ubi-read-disturb.txt
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-26 13:49 [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling Tanya Brokhman
@ 2014-10-26 20:39 ` Richard Weinberger
2014-10-27 8:41 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-10-26 20:39 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Am 26.10.2014 um 14:49 schrieb Tanya Brokhman:
> One of the limitations of the NAND devices is the method used to read
> NAND flash memory may cause bit-flips on the surrounding cells and result
> in uncorrectable ECC errors. This is known as the read disturb or data
> retention.
>
> Today’s Linux NAND drivers implementation doesn’t address the read disturb
> and the data retention limitations of the NAND devices. To date these
> issues could be overlooked since the possibility of their occurrence in
> today’s NAND devices is very low.
>
> With the evolution of NAND devices and the requirement for a “long life”
> NAND flash, read disturb and data retention can no longer be ignored
> otherwise there will be data loss over time.
>
> The following patch set implements handling of Read-disturb and Data
> retention by the UBI layer.
So, your patch addresses the following issue:
We need to re-read a PEB after a specific time (to detect bit rot) or after N reads (to detect read disturb issues).
Is this correct?
Currently users of UBI do this by having cron jobs which read the complete UBI volume
and then cause scrub work.
The draw back of this is that only UBI payload will be read and not all data like EC and VID headers.
I understand that you want to fix this issue.
According to my opinion it is not a good idea to store read counters and timestamps into the UBI/Fastmap on-disk layout.
Both the read counters and timestamps don't have to be exact values.
What about this idea?
Add a userspace interface which allows UBI to expose read counters and last access timestamps.
A userspace daemon (let's name it ubihealthd) then can decide whether it is time to trigger a re-read of a PEB.
This daemon can also store and load the timestamp values and counters from and to UBI. If it misses these meta data some times due to a
power cut it won't hurt.
We could also add another internal UBI volume which can carry these data.
All in all, I like the idea but changing/extending the on-disk layout is overkill IMHO.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-26 20:39 ` Richard Weinberger
@ 2014-10-27 8:41 ` Tanya Brokhman
2014-10-27 8:56 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-10-27 8:41 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
On 10/26/2014 10:39 PM, Richard Weinberger wrote:
> Am 26.10.2014 um 14:49 schrieb Tanya Brokhman:
>> One of the limitations of the NAND devices is the method used to read
>> NAND flash memory may cause bit-flips on the surrounding cells and result
>> in uncorrectable ECC errors. This is known as the read disturb or data
>> retention.
>>
>> Today’s Linux NAND drivers implementation doesn’t address the read disturb
>> and the data retention limitations of the NAND devices. To date these
>> issues could be overlooked since the possibility of their occurrence in
>> today’s NAND devices is very low.
>>
>> With the evolution of NAND devices and the requirement for a “long life”
>> NAND flash, read disturb and data retention can no longer be ignored
>> otherwise there will be data loss over time.
>>
>> The following patch set implements handling of Read-disturb and Data
>> retention by the UBI layer.
>
> So, your patch addresses the following issue:
> We need to re-read a PEB after a specific time (to detect bit rot) or after N reads (to detect read disturb issues).
> Is this correct?
Not exactly... We need to scrub a PEB that is being frequently read from
in order to prevent bit-flip errors that might occur due to read-disturb
>
> Currently users of UBI do this by having cron jobs which read the complete UBI volume
> and then cause scrub work.
> The draw back of this is that only UBI payload will be read and not all data like EC and VID headers.
> I understand that you want to fix this issue.
Not sure I completely understand what this crons do but the last patch
in the series does something similar.
>
> According to my opinion it is not a good idea to store read counters and timestamps into the UBI/Fastmap on-disk layout.
> Both the read counters and timestamps don't have to be exact values.
Why not? Storing last_erase_timestamp doesn't increase the memory
consumption on NAND since I used reserved bytes in the ec_header. I
agree that the RAM is increased but I couldn't find any other way to
have these statistics saved.
read_counters can be saved ONLY as part of fastmap unfortunately because
of the erase-before-write limitation.
>
> What about this idea?
> Add a userspace interface which allows UBI to expose read counters and last access timestamps.
Where will you save those?
> A userspace daemon (let's name it ubihealthd) then can decide whether it is time to trigger a re-read of a PEB.
Not a re-read - scrub. read-disturb is fixed by erasing the PEB.
> This daemon can also store and load the timestamp values and counters from and to UBI. If it misses these meta data some times due to a
> power cut it won't hurt.
Not sure i follow. How is this better then doing this from the kernel?
you do have to store the timestamps and the read_counters somewhere and
they are both updated in the ubi layer. I must be missing something
here. Could you please elaborate on your idea?
> We could also add another internal UBI volume which can carry these data.
I'm afraid I have to disagree with this idea. First of all having a
dedicated volume for this data is an overkill. Its not a sufficient
amount of data to reserve a volume for. and what about the PEBs that
belong to this volume? Taking this feature out of the UBI layer is just
complicated, feels wrong from design perspective, and I don't see the
benefit of it. Basically, its very similar to the wear-leveling but for
"reads" instead of "writes".
>
> All in all, I like the idea but changing/extending the on-disk layout is overkill IMHO.
Why? Without addressing this issues we can't have devices with life span
of more then ~5 years (and we need to). And this is very similar to
wear-leveling and erase counters. So why is read-counters and
erase_timestamp is an overkill?
I'm working on your idea of changing the fastmap layout to save all the
read disturb data at the end of it and not integrated into fastmap
existing data structures (as is done in this version of the code). But
as I see it, fastmap has to be updates as well.
>
> Thanks,
> //richard
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-27 8:41 ` Tanya Brokhman
@ 2014-10-27 8:56 ` Richard Weinberger
2014-10-29 11:03 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-10-27 8:56 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Tanya,
Am 27.10.2014 um 09:41 schrieb Tanya Brokhman:
>> So, your patch addresses the following issue:
>> We need to re-read a PEB after a specific time (to detect bit rot) or after N reads (to detect read disturb issues).
>> Is this correct?
>
> Not exactly... We need to scrub a PEB that is being frequently read from in order to prevent bit-flip errors that might occur due to read-disturb
This is what I meant with "after N reads". :)
>>
>> Currently users of UBI do this by having cron jobs which read the complete UBI volume
>> and then cause scrub work.
>> The draw back of this is that only UBI payload will be read and not all data like EC and VID headers.
>> I understand that you want to fix this issue.
>
> Not sure I completely understand what this crons do but the last patch in the series does something similar.
The cron job reads the complete UBI volume. i.e. dd=/dev/ubi0_X of=/dev/null. It will trigger scrub work
for bit-flipping PEBs. Is the poor men variant of your feature.
>>
>> According to my opinion it is not a good idea to store read counters and timestamps into the UBI/Fastmap on-disk layout.
>> Both the read counters and timestamps don't have to be exact values.
>
> Why not? Storing last_erase_timestamp doesn't increase the memory consumption on NAND since I used reserved bytes in the ec_header. I agree that the RAM is increased but I couldn't
> find any other way to have these statistics saved.
> read_counters can be saved ONLY as part of fastmap unfortunately because of the erase-before-write limitation.
Please explain in detail why those counters have to be exact.
I was not complaining about RAM consumption. But I think we should change the on-disk layout only for very
serious reasons.
>>
>> What about this idea?
>> Add a userspace interface which allows UBI to expose read counters and last access timestamps.
>
> Where will you save those?
In a plain file? As I said, the counters don't have to be exact. If you lose one cycle, who cares....
The counters and timestamps are only a rough estimate.
i.e. the userspace daemon dumps all this informations from UBI and stores them to a file (or a static UBI volume).
Upon system boot it restores them.
>> A userspace daemon (let's name it ubihealthd) then can decide whether it is time to trigger a re-read of a PEB.
>
> Not a re-read - scrub. read-disturb is fixed by erasing the PEB.
It will trigger a scrub work if bit-flipps happen. But what I was trying to say, this all can be done perfectly fine
in userspace.
>> This daemon can also store and load the timestamp values and counters from and to UBI. If it misses these meta data some times due to a
>> power cut it won't hurt.
>
> Not sure i follow. How is this better then doing this from the kernel? you do have to store the timestamps and the read_counters somewhere and they are both updated in the ubi
> layer. I must be missing something here. Could you please elaborate on your idea?
If it can be done in userspace, do it in userspace. We have to make sure that the kernel stays maintainable.
We really don't want to add new complexity which is not really needed.
>> We could also add another internal UBI volume which can carry these data.
>
> I'm afraid I have to disagree with this idea. First of all having a dedicated volume for this data is an overkill. Its not a sufficient amount of data to reserve a volume for. and
> what about the PEBs that belong to this volume? Taking this feature out of the UBI layer is just complicated, feels wrong from design perspective, and I don't see the benefit of
> it. Basically, its very similar to the wear-leveling but for "reads" instead of "writes".
But adding this data to fastmap is a better idea? fastmap is also just another internal volume.
>>
>> All in all, I like the idea but changing/extending the on-disk layout is overkill IMHO.
>
> Why? Without addressing this issues we can't have devices with life span of more then ~5 years (and we need to). And this is very similar to wear-leveling and erase counters. So
> why is read-counters and erase_timestamp is an overkill?
> I'm working on your idea of changing the fastmap layout to save all the read disturb data at the end of it and not integrated into fastmap existing data structures (as is done in
> this version of the code). But as I see it, fastmap has to be updates as well.
I meant that adding these data to the on-disk layout is overkill. I like your feature but not the part
where you extend the on-disk layout. In my opinion most of it can be done without storing this data into fastmap
or other UBI internal on-disk data structures.
As I said, the counters don't have to be exact. Let a daemon handle and persist them.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-27 8:56 ` Richard Weinberger
@ 2014-10-29 11:03 ` Tanya Brokhman
2014-10-29 12:00 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-10-29 11:03 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
Hi Richard
On 10/27/2014 10:56 AM, Richard Weinberger wrote:
> Tanya,
>
> Am 27.10.2014 um 09:41 schrieb Tanya Brokhman:
>>> So, your patch addresses the following issue:
>>> We need to re-read a PEB after a specific time (to detect bit rot) or after N reads (to detect read disturb issues).
>>> Is this correct?
>>
>> Not exactly... We need to scrub a PEB that is being frequently read from in order to prevent bit-flip errors that might occur due to read-disturb
>
> This is what I meant with "after N reads". :)
>
>>>
>>> Currently users of UBI do this by having cron jobs which read the complete UBI volume
>>> and then cause scrub work.
>>> The draw back of this is that only UBI payload will be read and not all data like EC and VID headers.
>>> I understand that you want to fix this issue.
>>
>> Not sure I completely understand what this crons do but the last patch in the series does something similar.
>
> The cron job reads the complete UBI volume. i.e. dd=/dev/ubi0_X of=/dev/null. It will trigger scrub work
> for bit-flipping PEBs. Is the poor men variant of your feature.
>
>>>
>>> According to my opinion it is not a good idea to store read counters and timestamps into the UBI/Fastmap on-disk layout.
>>> Both the read counters and timestamps don't have to be exact values.
>>
>> Why not? Storing last_erase_timestamp doesn't increase the memory consumption on NAND since I used reserved bytes in the ec_header. I agree that the RAM is increased but I couldn't
>> find any other way to have these statistics saved.
>> read_counters can be saved ONLY as part of fastmap unfortunately because of the erase-before-write limitation.
>
> Please explain in detail why those counters have to be exact.
> I was not complaining about RAM consumption. But I think we should change the on-disk layout only for very
> serious reasons.
>
>>>
>>> What about this idea?
>>> Add a userspace interface which allows UBI to expose read counters and last access timestamps.
>>
>> Where will you save those?
>
> In a plain file? As I said, the counters don't have to be exact. If you lose one cycle, who cares....
> The counters and timestamps are only a rough estimate.
> i.e. the userspace daemon dumps all this informations from UBI and stores them to a file (or a static UBI volume).
> Upon system boot it restores them.
>
>>> A userspace daemon (let's name it ubihealthd) then can decide whether it is time to trigger a re-read of a PEB.
>>
>> Not a re-read - scrub. read-disturb is fixed by erasing the PEB.
>
> It will trigger a scrub work if bit-flipps happen. But what I was trying to say, this all can be done perfectly fine
> in userspace.
>
>>> This daemon can also store and load the timestamp values and counters from and to UBI. If it misses these meta data some times due to a
>>> power cut it won't hurt.
>>
>> Not sure i follow. How is this better then doing this from the kernel? you do have to store the timestamps and the read_counters somewhere and they are both updated in the ubi
>> layer. I must be missing something here. Could you please elaborate on your idea?
>
> If it can be done in userspace, do it in userspace. We have to make sure that the kernel stays maintainable.
> We really don't want to add new complexity which is not really needed.
>
>>> We could also add another internal UBI volume which can carry these data.
>>
>> I'm afraid I have to disagree with this idea. First of all having a dedicated volume for this data is an overkill. Its not a sufficient amount of data to reserve a volume for. and
>> what about the PEBs that belong to this volume? Taking this feature out of the UBI layer is just complicated, feels wrong from design perspective, and I don't see the benefit of
>> it. Basically, its very similar to the wear-leveling but for "reads" instead of "writes".
>
> But adding this data to fastmap is a better idea? fastmap is also just another internal volume.
>
>>>
>>> All in all, I like the idea but changing/extending the on-disk layout is overkill IMHO.
>>
>> Why? Without addressing this issues we can't have devices with life span of more then ~5 years (and we need to). And this is very similar to wear-leveling and erase counters. So
>> why is read-counters and erase_timestamp is an overkill?
>> I'm working on your idea of changing the fastmap layout to save all the read disturb data at the end of it and not integrated into fastmap existing data structures (as is done in
>> this version of the code). But as I see it, fastmap has to be updates as well.
>
> I meant that adding these data to the on-disk layout is overkill. I like your feature but not the part
> where you extend the on-disk layout. In my opinion most of it can be done without storing this data into fastmap
> or other UBI internal on-disk data structures.
> As I said, the counters don't have to be exact. Let a daemon handle and persist them.
I'll try to address all you comments in one place.
You're right that the read counters don't have to be exact but they do
have to reflect the real state.
Regarding your idea of saving them to a file, or somehow with userspace
involved; This is doable, but such solution will depend on user space
implementation:
- one need to update kernel with correct read counters (saved somewhere
in userspace)
- it is required on every boot.
- saving the counters back to userspace should be periodically triggered
as well.
So the minimal workflow for each boot life cycle will be:
- on boot: update kernel with correct values from userspace
- kernel updates the counters on each read operation
- on powerdown: save the updated kernel counters back to userspace
The read-disturb handling is based on kernel updating and monitoring
read counters. Taking this out of the kernel space will result in an
incomplete and very fragile solution for the read-disturb problem since
the dependency in userspace is just too big.
Another issue to consider is that each SW upgrade will result in loosing
the counters saved in userspace and reset all. Otherwise, system upgrade
process will also have to be updated.
The read counters are very much like the ec counters used for
wear-leveling; One is updated on each erase, other on each read; One is
used to handle issues caused by frequent writes (erase operations), the
other handle issues caused by frequent reads.
So how are the two different? Why isn't wear-leveling (and erase
counters) handled by userspace? My guess that the decision to
encapsulate the wear-leveling into the kernel was due to the above
mentioned reasons.
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-29 11:03 ` Tanya Brokhman
@ 2014-10-29 12:00 ` Richard Weinberger
2014-10-31 13:12 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-10-29 12:00 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Tanya,
Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
> I'll try to address all you comments in one place.
> You're right that the read counters don't have to be exact but they do have to reflect the real state.
But it does not really matter if the counters are a way to high or too low?
It does also not matter if a re-read of adjacent PEBs is issued too often.
It won't hurt.
> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
> - one need to update kernel with correct read counters (saved somewhere in userspace)
> - it is required on every boot.
> - saving the counters back to userspace should be periodically triggered as well.
> So the minimal workflow for each boot life cycle will be:
> - on boot: update kernel with correct values from userspace
Correct.
> - kernel updates the counters on each read operation
Yeah, that's a plain simple in kernel counter..
> - on powerdown: save the updated kernel counters back to userspace
Correct. The counters can also be saved once a day by cron.
If one or two save operations are missed it won't hurt either.
> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
> the read-disturb problem since the dependency in userspace is just too big.
Why?
We both agree on the fact that the counters don't have to be exact.
Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be updated.
Does it hurt if these counters are lost upon an upgrade?
Why do we need them for ever?
If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
And of course these counters can be preserved. One can also place them into a UBI static volume.
Or use a sane upgrade process...
As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
> writes (erase operations), the other handle issues caused by frequent reads.
> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
> to the above mentioned reasons.
The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
Again, to my understanding read counters are just a rough indicator when to have a check.
If we don't do this check immediately, nothing will go bad. As I understand the feature it is something like "Oh, the following PEBs got read a lot in the last few hours, let's
trigger a check later." Same applies for the timestamps.
Thanks,
//richard
P.s: Is my assumption correct that read counters are needed because newer MLC-NANDs are so crappy? ;-)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-29 12:00 ` Richard Weinberger
@ 2014-10-31 13:12 ` Tanya Brokhman
2014-10-31 15:34 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-10-31 13:12 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
Hi Richard
On 10/29/2014 2:00 PM, Richard Weinberger wrote:
> Tanya,
>
> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>> I'll try to address all you comments in one place.
>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>
> But it does not really matter if the counters are a way to high or too low?
> It does also not matter if a re-read of adjacent PEBs is issued too often.
> It won't hurt.
>
>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>> - it is required on every boot.
>> - saving the counters back to userspace should be periodically triggered as well.
>> So the minimal workflow for each boot life cycle will be:
>> - on boot: update kernel with correct values from userspace
>
> Correct.
>
>> - kernel updates the counters on each read operation
>
> Yeah, that's a plain simple in kernel counter..
>
>> - on powerdown: save the updated kernel counters back to userspace
>
> Correct. The counters can also be saved once a day by cron.
> If one or two save operations are missed it won't hurt either.
>
>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>> the read-disturb problem since the dependency in userspace is just too big.
>
> Why?
> We both agree on the fact that the counters don't have to be exact.
> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
The idea is to prevent data loss, to prevent errors while reading,
because we might hit errors we can't fix. So although the
read_disturb_threshold is a rough estimation based on statistics, we
can't ignore it and need to stay close to the calculated statistics.
Its really the same as wear-leveling. You have a limitation that each
peb can be erased limited number of times. This erase-limit is also an
estimation based on statistics collected by the card vendor. But you do
want to know the exact number of erase counter to prevent erasing the
block extensively.
>
>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be updated.
>
> Does it hurt if these counters are lost upon an upgrade?
> Why do we need them for ever?
> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
yes, we do need the ACCURATE counters and cant loose them. For example:
we have a heavily read block. It was read from 100 times when the
read-threshold is 101. Meaning, the 101 read will most probably fail.
You do a SW upgrade, and set the read-counter for this block as 0 and
don't scrubb it. Next time you try reading from it (since it's heavily
read from block), you'll get errors. If you're lucky, ecc will fx them
for you, but its not guarantied.
>
> And of course these counters can be preserved. One can also place them into a UBI static volume.
> Or use a sane upgrade process...
"Sane upgrade" means that in order to support read-disturb we twist the
users hand into implementing not a trivial logic in userspace.
>
> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>
>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>> writes (erase operations), the other handle issues caused by frequent reads.
>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>> to the above mentioned reasons.
>
> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
Same with read-counters and last_erase_timestamps. If ec counters are
lost, we might get with bad blocks (since they are worn out) and have
data loss.
If we ignore read-disturb and don't' scrubb heavily read blocks we will
have data loss as well.
the only difference between the 2 scenarios is "how long before it
happens". Read-disturb wasn't an issue since average lifespan of a nand
device was ~5 years. Read-disturb occurs in a longer lifespan. that's
why it's required now: a need for a "long life nand".
>
> Again, to my understanding read counters are just a rough indicator when to have a check.
> If we don't do this check immediately, nothing will go bad. As I understand the feature it is something like "Oh, the following PEBs got read a lot in the last few hours, let's
> trigger a check later." Same applies for the timestamps.
I'm afraid your understanding is inaccurate :) Hope I explained in the
prev paragraph why.
>
> Thanks,
> //richard
>
> P.s: Is my assumption correct that read counters are needed because newer MLC-NANDs are so crappy? ;-)
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-31 13:12 ` Tanya Brokhman
@ 2014-10-31 15:34 ` Richard Weinberger
2014-10-31 15:39 ` Richard Weinberger
2014-11-02 13:23 ` Tanya Brokhman
0 siblings, 2 replies; 38+ messages in thread
From: Richard Weinberger @ 2014-10-31 15:34 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Hi Tanya,
Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
> Hi Richard
>
> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>> Tanya,
>>
>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>> I'll try to address all you comments in one place.
>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>
>> But it does not really matter if the counters are a way to high or too low?
>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>> It won't hurt.
>>
>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>>> - it is required on every boot.
>>> - saving the counters back to userspace should be periodically triggered as well.
>>> So the minimal workflow for each boot life cycle will be:
>>> - on boot: update kernel with correct values from userspace
>>
>> Correct.
>>
>>> - kernel updates the counters on each read operation
>>
>> Yeah, that's a plain simple in kernel counter..
>>
>>> - on powerdown: save the updated kernel counters back to userspace
>>
>> Correct. The counters can also be saved once a day by cron.
>> If one or two save operations are missed it won't hurt either.
>>
>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>>> the read-disturb problem since the dependency in userspace is just too big.
>>
>> Why?
>> We both agree on the fact that the counters don't have to be exact.
>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>
> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
> statistics, we can't ignore it and need to stay close to the calculated statistics.
>
> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
So you have to update the EC-Header every time we read a PEB...?
>
>>
>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
>>> updated.
>>
>> Does it hurt if these counters are lost upon an upgrade?
>> Why do we need them for ever?
>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>
> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
> read will most probably fail.
You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
> you're lucky, ecc will fx them for you, but its not guarantied.
>
>>
>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>> Or use a sane upgrade process...
>
> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>
>>
>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>
>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>>> writes (erase operations), the other handle issues caused by frequent reads.
>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>>> to the above mentioned reasons.
>>
>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>
> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
> in a longer lifespan. that's why it's required now: a need for a "long life nand".
Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
Let's recap.
We need to address two issues:
a) If a PEB is ready very often we need to scrub it.
b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
I don't think that this is a good solution.
We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
Btw: We also have to be very careful that reading data will not wear out the flash.
So, we need a logic within UBI which counts every read access and persists this data in some way.
As suggested in an earlier mail this can also be done purely in userspace.
It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
My point is that no on-disk layout change at all is needed.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-31 15:34 ` Richard Weinberger
@ 2014-10-31 15:39 ` Richard Weinberger
2014-10-31 22:55 ` Jeff Lauruhn (jlauruhn)
2014-11-02 13:25 ` Tanya Brokhman
2014-11-02 13:23 ` Tanya Brokhman
1 sibling, 2 replies; 38+ messages in thread
From: Richard Weinberger @ 2014-10-31 15:39 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Am 31.10.2014 um 16:34 schrieb Richard Weinberger:
> Hi Tanya,
>
> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
>> Hi Richard
>>
>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>>> Tanya,
>>>
>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>>> I'll try to address all you comments in one place.
>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>>
>>> But it does not really matter if the counters are a way to high or too low?
>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>>> It won't hurt.
>>>
>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>>>> - it is required on every boot.
>>>> - saving the counters back to userspace should be periodically triggered as well.
>>>> So the minimal workflow for each boot life cycle will be:
>>>> - on boot: update kernel with correct values from userspace
>>>
>>> Correct.
>>>
>>>> - kernel updates the counters on each read operation
>>>
>>> Yeah, that's a plain simple in kernel counter..
>>>
>>>> - on powerdown: save the updated kernel counters back to userspace
>>>
>>> Correct. The counters can also be saved once a day by cron.
>>> If one or two save operations are missed it won't hurt either.
>>>
>>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>>>> the read-disturb problem since the dependency in userspace is just too big.
>>>
>>> Why?
>>> We both agree on the fact that the counters don't have to be exact.
>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>>
>> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
>> statistics, we can't ignore it and need to stay close to the calculated statistics.
>>
>> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
>> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
>
> So you have to update the EC-Header every time we read a PEB...?
>
>>
>>>
>>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
>>>> updated.
>>>
>>> Does it hurt if these counters are lost upon an upgrade?
>>> Why do we need them for ever?
>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>>
>> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
>> read will most probably fail.
>
> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
>
>> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
>> you're lucky, ecc will fx them for you, but its not guarantied.
>>
>>>
>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>>> Or use a sane upgrade process...
>>
>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>>
>>>
>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>>
>>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>>>> writes (erase operations), the other handle issues caused by frequent reads.
>>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>>>> to the above mentioned reasons.
>>>
>>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
>>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>>
>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
>> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
>> in a longer lifespan. that's why it's required now: a need for a "long life nand".
>
> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
>
> Let's recap.
>
> We need to address two issues:
> a) If a PEB is ready very often we need to scrub it.
> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
>
> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
> I don't think that this is a good solution.
> We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
> or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
> it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
> Btw: We also have to be very careful that reading data will not wear out the flash.
>
> So, we need a logic within UBI which counts every read access and persists this data in some way.
> As suggested in an earlier mail this can also be done purely in userspace.
> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
Another point:
What if we scrub every PEB once a week?
Why would that not work?
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-31 15:39 ` Richard Weinberger
@ 2014-10-31 22:55 ` Jeff Lauruhn (jlauruhn)
2014-11-02 13:30 ` Tanya Brokhman
2014-11-02 13:25 ` Tanya Brokhman
1 sibling, 1 reply; 38+ messages in thread
From: Jeff Lauruhn (jlauruhn) @ 2014-10-31 22:55 UTC (permalink / raw)
To: Richard Weinberger, Tanya Brokhman, dedekind1@gmail.com
Cc: linux-arm-msm@vger.kernel.org, linux-mtd@lists.infradead.org
Hope I'm not over stepping here, but I thought I could help. I'm a NAND AE.
Are you using NAND or eMMC? If NAND why not use ECC to monitor for disturb? NAND is a great storage unit, but you have to follow the rules. Please refer to Micron datasheet MT29F2G08ABAEAH4 page 100. NAND is made up of blocks(2048 in this case), each block has a number of pages. The block is the smallest erasable unit and the only way to change 0's to 1's. Pages are the smallest programmable unit and can only change 1's to 0's. P/E cycling (100,000 in this case) wears out the block. We provide 64bytes of spare area for BCH ECC and NAND management. BCH ECC will tell you if bits have changed and will correct up to 5.
Read disturb is a recoverable failure. It doesn't affect the cells in the page you are reading it affects the cells on either side of the page you are reading. P/E cycling for this device is 100,000. You can program once and read many many times.
Data retention is the loss of charge on the cells. Technically you can only change a 0 to 1 by erasing the whole block. However, data retention is the loss of charge in a cell over time. In this case data retention is 10 years. Data retention gets worse as temperature goes up.
-----Original Message-----
From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf Of Richard Weinberger
Sent: Friday, October 31, 2014 8:40 AM
To: Tanya Brokhman; dedekind1@gmail.com
Cc: linux-arm-msm@vger.kernel.org; linux-mtd@lists.infradead.org
Subject: Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
Am 31.10.2014 um 16:34 schrieb Richard Weinberger:
> Hi Tanya,
>
> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
>> Hi Richard
>>
>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>>> Tanya,
>>>
>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>>> I'll try to address all you comments in one place.
>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>>
>>> But it does not really matter if the counters are a way to high or too low?
>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>>> It won't hurt.
>>>
>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>>> - one need to update kernel with correct read counters (saved
>>>> somewhere in userspace)
>>>> - it is required on every boot.
>>>> - saving the counters back to userspace should be periodically triggered as well.
>>>> So the minimal workflow for each boot life cycle will be:
>>>> - on boot: update kernel with correct values from userspace
>>>
>>> Correct.
>>>
>>>> - kernel updates the counters on each read operation
>>>
>>> Yeah, that's a plain simple in kernel counter..
>>>
>>>> - on powerdown: save the updated kernel counters back to userspace
>>>
>>> Correct. The counters can also be saved once a day by cron.
>>> If one or two save operations are missed it won't hurt either.
>>>
>>>> The read-disturb handling is based on kernel updating and
>>>> monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for the read-disturb problem since the dependency in userspace is just too big.
>>>
>>> Why?
>>> We both agree on the fact that the counters don't have to be exact.
>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>>
>> The idea is to prevent data loss, to prevent errors while reading,
>> because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on statistics, we can't ignore it and need to stay close to the calculated statistics.
>>
>> Its really the same as wear-leveling. You have a limitation that each
>> peb can be erased limited number of times. This erase-limit is also an estimation based on statistics collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
>
> So you have to update the EC-Header every time we read a PEB...?
>
>>
>>>
>>>> Another issue to consider is that each SW upgrade will result in
>>>> loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be updated.
>>>
>>> Does it hurt if these counters are lost upon an upgrade?
>>> Why do we need them for ever?
>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>>
>> yes, we do need the ACCURATE counters and cant loose them. For
>> example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101 read will most probably fail.
>
> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
>
>> You do a SW upgrade, and set the read-counter for this block as 0 and
>> don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If you're lucky, ecc will fx them for you, but its not guarantied.
>>
>>>
>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>>> Or use a sane upgrade process...
>>
>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>>
>>>
>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>>
>>>> The read counters are very much like the ec counters used for
>>>> wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent writes (erase operations), the other handle issues caused by frequent reads.
>>>> So how are the two different? Why isn't wear-leveling (and erase
>>>> counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due to the above mentioned reasons.
>>>
>>> The erase counters are crucial for UBI to operate. Even while
>>> booting up the kernel and mounting UBIFS the EC counters have to available because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>>
>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
>> the only difference between the 2 scenarios is "how long before it
>> happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs in a longer lifespan. that's why it's required now: a need for a "long life nand".
>
> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
>
> Let's recap.
>
> We need to address two issues:
> a) If a PEB is ready very often we need to scrub it.
> b) PEBs which are not read for a very long time need to be
> re-read/scrubbed to detect bit-rot
>
> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
> I don't think that this is a good solution.
> We can perfectly fine save the read-counters from time to time and
> upon detach either to a file on UBIFS or into a new internal value. As
> read-disturb will only happen after a long time and hence very high read-counters it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
> Btw: We also have to be very careful that reading data will not wear out the flash.
>
> So, we need a logic within UBI which counts every read access and persists this data in some way.
> As suggested in an earlier mail this can also be done purely in userspace.
> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
Another point:
What if we scrub every PEB once a week?
Why would that not work?
Thanks,
//richard
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-31 15:34 ` Richard Weinberger
2014-10-31 15:39 ` Richard Weinberger
@ 2014-11-02 13:23 ` Tanya Brokhman
2014-11-02 13:54 ` Richard Weinberger
1 sibling, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-02 13:23 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
On 10/31/2014 5:34 PM, Richard Weinberger wrote:
> Hi Tanya,
>
> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
>> Hi Richard
>>
>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>>> Tanya,
>>>
>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>>> I'll try to address all you comments in one place.
>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>>
>>> But it does not really matter if the counters are a way to high or too low?
>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>>> It won't hurt.
>>>
>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>>>> - it is required on every boot.
>>>> - saving the counters back to userspace should be periodically triggered as well.
>>>> So the minimal workflow for each boot life cycle will be:
>>>> - on boot: update kernel with correct values from userspace
>>>
>>> Correct.
>>>
>>>> - kernel updates the counters on each read operation
>>>
>>> Yeah, that's a plain simple in kernel counter..
>>>
>>>> - on powerdown: save the updated kernel counters back to userspace
>>>
>>> Correct. The counters can also be saved once a day by cron.
>>> If one or two save operations are missed it won't hurt either.
>>>
>>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>>>> the read-disturb problem since the dependency in userspace is just too big.
>>>
>>> Why?
>>> We both agree on the fact that the counters don't have to be exact.
>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>>
>> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
>> statistics, we can't ignore it and need to stay close to the calculated statistics.
>>
>> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
>> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
>
> So you have to update the EC-Header every time we read a PEB...?
No, I can't save the read-counter as part of the ec_header because of
the erase-before-write limitation. Thats why the read-counters are saved
only as part of the fastmap data.
last_erase_timestamp is saved as part of the ec_header and it's updated
each erase operation together with the erase counter. For
last_erase_timestamp I used the reserved bytes of the ec_header so not
much impact here.
>
>>
>>>
>>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
>>>> updated.
>>>
>>> Does it hurt if these counters are lost upon an upgrade?
>>> Why do we need them for ever?
>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>>
>> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
>> read will most probably fail.
>
> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
Of course not :) it was just an example. The actual value for
read-disturb is huge and is defined by the NAND manufacturer.
> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
>
>> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
>> you're lucky, ecc will fx them for you, but its not guarantied.
>>
>>>
>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>>> Or use a sane upgrade process...
>>
>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>>
>>>
>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>>
>>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>>>> writes (erase operations), the other handle issues caused by frequent reads.
>>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>>>> to the above mentioned reasons.
>>>
>>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
>>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>>
>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
>> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
>> in a longer lifespan. that's why it's required now: a need for a "long life nand".
>
> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
Yes. In 0001-mtd-ubi-Read-disturb-infrastructure.patch you'll find:
#define UBI_RD_THRESHOLD 100000
Can't share more than that. This value is defined by card manufacturer
and configurable by this define.
>
> Let's recap.
>
> We need to address two issues:
> a) If a PEB is ready very often we need to scrub it.
right. this is what the read-counter is for.
> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
it need to be scrubbed. this is for data retention and these pebs are
found by last_erase_timestamp. I referred to them as "pebs that are
rarely accessed. "
>
> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
That isn't good enough. Because if we just re-read the peb we will find
the "problematic" once only when the read produces ecc errors. But if we
relay on that we may be too late because we might hit ecc errors that we
just wont be able to fix and data will be lost. So the goal is *to
prevent* ecc errors on read. That's why we need both the read-counter
(for heavily read pebs) and the last_erase_timestamp (for once that are
rarely accessed).
> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
No, not on disk layout. You're mixing the read-counter with the
last_erase_timestamp.
read-counter: maintained only at RAM, saved *only* as part of fastmap
data. If fastmap data is lost: read counters are lost too
last-erase-timestamp: part of ec_header, maintained on disk
> I don't think that this is a good solution.
> We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
> or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
> it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
> Btw: We also have to be very careful that reading data will not wear out the flash.
>
> So, we need a logic within UBI which counts every read access and persists this data in some way.
> As suggested in an earlier mail this can also be done purely in userspace.
> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
>
> My point is that no on-disk layout change at all is needed.
I hope my previous answer addressed the above as well, since you
misunderstood where the read-counters will be saved.
BTW, I described it all in the documentation file I added in patch #1 :)
>
> Thanks,
> //richard
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-31 15:39 ` Richard Weinberger
2014-10-31 22:55 ` Jeff Lauruhn (jlauruhn)
@ 2014-11-02 13:25 ` Tanya Brokhman
2014-11-06 8:07 ` Artem Bityutskiy
1 sibling, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-02 13:25 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
On 10/31/2014 5:39 PM, Richard Weinberger wrote:
> Am 31.10.2014 um 16:34 schrieb Richard Weinberger:
>> Hi Tanya,
>>
>> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
>>> Hi Richard
>>>
>>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>>>> Tanya,
>>>>
>>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>>>> I'll try to address all you comments in one place.
>>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>>>
>>>> But it does not really matter if the counters are a way to high or too low?
>>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>>>> It won't hurt.
>>>>
>>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>>>>> - it is required on every boot.
>>>>> - saving the counters back to userspace should be periodically triggered as well.
>>>>> So the minimal workflow for each boot life cycle will be:
>>>>> - on boot: update kernel with correct values from userspace
>>>>
>>>> Correct.
>>>>
>>>>> - kernel updates the counters on each read operation
>>>>
>>>> Yeah, that's a plain simple in kernel counter..
>>>>
>>>>> - on powerdown: save the updated kernel counters back to userspace
>>>>
>>>> Correct. The counters can also be saved once a day by cron.
>>>> If one or two save operations are missed it won't hurt either.
>>>>
>>>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>>>>> the read-disturb problem since the dependency in userspace is just too big.
>>>>
>>>> Why?
>>>> We both agree on the fact that the counters don't have to be exact.
>>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>>>
>>> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
>>> statistics, we can't ignore it and need to stay close to the calculated statistics.
>>>
>>> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
>>> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
>>
>> So you have to update the EC-Header every time we read a PEB...?
>>
>>>
>>>>
>>>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
>>>>> updated.
>>>>
>>>> Does it hurt if these counters are lost upon an upgrade?
>>>> Why do we need them for ever?
>>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>>>
>>> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
>>> read will most probably fail.
>>
>> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
>> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
>>
>>> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
>>> you're lucky, ecc will fx them for you, but its not guarantied.
>>>
>>>>
>>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>>>> Or use a sane upgrade process...
>>>
>>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>>>
>>>>
>>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>>>
>>>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>>>>> writes (erase operations), the other handle issues caused by frequent reads.
>>>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>>>>> to the above mentioned reasons.
>>>>
>>>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
>>>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>>>
>>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
>>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
>>> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
>>> in a longer lifespan. that's why it's required now: a need for a "long life nand".
>>
>> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
>>
>> Let's recap.
>>
>> We need to address two issues:
>> a) If a PEB is ready very often we need to scrub it.
>> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
>>
>> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
>> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
>> I don't think that this is a good solution.
>> We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
>> or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
>> it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
>> Btw: We also have to be very careful that reading data will not wear out the flash.
>>
>> So, we need a logic within UBI which counts every read access and persists this data in some way.
>> As suggested in an earlier mail this can also be done purely in userspace.
>> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
>
> Another point:
> What if we scrub every PEB once a week?
> Why would that not work?
It will work but it's an overkill because we don't want to scrub (and
erase) pebs that don't need this because this way we will ware out the
device in terms on wear-leveling.
Besides, scrubbing all pebs will also be a performance hit.
>
> Thanks,
> //richard
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-10-31 22:55 ` Jeff Lauruhn (jlauruhn)
@ 2014-11-02 13:30 ` Tanya Brokhman
2014-11-07 9:21 ` Artem Bityutskiy
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-02 13:30 UTC (permalink / raw)
To: linux-mtd
On 11/1/2014 12:55 AM, Jeff Lauruhn (jlauruhn) wrote:
> Hope I'm not over stepping here, but I thought I could help. I'm a NAND AE.
Not at all. Thank you very much :)
>
> Are you using NAND or eMMC?
RAW NAND. eMMC is managed NAND so it handles this all for us :)
> If NAND why not use ECC to monitor for disturb?
We don't want just to monitor, we want to prevent cases where ecc cant
be fixed. You said it yourself later on "BCH ECC will tell you if bits
have changed and will correct up to 5". The goal is to prevent more then
5 errors that can't be fixed.
NAND is a great storage unit, but you have to follow the rules. Please
refer to Micron datasheet MT29F2G08ABAEAH4 page 100. NAND is made up of
blocks(2048 in this case), each block has a number of pages. The block
is the smallest erasable unit and the only way to change 0's to 1's.
Pages are the smallest programmable unit and can only change 1's to 0's.
P/E cycling (100,000 in this case) wears out the block. We provide
64bytes of spare area for BCH ECC and NAND management. BCH ECC will
tell you if bits have changed and will correct up to 5.
>
> Read disturb is a recoverable failure. It doesn't affect the cells in the page you are reading it affects the cells on either side of the page you are reading. P/E cycling for this device is 100,000. You can program once and read many many times.
>
> Data retention is the loss of charge on the cells. Technically you can only change a 0 to 1 by erasing the whole block. However, data retention is the loss of charge in a cell over time. In this case data retention is 10 years.
> Data retention gets worse as temperature goes up.
Exactly! We're aware of all you described above. This is exactly why we
need to handle both read disturb and data retention.
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-02 13:23 ` Tanya Brokhman
@ 2014-11-02 13:54 ` Richard Weinberger
2014-11-02 14:12 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-11-02 13:54 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Am 02.11.2014 um 14:23 schrieb Tanya Brokhman:
>> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
>
> Yes. In 0001-mtd-ubi-Read-disturb-infrastructure.patch you'll find:
> #define UBI_RD_THRESHOLD 100000
> Can't share more than that. This value is defined by card manufacturer and configurable by this define.
Somehow I managed to oversee that value. It is as large as I expected.
But is is *very* sad that you can't share more details.
We'd have make this value configurable at runtime.
Other manufacturers may have other magical values...
>>
>> Let's recap.
>>
>> We need to address two issues:
>> a) If a PEB is ready very often we need to scrub it.
>
> right. this is what the read-counter is for.
>
>> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
>
> it need to be scrubbed. this is for data retention and these pebs are found by last_erase_timestamp. I referred to them as "pebs that are rarely accessed. "
>
>>
>> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
>
> That isn't good enough. Because if we just re-read the peb we will find the "problematic" once only when the read produces ecc errors. But if we relay on that we may be too late
> because we might hit ecc errors that we just wont be able to fix and data will be lost. So the goal is *to prevent* ecc errors on read. That's why we need both the read-counter
> (for heavily read pebs) and the last_erase_timestamp (for once that are rarely accessed).
>
>> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
>
> No, not on disk layout. You're mixing the read-counter with the last_erase_timestamp.
> read-counter: maintained only at RAM, saved *only* as part of fastmap data. If fastmap data is lost: read counters are lost too
> last-erase-timestamp: part of ec_header, maintained on disk
You're right I mixed that up. Sorry.
Copy&Pasting from your other mail:
>> Another point:
>> What if we scrub every PEB once a week?
>> Why would that not work?
>
> It will work but it's an overkill because we don't want to scrub (and erase) pebs that don't need this because this way we will ware out the device in terms on wear-leveling.
> Besides, scrubbing all pebs will also be a performance hit.
A year has 52 weeks. So, in 10 (!) years we would scrub each PEB only 520 times.
Even if we scrub every day we'd only scrub each PEB 3650 times in 10 years.
I don't see any overhead at all. Of course only a stupid implementation would scrub them at once, this would
be a performance issue.
Back to topic.
Storing the read-counters into fastmap also not a good idea because the fastmap can get lost completely (by design).
Better store the read-counter lazily into a new internal UBI volume (use UBI_COMPAT_PRESERVE).
This way you can make sure that they are not lost.
I suggest the following:
a) Maintain the erase-counters in RAM
b) From time to time write them to an internal UBI volume. (e.g. at detach time and once a day).
c) Implement a logic in UBI which scrubs a PEB if it got a lot of reads.
You could do c) even in userspace.
And for bit-rot detection you can do the same, but with timestamps instead of read-counters...
Artem, what do you think?
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-02 13:54 ` Richard Weinberger
@ 2014-11-02 14:12 ` Tanya Brokhman
2014-11-02 17:02 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-02 14:12 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
On 11/2/2014 3:54 PM, Richard Weinberger wrote:
> Am 02.11.2014 um 14:23 schrieb Tanya Brokhman:
>>> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
>>
>> Yes. In 0001-mtd-ubi-Read-disturb-infrastructure.patch you'll find:
>> #define UBI_RD_THRESHOLD 100000
>> Can't share more than that. This value is defined by card manufacturer and configurable by this define.
>
> Somehow I managed to oversee that value. It is as large as I expected.
> But is is *very* sad that you can't share more details.
> We'd have make this value configurable at runtime.
> Other manufacturers may have other magical values...
Yea, thought of that as well while answering your email. will do. thanks!
>
>>>
>>> Let's recap.
>>>
>>> We need to address two issues:
>>> a) If a PEB is ready very often we need to scrub it.
>>
>> right. this is what the read-counter is for.
>>
>>> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
>>
>> it need to be scrubbed. this is for data retention and these pebs are found by last_erase_timestamp. I referred to them as "pebs that are rarely accessed. "
>>
>>>
>>> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
>>
>> That isn't good enough. Because if we just re-read the peb we will find the "problematic" once only when the read produces ecc errors. But if we relay on that we may be too late
>> because we might hit ecc errors that we just wont be able to fix and data will be lost. So the goal is *to prevent* ecc errors on read. That's why we need both the read-counter
>> (for heavily read pebs) and the last_erase_timestamp (for once that are rarely accessed).
>>
>>> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
>>
>> No, not on disk layout. You're mixing the read-counter with the last_erase_timestamp.
>> read-counter: maintained only at RAM, saved *only* as part of fastmap data. If fastmap data is lost: read counters are lost too
>> last-erase-timestamp: part of ec_header, maintained on disk
>
> You're right I mixed that up. Sorry.
>
> Copy&Pasting from your other mail:
>
>>> Another point:
>>> What if we scrub every PEB once a week?
>>> Why would that not work?
>>
>> It will work but it's an overkill because we don't want to scrub (and erase) pebs that don't need this because this way we will ware out the device in terms on wear-leveling.
>> Besides, scrubbing all pebs will also be a performance hit.
>
> A year has 52 weeks. So, in 10 (!) years we would scrub each PEB only 520 times.
> Even if we scrub every day we'd only scrub each PEB 3650 times in 10 years.
> I don't see any overhead at all. Of course only a stupid implementation would scrub them at once, this would
> be a performance issue.
>
> Back to topic.
> Storing the read-counters into fastmap also not a good idea because the fastmap can get lost completely (by design).
yes,I'm aware of that. We have a default value for that case, and we're
trying to avoid fastmap being invalid...
> Better store the read-counter lazily into a new internal UBI volume (use UBI_COMPAT_PRESERVE).
not familiar with UBI_COMPAT_PRESERVE. will look into this and consider
your suggestion.
> This way you can make sure that they are not lost.
>
> I suggest the following:
> a) Maintain the erase-counters in RAM
> b) From time to time write them to an internal UBI volume. (e.g. at detach time and once a day).
> c) Implement a logic in UBI which scrubs a PEB if it got a lot of reads.
> You could do c) even in userspace.
>
> And for bit-rot detection you can do the same, but with timestamps instead of read-counters...
>
> Artem, what do you think?
>
> Thanks,
> //richard
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-02 14:12 ` Tanya Brokhman
@ 2014-11-02 17:02 ` Richard Weinberger
2014-11-02 17:18 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-11-02 17:02 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, linux-mtd
Am 02.11.2014 um 15:12 schrieb Tanya Brokhman:
>> Back to topic.
>> Storing the read-counters into fastmap also not a good idea because the fastmap can get lost completely (by design).
>
> yes,I'm aware of that. We have a default value for that case, and we're trying to avoid fastmap being invalid...
Here be dragons. Can you please share these modifications?
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-02 17:02 ` Richard Weinberger
@ 2014-11-02 17:18 ` Tanya Brokhman
0 siblings, 0 replies; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-02 17:18 UTC (permalink / raw)
To: Richard Weinberger, dedekind1; +Cc: linux-arm-msm, linux-mtd
On 11/2/2014 7:02 PM, Richard Weinberger wrote:
> Am 02.11.2014 um 15:12 schrieb Tanya Brokhman:
>>> Back to topic.
>>> Storing the read-counters into fastmap also not a good idea because the fastmap can get lost completely (by design).
>>
>> yes,I'm aware of that. We have a default value for that case, and we're trying to avoid fastmap being invalid...
>
> Here be dragons. Can you please share these modifications?
nothing to share yet :) work in progress. more thought then code at this
point. will share as soon as have something valid...
>
> Thanks,
> //richard
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-02 13:25 ` Tanya Brokhman
@ 2014-11-06 8:07 ` Artem Bityutskiy
2014-11-06 12:16 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-06 8:07 UTC (permalink / raw)
To: Tanya Brokhman; +Cc: Richard Weinberger, linux-mtd, linux-arm-msm
On Sun, 2014-11-02 at 15:25 +0200, Tanya Brokhman wrote:
> On 10/31/2014 5:39 PM, Richard Weinberger wrote:
> > Am 31.10.2014 um 16:34 schrieb Richard Weinberger:
> >> Hi Tanya,
> >>
> >> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
> >>> Hi Richard
> >>>
> >>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
> >>>> Tanya,
> >>>>
> >>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
> >>>>> I'll try to address all you comments in one place.
> >>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
> >>>>
> >>>> But it does not really matter if the counters are a way to high or too low?
> >>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
> >>>> It won't hurt.
> >>>>
> >>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
> >>>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
> >>>>> - it is required on every boot.
> >>>>> - saving the counters back to userspace should be periodically triggered as well.
> >>>>> So the minimal workflow for each boot life cycle will be:
> >>>>> - on boot: update kernel with correct values from userspace
> >>>>
> >>>> Correct.
> >>>>
> >>>>> - kernel updates the counters on each read operation
> >>>>
> >>>> Yeah, that's a plain simple in kernel counter..
> >>>>
> >>>>> - on powerdown: save the updated kernel counters back to userspace
> >>>>
> >>>> Correct. The counters can also be saved once a day by cron.
> >>>> If one or two save operations are missed it won't hurt either.
> >>>>
> >>>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
> >>>>> the read-disturb problem since the dependency in userspace is just too big.
> >>>>
> >>>> Why?
> >>>> We both agree on the fact that the counters don't have to be exact.
> >>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
> >>>
> >>> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
> >>> statistics, we can't ignore it and need to stay close to the calculated statistics.
> >>>
> >>> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
> >>> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
> >>
> >> So you have to update the EC-Header every time we read a PEB...?
> >>
> >>>
> >>>>
> >>>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
> >>>>> updated.
> >>>>
> >>>> Does it hurt if these counters are lost upon an upgrade?
> >>>> Why do we need them for ever?
> >>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
> >>>
> >>> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
> >>> read will most probably fail.
> >>
> >> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
> >> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
> >>
> >>> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
> >>> you're lucky, ecc will fx them for you, but its not guarantied.
> >>>
> >>>>
> >>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
> >>>> Or use a sane upgrade process...
> >>>
> >>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
> >>>
> >>>>
> >>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
> >>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
> >>>>
> >>>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
> >>>>> writes (erase operations), the other handle issues caused by frequent reads.
> >>>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
> >>>>> to the above mentioned reasons.
> >>>>
> >>>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
> >>>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
> >>>
> >>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
> >>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
> >>> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
> >>> in a longer lifespan. that's why it's required now: a need for a "long life nand".
> >>
> >> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
> >>
> >> Let's recap.
> >>
> >> We need to address two issues:
> >> a) If a PEB is ready very often we need to scrub it.
> >> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
> >>
> >> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
> >> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
> >> I don't think that this is a good solution.
> >> We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
> >> or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
> >> it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
> >> Btw: We also have to be very careful that reading data will not wear out the flash.
> >>
> >> So, we need a logic within UBI which counts every read access and persists this data in some way.
> >> As suggested in an earlier mail this can also be done purely in userspace.
> >> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
> >
> > Another point:
> > What if we scrub every PEB once a week?
> > Why would that not work?
>
> It will work but it's an overkill because we don't want to scrub (and
> erase) pebs that don't need this because this way we will ware out the
> device in terms on wear-leveling.
But the point is - how do you know if they need it or not? How do you
prove that the thresholds are correct, and not too low? What makes you
believe that it is the same for all eraseblocks?
>From what I read in technical papers some blocs are better and some are
worse. There are different nano-defects in them. Some start misbehaving
earlier than others, depending on the temperature.
There is also the "radiation" effect. Say, you have 3 continuous PEBs A,
B, C. And some PEB D which is far away from them. You never change A, C,
and D, you only read them. And you change B many times. IIUC, the
radiation effect is that A and C will accumulate bit-flips earlier than
D, because D is being erased and re-programmed.
Now the counters approach does not take this into account.
On the contrary, reading data and scrubbing on the "need-to-do" basis
takes into account whatever weird effect there is.
Maintaining counters is hard job, and easy to take wrong. Besides, you
lose them on power cuts, so they are not mathematically correct anyway.
And there is guess-work anyway. And you do not take into account all the
NAND effects.
So why bothering with the complexity instead of just dealing with
problems on the "by fact" basis: some weird NAND effect happened and we
see bit-flips? Fine, we just scrub and "refresh" our data. We do not
know that exactly was the effect, but we know how to detect it and how
to deal with it.
Isn't it simple and robust approach?
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-06 8:07 ` Artem Bityutskiy
@ 2014-11-06 12:16 ` Tanya Brokhman
2014-11-07 8:55 ` Artem Bityutskiy
2014-11-07 8:58 ` Artem Bityutskiy
0 siblings, 2 replies; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-06 12:16 UTC (permalink / raw)
To: linux-mtd
On 11/6/2014 10:07 AM, Artem Bityutskiy wrote:
> On Sun, 2014-11-02 at 15:25 +0200, Tanya Brokhman wrote:
>> On 10/31/2014 5:39 PM, Richard Weinberger wrote:
>>> Am 31.10.2014 um 16:34 schrieb Richard Weinberger:
>>>> Hi Tanya,
>>>>
>>>> Am 31.10.2014 um 14:12 schrieb Tanya Brokhman:
>>>>> Hi Richard
>>>>>
>>>>> On 10/29/2014 2:00 PM, Richard Weinberger wrote:
>>>>>> Tanya,
>>>>>>
>>>>>> Am 29.10.2014 um 12:03 schrieb Tanya Brokhman:
>>>>>>> I'll try to address all you comments in one place.
>>>>>>> You're right that the read counters don't have to be exact but they do have to reflect the real state.
>>>>>>
>>>>>> But it does not really matter if the counters are a way to high or too low?
>>>>>> It does also not matter if a re-read of adjacent PEBs is issued too often.
>>>>>> It won't hurt.
>>>>>>
>>>>>>> Regarding your idea of saving them to a file, or somehow with userspace involved; This is doable, but such solution will depend on user space implementation:
>>>>>>> - one need to update kernel with correct read counters (saved somewhere in userspace)
>>>>>>> - it is required on every boot.
>>>>>>> - saving the counters back to userspace should be periodically triggered as well.
>>>>>>> So the minimal workflow for each boot life cycle will be:
>>>>>>> - on boot: update kernel with correct values from userspace
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>> - kernel updates the counters on each read operation
>>>>>>
>>>>>> Yeah, that's a plain simple in kernel counter..
>>>>>>
>>>>>>> - on powerdown: save the updated kernel counters back to userspace
>>>>>>
>>>>>> Correct. The counters can also be saved once a day by cron.
>>>>>> If one or two save operations are missed it won't hurt either.
>>>>>>
>>>>>>> The read-disturb handling is based on kernel updating and monitoring read counters. Taking this out of the kernel space will result in an incomplete and very fragile solution for
>>>>>>> the read-disturb problem since the dependency in userspace is just too big.
>>>>>>
>>>>>> Why?
>>>>>> We both agree on the fact that the counters don't have to be exact.
>>>>>> Maybe I'm wrong but to my understanding they are just a rough indicator that sometime later UBI has to check for bitrot/flips.
>>>>>
>>>>> The idea is to prevent data loss, to prevent errors while reading, because we might hit errors we can't fix. So although the read_disturb_threshold is a rough estimation based on
>>>>> statistics, we can't ignore it and need to stay close to the calculated statistics.
>>>>>
>>>>> Its really the same as wear-leveling. You have a limitation that each peb can be erased limited number of times. This erase-limit is also an estimation based on statistics
>>>>> collected by the card vendor. But you do want to know the exact number of erase counter to prevent erasing the block extensively.
>>>>
>>>> So you have to update the EC-Header every time we read a PEB...?
>>>>
>>>>>
>>>>>>
>>>>>>> Another issue to consider is that each SW upgrade will result in loosing the counters saved in userspace and reset all. Otherwise, system upgrade process will also have to be
>>>>>>> updated.
>>>>>>
>>>>>> Does it hurt if these counters are lost upon an upgrade?
>>>>>> Why do we need them for ever?
>>>>>> If they start after an upgrade from 0 again heavily read PEBs will quickly gain a high counter and will be checked.
>>>>>
>>>>> yes, we do need the ACCURATE counters and cant loose them. For example: we have a heavily read block. It was read from 100 times when the read-threshold is 101. Meaning, the 101
>>>>> read will most probably fail.
>>>>
>>>> You are trying me to tell that the NAND is that crappy that it will die after 100 reads? I really hope this was just a bad example.
>>>> You *will* loose counters unless you update the EC-Header upon every read, which is also not sane at all.
>>>>
>>>>> You do a SW upgrade, and set the read-counter for this block as 0 and don't scrubb it. Next time you try reading from it (since it's heavily read from block), you'll get errors. If
>>>>> you're lucky, ecc will fx them for you, but its not guarantied.
>>>>>
>>>>>>
>>>>>> And of course these counters can be preserved. One can also place them into a UBI static volume.
>>>>>> Or use a sane upgrade process...
>>>>>
>>>>> "Sane upgrade" means that in order to support read-disturb we twist the users hand into implementing not a trivial logic in userspace.
>>>>>
>>>>>>
>>>>>> As I wrote in my last mail we could also create a new internal UBI volume to store these counters.
>>>>>> Then you can have the logic in kernel but don't have to change the UBI on-disk layout.
>>>>>>
>>>>>>> The read counters are very much like the ec counters used for wear-leveling; One is updated on each erase, other on each read; One is used to handle issues caused by frequent
>>>>>>> writes (erase operations), the other handle issues caused by frequent reads.
>>>>>>> So how are the two different? Why isn't wear-leveling (and erase counters) handled by userspace? My guess that the decision to encapsulate the wear-leveling into the kernel was due
>>>>>>> to the above mentioned reasons.
>>>>>>
>>>>>> The erase counters are crucial for UBI to operate. Even while booting up the kernel and mounting UBIFS the EC counters have to available
>>>>>> because UBI maybe needs to move LEBs around or has to find free PEBs which are not worn out. I UBI makes here a bad decision things will break.
>>>>>
>>>>> Same with read-counters and last_erase_timestamps. If ec counters are lost, we might get with bad blocks (since they are worn out) and have data loss.
>>>>> If we ignore read-disturb and don't' scrubb heavily read blocks we will have data loss as well.
>>>>> the only difference between the 2 scenarios is "how long before it happens". Read-disturb wasn't an issue since average lifespan of a nand device was ~5 years. Read-disturb occurs
>>>>> in a longer lifespan. that's why it's required now: a need for a "long life nand".
>>>>
>>>> Okay, read-disturb will only happen if you read blocks *very* often. Do you have numbers, datasheets, etc...?
>>>>
>>>> Let's recap.
>>>>
>>>> We need to address two issues:
>>>> a) If a PEB is ready very often we need to scrub it.
>>>> b) PEBs which are not read for a very long time need to be re-read/scrubbed to detect bit-rot
>>>>
>>>> Solving b) is easy, just re-read every PEB from time to time. No persistent data at all is needed.
>>>> To solve a) you suggest adding the read-counter to the UBI on-disk layout like the erase-counter values.
>>>> I don't think that this is a good solution.
>>>> We can perfectly fine save the read-counters from time to time and upon detach either to a file on UBIFS
>>>> or into a new internal value. As read-disturb will only happen after a long time and hence very high read-counters
>>>> it does not matter if we lose some values upon a powercut. i.e. Such that a counter is 50000 instead of 50500.
>>>> Btw: We also have to be very careful that reading data will not wear out the flash.
>>>>
>>>> So, we need a logic within UBI which counts every read access and persists this data in some way.
>>>> As suggested in an earlier mail this can also be done purely in userspace.
>>>> It can also be done within the UBI kernel module. I.e. by storing the counters into a internal volume.
>>>
>>> Another point:
>>> What if we scrub every PEB once a week?
>>> Why would that not work?
>>
>> It will work but it's an overkill because we don't want to scrub (and
>> erase) pebs that don't need this because this way we will ware out the
>> device in terms on wear-leveling.
>
> But the point is - how do you know if they need it or not? How do you
> prove that the thresholds are correct, and not too low? What makes you
> believe that it is the same for all eraseblocks?
the thresholds are configurable. The exact numbers are just estimation
based on statistics i suppose. anyway, we got the values from the NAND
manufacturer, as defined by the spec.
I guess its not the same for all blocks but close enough, so the defined
threshold should be good enough for all blocks.
>
> From what I read in technical papers some blocs are better and some are
> worse. There are different nano-defects in them. Some start misbehaving
> earlier than others, depending on the temperature.
>
> There is also the "radiation" effect. Say, you have 3 continuous PEBs A,
> B, C. And some PEB D which is far away from them. You never change A, C,
> and D, you only read them. And you change B many times. IIUC, the
> radiation effect is that A and C will accumulate bit-flips earlier than
> D, because D is being erased and re-programmed.
I think you got confused with the block-names (A B C D) because it
doesn't make sense :) You said D never changes, just read from.
>
> Now the counters approach does not take this into account.
Yes, but for this we will scrub all PEBs from time to time
>
> On the contrary, reading data and scrubbing on the "need-to-do" basis
> takes into account whatever weird effect there is.
>
> Maintaining counters is hard job, and easy to take wrong.
Why? You increment on each READ, reset on each ERASE.
The only thing that is considered "hard" is how to handle power-cuts but
for this we have a default value and we're working on some other ideas
how to preserved them better.
Besides, you
> lose them on power cuts, so they are not mathematically correct anyway.
> And there is guess-work anyway. And you do not take into account all the
> NAND effects.
>
> So why bothering with the complexity instead of just dealing with
> problems on the "by fact" basis: some weird NAND effect happened and we
> see bit-flips? Fine, we just scrub and "refresh" our data.
Because when you see bit-flips it may be too late and you might get too
many errors then you're capable of fixing. What I'm trying to say - it
may be too late and you may lose data here. "preferred to prevent rather
than cure".
We do not
> know that exactly was the effect, but we know how to detect it and how
> to deal with it.
>
> Isn't it simple and robust approach?
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-06 12:16 ` Tanya Brokhman
@ 2014-11-07 8:55 ` Artem Bityutskiy
2014-11-07 8:58 ` Artem Bityutskiy
1 sibling, 0 replies; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-07 8:55 UTC (permalink / raw)
To: Tanya Brokhman; +Cc: linux-mtd
On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote:
> > There is also the "radiation" effect. Say, you have 3 continuous
> PEBs A,
> > B, C. And some PEB D which is far away from them. You never change
> A, C,
> > and D, you only read them. And you change B many times. IIUC, the
> > radiation effect is that A and C will accumulate bit-flips earlier
> than
> > D, because D is being erased and re-programmed.
>
> I think you got confused with the block-names (A B C D) because it
> doesn't make sense :) You said D never changes, just read from.
A, C - read, written, erased relatively often.
B, D - only read with the same rate.
B will gain bit flips quicker than D due to the radiation effect. On
some NANDs like MLC a lot quicker.
Disclaimer: I am not a HW expert, this is my understanding.
The patches in question do not cover this "radiation" effect.
The "read all periodically and scrub if enough bit-flips" approach
covers the radiation effect, just like the other NAND effects
(read-disturb, aging).
Artem.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-06 12:16 ` Tanya Brokhman
2014-11-07 8:55 ` Artem Bityutskiy
@ 2014-11-07 8:58 ` Artem Bityutskiy
2014-11-11 20:36 ` Tanya Brokhman
1 sibling, 1 reply; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-07 8:58 UTC (permalink / raw)
To: Tanya Brokhman; +Cc: linux-mtd
On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote:
> What I'm trying to say - it
> may be too late and you may lose data here. "preferred to prevent rather
> than cure".
First of all, just to clarify, I do not have a goal of turning down your
patches. I just want to understand why this is the best design, and if
it is helpful to all Linux MTD users.
Modern flashes have strong ECC codes protecting against many bit-flips.
MTD even was modified to stop reporting about a single or few bit-flips,
because those happen too often and they are "harmless", and do not
require scrubbing. We have the threshold value in MTD for this, which is
configurable, of course.
Bit-flips develop slowly over time. If you get one more bit-flips, it is
not too late yet. You can mitigate the "too late" part by reading more
often of course.
You also may lower the bit-flip threshold when reading for scrubbing.
Could you try to "sell" your design in a way that it becomes clear why
it is better than just reading the entire flash periodically. Some hard
experimental data would be preferable.
The advantages of the "read all periodically" approach were:
1. Simple, no modifications needed
2. No need to write if the media is read-only, except when scrubbing
happens.
3. Should cover all the NAND effects, including the "radiation" one.
And disadvantages of your design were:
1. Need modifications, rather large, changes binary format, needs more
ram.
2. Does not cover all the NAND effects
3. Is not transparent to the user
4. If system time is incorrectly set, may cause a storm of I/O
(scrubbing) and may put the system to it's knees before user-space has a
chance to fix-up the system time.
5. Needs more writes on the R/O system (to maintain read counters)
Also, it is not clear if with your design we save energy. Reads a lot
less need less energy than writes and erases (to maintain read
counters). May you save energy comparing to the read-all periodically
approach. May be not.
Artem.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-02 13:30 ` Tanya Brokhman
@ 2014-11-07 9:21 ` Artem Bityutskiy
0 siblings, 0 replies; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-07 9:21 UTC (permalink / raw)
To: Tanya Brokhman, Jeff Lauruhn (jlauruhn)
Cc: linux-arm-msm@vger.kernel.org, linux-mtd@lists.infradead.org,
Richard Weinberger
On Sun, 2014-11-02 at 15:30 +0200, Tanya Brokhman wrote:
> > If NAND why not use ECC to monitor for disturb?
>
> We don't want just to monitor, we want to prevent cases where ecc cant
> be fixed. You said it yourself later on "BCH ECC will tell you if bits
> have changed and will correct up to 5". The goal is to prevent more then
> 5 errors that can't be fixed.
>
> NAND is a great storage unit, but you have to follow the rules. Please
> refer to Micron datasheet MT29F2G08ABAEAH4 page 100. NAND is made up of
> blocks(2048 in this case), each block has a number of pages. The block
> is the smallest erasable unit and the only way to change 0's to 1's.
> Pages are the smallest programmable unit and can only change 1's to 0's.
> P/E cycling (100,000 in this case) wears out the block. We provide
> 64bytes of spare area for BCH ECC and NAND management. BCH ECC will
> tell you if bits have changed and will correct up to 5.
> >
> > Read disturb is a recoverable failure. It doesn't affect the cells in the page you are reading it affects the cells on either side of the page you are reading. P/E cycling for this device is 100,000. You can program once and read many many times.
> >
> > Data retention is the loss of charge on the cells. Technically you can only change a 0 to 1 by erasing the whole block. However, data retention is the loss of charge in a cell over time. In this case data retention is 10 years.
> > Data retention gets worse as temperature goes up.
>
> Exactly! We're aware of all you described above. This is exactly why we
> need to handle both read disturb and data retention.
Hi Tanya,
just a friendly notice: did you notice that you drop all the CCs in the
reply? Even the person you replied to was not in "To". I guess it is
worth checking your e-mail client's settings.
Jeff, my main concern about the patches is whether they really address
NAND problems, and whether the complexity they introduce are worth it.
The counter-approach is to just read the entire flash periodically, and
just scrub the PEBs (physical eraseblocks) which have have enough
bit-flips (more than a configured threshold per ECC unit, say 1 or 2).
I tried to explain my concerns in here:
http://lists.infradead.org/pipermail/linux-mtd/2014-November/056385.html
http://lists.infradead.org/pipermail/linux-mtd/2014-November/056386.html
Thanks!
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
[not found] <201411101307.03225.jbe@pengutronix.de>
@ 2014-11-10 12:35 ` Richard Weinberger
2014-11-10 13:12 ` Juergen Borleis
2014-11-10 13:13 ` Ricard Wanderlof
0 siblings, 2 replies; 38+ messages in thread
From: Richard Weinberger @ 2014-11-10 12:35 UTC (permalink / raw)
To: Juergen Borleis, linux-mtd; +Cc: jlauruhn, tlinder
Am 10.11.2014 um 13:07 schrieb Juergen Borleis:
> Hi Richard,
>
> sorry to jump in so lately:
>
> Richard Weinberger wrote:
>>> If we ignore read-disturb and don't' scrubb heavily read blocks we will
>>> have data loss as well. the only difference between the 2 scenarios is
>>> "how long before it happens". Read-disturb wasn't an issue since average
>>> lifespan of a nand device was ~5 years. Read-disturb occurs in a longer
>>> lifespan. that's why it's required now: a need for a "long life nand".
>>
>> Okay, read-disturb will only happen if you read blocks *very* often. Do you
>> have numbers, datasheets, etc...?
>
> I have made some simple test by reading the first 2048 pages of my NAND in an
> endless loop. Only reading, nothing else (made while the bootloader was run,
> nothing else touches the NAND memory).
>
> Below a result of my test with a 512 MiB SLC NAND with 2kiB page size and
> 128kiB block size:
>
> The used NAND controller is able to correct up to 8 flipped bits. After the
> 9th bit is flipped the read returns -74.
>
> This log is a snapshot after the whole area of the first 4 MiB of the NAND
> were read 201688 times.
>
> Page no
> / 1st bitflip after iteration #
> | / 2nd bitflip after iteration #
> | | / 3rd 4th 5th 6th 7th 8th bitflip after iteration #
> | | | / / / / / / error <errcode> @ iteration #
> | | | | | | | | | /
> | | | | | | | | | |
> [...]
> 529: 91760 - - - - - - - err: 0 @ 0
> 530: 67168 - - - - - - - err: 0 @ 0
> 531: 141039 - - - - - - - err: 0 @ 0
> 532: 100288 - - - - - - - err: 0 @ 0
> 533: 133754 - - - - - - - err: 0 @ 0
> 534: 130095 - - - - - - - err: 0 @ 0
> 535: - - - - - - - - err: 0 @ 0
> 536: - - - - - - - - err: 0 @ 0
> 537: - - - - - - - - err: 0 @ 0
> 538: 116134 - - - - - - - err: 0 @ 0
> 539: 198269 - - - - - - - err: 0 @ 0
> 540: 61589 - - - - - - - err: 0 @ 0
> 541: 69437 126618 - - - - - - err: 0 @ 0
> 542: 127839 146936 - - - - - - err: 0 @ 0
> 543: 90092 112675 - - - - - - err: 0 @ 0
> 544: 110714 - - - - - - - err: 0 @ 0
> 545: 102323 179716 - - - - - - err: 0 @ 0
> 546: 63838 107524 - - - - - - err: 0 @ 0
> 547: 140739 - - - - - - - err: 0 @ 0
> 548: 129423 - - - - - - - err: 0 @ 0
> 549: 79855 172562 189242 - - - - - err: 0 @ 0
> 550: 59809 95758 - - - - - - err: 0 @ 0
> 551: 61590 102645 182467 199394 - - - - err: 0 @ 0
> 552: 34892 47024 169765 - - - - - err: 0 @ 0
> 553: 26725 99616 168528 - - - - - err: 0 @ 0
> 554: 23348 117529 160522 194367 - - - - err: 0 @ 0
> 555: 108062 175917 - - - - - - err: 0 @ 0
> 556: 49259 120590 188435 - - - - - err: 0 @ 0
> 557: 54306 96666 120881 - - - - - err: 0 @ 0
> 558: 29085 31802 42191 43422 108748 167569 - - err: 0 @ 0
> 559: 56507 93286 - - - - - - err: 0 @ 0
> 560: 81849 101134 143402 152513 - - - - err: 0 @ 0
> 561: 13890 135991 199507 - - - - - err: 0 @ 0
> 562: 34135 69826 90917 107625 147321 161796 194928 199981 err: 0 @ 0
> 563: 36564 83188 89780 110756 113977 132219 171701 181298 err: -74 @ 196719
> 564: 24710 84965 131464 136672 143401 166123 196109 - err: 0 @ 0
> 565: 63052 190669 200874 - - - - - err: 0 @ 0
> 566: 23602 62334 107324 108235 111701 141831 143176 170709 err: 0 @ 0
> 567: 7827 81759 105200 146536 175196 181900 192630 200021 err: 0 @ 0
> 568: 19248 38095 42491 85788 108021 150404 178145 - err: 0 @ 0
> 569: 77853 93441 116798 149955 175747 - - - err: 0 @ 0
> 570: 23229 34546 60418 84112 169202 191880 198953 - err: 0 @ 0
> 571: 53596 66769 106074 133504 134134 163610 169159 178226 err: -74 @ 180360
> 572: 74009 83572 89710 103833 116947 147067 167137 - err: 0 @ 0
> 573: 23161 43896 89573 95705 102324 102887 115829 122581 err: -74 @ 138582
> [...]
>
> You can see some pages start to suffer from read disturbance after about
> 7,000 reads and fail after 200,000 reads, other pages start at 23,000 reads
> but fails at 120,000 reads. There is no rule when a page starts to suffer
> from read disturbance and how fast. So a simple read counter with a threshhold
> to detect when to recover a page/block seems not helpful to me.
>
> I'm still trying to interpret the test results. At least there are areas in
> the 4 MiB areas which show massive bit flips, while other areas have still no
> flipped bits.
>
> For example the log shown above continues with this pattern:
>
> 574: - - - - - - - - err: 0 @ 0
> 575: - - - - - - - - err: 0 @ 0
> 576: - - - - - - - - err: 0 @ 0
> 577: - - - - - - - - err: 0 @ 0
> 578: - - - - - - - - err: 0 @ 0
> 579: - - - - - - - - err: 0 @ 0
> 580: - - - - - - - - err: 0 @ 0
> 581: - - - - - - - - err: 0 @ 0
> 582: - - - - - - - - err: 0 @ 0
> 583: - - - - - - - - err: 0 @ 0
> 584: - - - - - - - - err: 0 @ 0
> 585: - - - - - - - - err: 0 @ 0
> 586: - - - - - - - - err: 0 @ 0
> 587: - - - - - - - - err: 0 @ 0
> 588: - - - - - - - - err: 0 @ 0
> 589: - - - - - - - - err: 0 @ 0
> 590: - - - - - - - - err: 0 @ 0
> 591: - - - - - - - - err: 0 @ 0
> 592: 194921 - - - - - - - err: 0 @ 0
> 593: - - - - - - - - err: 0 @ 0
> 594: - - - - - - - - err: 0 @ 0
> 595: 99328 186011 - - - - - - err: 0 @ 0
> 596: 178049 188598 - - - - - - err: 0 @ 0
> 597: - - - - - - - - err: 0 @ 0
> 598: 88247 - - - - - - - err: 0 @ 0
> 599: 66701 - - - - - - - err: 0 @ 0
> 600: 68454 - - - - - - - err: 0 @ 0
> 601: 152351 - - - - - - - err: 0 @ 0
> 602: 33574 56123 - - - - - - err: 0 @ 0
> 603: 130160 - - - - - - - err: 0 @ 0
> 604: 87415 - - - - - - - err: 0 @ 0
> 605: 121079 140456 - - - - - - err: 0 @ 0
> 606: 78960 201089 - - - - - - err: 0 @ 0
> 607: 67561 - - - - - - - err: 0 @ 0
> 608: 136825 - - - - - - - err: 0 @ 0
> 609: 46315 - - - - - - - err: 0 @ 0
> 610: 38588 86638 100277 149299 193350 - - - err: 0 @ 0
> 611: 77835 106222 184955 - - - - - err: 0 @ 0
> 612: 82427 196739 - - - - - - err: 0 @ 0
> 613: 45261 69448 - - - - - - err: 0 @ 0
> 614: 49466 177882 - - - - - - err: 0 @ 0
> 615: 68595 130868 - - - - - - err: 0 @ 0
> 616: 40169 134280 151830 - - - - - err: 0 @ 0
> 617: 47167 130047 - - - - - - err: 0 @ 0
> 618: 62839 114948 125289 - - - - - err: 0 @ 0
> 619: 45988 - - - - - - - err: 0 @ 0
> 620: 22611 70944 125715 183733 185630 193842 - - err: 0 @ 0
> 621: 71908 171400 - - - - - - err: 0 @ 0
> 622: 21252 44002 114774 154423 190673 - - - err: 0 @ 0
> 623: 33323 35582 101091 117813 - - - - err: 0 @ 0
> 624: 68726 108034 113045 - - - - - err: 0 @ 0
> 625: 45920 63497 122692 159199 165520 169147 200725 - err: 0 @ 0
> 626: 39039 60375 92903 101632 102331 118883 - - err: 0 @ 0
> 627: 44046 102881 163181 - - - - - err: 0 @ 0
> 628: 53511 89063 158921 194571 - - - - err: 0 @ 0
> 629: 45185 78174 118801 160227 192668 - - - err: 0 @ 0
> 630: 106109 117537 165575 170772 183222 - - - err: 0 @ 0
> 631: 8848 15614 120298 - - - - - err: 0 @ 0
> 632: 58004 - - - - - - - err: 0 @ 0
> 633: 102767 155246 200323 - - - - - err: 0 @ 0
> 634: 44970 45381 78299 103220 108726 174601 - - err: 0 @ 0
> 635: 24964 46413 58086 71776 195353 - - - err: 0 @ 0
> 636: 16024 64719 77322 83557 120118 134934 137786 157911 err: -74 @ 173650
> 637: 54520 76187 89813 97778 125270 150291 178132 185518 err: -74 @ 199306
> 638: - - - - - - - - err: 0 @ 0
> 639: - - - - - - - - err: 0 @ 0
> 640: - - - - - - - - err: 0 @ 0
> 641: - - - - - - - - err: 0 @ 0
> 642: - - - - - - - - err: 0 @ 0
> 643: - - - - - - - - err: 0 @ 0
> 644: - - - - - - - - err: 0 @ 0
> 645: - - - - - - - - err: 0 @ 0
> 646: - - - - - - - - err: 0 @ 0
> 647: - - - - - - - - err: 0 @ 0
> 648: - - - - - - - - err: 0 @ 0
> [...]
>
> More confusing: the same test running on a 256 MiB NAND shows a different
> result with much less failures. After about 200,000 loops *all* pages are
> still okay (or correctable). The max bit flips in one page were four.
>
> [...]
> 546: - - - - - - - - err: 0 @ 0
> 547: - - - - - - - - err: 0 @ 0
> 548: - - - - - - - - err: 0 @ 0
> 549: - - - - - - - - err: 0 @ 0
> 550: - - - - - - - - err: 0 @ 0
> 551: - - - - - - - - err: 0 @ 0
> 552: - - - - - - - - err: 0 @ 0
> 553: - - - - - - - - err: 0 @ 0
> 554: 198362 - - - - - - - err: 0 @ 0
> 555: 138881 - - - - - - - err: 0 @ 0
> 556: - - - - - - - - err: 0 @ 0
> 557: - - - - - - - - err: 0 @ 0
> 558: - - - - - - - - err: 0 @ 0
> 559: 77431 - - - - - - - err: 0 @ 0
> 560: 100023 - - - - - - - err: 0 @ 0
> 561: - - - - - - - - err: 0 @ 0
> 562: 83265 - - - - - - - err: 0 @ 0
> 563: 154552 - - - - - - - err: 0 @ 0
> 564: 154541 - - - - - - - err: 0 @ 0
> 565: - - - - - - - - err: 0 @ 0
> 566: - - - - - - - - err: 0 @ 0
> 567: - - - - - - - - err: 0 @ 0
> 568: 105275 - - - - - - - err: 0 @ 0
> 569: 91386 186096 - - - - - - err: 0 @ 0
> 570: - - - - - - - - err: 0 @ 0
> 571: 43163 - - - - - - - err: 0 @ 0
> 572: 79839 190846 - - - - - - err: 0 @ 0
> 573: 184267 - - - - - - - err: 0 @ 0
> 574: - - - - - - - - err: 0 @ 0
> 575: - - - - - - - - err: 0 @ 0
> 576: - - - - - - - - err: 0 @ 0
> 577: - - - - - - - - err: 0 @ 0
> 578: - - - - - - - - err: 0 @ 0
> 579: - - - - - - - - err: 0 @ 0
> [...]
> 1848: - - - - - - - - err: 0 @ 0
> 1849: 115731 168972 178123 196740 - - - - err: 0 @ 0
> 1850: - - - - - - - - err: 0 @ 0
> [...]
Thanks a lot for this report, your number are a very valuable input.
They prove what Artem and I feared, it is almost impossible to define a sane threshold.
So, having exact read-counters will be almost useless.
All we can do is scrubbing PEBs unconditionally.
Can you share your test program? I'd like to run it also on one of my boards.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-10 12:35 ` Richard Weinberger
@ 2014-11-10 13:12 ` Juergen Borleis
2014-11-11 9:23 ` Richard Weinberger
2014-11-10 13:13 ` Ricard Wanderlof
1 sibling, 1 reply; 38+ messages in thread
From: Juergen Borleis @ 2014-11-10 13:12 UTC (permalink / raw)
To: linux-mtd; +Cc: Richard Weinberger, jlauruhn, tlinder
[-- Attachment #1: Type: text/plain, Size: 698 bytes --]
Hi Richard,
On Monday 10 November 2014 13:35:26 Richard Weinberger wrote:
[...]
> Can you share your test program? I'd like to run it also on one of my
> boards.
Please find the patch attached. Its intended for the Barebox bootloader and can
be applied to its current git. The "-c" option seems currently somehow broken.
I have to talk to Sascha first.
jbe
--
Pengutronix e.K. | Juergen Borleis |
Industrial Linux Solutions | Phone: +49-5121-206917-5128 |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de/ |
[-- Attachment #2: nand_test_command.patch --]
[-- Type: text/x-diff, Size: 8761 bytes --]
commit f3deffa6abf6ab563eaa1842fa3836351d867f30
Author: Sascha Hauer <s.hauer@pengutronix.de>
Date: Thu Jul 24 10:32:52 2014 +0200
Commands: Add a nand-read-test command
NAND flashes suffer from read disturbance. This means that if a page
is read often enough there will be bitflips. This test tool continuously
reads a bunch of pages and shows a statistic over the number of bitfips
occured after which read iteration. This can be used to test a NAND flash
but also to test a NAND driver. The page reads are optionally compared
to the initial read of the page. If there is a difference, but the driver
has not reported an error the driver is buggy.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
diff --git a/commands/Kconfig b/commands/Kconfig
index d73a393885e9..e1d4fba6bc08 100644
--- a/commands/Kconfig
+++ b/commands/Kconfig
@@ -1800,6 +1800,11 @@ config CMD_NANDTEST
-o OFFS start offset on flash
-l LEN length of flash to test
+config CMD_NAND_READ_TEST
+ tristate
+ depends on NAND
+ prompt "nand read test"
+
config CMD_POWEROFF
tristate
depends on HAS_POWEROFF
diff --git a/commands/Makefile b/commands/Makefile
index b1cdf331c441..66d63c8c869a 100644
--- a/commands/Makefile
+++ b/commands/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_CMD_SAVEENV) += saveenv.o
obj-$(CONFIG_CMD_LOADENV) += loadenv.o
obj-$(CONFIG_CMD_NAND) += nand.o
obj-$(CONFIG_CMD_NANDTEST) += nandtest.o
+obj-$(CONFIG_CMD_NAND_READ_TEST) += nand-read-test.o
obj-$(CONFIG_CMD_MEMTEST) += memtest.o
obj-$(CONFIG_CMD_TRUE) += true.o
obj-$(CONFIG_CMD_FALSE) += false.o
diff --git a/commands/nand-read-test.c b/commands/nand-read-test.c
new file mode 100644
index 000000000000..cac219178047
--- /dev/null
+++ b/commands/nand-read-test.c
@@ -0,0 +1,275 @@
+/*
+ * Copyright (c) 2014 Sascha Hauer <s.hauer@pengutronix.de>, Pengutronix
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+#include <common.h>
+#include <command.h>
+#include <fs.h>
+#include <linux/stat.h>
+#include <errno.h>
+#include <malloc.h>
+#include <getopt.h>
+#include <xfuncs.h>
+#include <init.h>
+#include <ioctl.h>
+#include <nand.h>
+#include <linux/mtd/mtd-abi.h>
+#include <fcntl.h>
+#include <libgen.h>
+#include <string.h>
+#include <clock.h>
+#include <linux/log2.h>
+#include <linux/mtd/mtd.h>
+
+#define MAXFLIPS 8
+
+/*
+ * NAND flashes suffer from read disturbance. This means that if a page
+ * is read often enough there will be bitflips. This test tool continuously
+ * reads a bunch of pages and shows a statistic over the number of bitfips
+ * occured after which read iteration. This can be used to test a NAND flash
+ * but also to test a NAND driver. The page reads are optionally compared
+ * to the initial read of the page. If there is a difference, but the driver
+ * has not reported an error the driver is buggy.
+ */
+struct status {
+ int flips[MAXFLIPS];
+ int err_it;
+ int err;
+ void *buf;
+ void *failbuf;
+};
+
+static void print_status_one(struct status *s, int page)
+{
+ int i;
+
+ printf("%-3d: ", page);
+
+ for (i = 0; i < MAXFLIPS; i++) {
+ if (s->flips[i] == -1)
+ printf("- ");
+ else
+ printf("%-7d ", s->flips[i]);
+ }
+
+ printf("err: %d @ %d\n", s->err, s->err_it);
+}
+
+int print_status(struct status *status, int num_pages, struct mtd_info *mtd, int iteration)
+{
+ int i, ret;
+
+ printf("\nstatistic after iteration %d:\n\n", iteration);
+
+ printf(" Page no\n");
+ printf(" / 1st bitflip after iteration #\n");
+ printf("| / 2nd bitflip after iteration #\n");
+ printf("| | / 3rd 4th 5th 6th 7th 8th bitflip after iteration #\n");
+ printf("| | | / / / / / / error <errcode> @ iteration #\n");
+ printf("| | | | | | | | | /\n");
+ printf("| | | | | | | | | |\n");
+
+ for (i = 0; i < num_pages; i++) {
+ if (ctrlc())
+ return -EINTR;
+ print_status_one(&status[i], i);
+ }
+
+ for (i = 0; i < num_pages; i++) {
+ struct status *s = &status[i];
+
+ if (ctrlc())
+ return -EINTR;
+
+ if (s->failbuf) {
+ printf("Undetected read failure on page %d:\n", i);
+ printf("Should be:\n");
+ ret = memory_display(s->buf, i * mtd->writesize, mtd->writesize, 4, 0);
+ if (ret)
+ return ret;
+ printf("read instead:\n");
+ ret = memory_display(s->failbuf, i * mtd->writesize, mtd->writesize, 4, 0);
+ if (ret)
+ return ret;
+ }
+ }
+
+ mdelay(200);
+ if (ctrlc())
+ return -EINTR;
+
+ return 0;
+}
+
+static void nand_read_test(struct mtd_info *mtd, int num_pages, int compare)
+{
+ void *_buf;
+ uint64_t start;
+ int i = 0, j, n;
+ loff_t addr;
+ int ret;
+ struct status *status;
+ size_t read;
+ void *pagebuf;
+
+ status = xzalloc(sizeof(struct status) * num_pages);
+ pagebuf = xzalloc(mtd->writesize);
+
+ if (compare) {
+ for (n = 0; n < num_pages; n++) {
+ struct status *s = &status[n];
+
+ addr = n * mtd->writesize;
+ s->buf = malloc(mtd->writesize);
+ if (!s->buf)
+ goto out;
+
+ ret = mtd->read(mtd, addr, mtd->writesize, &read, s->buf);
+ if (ret < 0) {
+ printf("Error while reading compare buffer: %s\n",
+ strerror(-ret));
+ goto out;
+ }
+ }
+ }
+
+ for (i = 0; i < num_pages; i++) {
+ struct status *s = &status[i];
+ for (j = 0; j < MAXFLIPS; j++)
+ s->flips[j] = -1;
+ }
+
+ start = get_time_ns();
+
+ i = 0;
+ while (1) {
+ for (n = 0; n < num_pages; n++) {
+ struct status *s = &status[n];
+ addr = n * mtd->writesize;
+
+ ret = mtd->read(mtd, addr, mtd->writesize, &read, pagebuf);
+ if (ret < 0) {
+ if (!s->err) {
+ s->err_it = i;
+ s->err = ret;
+ }
+ }
+
+ if (ret > 0 && ret <= MAXFLIPS) {
+ if (s->flips[ret - 1] == -1)
+ s->flips[ret - 1] = i;
+ }
+
+ if (compare && ret >= 0 && !s->failbuf && memcmp(s->buf, pagebuf, mtd->writesize))
+ s->failbuf = memdup(pagebuf, mtd->writesize);
+
+ _buf += mtd->writesize;
+ }
+
+ if (ctrlc()) {
+ ret = print_status(status, num_pages, mtd, i);
+ if (ret)
+ goto out;
+ }
+
+ if (is_timeout(start, SECOND * 60)) {
+ printf("iteration: %d\n", i);
+ start = get_time_ns();
+ }
+
+ i++;
+ }
+out:
+ free(pagebuf);
+ for (n = 0; n < num_pages; n++) {
+ struct status *s = &status[n];
+ free(s->buf);
+ free(s->failbuf);
+ }
+
+ free(status);
+
+ return;
+}
+
+static int do_nand_read_test(int argc, char *argv[])
+{
+ int opt, ret, fd;
+ static struct mtd_info_user meminfo;
+ int verbose;
+ int num_pages = 64;
+ int compare = 0;
+
+ while ((opt = getopt(argc, argv, "vn:c")) > 0) {
+ switch (opt) {
+ case 'v':
+ verbose = 1;
+ break;
+ case 'n':
+ num_pages = simple_strtoul(optarg, NULL, 0);
+ break;
+ case 'c':
+ compare = 1;
+ break;
+ default:
+ return COMMAND_ERROR_USAGE;
+ }
+ }
+
+ if (optind >= argc)
+ return COMMAND_ERROR_USAGE;
+
+ fd = open(argv[optind], O_RDWR);
+ if (fd < 0)
+ return fd;
+
+ ret = ioctl(fd, MEMGETINFO, &meminfo);
+
+ close(fd);
+
+ if (ret)
+ return ret;
+
+ if (num_pages * meminfo.writesize > meminfo.size) {
+ num_pages = meminfo.size >> ilog2(meminfo.writesize);
+ printf("WARNING: Device too small. Limiting to %d pages\n", num_pages);
+ }
+
+ printf("Starting NAND read disturbance test on %s with %d pages\n",
+ argv[optind], num_pages);
+ printf("Hit <ctrl-c> once to show current statistics, twice to stop the test\n");
+
+ nand_read_test(meminfo.mtd, num_pages, compare);
+
+ return 0;
+}
+
+BAREBOX_CMD_HELP_START(nand_read_test)
+BAREBOX_CMD_HELP_TEXT("This test tool continuously reads a bunch of NAND pages and")
+BAREBOX_CMD_HELP_TEXT("Prints a statistic about the number of bitflips and crc errors")
+BAREBOX_CMD_HELP_TEXT("occuring on each page")
+BAREBOX_CMD_HELP_TEXT("Options:")
+BAREBOX_CMD_HELP_OPT ("-n <npages>", "Specify number of pages to test (64)")
+BAREBOX_CMD_HELP_OPT ("-c", "Compare each page read with the first read")
+BAREBOX_CMD_HELP_END
+
+BAREBOX_CMD_START(nand_read_test)
+ .cmd = do_nand_read_test,
+ BAREBOX_CMD_DESC("NAND read disturbance test")
+ BAREBOX_CMD_OPTS("NANDDEV")
+ BAREBOX_CMD_GROUP(CMD_GRP_HWMANIP)
+ BAREBOX_CMD_HELP(cmd_nand_read_test_help)
+BAREBOX_CMD_END
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-10 12:35 ` Richard Weinberger
2014-11-10 13:12 ` Juergen Borleis
@ 2014-11-10 13:13 ` Ricard Wanderlof
2014-11-10 13:42 ` Juergen Borleis
1 sibling, 1 reply; 38+ messages in thread
From: Ricard Wanderlof @ 2014-11-10 13:13 UTC (permalink / raw)
To: Juergen Borleis
Cc: Richard Weinberger, jlauruhn@micron.com,
linux-mtd@lists.infradead.org, tlinder@codeaurora.org
On Mon, 10 Nov 2014, Richard Weinberger wrote:
> Am 10.11.2014 um 13:07 schrieb Juergen Borleis:
>
> >
> > I have made some simple test by reading the first 2048 pages of my NAND in an
> > endless loop. Only reading, nothing else (made while the bootloader was run,
> > nothing else touches the NAND memory).
> >
> > Below a result of my test with a 512 MiB SLC NAND with 2kiB page size and
> > 128kiB block size:
> > ...
> >
> > You can see some pages start to suffer from read disturbance after about
> > 7,000 reads and fail after 200,000 reads, other pages start at 23,000 reads
> > but fails at 120,000 reads. There is no rule when a page starts to suffer
> > from read disturbance and how fast. So a simple read counter with a threshhold
> > to detect when to recover a page/block seems not helpful to me.
> >
> > More confusing: the same test running on a 256 MiB NAND shows a different
> > result with much less failures. After about 200,000 loops *all* pages are
> > still okay (or correctable). The max bit flips in one page were four.
> >
> > [...]
These are interesting figures. I must admit I've never seen anything quite
so bad before.
We use 128 MiB and 256 MiB SLC NAND chips in our products, and as part of
the device qualification we run a test on a couple of samples where we
repeatedly read the first four blocks of the flash in an endless loop, and
measure the number of correctable and uncorrectable errors that occur.
Normally, we can read millions of times before even getting a single bit
flip. I've currently had a Macronix 128 MiB flash under test, which
according to the data sheet requires 4-bit ECC, but after 68 million reads
of the type just mentioned is still performing well with only single-bit
errors in various places in the test area. (The fact that errors do start
to occur after a while puts any suspicions of unexpected read caching to
rest). Admittedly, this is all at room temperature, etc, and only a single
sample, but it still is quite far from 200 000 reads. Other chips of the
same sizes that we use which specify 1-bit ECC have a similar performance.
The fact that 512 MiB is worse than 256 MiB is not too surprising, there
could well be a technology jump between those sizes, with smaller bit
cells for the larger flash.
> Can you share your test program? I'd like to run it also on one of my boards.
Agreed, it would be interesting to see what the results would be here too.
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-10 13:13 ` Ricard Wanderlof
@ 2014-11-10 13:42 ` Juergen Borleis
2014-11-10 13:52 ` Ricard Wanderlof
0 siblings, 1 reply; 38+ messages in thread
From: Juergen Borleis @ 2014-11-10 13:42 UTC (permalink / raw)
To: linux-mtd
Cc: Richard Weinberger, jlauruhn@micron.com, Ricard Wanderlof,
tlinder@codeaurora.org
Hi Ricard,
On Monday 10 November 2014 14:13:27 Ricard Wanderlof wrote:
> [...]
> These are interesting figures. I must admit I've never seen anything quite
> so bad before.
That leads me to the question if I have done the test in a correct way.
> We use 128 MiB and 256 MiB SLC NAND chips in our products, and as part of
> the device qualification we run a test on a couple of samples where we
> repeatedly read the first four blocks of the flash in an endless loop, and
> measure the number of correctable and uncorrectable errors that occur.
>
> Normally, we can read millions of times before even getting a single bit
> flip. I've currently had a Macronix 128 MiB flash under test, which
> according to the data sheet requires 4-bit ECC, but after 68 million reads
> of the type just mentioned is still performing well with only single-bit
> errors in various places in the test area. (The fact that errors do start
> to occur after a while puts any suspicions of unexpected read caching to
> rest). Admittedly, this is all at room temperature, etc, and only a single
> sample, but it still is quite far from 200 000 reads. Other chips of the
> same sizes that we use which specify 1-bit ECC have a similar performance.
>
> The fact that 512 MiB is worse than 256 MiB is not too surprising, there
> could well be a technology jump between those sizes, with smaller bit
> cells for the larger flash.
I have no idea how the two boards were programmend I run these two tests on.
Maybe the NAND programming was done in a wrong way which leads to this bad
result.
On these boards the first 4 MiB area is the 'bootloader area', which must have
a special layout to enable the ROM code to load the bootloader from it.
I have a bunch of boards here with 128/256/512 MiB NANDs where I can repeat the
tests. Any recommendations how to setup the NAND before to do the tests again?
jbe
--
Pengutronix e.K. | Juergen Borleis |
Industrial Linux Solutions | Phone: +49-5121-206917-5128 |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de/ |
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-10 13:42 ` Juergen Borleis
@ 2014-11-10 13:52 ` Ricard Wanderlof
0 siblings, 0 replies; 38+ messages in thread
From: Ricard Wanderlof @ 2014-11-10 13:52 UTC (permalink / raw)
To: Juergen Borleis
Cc: Richard Weinberger, Ricard Wanderlöf, jlauruhn@micron.com,
linux-mtd@lists.infradead.org, tlinder@codeaurora.org
On Mon, 10 Nov 2014, Juergen Borleis wrote:
> Hi Ricard,
>
> On Monday 10 November 2014 14:13:27 Ricard Wanderlof wrote:
> > [...]
> > These are interesting figures. I must admit I've never seen anything quite
> > so bad before.
>
> That leads me to the question if I have done the test in a correct way.
Could be. Or there's a lot of difference between flash manufacturers. We
had very bad experience with a certain manufacturer several years ago.
> I have no idea how the two boards were programmend I run these two tests on.
> Maybe the NAND programming was done in a wrong way which leads to this bad
> result.
Unlike for instance old EPROMs, where the software programming the chip
had to have a programming algorithm, with a NAND flash you basically send
the data to the flash and wait for the embedded programming algorithm on
the flash to finish, then check the result.
One problem could be if the page had been programmer more than the
specified number of times without being erased. E.g. most SLC flashes
specify a maximum of 3 writes per page before erase becomes necessary. If
this is violated it could result in unstable bits.
Another problem could be if the power supply was out of spec during
programming.
> I have a bunch of boards here with 128/256/512 MiB NANDs where I can
> repeat the tests. Any recommendations how to setup the NAND before to do
> the tests again?
Not really, more than checking so that the data in the flash is the result
of a single write operation (i.e. not written in 16 byte bursts for
instance), and that the specs for the flash chip haven't been violated
during programming; in essence, that the write operation is allowed to
finish on its own (but I'm not sure it can be interrupted except by a
power cycle anyway, so I don't think there's much that can be abused here.
Perhaps someone else knows of a more concrete case of abusing a flash in
software so it fails to retain its data?.)
/Ricard
--
Ricard Wolf Wanderlöf ricardw(at)axis.com
Axis Communications AB, Lund, Sweden www.axis.com
Phone +46 46 272 2016 Fax +46 46 13 61 30
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-10 13:12 ` Juergen Borleis
@ 2014-11-11 9:23 ` Richard Weinberger
0 siblings, 0 replies; 38+ messages in thread
From: Richard Weinberger @ 2014-11-11 9:23 UTC (permalink / raw)
To: Juergen Borleis, linux-mtd; +Cc: jlauruhn, tlinder
Am 10.11.2014 um 14:12 schrieb Juergen Borleis:
> Hi Richard,
>
> On Monday 10 November 2014 13:35:26 Richard Weinberger wrote:
> [...]
>> Can you share your test program? I'd like to run it also on one of my
>> boards.
>
> Please find the patch attached. Its intended for the Barebox bootloader and can
> be applied to its current git. The "-c" option seems currently somehow broken.
> I have to talk to Sascha first.
Tanya, can you please run this test program on your board too?
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-07 8:58 ` Artem Bityutskiy
@ 2014-11-11 20:36 ` Tanya Brokhman
2014-11-11 21:39 ` Richard Weinberger
2014-11-12 11:55 ` Artem Bityutskiy
0 siblings, 2 replies; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-11 20:36 UTC (permalink / raw)
To: dedekind1; +Cc: richard, jlauruhn, linux-mtd, linux-arm-msm
Hi Artem,
Hope I didn't drop any ccs this time... Sorry about that. Not on purpose.
On 11/7/2014 10:58 AM, Artem Bityutskiy wrote:
> On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote:
>> What I'm trying to say - it
>> may be too late and you may lose data here. "preferred to prevent rather
>> than cure".
>
> First of all, just to clarify, I do not have a goal of turning down your
> patches. I just want to understand why this is the best design, and if
> it is helpful to all Linux MTD users.
>
> Modern flashes have strong ECC codes protecting against many bit-flips.
> MTD even was modified to stop reporting about a single or few bit-flips,
> because those happen too often and they are "harmless", and do not
> require scrubbing. We have the threshold value in MTD for this, which is
> configurable, of course.
>
> Bit-flips develop slowly over time. If you get one more bit-flips, it is
> not too late yet. You can mitigate the "too late" part by reading more
> often of course.
>
> You also may lower the bit-flip threshold when reading for scrubbing.
>
> Could you try to "sell" your design in a way that it becomes clear why
> it is better than just reading the entire flash periodically.
Please see my "selling" bellow :)
Some hard
> experimental data would be preferable.
Unfortunately none. This is done for a new device that we received just
now. The development was done on a virtual machine with nandsim. Testing
was more of stability and regression
>
> The advantages of the "read all periodically" approach were:
>
> 1. Simple, no modifications needed
> 2. No need to write if the media is read-only, except when scrubbing
> happens.
> 3. Should cover all the NAND effects, including the "radiation" one.
Disadvantages (as I see it):
1. performance hit: when do you trigger the "read-all"? will effect
performance
2. finds bitflips only when they are present instead of preventing them
from happening
Perhaps our design is an overkill for this and not covering 100% of te
usecases. But it was requested by our customers to handle read-disturb
and data retention specifically (as in "prevent" and not just "fix").
This is due to a new NAND device that should operate in high temperature
and last for ~15-20 years.
But we did rethink this and we're dropping the "last erase timestamp"
that was used to handle "data retention". We will force-scrub all PEBs
once in a while (triggered by user) as Richard suggested.
We're keeping the read counters though. I know that not all
"read-disturb" scenarios are covered by this but it's more coverage then
we have at the moment. So not 100% perfect solution but better then none.
I will update the implementation and change the fastmap layout (as
suggested by Richard earlier) or try using internal UBI volume. Still
have some study to do on that...
Also, if not everyone will find this useful, I can add a feature flag
for disabling this functionality.
>
> And disadvantages of your design were:
>
> 1. Need modifications, rather large, changes binary format, needs more
> ram.
> 2. Does not cover all the NAND effects
> 3. Is not transparent to the user
Why not? (btw, agree with all the rest)
> 4. If system time is incorrectly set, may cause a storm of I/O
> (scrubbing) and may put the system to it's knees before user-space has a
> chance to fix-up the system time.
The triggering of the scrub will be handled by a userspace application.
It will be its responsibility to decide when and if to trigger the
scrubbing. We're taking into consideration the fact that system time
might not be available. But since it's a userspace app, can't discuss
implementation details (legal....)
> 5. Needs more writes on the R/O system (to maintain read counters)
Will rethink how to address this. Thanks for bringing my attention to this!
>
> Also, it is not clear if with your design we save energy. Reads a lot
> less need less energy than writes and erases (to maintain read
> counters). May you save energy comparing to the read-all periodically
> approach. May be not.
This is not a test I can perform unfortunately.
>
> Artem.
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-11 20:36 ` Tanya Brokhman
@ 2014-11-11 21:39 ` Richard Weinberger
2014-11-12 12:07 ` Artem Bityutskiy
2014-11-12 11:55 ` Artem Bityutskiy
1 sibling, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-11-11 21:39 UTC (permalink / raw)
To: Tanya Brokhman, dedekind1; +Cc: linux-arm-msm, jlauruhn, linux-mtd
Tanya,
Am 11.11.2014 um 21:36 schrieb Tanya Brokhman:
> Hi Artem,
>
> Hope I didn't drop any ccs this time... Sorry about that. Not on purpose.
>
> On 11/7/2014 10:58 AM, Artem Bityutskiy wrote:
>> On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote:
>>> What I'm trying to say - it
>>> may be too late and you may lose data here. "preferred to prevent rather
>>> than cure".
>>
>> First of all, just to clarify, I do not have a goal of turning down your
>> patches. I just want to understand why this is the best design, and if
>> it is helpful to all Linux MTD users.
>>
>> Modern flashes have strong ECC codes protecting against many bit-flips.
>> MTD even was modified to stop reporting about a single or few bit-flips,
>> because those happen too often and they are "harmless", and do not
>> require scrubbing. We have the threshold value in MTD for this, which is
>> configurable, of course.
>>
>> Bit-flips develop slowly over time. If you get one more bit-flips, it is
>> not too late yet. You can mitigate the "too late" part by reading more
>> often of course.
>>
>> You also may lower the bit-flip threshold when reading for scrubbing.
>>
>> Could you try to "sell" your design in a way that it becomes clear why
>> it is better than just reading the entire flash periodically.
>
> Please see my "selling" bellow :)
>
> Some hard
>> experimental data would be preferable.
>
> Unfortunately none. This is done for a new device that we received just now. The development was done on a virtual machine with nandsim. Testing was more of stability and regression
>
>>
>> The advantages of the "read all periodically" approach were:
>>
>> 1. Simple, no modifications needed
>> 2. No need to write if the media is read-only, except when scrubbing
>> happens.
>> 3. Should cover all the NAND effects, including the "radiation" one.
>
> Disadvantages (as I see it):
> 1. performance hit: when do you trigger the "read-all"? will effect performance
Only a stupid implementation will re-read/scrub all PEBs at once.
We can use a low priority thread. We can do this even in userspace.
> 2. finds bitflips only when they are present instead of preventing them from happening
We can scrub unconditionally.
Even if we scrub every PEB once a week the erase counters won't go up very much.
> Perhaps our design is an overkill for this and not covering 100% of te usecases. But it was requested by our customers to handle read-disturb and data retention specifically (as in
> "prevent" and not just "fix"). This is due to a new NAND device that should operate in high temperature and last for ~15-20 years.
>
> But we did rethink this and we're dropping the "last erase timestamp" that was used to handle "data retention". We will force-scrub all PEBs once in a while (triggered by user) as
> Richard suggested.
> We're keeping the read counters though. I know that not all "read-disturb" scenarios are covered by this but it's more coverage then we have at the moment. So not 100% perfect
> solution but better then none.
>
> I will update the implementation and change the fastmap layout (as suggested by Richard earlier) or try using internal UBI volume. Still have some study to do on that...
Please don't (ab)use fastmap. If you really need persistent read-counters use an internal UBI volume.
But I think that time-based unconditional scrubbing will also do it. As long we don't have sane threshold values
keeping counters is useless.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-11 20:36 ` Tanya Brokhman
2014-11-11 21:39 ` Richard Weinberger
@ 2014-11-12 11:55 ` Artem Bityutskiy
2014-11-13 12:13 ` Tanya Brokhman
1 sibling, 1 reply; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-12 11:55 UTC (permalink / raw)
To: Tanya Brokhman; +Cc: richard, jlauruhn, linux-mtd, linux-arm-msm
On Tue, 2014-11-11 at 22:36 +0200, Tanya Brokhman wrote:
> Unfortunately none. This is done for a new device that we received just
> now. The development was done on a virtual machine with nandsim. Testing
> was more of stability and regression
OK. So the implementation is theory-driven and misses the experimental
prove. This means that building a product based on this implementation
has certain amount of risk involved.
And from where I am, the theoretical base for the solution also does not
look very strong.
> > The advantages of the "read all periodically" approach were:
> >
> > 1. Simple, no modifications needed
> > 2. No need to write if the media is read-only, except when scrubbing
> > happens.
> > 3. Should cover all the NAND effects, including the "radiation" one.
>
> Disadvantages (as I see it):
> 1. performance hit: when do you trigger the "read-all"? will effect
> performance
Right. We do not know how often, just like we do not know how often and
how much (read counter threshold) in your proposal.
Performance - sure, matter of experiment, just like the performance of
your solution. And as I notice, energy too (read - battery life).
In your solution you have to do more work maintaining the counters and
writing them. With read solution you do more work reading data.
The promise that reading may be done in background, when there is no
other I/O.
> 2. finds bitflips only when they are present instead of preventing them
> from happening
But is this true? I do not see how is this true in your case. Yo want to
scrub by threshold, which is a theoretical value with very large
deviation from the real one. And there may be no real one even - the
real one depends on the erase block, it depends on the I/O patterns, and
it depends on the temperature.
You will end up scrubbing a lot earlier than needed. Here comes the
performance loss too (and energy). And you will eventually end up
scrubbing too late.
I do not see how your solution provides any hard guarantee. Please,
explain how do you guarantee that my PEB does not bit-rot earlier than
read counter reaches the threshold? It may bit-rot earlier because it is
close to be worn out, or because of just higher temperature, or because
it has a nano-defect.
> Perhaps our design is an overkill for this and not covering 100% of te
> usecases. But it was requested by our customers to handle read-disturb
> and data retention specifically (as in "prevent" and not just "fix").
> This is due to a new NAND device that should operate in high temperature
> and last for ~15-20 years.
I understand the whole customer orientation concept. But for me so far
the solution does not feel like something suitable to a customer I could
imagine. I mean, if I think about me as a potential customer, I would
just want my data to be safe and covered from all the NAND effects. I
would not want counters, I'd want the result. And in the proposed
solution I would not see how I'd get the guaranteed result. But of
course I do not know the customer requirements that you've got.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-11 21:39 ` Richard Weinberger
@ 2014-11-12 12:07 ` Artem Bityutskiy
2014-11-12 13:01 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-12 12:07 UTC (permalink / raw)
To: Richard Weinberger; +Cc: linux-arm-msm, jlauruhn, linux-mtd, Tanya Brokhman
On Tue, 2014-11-11 at 22:39 +0100, Richard Weinberger wrote:
> Please don't (ab)use fastmap. If you really need persistent read-counters use an internal UBI volume.
Just like you, I do not think the proposed solution is the right answer
to the problem, at least so far. But if we imagine that Tanya proves
that the counters is the right thing, storing them in fastmap would be
the first thing which comes to mind. Just calling this an abuse without
explaining (even if this is right) is not very collaborative.
Let me see why would that be an "abuse"... Probably because of the
nature of the data. Fastmap contains data which only changes in case of
writes (well, more precisely, erases, but those usually go are related
to writes). Read counters are completely opposite - they stay constant
when we write and change when we read.
Putting them all to the same on-flash area is possible, but is it
optimal? I wouldn't be so sure, I see cons. and pros.
Any other reasons?
Artem.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-12 12:07 ` Artem Bityutskiy
@ 2014-11-12 13:01 ` Richard Weinberger
2014-11-12 13:32 ` Artem Bityutskiy
0 siblings, 1 reply; 38+ messages in thread
From: Richard Weinberger @ 2014-11-12 13:01 UTC (permalink / raw)
To: dedekind1; +Cc: linux-arm-msm, jlauruhn, linux-mtd, Tanya Brokhman
Am 12.11.2014 um 13:07 schrieb Artem Bityutskiy:
> On Tue, 2014-11-11 at 22:39 +0100, Richard Weinberger wrote:
>> Please don't (ab)use fastmap. If you really need persistent read-counters use an internal UBI volume.
>
> Just like you, I do not think the proposed solution is the right answer
> to the problem, at least so far. But if we imagine that Tanya proves
> that the counters is the right thing, storing them in fastmap would be
> the first thing which comes to mind. Just calling this an abuse without
> explaining (even if this is right) is not very collaborative.
I explaind that already.
There have been so many mails on this topic that some facts may got lost.
Tanya stated that the read counters must not get lost.
But it can happen that you lose the fastmap. Fastmap is optional.
I.e. if you boot an older kernel it will delete the fastmap. If you run
out of PEBs which can be used by fastmap, fastmap has to delete the current fastmap.
Same for too many write errors, etc...
If we add the read-counters to fastmap we'd have to change the fastmap on-flash layout too.
(Unless we do very hacky tricks)
Also writing a fastmap is not cheap, we have to stop all IO. So, saving the read-counter will
be expensive and an performance problem.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-12 13:01 ` Richard Weinberger
@ 2014-11-12 13:32 ` Artem Bityutskiy
2014-11-12 15:37 ` Richard Weinberger
0 siblings, 1 reply; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-12 13:32 UTC (permalink / raw)
To: Richard Weinberger; +Cc: linux-arm-msm, jlauruhn, linux-mtd, Tanya Brokhman
[Sort of off-topic]
On Wed, 2014-11-12 at 14:01 +0100, Richard Weinberger wrote:
> Tanya stated that the read counters must not get lost.
I understood that this is more of "we try not to lose them, but if we
lose, we can deal with this".
> But it can happen that you lose the fastmap. Fastmap is optional.
And new data structure would be kind of optional too.
> I.e. if you boot an older kernel it will delete the fastmap. If you run
> out of PEBs which can be used by fastmap, fastmap has to delete the current fastmap.
> Same for too many write errors, etc...
It would be cool to document this in more details, say in the web site.
If someone uses fastmap, they probably need to know exactly when it
could "disappear", in order to try avoiding these conditions.
> If we add the read-counters to fastmap we'd have to change the fastmap on-flash layout too.
But this is not the end of the world. Fastmap is still an experimental
feature, and I personally consider it as "not yet proved to be ready for
production", because I did not hear success stories yet. It does not
mean there are no success stories. And this is just my perception, I may
be wrong. So while not touching on-flash format is always a good goal,
we may be less resistant about fastmap.
> (Unless we do very hacky tricks)
> Also writing a fastmap is not cheap, we have to stop all IO. So, saving the read-counter will
> be expensive and an performance problem.
For me this one sounds like a strong point. We do not really want to
make fastmap change more often.
Thanks,
Artem.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-12 13:32 ` Artem Bityutskiy
@ 2014-11-12 15:37 ` Richard Weinberger
0 siblings, 0 replies; 38+ messages in thread
From: Richard Weinberger @ 2014-11-12 15:37 UTC (permalink / raw)
To: dedekind1; +Cc: linux-arm-msm, jlauruhn, linux-mtd, Tanya Brokhman
Am 12.11.2014 um 14:32 schrieb Artem Bityutskiy:
> [Sort of off-topic]
>
> On Wed, 2014-11-12 at 14:01 +0100, Richard Weinberger wrote:
>> Tanya stated that the read counters must not get lost.
>
> I understood that this is more of "we try not to lose them, but if we
> lose, we can deal with this".
>
>> But it can happen that you lose the fastmap. Fastmap is optional.
>
> And new data structure would be kind of optional too.
Yeah, but it should be COMPAT_PRESERVE instead of COMPAT_DELETE.
>> I.e. if you boot an older kernel it will delete the fastmap. If you run
>> out of PEBs which can be used by fastmap, fastmap has to delete the current fastmap.
>> Same for too many write errors, etc...
>
> It would be cool to document this in more details, say in the web site.
> If someone uses fastmap, they probably need to know exactly when it
> could "disappear", in order to try avoiding these conditions.
Will file a patch against mtd-www.git!
>> If we add the read-counters to fastmap we'd have to change the fastmap on-flash layout too.
>
> But this is not the end of the world. Fastmap is still an experimental
> feature, and I personally consider it as "not yet proved to be ready for
> production", because I did not hear success stories yet. It does not
> mean there are no success stories. And this is just my perception, I may
> be wrong. So while not touching on-flash format is always a good goal,
> we may be less resistant about fastmap.
Yeah, if needed I will not block it.
>> (Unless we do very hacky tricks)
>> Also writing a fastmap is not cheap, we have to stop all IO. So, saving the read-counter will
>> be expensive and an performance problem.
>
> For me this one sounds like a strong point. We do not really want to
> make fastmap change more often.
Exactly.
Thanks,
//richard
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-12 11:55 ` Artem Bityutskiy
@ 2014-11-13 12:13 ` Tanya Brokhman
2014-11-13 13:36 ` Artem Bityutskiy
0 siblings, 1 reply; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-13 12:13 UTC (permalink / raw)
To: dedekind1; +Cc: richard, jlauruhn, linux-mtd, linux-arm-msm
On 11/12/2014 1:55 PM, Artem Bityutskiy wrote:
> On Tue, 2014-11-11 at 22:36 +0200, Tanya Brokhman wrote:
>> Unfortunately none. This is done for a new device that we received just
>> now. The development was done on a virtual machine with nandsim. Testing
>> was more of stability and regression
>
> OK. So the implementation is theory-driven and misses the experimental
> prove. This means that building a product based on this implementation
> has certain amount of risk involved.
>
> And from where I am, the theoretical base for the solution also does not
> look very strong.
>
>>> The advantages of the "read all periodically" approach were:
>>>
>>> 1. Simple, no modifications needed
>>> 2. No need to write if the media is read-only, except when scrubbing
>>> happens.
>>> 3. Should cover all the NAND effects, including the "radiation" one.
>>
>> Disadvantages (as I see it):
>> 1. performance hit: when do you trigger the "read-all"? will effect
>> performance
>
> Right. We do not know how often, just like we do not know how often and
> how much (read counter threshold) in your proposal.
>
> Performance - sure, matter of experiment, just like the performance of
> your solution. And as I notice, energy too (read - battery life).
>
> In your solution you have to do more work maintaining the counters and
> writing them. With read solution you do more work reading data.
But the maintaining work is minimal here. ++the counter on every read is
all that is required and verify it's value. O(1)...
Saving them on fastmap also doesn't add any more maintenance work. They
are saved as part of fastmap. I didn't increase the number of events
that trigger saving fastmat to flash. So all is changes is that the
number of scubbing events increased
>
> The promise that reading may be done in background, when there is no
> other I/O.
>
>> 2. finds bitflips only when they are present instead of preventing them
>> from happening
>
> But is this true? I do not see how is this true in your case. Yo want to
> scrub by threshold, which is a theoretical value with very large
> deviation from the real one. And there may be no real one even - the
> real one depends on the erase block, it depends on the I/O patterns, and
> it depends on the temperature.
I know... We got the threshold value (that is exposed in my patches as a
define you just missed it) from NAND manufacturer asking to take into
consideration the temperature the device will operate at. I know its
still an estimation but so is the program/erase threshold. Since it was
set by manufacturer - I think its the best one we can hope for.
>
> You will end up scrubbing a lot earlier than needed. Here comes the
> performance loss too (and energy). And you will eventually end up
> scrubbing too late.
I don't see why I would end up scrubbing too late?
>
> I do not see how your solution provides any hard guarantee. Please,
> explain how do you guarantee that my PEB does not bit-rot earlier than
> read counter reaches the threshold? It may bit-rot earlier because it is
> close to be worn out, or because of just higher temperature, or because
> it has a nano-defect.
I can't guarantee it wont bit-flip, I don't think any one could but I
can say that with my implementation the chance of bit-flip is reduced.
Even if not all the scenarios are covered. For example in the bellow
case I reduce the chance of data loss:
In an endless loop - read page 3 of PEB-A.
This will effect near by pages (say 4 and 2 for simplicity). But if I
scrub the whole PEB according to read-counter I will save data of pages
2 and 4.
If I do nothing: when reading eventually page 4 it will produce
bit-flips that may not be fixable.
>
>> Perhaps our design is an overkill for this and not covering 100% of te
>> usecases. But it was requested by our customers to handle read-disturb
>> and data retention specifically (as in "prevent" and not just "fix").
>> This is due to a new NAND device that should operate in high temperature
>> and last for ~15-20 years.
>
> I understand the whole customer orientation concept. But for me so far
> the solution does not feel like something suitable to a customer I could
> imagine. I mean, if I think about me as a potential customer, I would
> just want my data to be safe and covered from all the NAND effects.
I'm not sure that at the moment "all NAND effects" can be covered. In
our case the result is that we reduce the chance of loosing data. not to
0% unfortunately but still reduce.
And from the tests we ran we didn't observe performance hit with this
implementation. And the customer doesn't really care how this was done.
I do not know about power. Its possible that our implementation will
have negative effect on power consumption. I don't have the equipment to
verify that unfortunately.
There are plans to test this implementation in extreme temperature
conditions and get some real numbers and statistics on endurance. It
wasn't done yet and wont be done by us. When I get the results I'll try
to share (if allowed to by legal)
I
> would not want counters, I'd want the result. And in the proposed
> solution I would not see how I'd get the guaranteed result. But of
> course I do not know the customer requirements that you've got.
>
>
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-13 12:13 ` Tanya Brokhman
@ 2014-11-13 13:36 ` Artem Bityutskiy
2014-11-23 8:13 ` Tanya Brokhman
0 siblings, 1 reply; 38+ messages in thread
From: Artem Bityutskiy @ 2014-11-13 13:36 UTC (permalink / raw)
To: Tanya Brokhman; +Cc: richard, jlauruhn, linux-mtd, linux-arm-msm
On Thu, 2014-11-13 at 14:13 +0200, Tanya Brokhman wrote:
> > In your solution you have to do more work maintaining the counters and
> > writing them. With read solution you do more work reading data.
>
> But the maintaining work is minimal here. ++the counter on every read is
> all that is required and verify it's value. O(1)...
Let's consider the R/O FS on top of UBI case. Fastmap will only be
updated when there are erase operations, which may only be cause by
scrubbing in this case. IOW, fastmap will be updated extremely rarely.
And suppose there is no clean unmount ever happening.
Will we always lose erase counters and set them to half the threshold
all the time? Even if it was Threshold-1 before, it becomes Threshold/2
after power cut?
Don't we actually want to write the read counters when they change
significantly enough?
> I know... We got the threshold value (that is exposed in my patches as a
> define you just missed it) from NAND manufacturer asking to take into
> consideration the temperature the device will operate at. I know its
> still an estimation but so is the program/erase threshold. Since it was
> set by manufacturer - I think its the best one we can hope for.
I wonder how constant is the threshold.
* Does it change with time, as eraseblock becomes more worn out. Say,
the PEB resource is 10000 erase cycles. Will the threshold be the same
for PEB at 0 erase cycles and at 5000 erase cycles?
* Does it depend on eraseblock?
* Does it depend on the I/O in other eraseblocks?
Just wonder how pessimistic is the threshold number manufacturers give.
Just curious to learn more about this number, and have an idea about how
reliable is it.
> > You will end up scrubbing a lot earlier than needed. Here comes the
> > performance loss too (and energy). And you will eventually end up
> > scrubbing too late.
>
> I don't see why I would end up scrubbing too late?
Well, one example - see above, you lose the read counters often, always
reset to threshold/2, end up reading more than the threshold.
The other doubt is that the threshold you use is actually the right one
for a worst case usage scenario of the end product. But probably it is
about just learning more about this threshold value.
> I can't guarantee it wont bit-flip, I don't think any one could but I
> can say that with my implementation the chance of bit-flip is reduced.
That was my point. There is already a solution for the problem you are
trying to solve. It is implemented. And it covers not just the problem
you are solving, but the other problems of NAND.
So probably what is missing is some kind of better analysis or
experimental prove that the solution which is already implemented (let's
call it "periodic read") is defective.
May be I should expand a bit more on why the periodic read solution does
not look bad to me.
If the ECC is strong enough for the flash chip in question, then
bit-flips will accumulate slowly enough. First one bit-flip, then 2,
then 3, etc. All you need to do is to make your read period good enough
to make sure no PEB accumulates too many bit-flips.
E.g., modern ECCs cover 8 or more bit-flips.
And the other compelling point here that this will cover all other NAND
effects. All of them lead to more bit-flips at the end, right? And you
just fix bit-flips when they come. You do not care why they came. You
just deal with them.
And what is very nice is that you do not need to implement anything, or
you implement very little.
> In an endless loop - read page 3 of PEB-A.
> This will effect near by pages (say 4 and 2 for simplicity). But if I
> scrub the whole PEB according to read-counter I will save data of pages
> 2 and 4.
> If I do nothing: when reading eventually page 4 it will produce
> bit-flips that may not be fixable.
This is quite artificial example, but yes, if you read the same page in
a tight loop, you may cause bit flips fast enough, faster than your
periodic read task starts reading your media.
But first of all, how realistic is this scenario? I am sure not,
especially if there is an FS on top of UBI and the data are cached, so
the second read actually reads from RAM.
Secondly, can this scenario be covered by simpler means? Say, UBI could
watch the read ratio, and if it grows, trigger the scrubber task
earlier?
> > I understand the whole customer orientation concept. But for me so far
> > the solution does not feel like something suitable to a customer I could
> > imagine. I mean, if I think about me as a potential customer, I would
> > just want my data to be safe and covered from all the NAND effects.
>
> I'm not sure that at the moment "all NAND effects" can be covered.
I explained how I see it above in this e-mail. In short: read all data
often enough ("enough" is defined by your product), and you are done.
All "NAND effects" lead to bit-flips, you fix bit-flips faster than they
become hard errors, and you are done.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling
2014-11-13 13:36 ` Artem Bityutskiy
@ 2014-11-23 8:13 ` Tanya Brokhman
0 siblings, 0 replies; 38+ messages in thread
From: Tanya Brokhman @ 2014-11-23 8:13 UTC (permalink / raw)
To: dedekind1; +Cc: richard, jlauruhn, linux-mtd, linux-arm-msm
Hi Artem/Richard
On 11/13/2014 3:36 PM, Artem Bityutskiy wrote:
>
> I explained how I see it above in this e-mail. In short: read all data
> often enough ("enough" is defined by your product), and you are done.
> All "NAND effects" lead to bit-flips, you fix bit-flips faster than they
> become hard errors, and you are done.
>
We decided to drop this solution and stay with "force scrub" all PEBs
from time to time, triggered from userspace.
Thank you all for your inputs and comments! It was very helpful in
coming to this decision.
Thanks,
Tanya Brokhman
--
Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2014-11-23 8:13 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-26 13:49 [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling Tanya Brokhman
2014-10-26 20:39 ` Richard Weinberger
2014-10-27 8:41 ` Tanya Brokhman
2014-10-27 8:56 ` Richard Weinberger
2014-10-29 11:03 ` Tanya Brokhman
2014-10-29 12:00 ` Richard Weinberger
2014-10-31 13:12 ` Tanya Brokhman
2014-10-31 15:34 ` Richard Weinberger
2014-10-31 15:39 ` Richard Weinberger
2014-10-31 22:55 ` Jeff Lauruhn (jlauruhn)
2014-11-02 13:30 ` Tanya Brokhman
2014-11-07 9:21 ` Artem Bityutskiy
2014-11-02 13:25 ` Tanya Brokhman
2014-11-06 8:07 ` Artem Bityutskiy
2014-11-06 12:16 ` Tanya Brokhman
2014-11-07 8:55 ` Artem Bityutskiy
2014-11-07 8:58 ` Artem Bityutskiy
2014-11-11 20:36 ` Tanya Brokhman
2014-11-11 21:39 ` Richard Weinberger
2014-11-12 12:07 ` Artem Bityutskiy
2014-11-12 13:01 ` Richard Weinberger
2014-11-12 13:32 ` Artem Bityutskiy
2014-11-12 15:37 ` Richard Weinberger
2014-11-12 11:55 ` Artem Bityutskiy
2014-11-13 12:13 ` Tanya Brokhman
2014-11-13 13:36 ` Artem Bityutskiy
2014-11-23 8:13 ` Tanya Brokhman
2014-11-02 13:23 ` Tanya Brokhman
2014-11-02 13:54 ` Richard Weinberger
2014-11-02 14:12 ` Tanya Brokhman
2014-11-02 17:02 ` Richard Weinberger
2014-11-02 17:18 ` Tanya Brokhman
[not found] <201411101307.03225.jbe@pengutronix.de>
2014-11-10 12:35 ` Richard Weinberger
2014-11-10 13:12 ` Juergen Borleis
2014-11-11 9:23 ` Richard Weinberger
2014-11-10 13:13 ` Ricard Wanderlof
2014-11-10 13:42 ` Juergen Borleis
2014-11-10 13:52 ` Ricard Wanderlof
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).