From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tanya Brokhman Subject: Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling Date: Mon, 27 Oct 2014 10:41:15 +0200 Message-ID: <544E052B.1040505@codeaurora.org> References: <1414331342-27839-1-git-send-email-tlinder@codeaurora.org> <544D5BEC.50802@nod.at> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from smtp.codeaurora.org ([198.145.11.231]:34876 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751258AbaJ0IlT (ORCPT ); Mon, 27 Oct 2014 04:41:19 -0400 In-Reply-To: <544D5BEC.50802@nod.at> Sender: linux-arm-msm-owner@vger.kernel.org List-Id: linux-arm-msm@vger.kernel.org To: Richard Weinberger , dedekind1@gmail.com Cc: linux-mtd@lists.infradead.org, linux-arm-msm@vger.kernel.org On 10/26/2014 10:39 PM, Richard Weinberger wrote: > Am 26.10.2014 um 14:49 schrieb Tanya Brokhman: >> One of the limitations of the NAND devices is the method used to rea= d >> NAND flash memory may cause bit-flips on the surrounding cells and r= esult >> in uncorrectable ECC errors. This is known as the read disturb or da= ta >> retention. >> >> Today=E2=80=99s Linux NAND drivers implementation doesn=E2=80=99t ad= dress the read disturb >> and the data retention limitations of the NAND devices. To date thes= e >> issues could be overlooked since the possibility of their occurrence= in >> today=E2=80=99s NAND devices is very low. >> >> With the evolution of NAND devices and the requirement for a =E2=80=9C= long life=E2=80=9D >> NAND flash, read disturb and data retention can no longer be ignored >> otherwise there will be data loss over time. >> >> The following patch set implements handling of Read-disturb and Data >> retention by the UBI layer. > > So, your patch addresses the following issue: > We need to re-read a PEB after a specific time (to detect bit rot) or= after N reads (to detect read disturb issues). > Is this correct? Not exactly... We need to scrub a PEB that is being frequently read fro= m=20 in order to prevent bit-flip errors that might occur due to read-distur= b > > Currently users of UBI do this by having cron jobs which read the com= plete UBI volume > and then cause scrub work. > The draw back of this is that only UBI payload will be read and not a= ll data like EC and VID headers. > I understand that you want to fix this issue. Not sure I completely understand what this crons do but the last patch=20 in the series does something similar. > > According to my opinion it is not a good idea to store read counters = and timestamps into the UBI/Fastmap on-disk layout. > Both the read counters and timestamps don't have to be exact values. Why not? Storing last_erase_timestamp doesn't increase the memory=20 consumption on NAND since I used reserved bytes in the ec_header. I=20 agree that the RAM is increased but I couldn't find any other way to=20 have these statistics saved. read_counters can be saved ONLY as part of fastmap unfortunately becaus= e=20 of the erase-before-write limitation. > > What about this idea? > Add a userspace interface which allows UBI to expose read counters an= d last access timestamps. Where will you save those? > A userspace daemon (let's name it ubihealthd) then can decide whether= it is time to trigger a re-read of a PEB. Not a re-read - scrub. read-disturb is fixed by erasing the PEB. > This daemon can also store and load the timestamp values and counters= from and to UBI. If it misses these meta data some times due to a > power cut it won't hurt. Not sure i follow. How is this better then doing this from the kernel?=20 you do have to store the timestamps and the read_counters somewhere and= =20 they are both updated in the ubi layer. I must be missing something=20 here. Could you please elaborate on your idea? > We could also add another internal UBI volume which can carry these d= ata. I'm afraid I have to disagree with this idea. First of all having a=20 dedicated volume for this data is an overkill. Its not a sufficient=20 amount of data to reserve a volume for. and what about the PEBs that=20 belong to this volume? Taking this feature out of the UBI layer is just= =20 complicated, feels wrong from design perspective, and I don't see the=20 benefit of it. Basically, its very similar to the wear-leveling but for= =20 "reads" instead of "writes". > > All in all, I like the idea but changing/extending the on-disk layout= is overkill IMHO. Why? Without addressing this issues we can't have devices with life spa= n=20 of more then ~5 years (and we need to). And this is very similar to=20 wear-leveling and erase counters. So why is read-counters and=20 erase_timestamp is an overkill? I'm working on your idea of changing the fastmap layout to save all the= =20 read disturb data at the end of it and not integrated into fastmap=20 existing data structures (as is done in this version of the code). But=20 as I see it, fastmap has to be updates as well. > > Thanks, > //richard > Thanks, Tanya Brokhman --=20 Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora=20 =46orum, a Linux Foundation Collaborative Project From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.codeaurora.org ([198.145.11.231]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1Xifrw-0007lp-8o for linux-mtd@lists.infradead.org; Mon, 27 Oct 2014 08:41:40 +0000 Message-ID: <544E052B.1040505@codeaurora.org> Date: Mon, 27 Oct 2014 10:41:15 +0200 From: Tanya Brokhman MIME-Version: 1.0 To: Richard Weinberger , dedekind1@gmail.com Subject: Re: [RFC/PATCH 0/5 v2] mtd:ubi: Read disturb and Data retention handling References: <1414331342-27839-1-git-send-email-tlinder@codeaurora.org> <544D5BEC.50802@nod.at> In-Reply-To: <544D5BEC.50802@nod.at> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: linux-arm-msm@vger.kernel.org, linux-mtd@lists.infradead.org List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 10/26/2014 10:39 PM, Richard Weinberger wrote: > Am 26.10.2014 um 14:49 schrieb Tanya Brokhman: >> One of the limitations of the NAND devices is the method used to read >> NAND flash memory may cause bit-flips on the surrounding cells and result >> in uncorrectable ECC errors. This is known as the read disturb or data >> retention. >> >> Today’s Linux NAND drivers implementation doesn’t address the read disturb >> and the data retention limitations of the NAND devices. To date these >> issues could be overlooked since the possibility of their occurrence in >> today’s NAND devices is very low. >> >> With the evolution of NAND devices and the requirement for a “long life” >> NAND flash, read disturb and data retention can no longer be ignored >> otherwise there will be data loss over time. >> >> The following patch set implements handling of Read-disturb and Data >> retention by the UBI layer. > > So, your patch addresses the following issue: > We need to re-read a PEB after a specific time (to detect bit rot) or after N reads (to detect read disturb issues). > Is this correct? Not exactly... We need to scrub a PEB that is being frequently read from in order to prevent bit-flip errors that might occur due to read-disturb > > Currently users of UBI do this by having cron jobs which read the complete UBI volume > and then cause scrub work. > The draw back of this is that only UBI payload will be read and not all data like EC and VID headers. > I understand that you want to fix this issue. Not sure I completely understand what this crons do but the last patch in the series does something similar. > > According to my opinion it is not a good idea to store read counters and timestamps into the UBI/Fastmap on-disk layout. > Both the read counters and timestamps don't have to be exact values. Why not? Storing last_erase_timestamp doesn't increase the memory consumption on NAND since I used reserved bytes in the ec_header. I agree that the RAM is increased but I couldn't find any other way to have these statistics saved. read_counters can be saved ONLY as part of fastmap unfortunately because of the erase-before-write limitation. > > What about this idea? > Add a userspace interface which allows UBI to expose read counters and last access timestamps. Where will you save those? > A userspace daemon (let's name it ubihealthd) then can decide whether it is time to trigger a re-read of a PEB. Not a re-read - scrub. read-disturb is fixed by erasing the PEB. > This daemon can also store and load the timestamp values and counters from and to UBI. If it misses these meta data some times due to a > power cut it won't hurt. Not sure i follow. How is this better then doing this from the kernel? you do have to store the timestamps and the read_counters somewhere and they are both updated in the ubi layer. I must be missing something here. Could you please elaborate on your idea? > We could also add another internal UBI volume which can carry these data. I'm afraid I have to disagree with this idea. First of all having a dedicated volume for this data is an overkill. Its not a sufficient amount of data to reserve a volume for. and what about the PEBs that belong to this volume? Taking this feature out of the UBI layer is just complicated, feels wrong from design perspective, and I don't see the benefit of it. Basically, its very similar to the wear-leveling but for "reads" instead of "writes". > > All in all, I like the idea but changing/extending the on-disk layout is overkill IMHO. Why? Without addressing this issues we can't have devices with life span of more then ~5 years (and we need to). And this is very similar to wear-leveling and erase counters. So why is read-counters and erase_timestamp is an overkill? I'm working on your idea of changing the fastmap layout to save all the read disturb data at the end of it and not integrated into fastmap existing data structures (as is done in this version of the code). But as I see it, fastmap has to be updates as well. > > Thanks, > //richard > Thanks, Tanya Brokhman -- Qualcomm Israel, on behalf of Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project