* Testing a device using mtd_stresstest
@ 2011-01-31 12:12 David Peverley
2011-02-06 14:24 ` Artem Bityutskiy
0 siblings, 1 reply; 12+ messages in thread
From: David Peverley @ 2011-01-31 12:12 UTC (permalink / raw)
To: linux-mtd
Hi all,
I've got a new board that has been experiencing occasional issues that
look to be MTD related so I've been running the tests from linux
2.6.31.12 from under drivers/mtd/tests to see if I can trigger
failures. It's worth mentioning also that I've built our kernel with
CONFIG_MTD_NAND_VERIFY_WRITE enabled. The flash we're using is a
Micron MT29F2G08AADWP, see datasheet :
http://download.micron.com/pdf/datasheets/flash/nand/2_4_8gb_nand_m49a.pdf
(Same part but different operating voltage)
The only caveat is that that this new board has changed to our
previous board in that the R/B line is now fixed high and instead of
being connected to a GPIO and a 25uS chip_delay has been specified.
I was wondering if anyone might have experienced anything similar and
might be able to nudge me in the right direction on this one :
Question 1 : The mtd_subpagetest (which I suspect should fail as the
device doesn't support sub-pages). I googled around and found a
reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the
options. I tried this and it made no difference. Out of curiosity I
grepped through drivers/mtd and found that *no* drivers actully use
this bit anyway...! Is it reasonable to ignore this or ought I address
it? Should I set the flag and expect it to have an effect?
Question 2 : The mtd_stresstest test fails after anywhere between 1000
and 200,000 operations. I'm certain this is a Bad Sign. It fails in
nand_base.c:nand_write_page() in the verification step enabled by
MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that
ostensibly works fine) it failed the stress test after 2.6M operations
instead. Should I be expecting to never see a failure of the stress
test or is an occasional verify failure reasonably expected?
Thanks for any suggestions!
~Pev
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Testing a device using mtd_stresstest 2011-01-31 12:12 Testing a device using mtd_stresstest David Peverley @ 2011-02-06 14:24 ` Artem Bityutskiy 2011-02-07 13:44 ` David Peverley 0 siblings, 1 reply; 12+ messages in thread From: Artem Bityutskiy @ 2011-02-06 14:24 UTC (permalink / raw) To: David Peverley; +Cc: linux-mtd Hi, On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote: > Question 1 : The mtd_subpagetest (which I suspect should fail as the > device doesn't support sub-pages). I googled around and found a > reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the > options. I tried this and it made no difference. Out of curiosity I > grepped through drivers/mtd and found that *no* drivers actully use > this bit anyway...! Is it reasonable to ignore this or ought I address > it? Should I set the flag and expect it to have an effect? MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes errors when sub-pages are used. You should either disable this configuration option or fix MTD. We have this in our FAQ: http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail > Question 2 : The mtd_stresstest test fails after anywhere between 1000 > and 200,000 operations. I'm certain this is a Bad Sign. It fails in > nand_base.c:nand_write_page() in the verification step enabled by > MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that > ostensibly works fine) it failed the stress test after 2.6M operations > instead. Should I be expecting to never see a failure of the stress > test or is an occasional verify failure reasonably expected? Yes, the test is expected to never fail. You should try to dig and understand why is it failing and what is the reason. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-06 14:24 ` Artem Bityutskiy @ 2011-02-07 13:44 ` David Peverley 2011-02-07 14:01 ` Andrew Murray ` (4 more replies) 0 siblings, 5 replies; 12+ messages in thread From: David Peverley @ 2011-02-07 13:44 UTC (permalink / raw) To: dedekind1; +Cc: linux-mtd Hi Artem, Many thanks for the response ; > MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes > errors when sub-pages are used. You should either disable this > configuration option or fix MTD. We have this in our FAQ: > > http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail I'm not sure what the implication of this is ; I understand that this will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE enabled. However, the FAQ I had discounted as we use YAFFS2 and not UBIFS. Given that should I still disable the write verify? At the moment I'm inclined to leave it enabled as it seems to be regularly catching failures that should not occur, such as the stress-test failures noted. We've also noticed that every so often we see "uncorrectable error:" messages from nand_ecc.c - do you have any suggestions as to where to start investigating here? So far I can't find a pattern to occurrences or a regular way to reproduce. Thanks again! ~Pev On 6 February 2011 14:24, Artem Bityutskiy <dedekind1@gmail.com> wrote: > Hi, > > On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote: >> Question 1 : The mtd_subpagetest (which I suspect should fail as the >> device doesn't support sub-pages). I googled around and found a >> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the >> options. I tried this and it made no difference. Out of curiosity I >> grepped through drivers/mtd and found that *no* drivers actully use >> this bit anyway...! Is it reasonable to ignore this or ought I address >> it? Should I set the flag and expect it to have an effect? > > MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes > errors when sub-pages are used. You should either disable this > configuration option or fix MTD. We have this in our FAQ: > > http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail > > >> Question 2 : The mtd_stresstest test fails after anywhere between 1000 >> and 200,000 operations. I'm certain this is a Bad Sign. It fails in >> nand_base.c:nand_write_page() in the verification step enabled by >> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that >> ostensibly works fine) it failed the stress test after 2.6M operations >> instead. Should I be expecting to never see a failure of the stress >> test or is an occasional verify failure reasonably expected? > > Yes, the test is expected to never fail. You should try to dig and > understand why is it failing and what is the reason. > > -- > Best Regards, > Artem Bityutskiy (Артём Битюцкий) > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-07 13:44 ` David Peverley @ 2011-02-07 14:01 ` Andrew Murray 2011-02-07 14:39 ` Arno Steffen ` (3 subsequent siblings) 4 siblings, 0 replies; 12+ messages in thread From: Andrew Murray @ 2011-02-07 14:01 UTC (permalink / raw) To: David Peverley; +Cc: linux-mtd, dedekind1 On 7 February 2011 13:44, David Peverley <pev@sketchymonkey.com> wrote: > We've also noticed that every so often we see "uncorrectable error:" > messages from nand_ecc.c - do you have any suggestions as to where to > start investigating here? So far I can't find a pattern to occurrences > or a regular way to reproduce. Are you using hardware ECC? We switched from SW ECC to HW ECC and discovered that YAFFS2 was writing over the ECC in the OOB. It may be wroth checking for any such overlap of OOB use between the kernel-mtd/YAFFS2 and whatever boot loaders you may have. We worked around this issue by using in-band tags in our YAFFS2 image. Thanks, Andrew Murray ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-07 13:44 ` David Peverley 2011-02-07 14:01 ` Andrew Murray @ 2011-02-07 14:39 ` Arno Steffen 2011-02-10 15:24 ` David Peverley [not found] ` <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com> ` (2 subsequent siblings) 4 siblings, 1 reply; 12+ messages in thread From: Arno Steffen @ 2011-02-07 14:39 UTC (permalink / raw) To: David Peverley; +Cc: linux-mtd, dedekind1 I did some same observations. Especially I digged around with the subpage issue. As options failes, I did patch the nand_base.c to set this bit. At least test doesn't fail anymore. But this is a far from perfect solution. With this uncorrectable errors: I struggled with it for month until I found that there has been some patch (for OMAP). I assume this is for TI OMAP only, but I don't know, what processor do you use. There are still some other issues in with jffs2, I reported. It seems nobody here cares about. Artem has fixed one of the reported bugs into ubifs, but this doesn't help me much. JFFS2 is without support - as far as I could see. Best regards Arno 2011/2/7 David Peverley <pev@sketchymonkey.com>: > Hi Artem, > > Many thanks for the response ; > >> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes >> errors when sub-pages are used. You should either disable this >> configuration option or fix MTD. We have this in our FAQ: >> >> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail > I'm not sure what the implication of this is ; I understand that this > will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE > enabled. However, the FAQ I had discounted as we use YAFFS2 and not > UBIFS. Given that should I still disable the write verify? At the > moment I'm inclined to leave it enabled as it seems to be regularly > catching failures that should not occur, such as the stress-test > failures noted. > > We've also noticed that every so often we see "uncorrectable error:" > messages from nand_ecc.c - do you have any suggestions as to where to > start investigating here? So far I can't find a pattern to occurrences > or a regular way to reproduce. > > Thanks again! > > ~Pev > > On 6 February 2011 14:24, Artem Bityutskiy <dedekind1@gmail.com> wrote: >> Hi, >> >> On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote: >>> Question 1 : The mtd_subpagetest (which I suspect should fail as the >>> device doesn't support sub-pages). I googled around and found a >>> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the >>> options. I tried this and it made no difference. Out of curiosity I >>> grepped through drivers/mtd and found that *no* drivers actully use >>> this bit anyway...! Is it reasonable to ignore this or ought I address >>> it? Should I set the flag and expect it to have an effect? >> >> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes >> errors when sub-pages are used. You should either disable this >> configuration option or fix MTD. We have this in our FAQ: >> >> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail >> >> >>> Question 2 : The mtd_stresstest test fails after anywhere between 1000 >>> and 200,000 operations. I'm certain this is a Bad Sign. It fails in >>> nand_base.c:nand_write_page() in the verification step enabled by >>> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that >>> ostensibly works fine) it failed the stress test after 2.6M operations >>> instead. Should I be expecting to never see a failure of the stress >>> test or is an occasional verify failure reasonably expected? >> >> Yes, the test is expected to never fail. You should try to dig and >> understand why is it failing and what is the reason. >> >> -- >> Best Regards, >> Artem Bityutskiy (Артём Битюцкий) >> >> > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-07 14:39 ` Arno Steffen @ 2011-02-10 15:24 ` David Peverley 2011-02-18 10:59 ` Arno Steffen 0 siblings, 1 reply; 12+ messages in thread From: David Peverley @ 2011-02-10 15:24 UTC (permalink / raw) To: Arno Steffen; +Cc: linux-mtd, dedekind1 Hi Arno, Is the patch you refer to the addition of dmb() to nand_command_lp() that I found discussed on TI's E2E board? : http://e2e.ti.com/support/embedded/f/354/p/56710/234039.aspx Digging around I managed to find someones GIT commit with some description at : http://arago-project.org/git/people/?p=sriram/ti-psp-omap.git;a=commitdiff;h=76319aa1a321c4b5981e412bf489cfb617186c2f "When using delay loop for wait states, need to ascertain that the write to OMAP HW register is reflected befor the delay loop starts. This patch adds a dmb() instruction to this effect. Without this fix, NAND read failures reported with mtd_oobtests." That's really interesting as it's not completely dis-similar to the idea behind the call to gpio_nand_dosync() found in the gpio nand driver (mtd/nand/gpio.c) ; in that calls which should be effecting changes in hardware are not occurring synchronously which we'd like in these cases... : "Make sure the GPIO state changes occur in-order with writes to NAND memory region. Needed on PXA due to bus-reordering within the SoC itself (see section on I/O ordering in PXA manual (section 2.3, p35)" Which was discussed in more detail here : http://patchwork.ozlabs.org/patch/3260/ and here : http://patchwork.ozlabs.org/patch/3738/ Interestingly in the former link, the approach of using a generic memory barrier has been mooted but the verdict was that it wasn't the right mechanism to enforce this. Additionally, the author of the driver I'm debugging has added a udelay(2) at the equivalent position of the first call to gpio_nand_dosync() in the gpio driver with a comment noting that GPIO's "seemed a bit slow and was causing the signal to not be set"... Also, we're using a delay loop (chip_delay) as R/B isn't plumbed in, so all in all I'm wondering if we're observing something similar. I suspect I need to spend a while reading through the datasheet..! (PC302) As far as I can tell, the GPIO NAND driver is only used by the Compulab ARMCORE with a PXA255 CPU, so the manual in question can be found at : http://www.xscale-freak.com/XSDoc/PXA255/27869302.pdf where indeed, section 2.3 covers I/O ordering. Cheers, ~Pev On 7 February 2011 14:39, Arno Steffen <arno.steffen@googlemail.com> wrote: > I did some same observations. Especially I digged around with the subpage issue. > As options failes, I did patch the nand_base.c to set this bit. > At least test doesn't fail anymore. But this is a far from perfect solution. > > With this uncorrectable errors: I struggled with it for month until I > found that there has been some patch (for OMAP). > I assume this is for TI OMAP only, but I don't know, what processor do you use. > > There are still some other issues in with jffs2, I reported. It seems > nobody here cares about. > Artem has fixed one of the reported bugs into ubifs, but this doesn't > help me much. > JFFS2 is without support - as far as I could see. > > Best regards > Arno > > 2011/2/7 David Peverley <pev@sketchymonkey.com>: >> Hi Artem, >> >> Many thanks for the response ; >> >>> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes >>> errors when sub-pages are used. You should either disable this >>> configuration option or fix MTD. We have this in our FAQ: >>> >>> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail >> I'm not sure what the implication of this is ; I understand that this >> will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE >> enabled. However, the FAQ I had discounted as we use YAFFS2 and not >> UBIFS. Given that should I still disable the write verify? At the >> moment I'm inclined to leave it enabled as it seems to be regularly >> catching failures that should not occur, such as the stress-test >> failures noted. >> >> We've also noticed that every so often we see "uncorrectable error:" >> messages from nand_ecc.c - do you have any suggestions as to where to >> start investigating here? So far I can't find a pattern to occurrences >> or a regular way to reproduce. >> >> Thanks again! >> >> ~Pev >> >> On 6 February 2011 14:24, Artem Bityutskiy <dedekind1@gmail.com> wrote: >>> Hi, >>> >>> On Mon, 2011-01-31 at 12:12 +0000, David Peverley wrote: >>>> Question 1 : The mtd_subpagetest (which I suspect should fail as the >>>> device doesn't support sub-pages). I googled around and found a >>>> reference that maybe I should add NAND_NO_SUBPAGE_WRITE to the >>>> options. I tried this and it made no difference. Out of curiosity I >>>> grepped through drivers/mtd and found that *no* drivers actully use >>>> this bit anyway...! Is it reasonable to ignore this or ought I address >>>> it? Should I set the flag and expect it to have an effect? >>> >>> MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes >>> errors when sub-pages are used. You should either disable this >>> configuration option or fix MTD. We have this in our FAQ: >>> >>> http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail >>> >>> >>>> Question 2 : The mtd_stresstest test fails after anywhere between 1000 >>>> and 200,000 operations. I'm certain this is a Bad Sign. It fails in >>>> nand_base.c:nand_write_page() in the verification step enabled by >>>> MTD_NAND_VERIFY_WRITE. When I tested this on our previous board (that >>>> ostensibly works fine) it failed the stress test after 2.6M operations >>>> instead. Should I be expecting to never see a failure of the stress >>>> test or is an occasional verify failure reasonably expected? >>> >>> Yes, the test is expected to never fail. You should try to dig and >>> understand why is it failing and what is the reason. >>> >>> -- >>> Best Regards, >>> Artem Bityutskiy (Артём Битюцкий) >>> >>> >> >> ______________________________________________________ >> Linux MTD discussion mailing list >> http://lists.infradead.org/mailman/listinfo/linux-mtd/ >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-10 15:24 ` David Peverley @ 2011-02-18 10:59 ` Arno Steffen 0 siblings, 0 replies; 12+ messages in thread From: Arno Steffen @ 2011-02-18 10:59 UTC (permalink / raw) To: David Peverley; +Cc: linux-mtd, dedekind1 2011/2/10 David Peverley <pev@sketchymonkey.com>: > Hi Arno, > > Is the patch you refer to the addition of dmb() to nand_command_lp() > that I found discussed on TI's E2E board? : > http://e2e.ti.com/support/embedded/f/354/p/56710/234039.aspx > > Digging around I managed to find someones GIT commit with some description at : > http://arago-project.org/git/people/?p=sriram/ti-psp-omap.git;a=commitdiff;h=76319aa1a321c4b5981e412bf489cfb617186c2f > Yepp, that's exactly what helps me. Before that patch it was a nightmare. I still have one issue with jffs2 - and I don't know - which of the behaviours is right or wrong. Generating a jffs2 will end with data , lets say at 0x140 (it's an almost empty filesystem). Rest of partition is 0xff. If I do changes in filesystem it writes blocks of 0x800 bytes, and the not used data are set to 0x00. And then I get a message from mdt module, that there is empty flash between 0x140 and 0x800. As kernel writes always blocks of 0x800 byte, this message will not occure here. So does my mkfs.jffs2 should pad from 0x140 .. 0x800 with 0x00 to work properly? Or does my kernel / mtd driver misbehaves ? I am courious whether you know, what is right or wrong. ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com>]
* Re: Testing a device using mtd_stresstest [not found] ` <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com> @ 2011-02-10 12:30 ` David Peverley 0 siblings, 0 replies; 12+ messages in thread From: David Peverley @ 2011-02-10 12:30 UTC (permalink / raw) To: Andrew Murray; +Cc: linux-mtd, dedekind1 Hi Andy! Actually, we're using SW ECC so I'd assumed that we should be free of this particular issue. It in our configuration MTD is controlling the OOB area which reserves the first two bytes for BB marking and the last 24 bytes for ECC. The remaining 38 bytes are in oobfree for YAFFS to use. As I understand it the yaffs packed tags struct stored there is 12 bytes so I *think* it's OK? Cheers, ~Pev On 7 February 2011 13:56, Andrew Murray <amurray@mpcdata.com> wrote: > On 7 February 2011 13:44, David Peverley <pev@sketchymonkey.com> wrote: >> >> We've also noticed that every so often we see "uncorrectable error:" >> messages from nand_ecc.c - do you have any suggestions as to where to >> start investigating here? So far I can't find a pattern to occurrences >> or a regular way to reproduce. > > Are you using hardware ECC? > We switched from SW ECC to HW ECC and discovered that YAFFS2 was writing > over the ECC in the OOB. It may be wroth checking for any such overlap of > OOB use between the kernel-mtd/YAFFS2 and whatever boot loaders you may > have. > We worked around this issue by using in-band tags in our YAFFS2 image. > Thanks, > Andrew Murray ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com>]
* Re: Testing a device using mtd_stresstest [not found] ` <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com> @ 2011-02-11 14:25 ` David Peverley 2011-02-11 15:16 ` Artem Bityutskiy 0 siblings, 1 reply; 12+ messages in thread From: David Peverley @ 2011-02-11 14:25 UTC (permalink / raw) To: Karl Beldan; +Cc: linux-mtd, dedekind1 Hi Karl, Sometimes... When they are I've manually marked blocks as bad and re-started the test run :-D I had wondered about whether it was blocks naturally failing with use, but some of these blocks haven't been (Ok, SHOULDNT have been!) erased anywhere near 100,000 times so I'm suspicious about this. When I looked into it I noted that the mtd_stresstest.ko doesn't mark blocks as bad ever so this is potentially something that would occur. However, nandtest.c in mtd-utils I notice *does* support marking of bad blocks during testing. Should I consider using this instead? I'm not sure what the relationship between these test tools is...? Cheers, ~Pev On 11 February 2011 13:58, Karl Beldan <karl.beldan@gmail.com> wrote: > On Mon, Feb 7, 2011 at 2:44 PM, David Peverley <pev@sketchymonkey.com> > wrote: >> >> Hi Artem, >> >> Many thanks for the response ; >> >> > MTD code is currently broken and CONFIG_MTD_NAND_VERIFY_WRITE causes >> > errors when sub-pages are used. You should either disable this >> > configuration option or fix MTD. We have this in our FAQ: >> > >> > http://www.linux-mtd.infradead.org/faq/ubi.html#L_subpage_verify_fail >> I'm not sure what the implication of this is ; I understand that this >> will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE >> enabled. However, the FAQ I had discounted as we use YAFFS2 and not >> UBIFS. Given that should I still disable the write verify? At the >> moment I'm inclined to leave it enabled as it seems to be regularly >> catching failures that should not occur, such as the stress-test >> failures noted. >> >> We've also noticed that every so often we see "uncorrectable error:" >> messages from nand_ecc.c - do you have any suggestions as to where to >> start investigating here? So far I can't find a pattern to occurrences >> or a regular way to reproduce. >> > > By any chance, wouldn't those "uncorrectable errors" happen to be within the > same pages/blocks each time ? > -- > Karl ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-11 14:25 ` David Peverley @ 2011-02-11 15:16 ` Artem Bityutskiy 2011-02-11 16:42 ` David Peverley 0 siblings, 1 reply; 12+ messages in thread From: Artem Bityutskiy @ 2011-02-11 15:16 UTC (permalink / raw) To: David Peverley; +Cc: Karl Beldan, linux-mtd On Fri, 2011-02-11 at 14:25 +0000, David Peverley wrote: > Sometimes... When they are I've manually marked blocks as bad and > re-started the test run :-D I had wondered about whether it was blocks > naturally failing with use, but some of these blocks haven't been (Ok, > SHOULDNT have been!) erased anywhere near 100,000 times so I'm > suspicious about this. When I looked into it I noted that the > mtd_stresstest.ko doesn't mark blocks as bad ever so this is > potentially something that would occur. Yeah, I think the tests should not do this, they should just test and report you issues. > However, nandtest.c in > mtd-utils I notice *does* support marking of bad blocks during > testing. Should I consider using this instead? I'm not sure what the > relationship between these test tools is...? These are tools wirtten by different people at different times. Kernel MTD tests were written by Nokia guys and I think the tests are more or less consistent in how they behave. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-11 15:16 ` Artem Bityutskiy @ 2011-02-11 16:42 ` David Peverley 0 siblings, 0 replies; 12+ messages in thread From: David Peverley @ 2011-02-11 16:42 UTC (permalink / raw) To: dedekind1; +Cc: Karl Beldan, linux-mtd Hi Artem, Thanks for the useful feedback! > I do not know YAFFS, ask Charles, but I _think_ YAFFS does not use > sub-pages, so you han have that option enabled. Yep, I posted to the YAFFS list ; interestingly the write verify failure found a case in yaffs where a write failure wasn't tested and (I believe) erroneously completes a checkpoint write as a result...! Having said that from > For sure, if you do not use sub-pages and it catches problems - have it > enabled and nail the problems down. Sure, that makes sense. Although I'm having fun trying to differentiate issues ; I get both the write verify failures and the "uncorrectable errors" and there's not necessarily a direct correlation between occurrences so my gut tells me they're to separate issues...! > Yeah, I think the tests should not do this, they should just test and > report you issues. Ok, so to summarise my understanding so far, a failure during the stresstest is likely to be one of two things ; either a failure due to a bad block developing which is not unexpected or an actual failure of the test case per-se, or it could be due to Something Else Bad which is not expected and IS a test failure. The only way I can see to differentiate between these two situations is via statistics. i.e. if a block is repeatedly failing its likely bad. If random blocks are failing separately its probably something else warranting investigation. Is that correct? If so would it be more useful to adapt the (kernel) stresstest so that it doesn't abort the test run on a failure but instead keeps a tally of blocks within which failures have occurred and runs to completion. Does that sound like a beneficial change? I'm not sure what strategy is used for discerning if a block is bad or not but nandtest.c from mtd-utils simply marks if an erase or write fails at all so this would hopefully give more useful feedback from the stresstest. Aborting the test on what could be a normal bad block seems a little misleading, although I'm admittedly an unusually ardent fan of tests being unambiguous... :-) Thanks! ~Pev ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Testing a device using mtd_stresstest 2011-02-07 13:44 ` David Peverley ` (3 preceding siblings ...) [not found] ` <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com> @ 2011-02-11 15:12 ` Artem Bityutskiy 4 siblings, 0 replies; 12+ messages in thread From: Artem Bityutskiy @ 2011-02-11 15:12 UTC (permalink / raw) To: David Peverley; +Cc: linux-mtd On Mon, 2011-02-07 at 13:44 +0000, David Peverley wrote: > I'm not sure what the implication of this is ; I understand that this > will cause the subpage test to fail with CONFIG_MTD_NAND_VERIFY_WRITE > enabled. However, the FAQ I had discounted as we use YAFFS2 and not > UBIFS. Given that should I still disable the write verify? At the > moment I'm inclined to leave it enabled as it seems to be regularly > catching failures that should not occur, such as the stress-test > failures noted. I do not know YAFFS, ask Charles, but I _think_ YAFFS does not use sub-pages, so you han have that option enabled. For sure, if you do not use sub-pages and it catches problems - have it enabled and nail the problems down. > We've also noticed that every so often we see "uncorrectable error:" > messages from nand_ecc.c - do you have any suggestions as to where to > start investigating here? So far I can't find a pattern to occurrences > or a regular way to reproduce. Not really - this can be incorrect timings or bad HW. I cannot give you good suggestions. -- Best Regards, Artem Bityutskiy (Артём Битюцкий) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-02-18 10:59 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-31 12:12 Testing a device using mtd_stresstest David Peverley
2011-02-06 14:24 ` Artem Bityutskiy
2011-02-07 13:44 ` David Peverley
2011-02-07 14:01 ` Andrew Murray
2011-02-07 14:39 ` Arno Steffen
2011-02-10 15:24 ` David Peverley
2011-02-18 10:59 ` Arno Steffen
[not found] ` <AANLkTi=UgsLq3=7ma9MPJJBVxtNvyr=ThtLPy8qzC3Bk@mail.gmail.com>
2011-02-10 12:30 ` David Peverley
[not found] ` <AANLkTimJFv1Uy2c70ewPUHYH58rQHT=VsDa3ioU9hJZh@mail.gmail.com>
2011-02-11 14:25 ` David Peverley
2011-02-11 15:16 ` Artem Bityutskiy
2011-02-11 16:42 ` David Peverley
2011-02-11 15:12 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox