* Timeout in denali.c on Micron nandflash (Altera SoC) @ 2017-03-07 13:32 Thorsten Christiansson 2017-03-09 22:09 ` Richard Weinberger 0 siblings, 1 reply; 5+ messages in thread From: Thorsten Christiansson @ 2017-03-07 13:32 UTC (permalink / raw) To: linux-mtd; +Cc: laurent.monat@idquantique.com Hi all, I'm using Linux on an Altera SoC (Arria V), on which I'm using UBIFS on a nandflash from Micron (MT29F8G08ADADAH4). I have a 400Mb r/w partition on which I have a sqlite3-based database. We're running an application that reads/writes fairly small blocks. After running for about a week at moderate load, I get an error message, and the filesystem becomes read-only. The message I get is a timeout, originating in the denali.c driver. [11744.733748] timeout occurred, status = 0x0, mask = 0x4 [11745.733685] timeout occurred, status = 0x0, mask = 0x120 I can also reproduce the error much faster (in ~1h) using the GNU 'stress' command, writing/reading small files continuously. I'm using Linux 4.4, with some patches from Altera. I have compared the denali.c that I'm using with the current HEAD on github, and the differences appear to be only cosmetic. I have asked Altera for help, but their only response so far has been that they can reproduce the issue on their latest SoCs (it apparently appears on both Arria10 and CycloneV) with the same flash. (We have also tested with a Macronix MX66U51235FMI-10G, with the same results.) At first we used the FASTMAP feature of the UBIFS, but then we ran into this issue after only a couple of hours running at moderate load. When we disabled that, we thought the problem was gone, but it appears that it was only hiding, and now comes out to bite us after about a week. My questions are the following: - Are there any known issues with the denali driver that could cause this? - Could it be an issue in the MTD/UBI/UBIFS layers? - Are there any other parameters that can be tuned in order to alleviate the problem? and of course - Have I missed something obvious? I'm pulling my hair here... best regards, -- Thorsten Christiansson ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout in denali.c on Micron nandflash (Altera SoC) 2017-03-07 13:32 Timeout in denali.c on Micron nandflash (Altera SoC) Thorsten Christiansson @ 2017-03-09 22:09 ` Richard Weinberger 2017-03-14 11:12 ` Thorsten Christiansson 2017-03-15 12:06 ` Thorsten Christiansson 0 siblings, 2 replies; 5+ messages in thread From: Richard Weinberger @ 2017-03-09 22:09 UTC (permalink / raw) To: Thorsten Christiansson Cc: linux-mtd@lists.infradead.org, laurent.monat@idquantique.com, David Oberhollenzer Thorsten, On Tue, Mar 7, 2017 at 2:32 PM, Thorsten Christiansson <thorsten.christiansson@idquantique.com> wrote: > Hi all, > > I'm using Linux on an Altera SoC (Arria V), on which I'm using UBIFS on a > nandflash from Micron (MT29F8G08ADADAH4). I have a 400Mb r/w partition on > which I have a sqlite3-based database. We're running an application that > reads/writes fairly small blocks. After running for about a week at moderate > load, I get an error message, and the filesystem becomes read-only. > > The message I get is a timeout, originating in the denali.c driver. > [11744.733748] timeout occurred, status = 0x0, mask = 0x4 > [11745.733685] timeout occurred, status = 0x0, mask = 0x120 > > I can also reproduce the error much faster (in ~1h) using the GNU 'stress' > command, writing/reading small files continuously. > > I'm using Linux 4.4, with some patches from Altera. I have compared the > denali.c that I'm using with the current HEAD on github, and the differences > appear to be only cosmetic. > > I have asked Altera for help, but their only response so far has been that > they can reproduce the issue on their latest SoCs (it apparently appears on > both Arria10 and CycloneV) with the same flash. (We have also tested with a > Macronix MX66U51235FMI-10G, with the same results.) > > At first we used the FASTMAP feature of the UBIFS, but then we ran into this > issue after only a couple of hours running at moderate load. When we > disabled > that, we thought the problem was gone, but it appears that it was only > hiding, > and now comes out to bite us after about a week. > > > My questions are the following: > - Are there any known issues with the denali driver that could cause this? Well, 4.4. is not very fresh. Maybe it saw fixes in recent version. > - Could it be an issue in the MTD/UBI/UBIFS layers? Since the denali driver prints the timeouts I'd say the root of the problem is there. > - Are there any other parameters that can be tuned in order to alleviate the > problem? > > and of course > - Have I missed something obvious? I'm pulling my hair here... Can you please give MTD tests a try? -- Thanks, //richard ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout in denali.c on Micron nandflash (Altera SoC) 2017-03-09 22:09 ` Richard Weinberger @ 2017-03-14 11:12 ` Thorsten Christiansson 2017-03-15 12:06 ` Thorsten Christiansson 1 sibling, 0 replies; 5+ messages in thread From: Thorsten Christiansson @ 2017-03-14 11:12 UTC (permalink / raw) To: Richard Weinberger Cc: linux-mtd@lists.infradead.org, laurent.monat@idquantique.com, David Oberhollenzer Hello Richard, On advice from Altera, I have been testing using JFFS2 directly on top of MTD, and thereby bypassing the UBI completely. At first it looked promising, but I have seen the same 'timeout' here as well. I will get back to running some tests with an up-to-date kernel, and will keep you posted. Thanks, Thorsten On 03/09/2017 11:09 PM, Richard Weinberger wrote: > Thorsten, > > On Tue, Mar 7, 2017 at 2:32 PM, Thorsten Christiansson > <thorsten.christiansson@idquantique.com> wrote: >> Hi all, >> >> I'm using Linux on an Altera SoC (Arria V), on which I'm using UBIFS on a >> nandflash from Micron (MT29F8G08ADADAH4). I have a 400Mb r/w partition on >> which I have a sqlite3-based database. We're running an application that >> reads/writes fairly small blocks. After running for about a week at moderate >> load, I get an error message, and the filesystem becomes read-only. >> >> The message I get is a timeout, originating in the denali.c driver. >> [11744.733748] timeout occurred, status = 0x0, mask = 0x4 >> [11745.733685] timeout occurred, status = 0x0, mask = 0x120 >> >> I can also reproduce the error much faster (in ~1h) using the GNU 'stress' >> command, writing/reading small files continuously. >> >> I'm using Linux 4.4, with some patches from Altera. I have compared the >> denali.c that I'm using with the current HEAD on github, and the differences >> appear to be only cosmetic. >> >> I have asked Altera for help, but their only response so far has been that >> they can reproduce the issue on their latest SoCs (it apparently appears on >> both Arria10 and CycloneV) with the same flash. (We have also tested with a >> Macronix MX66U51235FMI-10G, with the same results.) >> >> At first we used the FASTMAP feature of the UBIFS, but then we ran into this >> issue after only a couple of hours running at moderate load. When we >> disabled >> that, we thought the problem was gone, but it appears that it was only >> hiding, >> and now comes out to bite us after about a week. >> >> >> My questions are the following: >> - Are there any known issues with the denali driver that could cause this? > Well, 4.4. is not very fresh. Maybe it saw fixes in recent version. > >> - Could it be an issue in the MTD/UBI/UBIFS layers? > Since the denali driver prints the timeouts I'd say the root of the > problem is there. > >> - Are there any other parameters that can be tuned in order to alleviate the >> problem? >> >> and of course >> - Have I missed something obvious? I'm pulling my hair here... > Can you please give MTD tests a try? > -- Thorsten Christiansson Security Engineer ID Quantique thorsten.christiansson@idquantique.com <mailto:thorsten.christiansson@idquantique.com> Tel: +41 22 301 8373 Fax: +41 22 301 8379 www.idquantique.com <https://www.idquantique.com> https://twitter.com/IDQuantique https://www.linkedin.com/company/id-quantique-sa ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout in denali.c on Micron nandflash (Altera SoC) 2017-03-09 22:09 ` Richard Weinberger 2017-03-14 11:12 ` Thorsten Christiansson @ 2017-03-15 12:06 ` Thorsten Christiansson 2017-03-15 12:53 ` Boris Brezillon 1 sibling, 1 reply; 5+ messages in thread From: Thorsten Christiansson @ 2017-03-15 12:06 UTC (permalink / raw) To: Richard Weinberger Cc: linux-mtd@lists.infradead.org, laurent.monat@idquantique.com Hi all, Follow-up: > > I'm using Linux on an Altera SoC (Arria V), on which I'm using UBIFS on a > > nandflash from Micron (MT29F8G08ADADAH4). I have a 400Mb r/w partition on > > which I have a sqlite3-based database. We're running an application that > > reads/writes fairly small blocks. After running for about a week at moderate > > load, I get an error message, and the filesystem becomes read-only. > > > > The message I get is a timeout, originating in the denali.c driver. > > [11744.733748] timeout occurred, status = 0x0, mask = 0x4 > > [11745.733685] timeout occurred, status = 0x0, mask = 0x120 > Well, 4.4. is not very fresh. Maybe it saw fixes in recent version. > Can you please give MTD tests a try? I have now got a clean 4.10 up and running, and the MTD tests show the same error even quicker: # insmod /lib/modules/4.10.0/kernel/drivers/mtd/tests/mtd_stresstest.ko dev=1 [ 488.721072] [ 488.722575] ================================================= [ 488.728337] mtd_stresstest: MTD device: 1 [ 488.732342] mtd_stresstest: MTD device size 536870912, eraseblock size 131072, page size 2048, count of eraseblocks 4096, pages per eraseblock 64, OOB size 64 [ 488.748845] mtd_test: scanning for bad eraseblocks [ 488.756947] mtd_test: scanned 4096 eraseblocks, 0 are bad [ 488.762322] mtd_stresstest: doing operations [ 488.766601] mtd_stresstest: 0 operations done [ 490.243583] timeout occurred, status = 0x4, mask = 0x3 [ 492.003933] timeout occurred, status = 0x4, mask = 0x3 [ 493.363590] timeout occurred, status = 0x4, mask = 0x3 [ 494.483584] timeout occurred, status = 0x4, mask = 0x3 [ 495.603585] timeout occurred, status = 0x4, mask = 0x3 [ 496.723582] timeout occurred, status = 0x4, mask = 0x3 [ 498.083600] timeout occurred, status = 0x4, mask = 0x3 [ 499.203582] timeout occurred, status = 0x4, mask = 0x3 [ 500.323921] timeout occurred, status = 0x4, mask = 0x3 [ 501.523590] timeout occurred, status = 0x4, mask = 0x3 [ 502.723584] timeout occurred, status = 0x4, mask = 0x3 [ 503.843583] timeout occurred, status = 0x4, mask = 0x3 [ 505.203584] timeout occurred, status = 0x4, mask = 0x3 ^C[ 506.883588] timeout occurred, status = 0x4, mask = 0x3 [ 506.913867] mtd_stresstest: aborting test due to pending signal! [ 506.919940] mtd_stresstest: error -4 occurred [ 506.924320] ================================================= Any ideas on how to go forwards from here are very welcome. regards, Thorsten Christiansson Security Engineer ID Quantique thorsten.christiansson@idquantique.com <mailto:thorsten.christiansson@idquantique.com> Tel: +41 22 301 8373 Fax: +41 22 301 8379 <https://www.linkedin.com/company/id-quantique-sa> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Timeout in denali.c on Micron nandflash (Altera SoC) 2017-03-15 12:06 ` Thorsten Christiansson @ 2017-03-15 12:53 ` Boris Brezillon 0 siblings, 0 replies; 5+ messages in thread From: Boris Brezillon @ 2017-03-15 12:53 UTC (permalink / raw) To: Thorsten Christiansson, Masahiro Yamada Cc: Richard Weinberger, laurent.monat@idquantique.com, linux-mtd@lists.infradead.org +Masahiro On Wed, 15 Mar 2017 13:06:46 +0100 Thorsten Christiansson <thorsten.christiansson@idquantique.com> wrote: > Hi all, > > Follow-up: > > > > I'm using Linux on an Altera SoC (Arria V), on which I'm using UBIFS > on a > > > nandflash from Micron (MT29F8G08ADADAH4). I have a 400Mb r/w > partition on > > > which I have a sqlite3-based database. We're running an application that > > > reads/writes fairly small blocks. After running for about a week at > moderate > > > load, I get an error message, and the filesystem becomes read-only. > > > > > > The message I get is a timeout, originating in the denali.c driver. > > > [11744.733748] timeout occurred, status = 0x0, mask = 0x4 > > > [11745.733685] timeout occurred, status = 0x0, mask = 0x120 > > > Well, 4.4. is not very fresh. Maybe it saw fixes in recent version. > > Can you please give MTD tests a try? > > I have now got a clean 4.10 up and running, and the MTD tests show the > same error even quicker: > # insmod /lib/modules/4.10.0/kernel/drivers/mtd/tests/mtd_stresstest.ko > dev=1 > [ 488.721072] > [ 488.722575] ================================================= > [ 488.728337] mtd_stresstest: MTD device: 1 > [ 488.732342] mtd_stresstest: MTD device size 536870912, eraseblock > size 131072, page size 2048, count of eraseblocks 4096, pages per > eraseblock 64, OOB size 64 > [ 488.748845] mtd_test: scanning for bad eraseblocks > [ 488.756947] mtd_test: scanned 4096 eraseblocks, 0 are bad > [ 488.762322] mtd_stresstest: doing operations > [ 488.766601] mtd_stresstest: 0 operations done > [ 490.243583] timeout occurred, status = 0x4, mask = 0x3 > [ 492.003933] timeout occurred, status = 0x4, mask = 0x3 > [ 493.363590] timeout occurred, status = 0x4, mask = 0x3 > [ 494.483584] timeout occurred, status = 0x4, mask = 0x3 > [ 495.603585] timeout occurred, status = 0x4, mask = 0x3 > [ 496.723582] timeout occurred, status = 0x4, mask = 0x3 > [ 498.083600] timeout occurred, status = 0x4, mask = 0x3 > [ 499.203582] timeout occurred, status = 0x4, mask = 0x3 > [ 500.323921] timeout occurred, status = 0x4, mask = 0x3 > [ 501.523590] timeout occurred, status = 0x4, mask = 0x3 > [ 502.723584] timeout occurred, status = 0x4, mask = 0x3 > [ 503.843583] timeout occurred, status = 0x4, mask = 0x3 > [ 505.203584] timeout occurred, status = 0x4, mask = 0x3 > ^C[ 506.883588] timeout occurred, status = 0x4, mask = 0x3 > [ 506.913867] mtd_stresstest: aborting test due to pending signal! > [ 506.919940] mtd_stresstest: error -4 occurred > [ 506.924320] ================================================= > > Any ideas on how to go forwards from here are very welcome. Masahiro is currently reworking the driver, maybe he'll have some ideas. Regards, Boris ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-03-15 12:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-07 13:32 Timeout in denali.c on Micron nandflash (Altera SoC) Thorsten Christiansson 2017-03-09 22:09 ` Richard Weinberger 2017-03-14 11:12 ` Thorsten Christiansson 2017-03-15 12:06 ` Thorsten Christiansson 2017-03-15 12:53 ` Boris Brezillon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox