linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: boris.brezillon@free-electrons.com (Boris Brezillon)
To: linux-arm-kernel@lists.infradead.org
Subject: pxa3xx_nand times out in 4.14 with JFFS2
Date: Sun, 17 Dec 2017 13:33:55 +0100	[thread overview]
Message-ID: <20171217133355.618e5f92@bbrezillon> (raw)
In-Reply-To: <20171217120503.GA3323@1wt.eu>

Hi Willy,

On Sun, 17 Dec 2017 13:05:03 +0100
Willy Tarreau <w@1wt.eu> wrote:

> Hello,
> 
> I recently bought a Linksys WRT1900ACS which hosts an Armada 385 and a
> NAND flash. While I could get OpenWRT to work flawlessly on it using
> kernel 4.4, mainline 4.14.6 fails with a lot of such messages :
> 
>     pxa3xx-nand f10d0000.flash: Wait time out!!!
> 
> Looking a bit closer, I found that it was triggered by my boot scripts
> detecting the JFFS2 signature (0x1985) and trying to mount mtdblock5. But
> under openwrt's kernel this partition mounts pretty fine.
> 
> I tried to read both /dev/mtd5 and /dev/mtdblock5 it using cat then dd
> and got no issue either, so it seems that JFFS2 triggers a specific
> operation causing the flash (or driver) to fail. By the way, here's the
> device identification :
> 
> [    0.638155] pxa3xx-nand f10d0000.flash: This platform can't do DMA on this device
> [    0.644661] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xf1
> [    0.649732] nand: AMD/Spansion S34ML01G2
> [    0.652369] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
> [    0.658662] pxa3xx-nand f10d0000.flash: ECC strength 16, ECC step size 2048
> 
> I found a few threads on the net where people were changing the timeout
> in the driver, and OpenWRT has this larger timeout, so naturally I thought
> it was the way to fix it and I tried, but surprizingly nothing changed at
> all.
> 
> I found a long troubleshooting session in the list's archives where
> CONFIG_PREEMPT had to be enabled to work around the problem here :
> 
>    https://patchwork.ozlabs.org/patch/566837/
> 
> So I tried to enable it as well but it didn't change anything for me.
> 
> I enabled Robert's readl/writel debug traces and have added some messages
> at the entrance and exit of every single function in the driver to try to
> spot a bit more of it. I've placed the traces there :
> 
>    http://1wt.eu/wrt1900acs/
> 
> (Please note that the config above doesn't have PREEMPT enabled but it
> matches the kernel used to produce the dumps there). Also you'll notice
> in the config that I added the out-of-tree mwlwifi driver; I did it late
> in the evening after I spent my whole day trying to fix the NAND issue,
> so it's irrelevant. However if some prefer that I re-run traces without
> it I will.
> 
> I purposely split the trace in two steps : one before trying to mount the
> JFFS2 fs, and the second one during and after the mount attempt. I also
> took another trace with JFFS2 debuging enabled in case that helps.
> 
> I noticed that timeouts always occur after such a sequence :
> 
>   > pxa3xx_nand_irq:845  
>   pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():854 nand_readl(0x0014) = 0x2
>   < pxa3xx_nand_irq:932
>   < pxa3xx_nand_irq:933
>   > pxa3xx_nand_irq_thread:829
>   > handle_data_pio:728
>   > drain_fifo:693  
>   < drain_fifo:723
>   > drain_fifo:693  
>   < drain_fifo:723
>   < handle_data_pio:761
>   pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq_thread():833 nand_writel(0x6, 0x0014)
>   < pxa3xx_nand_irq_thread:835
> 
> So in short we enter pxa3xx_nand_irq, read register 0x0014 and find 2,
> then later we enter pxa3xx_nand_irq_thread(), handle_data_pio(), leave
> it, and issue a nand_writel() to register 0x0014 and leave
> pxa3xx_nand_irq_thread(). And at this point we wait till the timeout
> strikes (apparently the CHIP_DELAY_TIMEOUT since I'm observing 2 seconds
> and that's what it's set to).
> 
> Interestingly, this morning I left the machine hung like this and came
> back 20 minutes later to find that mtdblock5 had finally been successfully
> mounted (1322 seconds later precisely). I don't know if something suddenly
> decided to work, or if an operation is performed 650 times and waits for
> this 2 second timeout for each operation. I've left it running again to
> see if it happens again.
> 
> I have not tested 4.15-rc3 on it yet, though the only change to the
> driver is irrelevant to this issue. I also didn't go back to older
> kernels as I'm not sure when this machine was supposed to start to be
> supported (the DTS came in 4.12 but given that people started to complain
> about the timeout in openwrt's 4.4, it's not clear).
> 
> At this point I'm out of ideas so if anyone wants me to test specific
> config options to report more info, or to test patches, they're welcome!
> There's no emergency for me to get this machine to work properly, it's
> expected to replace my current one, but only once it works, so in the
> mean time it's more of a development platform lying on my desk :-)

You should have a look at this thread [1], and in case you don't want
to read everything, you can just test the solution proposed here [2].

[1]http://linux-mtd.infradead.narkive.com/Rd5UaRPO/bug-pxa3xx-wait-time-out-when-scanning-for-bb
[2]http://patchwork.ozlabs.org/patch/847411/

  reply	other threads:[~2017-12-17 12:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-17 12:05 pxa3xx_nand times out in 4.14 with JFFS2 Willy Tarreau
2017-12-17 12:33 ` Boris Brezillon [this message]
2017-12-17 13:17   ` Willy Tarreau
2017-12-17 14:25     ` Ezequiel Garcia
2017-12-17 14:27 ` Ezequiel Garcia
2017-12-17 14:53   ` Boris Brezillon
2017-12-17 15:00     ` Willy Tarreau
2017-12-17 15:09       ` Willy Tarreau
2017-12-17 15:53       ` Ezequiel Garcia
2017-12-17 16:23         ` Willy Tarreau
2017-12-17 18:07           ` Boris Brezillon
2017-12-17 19:00             ` Willy Tarreau
2017-12-17 21:01               ` Ezequiel Garcia
2017-12-17 21:16                 ` Willy Tarreau
2017-12-17 21:26                   ` Boris Brezillon
2017-12-17 21:46                     ` Miquel RAYNAL
2017-12-18  6:37                       ` Willy Tarreau
2017-12-18  7:06                         ` Willy Tarreau
2017-12-18 10:22                           ` Miquel RAYNAL
2017-12-18 21:52                             ` Willy Tarreau
2017-12-19  0:13                               ` Miquel RAYNAL
2017-12-19  5:34                                 ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171217133355.618e5f92@bbrezillon \
    --to=boris.brezillon@free-electrons.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).