linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Alex Smith <alex.smith@imgtec.com>
To: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: <linux-mtd@lists.infradead.org>,
	Brian Norris <computersforpeace@gmail.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"Zubair Lutfullah Kakakhel" <Zubair.Kakakhel@imgtec.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report timeouts
Date: Mon, 7 Sep 2015 15:38:01 +0100	[thread overview]
Message-ID: <55EDA149.6040002@imgtec.com> (raw)
In-Reply-To: <20150906203726.GA7745@laptop.cereza>

Hi Ezequiel,

Thanks for reviewing the series.

On 06/09/2015 21:37, Ezequiel Garcia wrote:
> On 27 Jul 02:50 PM, Alex Smith wrote:
>> If nand_wait_ready() times out, this is silently ignored, and its
>> caller will then proceed to read from/write to the chip before it is
>> ready. This can potentially result in corruption with no indication as
>> to why.
>>
>> While a 20ms timeout seems like it should be plenty enough, certain
>> behaviour can cause it to timeout much earlier than expected. The
>> situation which prompted this change was that CPU 0, which is
>> responsible for updating jiffies, was holding interrupts disabled
>> for a fairly long time while writing to the console during a printk,
>> causing several jiffies updates to be delayed. If CPU 1 happens to
>> enter the timeout loop in nand_wait_ready() just before CPU 0 re-
>> enables interrupts and updates jiffies, CPU 1 will immediately time
>> out when the delayed jiffies updates are made. The result of this is
>> that nand_wait_ready() actually waits less time than the NAND chip
>> would normally take to be ready, and then read_page() proceeds to
>> read out bad data from the chip.
>>
>> The situation described above may seem unlikely, but in fact it can be
>> reproduced almost every boot on the MIPS Creator Ci20.
>>
> 
> Not only unlikely but scary :) BTW, can't find SMP patches for Ci20,
> are you sure this behavior will apply once SMP is upstreamed?

Certainly made for fun debugging ;)

SMP support only exists in our 3.18 branch [1] at the moment, which was where this problem was encountered. Support should be upstreamed at some point, and I would guess that this behaviour could still happen then (even though it's a really obscure edge case that we were somehow managing to almost always hit on boot).

[1] https://github.com/MIPS/CI20_linux

> 
>> Debugging this was made more difficult by the misleading comment above
>> nand_wait_ready() stating "The timeout is caught later" - no timeout
>> was ever reported, leading me away from the real source of the problem.
>>
>> Therefore, this patch increases the timeout to 200ms. This should be
>> enough to cover cases where jiffies updates get delayed. Additionally,
>> add a pr_warn() when a timeout does occur so that it is easier to
>> pinpoint any problems in future caused by the chip not becoming ready.
>>
>> Signed-off-by: Alex Smith <alex.smith@imgtec.com>
>> Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
>> Cc: David Woodhouse <dwmw2@infradead.org>
>> Cc: Brian Norris <computersforpeace@gmail.com>
>> Cc: linux-mtd@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> v3 -> v4:
>>  - New patch to fix issue encountered in external Ci20 3.18 kernel
>>    branch which also applies upstream.
>> ---
>>  drivers/mtd/nand/nand_base.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca8277a..a0dab3414f16 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
>>  	}
>>  }
>>  
>> -/* Wait for the ready pin, after a command. The timeout is caught later. */
>> +/**
>> + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
>> + * @mtd: MTD device structure
>> + *
>> + * Wait for the ready pin after a command, and warn if a timeout occurs.
>> + */
>>  void nand_wait_ready(struct mtd_info *mtd)
>>  {
>>  	struct nand_chip *chip = mtd->priv;
>> -	unsigned long timeo = jiffies + msecs_to_jiffies(20);
>> +	unsigned long timeo = jiffies + msecs_to_jiffies(200);
>>  
>>  	/* 400ms timeout */
>>  	if (in_interrupt() || oops_in_progress)
>>  		return panic_nand_wait_ready(mtd, 400);
>>  
>>  	led_trigger_event(nand_led_trigger, LED_FULL);
>> +
> 
> Spurious change here.

Removed.

> 
>>  	/* Wait until command is processed or timeout occurs */
>>  	do {
>>  		if (chip->dev_ready(mtd))
>> -			break;
>> +			goto out;
>>  		touch_softlockup_watchdog();
>>  	} while (time_before(jiffies, timeo));
>> +
>> +	pr_warn("timeout while waiting for chip to become ready\n");
>> +out:
>>  	led_trigger_event(nand_led_trigger, LED_OFF);
>>  }
> 
> This change looks reasonable, a timeout value should be large enough
> to be confident the operation has _really_ timed out. On non-error
> path, this change shouldn't make any difference.
> 
> And the warning is probably helpful too, so:
> 
> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>

Great, thanks.

Alex

  reply	other threads:[~2015-09-07 14:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-27 13:50 [PATCH v4 0/3] mtd: nand: jz4780: Add NAND and BCH drivers Alex Smith
2015-07-27 13:50 ` [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report timeouts Alex Smith
2015-09-06 20:37   ` Ezequiel Garcia
2015-09-07 14:38     ` Alex Smith [this message]
2015-07-27 13:50 ` [PATCH v4 2/3] dt-bindings: binding for jz4780-{nand,bch} Alex Smith
2015-09-07 17:16   ` Ezequiel Garcia
2015-07-27 14:21 ` [PATCH v4 3/3] mtd: nand: jz4780: driver for NAND devices on JZ4780 SoCs Alex Smith
2015-09-06 21:21   ` Ezequiel Garcia
2015-09-07 14:52     ` Alex Smith
2015-08-18 12:39 ` [PATCH v4 0/3] mtd: nand: jz4780: Add NAND and BCH drivers Alex Smith
2015-08-26 14:21   ` Alex Smith
2015-09-06 20:38 ` Ezequiel Garcia
2015-09-07 14:54   ` Alex Smith
2015-09-07 16:00     ` Ezequiel Garcia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EDA149.6040002@imgtec.com \
    --to=alex.smith@imgtec.com \
    --cc=Zubair.Kakakhel@imgtec.com \
    --cc=computersforpeace@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=ezequiel@vanguardiasur.com.ar \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).