All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Smith <alex.smith@imgtec.com>
To: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: <linux-mtd@lists.infradead.org>,
	Brian Norris <computersforpeace@gmail.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"Zubair Lutfullah Kakakhel" <Zubair.Kakakhel@imgtec.com>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report timeouts
Date: Mon, 7 Sep 2015 15:38:01 +0100	[thread overview]
Message-ID: <55EDA149.6040002@imgtec.com> (raw)
In-Reply-To: <20150906203726.GA7745@laptop.cereza>

Hi Ezequiel,

Thanks for reviewing the series.

On 06/09/2015 21:37, Ezequiel Garcia wrote:
> On 27 Jul 02:50 PM, Alex Smith wrote:
>> If nand_wait_ready() times out, this is silently ignored, and its
>> caller will then proceed to read from/write to the chip before it is
>> ready. This can potentially result in corruption with no indication as
>> to why.
>>
>> While a 20ms timeout seems like it should be plenty enough, certain
>> behaviour can cause it to timeout much earlier than expected. The
>> situation which prompted this change was that CPU 0, which is
>> responsible for updating jiffies, was holding interrupts disabled
>> for a fairly long time while writing to the console during a printk,
>> causing several jiffies updates to be delayed. If CPU 1 happens to
>> enter the timeout loop in nand_wait_ready() just before CPU 0 re-
>> enables interrupts and updates jiffies, CPU 1 will immediately time
>> out when the delayed jiffies updates are made. The result of this is
>> that nand_wait_ready() actually waits less time than the NAND chip
>> would normally take to be ready, and then read_page() proceeds to
>> read out bad data from the chip.
>>
>> The situation described above may seem unlikely, but in fact it can be
>> reproduced almost every boot on the MIPS Creator Ci20.
>>
> 
> Not only unlikely but scary :) BTW, can't find SMP patches for Ci20,
> are you sure this behavior will apply once SMP is upstreamed?

Certainly made for fun debugging ;)

SMP support only exists in our 3.18 branch [1] at the moment, which was where this problem was encountered. Support should be upstreamed at some point, and I would guess that this behaviour could still happen then (even though it's a really obscure edge case that we were somehow managing to almost always hit on boot).

[1] https://github.com/MIPS/CI20_linux

> 
>> Debugging this was made more difficult by the misleading comment above
>> nand_wait_ready() stating "The timeout is caught later" - no timeout
>> was ever reported, leading me away from the real source of the problem.
>>
>> Therefore, this patch increases the timeout to 200ms. This should be
>> enough to cover cases where jiffies updates get delayed. Additionally,
>> add a pr_warn() when a timeout does occur so that it is easier to
>> pinpoint any problems in future caused by the chip not becoming ready.
>>
>> Signed-off-by: Alex Smith <alex.smith@imgtec.com>
>> Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
>> Cc: David Woodhouse <dwmw2@infradead.org>
>> Cc: Brian Norris <computersforpeace@gmail.com>
>> Cc: linux-mtd@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> v3 -> v4:
>>  - New patch to fix issue encountered in external Ci20 3.18 kernel
>>    branch which also applies upstream.
>> ---
>>  drivers/mtd/nand/nand_base.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
>> index ceb68ca8277a..a0dab3414f16 100644
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
>>  	}
>>  }
>>  
>> -/* Wait for the ready pin, after a command. The timeout is caught later. */
>> +/**
>> + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
>> + * @mtd: MTD device structure
>> + *
>> + * Wait for the ready pin after a command, and warn if a timeout occurs.
>> + */
>>  void nand_wait_ready(struct mtd_info *mtd)
>>  {
>>  	struct nand_chip *chip = mtd->priv;
>> -	unsigned long timeo = jiffies + msecs_to_jiffies(20);
>> +	unsigned long timeo = jiffies + msecs_to_jiffies(200);
>>  
>>  	/* 400ms timeout */
>>  	if (in_interrupt() || oops_in_progress)
>>  		return panic_nand_wait_ready(mtd, 400);
>>  
>>  	led_trigger_event(nand_led_trigger, LED_FULL);
>> +
> 
> Spurious change here.

Removed.

> 
>>  	/* Wait until command is processed or timeout occurs */
>>  	do {
>>  		if (chip->dev_ready(mtd))
>> -			break;
>> +			goto out;
>>  		touch_softlockup_watchdog();
>>  	} while (time_before(jiffies, timeo));
>> +
>> +	pr_warn("timeout while waiting for chip to become ready\n");
>> +out:
>>  	led_trigger_event(nand_led_trigger, LED_OFF);
>>  }
> 
> This change looks reasonable, a timeout value should be large enough
> to be confident the operation has _really_ timed out. On non-error
> path, this change shouldn't make any difference.
> 
> And the warning is probably helpful too, so:
> 
> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>

Great, thanks.

Alex

  reply	other threads:[~2015-09-07 14:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-27 13:50 [PATCH v4 0/3] mtd: nand: jz4780: Add NAND and BCH drivers Alex Smith
2015-07-27 13:50 ` [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report timeouts Alex Smith
2015-09-06 20:37   ` Ezequiel Garcia
2015-09-07 14:38     ` Alex Smith [this message]
2015-07-27 13:50 ` [PATCH v4 2/3] dt-bindings: binding for jz4780-{nand,bch} Alex Smith
2015-07-27 13:50   ` Alex Smith
2015-09-07 17:16   ` Ezequiel Garcia
2015-07-27 14:21 ` [PATCH v4 3/3] mtd: nand: jz4780: driver for NAND devices on JZ4780 SoCs Alex Smith
2015-09-06 21:21   ` Ezequiel Garcia
2015-09-07 14:52     ` Alex Smith
2015-08-18 12:39 ` [PATCH v4 0/3] mtd: nand: jz4780: Add NAND and BCH drivers Alex Smith
2015-08-26 14:21   ` Alex Smith
2015-09-06 20:38 ` Ezequiel Garcia
2015-09-07 14:54   ` Alex Smith
2015-09-07 16:00     ` Ezequiel Garcia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EDA149.6040002@imgtec.com \
    --to=alex.smith@imgtec.com \
    --cc=Zubair.Kakakhel@imgtec.com \
    --cc=computersforpeace@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=ezequiel@vanguardiasur.com.ar \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.