From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752532AbcBYXOh (ORCPT ); Thu, 25 Feb 2016 18:14:37 -0500 Received: from mail-pa0-f51.google.com ([209.85.220.51]:32925 "EHLO mail-pa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752141AbcBYXOf (ORCPT ); Thu, 25 Feb 2016 18:14:35 -0500 Date: Thu, 25 Feb 2016 15:14:32 -0800 From: Brian Norris To: Richard Weinberger Cc: Harvey Hunt , IMG-MIPSLinuxKerneldevelopers@imgtec.com, Alex Smith , Alex Smith , Zubair Lutfullah Kakakhel , David Woodhouse , Niklas Cassel , "linux-mtd@lists.infradead.org" , LKML , Boris Brezillon Subject: Re: [PATCH v7] mtd: nand: increase ready wait timeout and report timeouts Message-ID: <20160225231432.GN21465@google.com> References: <1444139527-14087-1-git-send-email-harvey.hunt@imgtec.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 25, 2016 at 11:54:25PM +0100, Richard Weinberger wrote: > On Tue, Oct 6, 2015 at 3:52 PM, Harvey Hunt wrote: > > From: Alex Smith [...] > > --- a/drivers/mtd/nand/nand_base.c > > +++ b/drivers/mtd/nand/nand_base.c > > @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo) > > } > > } > > > > -/* Wait for the ready pin, after a command. The timeout is caught later. */ > > +/** > > + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands. > > + * @mtd: MTD device structure > > + * > > + * Wait for the ready pin after a command, and warn if a timeout occurs. > > + */ > > void nand_wait_ready(struct mtd_info *mtd) > > { > > struct nand_chip *chip = mtd->priv; > > - unsigned long timeo = jiffies + msecs_to_jiffies(20); > > + unsigned long timeo = 400; > > > > - /* 400ms timeout */ > > if (in_interrupt() || oops_in_progress) > > - return panic_nand_wait_ready(mtd, 400); > > + return panic_nand_wait_ready(mtd, timeo); > > > > led_trigger_event(nand_led_trigger, LED_FULL); > > /* Wait until command is processed or timeout occurs */ > > + timeo = jiffies + msecs_to_jiffies(timeo); > > do { > > if (chip->dev_ready(mtd)) > > - break; > > - touch_softlockup_watchdog(); > > + goto out; > > + cond_resched(); > > } while (time_before(jiffies, timeo)); > > + > > + pr_warn_ratelimited( > > + "timeout while waiting for chip to become ready\n"); > > +out: > > Sorry for exhuming an already merged patch but Boris and I ran into > spurious chip timeouts > and hunted the issue down to this change. > If the system is under heavy load the cond_resched() will swap in > other threads and the > time_before() calculation will trigger and a wrong chip timeout is reported. > > It is also not clear to us why the cond_resched() is needed at all. > Can you please elaborate? I can't speak for the "why" precisely. It seemed reasonable to avoid a (potentially) 400 ms busy loop though, in the presence of other potential work. Regardless, this timeout loop is wrong. Shouldn't it have something like the following? diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index f2c8ff398d6c..596a9b0503da 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -566,8 +566,8 @@ void nand_wait_ready(struct mtd_info *mtd) cond_resched(); } while (time_before(jiffies, timeo)); - pr_warn_ratelimited( - "timeout while waiting for chip to become ready\n"); + if (!chip->dev_ready(mtd)) + pr_warn_ratelimited("timeout while waiting for chip to become ready\n"); out: led_trigger_event(nand_led_trigger, LED_OFF); }