From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752262AbbIPMBC (ORCPT ); Wed, 16 Sep 2015 08:01:02 -0400 Received: from bes.se.axis.com ([195.60.68.10]:56001 "EHLO bes.se.axis.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750989AbbIPMBA (ORCPT ); Wed, 16 Sep 2015 08:01:00 -0400 Message-ID: <55F959F9.7040102@axis.com> Date: Wed, 16 Sep 2015 14:00:57 +0200 From: Niklas Cassel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.7.0 MIME-Version: 1.0 To: Alex Smith , "linux-mtd@lists.infradead.org" CC: Alex Smith , Zubair Lutfullah Kakakhel , David Woodhouse , Brian Norris , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v6 1/4] mtd: nand: increase ready wait timeout and report timeouts References: <1442403417-5288-1-git-send-email-alex@alex-smith.me.uk> <1442403417-5288-2-git-send-email-alex@alex-smith.me.uk> In-Reply-To: <1442403417-5288-2-git-send-email-alex@alex-smith.me.uk> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/16/2015 01:36 PM, Alex Smith wrote: > From: Alex Smith > > If nand_wait_ready() times out, this is silently ignored, and its > caller will then proceed to read from/write to the chip before it is > ready. This can potentially result in corruption with no indication as > to why. > > While a 20ms timeout seems like it should be plenty enough, certain > behaviour can cause it to timeout much earlier than expected. The > situation which prompted this change was that CPU 0, which is > responsible for updating jiffies, was holding interrupts disabled > for a fairly long time while writing to the console during a printk, > causing several jiffies updates to be delayed. If CPU 1 happens to > enter the timeout loop in nand_wait_ready() just before CPU 0 re- > enables interrupts and updates jiffies, CPU 1 will immediately time > out when the delayed jiffies updates are made. The result of this is > that nand_wait_ready() actually waits less time than the NAND chip > would normally take to be ready, and then read_page() proceeds to > read out bad data from the chip. > > The situation described above may seem unlikely, but in fact it can be > reproduced almost every boot on the MIPS Creator Ci20. > > Therefore, this patch increases the timeout to 400ms. This should be > enough to cover cases where jiffies updates get delayed. In nand_wait() > the timeout was previously chosen based on whether erasing or > programming. This is changed to be 400ms unconditionally as well to > avoid similar problems there. nand_wait() is also slightly refactored > to be consistent with nand_wait{,_status}_ready(). These changes should > have no effect during normal operation. > > Debugging this was made more difficult by the misleading comment above > nand_wait_ready() stating "The timeout is caught later" - no timeout was > ever reported, leading me away from the real source of the problem. > Therefore, a pr_warn() is added when a timeout does occur so that it is > easier to pinpoint similar problems in future. > > Signed-off-by: Alex Smith > Cc: Zubair Lutfullah Kakakhel > Cc: David Woodhouse > Cc: Brian Norris > Cc: Niklas Cassel > Cc: linux-mtd@lists.infradead.org > Cc: linux-kernel@vger.kernel.org Reviewed-by: Niklas Cassel