From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from anubis.se.axis.com ([195.60.68.12]) by casper.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1SWOCV-0004jN-Vr for linux-mtd@lists.infradead.org; Mon, 21 May 2012 08:42:48 +0000 Received: from localhost (localhost [127.0.0.1]) by anubis.se.axis.com (Postfix) with ESMTP id 0BED119DE0 for ; Mon, 21 May 2012 10:42:41 +0200 (CEST) Received: from anubis.se.axis.com ([127.0.0.1]) by localhost (anubis.se.axis.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id buhE9TWS2iXP for ; Mon, 21 May 2012 10:42:40 +0200 (CEST) Received: from thoth.se.axis.com (thoth.se.axis.com [10.0.2.173]) by anubis.se.axis.com (Postfix) with ESMTP id 2EB1219D82 for ; Mon, 21 May 2012 10:42:39 +0200 (CEST) Received: from xmail2.se.axis.com (xmail2.se.axis.com [10.0.5.74]) by thoth.se.axis.com (Postfix) with ESMTP id 2CD0B34E5D for ; Mon, 21 May 2012 10:42:39 +0200 (CEST) From: Johan Gunnarsson To: Subject: [PATCH 0/2] use hrtimer in nand_wait Date: Mon, 21 May 2012 10:42:36 +0200 Message-ID: <1337589758-8775-1-git-send-email-johan.gunnarsson@axis.com> MIME-Version: 1.0 Content-Type: text/plain Cc: jespern@axis.com List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hello all, I've been researching a bug where blocks have gone bad when combining NAND writes with long periods of disabled interrupts. Such as lots of serial port writes (think printk) in interrupt context. I've narrowed it down to the nand_wait routine and its dependency on a reliable jiffies counter. Sadly, jiffies is not reliable when handling of timer interrupts are delayed or even completely discarded. If interrupts are disabled for, say, 3 timer periods, jiffies will stop counting during this time and have a very fast increment by 3 when interrupts are later enabled. This combined with unfortunate timing can cause the timeout loop think a 20ms timeout is happening when just <0.1ms has passed in wall clock time. To illustrate the jiffies/interrupt-relationship: Interrupts: | | | | | | | Jiffies: | | | ||| | | | This obviously only happen on multi-core CPUs, where the write and interrupts are executed by different cores simultaneously. Switching to hrtimer-based timeout solves this problem for me. I found a second (less serious) issue which included in the first patch. Johan Johan Gunnarsson (2): mtd: nand: panic_nand_wait expects timeout in ms. mtd: nand: use hrtimer to measure timeout in nand_wait{_ready,} drivers/mtd/nand/nand_base.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 files changed, 34 insertions(+), 8 deletions(-) -- 1.7.2.5