All of lore.kernel.org
 help / color / mirror / Atom feed
From: Niklas Cassel <niklas.cassel@axis.com>
To: Alex Smith <alex@alex-smith.me.uk>,
	Brian Norris <computersforpeace@gmail.com>
Cc: Alex Smith <alex.smith@imgtec.com>,
	"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 1/4] mtd: nand: increase ready wait timeout and report timeouts
Date: Tue, 15 Sep 2015 11:53:02 +0200	[thread overview]
Message-ID: <55F7EA7E.7020407@axis.com> (raw)
In-Reply-To: <CAOFt0_DK+KDe4g2QP-+v0nZiSWU0oaLW7=B+WfzNHq-GHwg=BA@mail.gmail.com>

On 09/15/2015 11:38 AM, Alex Smith wrote:
> On 10 September 2015 at 00:49, Brian Norris <computersforpeace@gmail.com> wrote:
>> + Niklas
>>
>> On Tue, Sep 08, 2015 at 10:10:50AM +0100, Alex Smith wrote:
>>> If nand_wait_ready() times out, this is silently ignored, and its
>>> caller will then proceed to read from/write to the chip before it is
>>> ready. This can potentially result in corruption with no indication as
>>> to why.
>>>
>>> While a 20ms timeout seems like it should be plenty enough, certain
>>> behaviour can cause it to timeout much earlier than expected. The
>>> situation which prompted this change was that CPU 0, which is
>>> responsible for updating jiffies, was holding interrupts disabled
>>> for a fairly long time while writing to the console during a printk,
>>> causing several jiffies updates to be delayed. If CPU 1 happens to
>>> enter the timeout loop in nand_wait_ready() just before CPU 0 re-
>>> enables interrupts and updates jiffies, CPU 1 will immediately time
>>> out when the delayed jiffies updates are made. The result of this is
>>> that nand_wait_ready() actually waits less time than the NAND chip
>>> would normally take to be ready, and then read_page() proceeds to
>>> read out bad data from the chip.
>>>
>>> The situation described above may seem unlikely, but in fact it can be
>>> reproduced almost every boot on the MIPS Creator Ci20.
>>>
>>> Debugging this was made more difficult by the misleading comment above
>>> nand_wait_ready() stating "The timeout is caught later" - no timeout
>>> was ever reported, leading me away from the real source of the problem.
>>>
>>> Therefore, this patch increases the timeout to 200ms. This should be
>>> enough to cover cases where jiffies updates get delayed. Additionally,
>>> add a pr_warn() when a timeout does occur so that it is easier to
>>> pinpoint any problems in future caused by the chip not becoming ready.
>>
>> Did you examine other solutions? I've seen patches for hrtimer support
>> previously:
>>
>> http://patchwork.ozlabs.org/patch/160333/
>> http://patchwork.ozlabs.org/patch/431066/
>>
>> A few things have been cleaned up since then, so some of the initial
>> objections to the hrtimer patch don't make sense anymore, I believe.
>>
>> Anyway, I think just increasing the timeout looks OK to me (as long as
>> we never have a 200ms jiffies jump... can this happen??), so hrtimer may
>> be over-engineering. I just want to make sure both options have been
>> considered before officially choosing one over the other.
>>
>> Brian
> 
> Hi Brian, Niklas,
> 
> I'm no expert in the matter but I feel like using a hrtimer here would
> indeed be over-engineering and could potentially add overhead to the
> "normal" case where the chip becomes ready well before the timeout
> expires? Just increasing the timeout seems like a simpler solution
> that solves the problem. I think that a jiffies jump of a few hundred
> milliseconds is extremely unlikely and would indicate something else
> that needs to be fixed (i.e. in the SMP case I had it would mean that
> the CPU which is supposed to update jiffies has interrupts disabled
> for hundreds of milliseconds).
> 
> Niklas: If I update the patch based on your suggestions would you be
> happy to go with that rather than your hrtimer patch?

Yes.

I've tested the patch inlined in the end of
http://marc.info/?l=linux-kernel&m=144197105326420
and it works just as good as the hrtimer patch that I sent out a couple of months ago.

(For our use-case where irqs were sometimes disabled for more than 20 ms.)

  reply	other threads:[~2015-09-15  9:53 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-08  9:10 [PATCH v5 0/4] mtd: nand: jz4780: Add NAND and BCH drivers Alex Smith
2015-09-08  9:10 ` Alex Smith
2015-09-08  9:10 ` Alex Smith
2015-09-08  9:10 ` [PATCH v5 1/4] mtd: nand: increase ready wait timeout and report timeouts Alex Smith
2015-09-09 23:49   ` Brian Norris
2015-09-11 11:30     ` Niklas Cassel
2015-09-15  9:38     ` Alex Smith
2015-09-15  9:53       ` Niklas Cassel [this message]
2015-09-08  9:10 ` [PATCH v5 2/4] dt-bindings: binding for jz4780-{nand,bch} Alex Smith
2015-09-08  9:10   ` Alex Smith
2015-09-08  9:10 ` [PATCH v5 3/4] mtd: nand: jz4780: driver for NAND devices on JZ4780 SoCs Alex Smith
2015-09-09 14:24   ` Ezequiel Garcia
2015-09-14 18:38     ` Ezequiel Garcia
2015-09-21 22:13       ` Brian Norris
2015-09-15  9:40     ` Alex Smith
2015-09-21 22:08   ` Brian Norris
2015-09-23  6:30     ` Boris Brezillon
2015-09-08  9:10 ` [PATCH v5 4/4] MIPS: dts: jz4780/ci20: Add NEMC, BCH and NAND device tree nodes Alex Smith
2015-09-08  9:10   ` Alex Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F7EA7E.7020407@axis.com \
    --to=niklas.cassel@axis.com \
    --cc=Zubair.Kakakhel@imgtec.com \
    --cc=alex.smith@imgtec.com \
    --cc=alex@alex-smith.me.uk \
    --cc=computersforpeace@gmail.com \
    --cc=dwmw2@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.