From: Richard Weinberger <richard@nod.at>
To: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>,
Hamish Martin <Hamish.Martin@alliedtelesis.co.nz>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>,
Brian Norris <computersforpeace@gmail.com>,
"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
Michael Wang <Michael.Wang@alliedtelesis.co.nz>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Boris Brezillon <boris.brezillon@free-electrons.com>,
Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Subject: Re: mtd: pxa3xx_nand: issue with command time out
Date: Fri, 4 Mar 2016 00:18:36 +0100 [thread overview]
Message-ID: <56D8C64C.1000004@nod.at> (raw)
In-Reply-To: <CAAEAJfAd2qkcJby3AaKOaxL2LZC=PMNfu7dy5yMcZ1eHhOF7oA@mail.gmail.com>
Am 03.03.2016 um 20:24 schrieb Ezequiel Garcia:
> +Richard, Boris and Thomas
>
> On 1 March 2016 at 19:17, Hamish Martin
> <Hamish.Martin@alliedtelesis.co.nz> wrote:
>> Hi Robert, Ezequiel, and Brian,
>>
>> I have been working with Michael Wang on this issue and wanted to update
>> you on our results as we think there is a wider issue to resolve here.
>>
>> I believe Ezequiel's analysis of the IRQ sequence is correct. The
>> proposed patch from Brian is not needed, not does it resolve our issue.
>>
>> Further, we have debugged in more detail with kernel tracing and see
>> that the problem occurs because nothing calls 'schedule()' in the kernel
>> within the timeout period (200ms) to actually allow the IRQ bottom-half
>> thread ((pxa3xx_nand_irq_thread()) to run.
>>
>> With the trace enabled we were able to see long delays between the IRQ
>> thread being queued for run as a result of the return with
>> IRQ_WAKE_THREAD in pxa3xx_nand_irq(), and a call to schedule() actually
>> selecting the thread for execution. This delay was tied to whatever else
>> was happening in our system during the boot up when this issue occurs.
>> For instance, if a mutex got unlocked this would ultimately lead to a
>> call to schedule() and our thread (pxa3xx_nand_irq_thread()) would run.
>> If some file system ops happened schedule() would get called. These
>> unrelated activities were required so that schedule() got called and the
>> scheduler would let the bottom-half execute.
>> In our tracing we saw the issue occur whenever the call to schedule()
>> didn't occur before the timeout expired. In short, if we got lucky and
>> something completely unrelated happened that led to a call to
>> 'schedule()', then the issue didn't occur. If we got unlucky and nothing
>> called schedule(), the issue was observed.
>>
>> We have enabled CONFIG_PREEMPT in our system to resolve this. This has
>> the effect of leading to a triggering of 'schedule()' at the end of
>> processing interrupts. This means that we now see a nice tight execution
>> of pxa3xx_nand threaded IRQ such that we see all ops completing with
>> less that 20ms delay.
>>
>> I wanted to highlight this to you because I think it shows that the
>> change to a threaded IRQ is probably only safe if we can guarantee
>> scheduling of the bottom-half thread within the time out period (200ms).
>> I think this is not safe for systems with CONFIG_PREEMPT not configured.
>> Or perhaps there is a better way to resolve this. Is this a deficiency
>> in the Armada XP code?
>>
>> What do you think?
>>
>
> Thanks a lot for updating us. Richard, Boris and I have
> discussed about this on IRC.
>
> We came to the following conclusions:
>
> 1. There's no way to guarantee that the thread will be scheduled
> soon enough to always do the data transfer in time.
>
> Using CONFIG_PREEMPT could prevent a timeout in many (most?)
> cases, but it's not guaranteed to work.
>
> 2. The pxa3xx-nand driver is poorly designed, and it shouldn't
> do any data transfer in the IRQ (threaded or not) handler.
>
> Instead, this should be done in {read,write}_buf, i.e. in the
> context of the calling process that asks data to be read/write.
> This is potentially a big rework.
>
> 3. Richard proposed to get rid of the timeout
> (using wait_for_completion_interruptible instead?).
Getting rid of the timeout can work but definitely needs more
clarification.
Before commit 24542257a3b987025d4b998ec2d15e556c98ad3f it made sense
but now the thread can be delayed for unknown time.
Thanks,
//richard
WARNING: multiple messages have this Message-ID (diff)
From: richard@nod.at (Richard Weinberger)
To: linux-arm-kernel@lists.infradead.org
Subject: mtd: pxa3xx_nand: issue with command time out
Date: Fri, 4 Mar 2016 00:18:36 +0100 [thread overview]
Message-ID: <56D8C64C.1000004@nod.at> (raw)
In-Reply-To: <CAAEAJfAd2qkcJby3AaKOaxL2LZC=PMNfu7dy5yMcZ1eHhOF7oA@mail.gmail.com>
Am 03.03.2016 um 20:24 schrieb Ezequiel Garcia:
> +Richard, Boris and Thomas
>
> On 1 March 2016 at 19:17, Hamish Martin
> <Hamish.Martin@alliedtelesis.co.nz> wrote:
>> Hi Robert, Ezequiel, and Brian,
>>
>> I have been working with Michael Wang on this issue and wanted to update
>> you on our results as we think there is a wider issue to resolve here.
>>
>> I believe Ezequiel's analysis of the IRQ sequence is correct. The
>> proposed patch from Brian is not needed, not does it resolve our issue.
>>
>> Further, we have debugged in more detail with kernel tracing and see
>> that the problem occurs because nothing calls 'schedule()' in the kernel
>> within the timeout period (200ms) to actually allow the IRQ bottom-half
>> thread ((pxa3xx_nand_irq_thread()) to run.
>>
>> With the trace enabled we were able to see long delays between the IRQ
>> thread being queued for run as a result of the return with
>> IRQ_WAKE_THREAD in pxa3xx_nand_irq(), and a call to schedule() actually
>> selecting the thread for execution. This delay was tied to whatever else
>> was happening in our system during the boot up when this issue occurs.
>> For instance, if a mutex got unlocked this would ultimately lead to a
>> call to schedule() and our thread (pxa3xx_nand_irq_thread()) would run.
>> If some file system ops happened schedule() would get called. These
>> unrelated activities were required so that schedule() got called and the
>> scheduler would let the bottom-half execute.
>> In our tracing we saw the issue occur whenever the call to schedule()
>> didn't occur before the timeout expired. In short, if we got lucky and
>> something completely unrelated happened that led to a call to
>> 'schedule()', then the issue didn't occur. If we got unlucky and nothing
>> called schedule(), the issue was observed.
>>
>> We have enabled CONFIG_PREEMPT in our system to resolve this. This has
>> the effect of leading to a triggering of 'schedule()' at the end of
>> processing interrupts. This means that we now see a nice tight execution
>> of pxa3xx_nand threaded IRQ such that we see all ops completing with
>> less that 20ms delay.
>>
>> I wanted to highlight this to you because I think it shows that the
>> change to a threaded IRQ is probably only safe if we can guarantee
>> scheduling of the bottom-half thread within the time out period (200ms).
>> I think this is not safe for systems with CONFIG_PREEMPT not configured.
>> Or perhaps there is a better way to resolve this. Is this a deficiency
>> in the Armada XP code?
>>
>> What do you think?
>>
>
> Thanks a lot for updating us. Richard, Boris and I have
> discussed about this on IRC.
>
> We came to the following conclusions:
>
> 1. There's no way to guarantee that the thread will be scheduled
> soon enough to always do the data transfer in time.
>
> Using CONFIG_PREEMPT could prevent a timeout in many (most?)
> cases, but it's not guaranteed to work.
>
> 2. The pxa3xx-nand driver is poorly designed, and it shouldn't
> do any data transfer in the IRQ (threaded or not) handler.
>
> Instead, this should be done in {read,write}_buf, i.e. in the
> context of the calling process that asks data to be read/write.
> This is potentially a big rework.
>
> 3. Richard proposed to get rid of the timeout
> (using wait_for_completion_interruptible instead?).
Getting rid of the timeout can work but definitely needs more
clarification.
Before commit 24542257a3b987025d4b998ec2d15e556c98ad3f it made sense
but now the thread can be delayed for unknown time.
Thanks,
//richard
next prev parent reply other threads:[~2016-03-03 23:19 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-13 4:13 mtd: pxa3xx_nand: issue with command time out Michael Wang
2016-01-13 4:13 ` Michael Wang
2016-01-13 7:47 ` Robert Jarzmik
2016-01-13 7:47 ` Robert Jarzmik
2016-01-14 22:57 ` Michael Wang
2016-01-16 23:53 ` Robert Jarzmik
2016-01-16 23:53 ` Robert Jarzmik
2016-01-18 2:15 ` Michael Wang
2016-01-18 21:13 ` Robert Jarzmik
2016-01-18 21:13 ` Robert Jarzmik
2016-01-19 22:33 ` Michael Wang
2016-01-19 22:33 ` Michael Wang
2016-01-25 18:30 ` Brian Norris
2016-01-25 18:30 ` Brian Norris
2016-01-25 20:48 ` Robert Jarzmik
2016-01-25 20:48 ` Robert Jarzmik
2016-01-26 13:03 ` Ezequiel Garcia
2016-01-26 13:03 ` Ezequiel Garcia
2016-03-01 22:17 ` Hamish Martin
2016-03-01 22:17 ` Hamish Martin
2016-03-03 19:24 ` Ezequiel Garcia
2016-03-03 19:24 ` Ezequiel Garcia
2016-03-03 20:16 ` Hamish Martin
2016-03-03 20:16 ` Hamish Martin
2016-03-04 0:03 ` Ezequiel Garcia
2016-03-04 0:03 ` Ezequiel Garcia
2016-03-07 20:04 ` Hamish Martin
2016-03-07 20:04 ` Hamish Martin
2016-03-03 23:18 ` Richard Weinberger [this message]
2016-03-03 23:18 ` Richard Weinberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D8C64C.1000004@nod.at \
--to=richard@nod.at \
--cc=Hamish.Martin@alliedtelesis.co.nz \
--cc=Michael.Wang@alliedtelesis.co.nz \
--cc=boris.brezillon@free-electrons.com \
--cc=computersforpeace@gmail.com \
--cc=ezequiel@vanguardiasur.com.ar \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mtd@lists.infradead.org \
--cc=robert.jarzmik@free.fr \
--cc=thomas.petazzoni@free-electrons.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.