[RFC] ARM: edma: unconditionally ack the error interrupt

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [RFC] ARM: edma: unconditionally ack the error interrupt
@ 2014-09-10 19:39 Sebastian Andrzej Siewior
  2014-09-18  9:42 ` Peter Ujfalusi
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-09-10 19:39 UTC (permalink / raw)
  To: linux-arm-kernel

With 8250-dma, 8250-omap and am335x I observe the following:

- start a RX transfer which will finish once the FIFO has enough data
- The TX side starts a large TX transfer, say 1244 bytes. It takes approx
  102ms for the transfer to complete
- cancel the RX transfer & start the RX transfer a few times
- the TX transfer completes. dma_irq_handler() notices this and
  schedules the completion callback
- dma_ccerr_handler() is invoked. It returns IRQ_NONE because all four
  checked registers return 0.
- the last irq handler is repeated a few times until the irq core shuts it
  down.

I see the above describes pattern also without dma_ccerr_handler()
beeing invoked. But if it is invoked, it always _after_ the dma interrupt
handler for the TX handler was handled. If I disable TX-DMA for the UART
then I don't see this dma_ccerr_handler() at all.

Testing longer I see two addition scenarios of dma_ccerr_handler():
- EDMA_EMR0 reports 0x04000000 (the channel used by TX-UART).
  edma_callback() reports "looks like slot is null". Looks harmless.
- EDMA_EMR0 reports the same values but later in the loop where a match
  search again EDMA_EMR0 reports 0 and so thing is done.

Since it looks like the EDMA_EMR0 is loosing its content before the
dma_ccerr_handler() is invoked, I suggest to unconditionally ack the
interrupt so the irq core does not shut it down.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/arm/common/edma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 88099175fc56..b31f3b7b3851 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -432,8 +432,10 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
 	if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
 	    (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
 	    (edma_read(ctlr, EDMA_QEMR) == 0) &&
-	    (edma_read(ctlr, EDMA_CCERR) == 0))
+	    (edma_read(ctlr, EDMA_CCERR) == 0)) {
+		edma_write(ctlr, EDMA_EEVAL, 1);
 		return IRQ_NONE;
+	}
 
 	while (1) {
 		int j = -1;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC] ARM: edma: unconditionally ack the error interrupt
  2014-09-10 19:39 [RFC] ARM: edma: unconditionally ack the error interrupt Sebastian Andrzej Siewior
@ 2014-09-18  9:42 ` Peter Ujfalusi
  2014-09-18 16:12   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Ujfalusi @ 2014-09-18  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/10/2014 10:39 PM, Sebastian Andrzej Siewior wrote:
> With 8250-dma, 8250-omap and am335x I observe the following:
> 
> - start a RX transfer which will finish once the FIFO has enough data
> - The TX side starts a large TX transfer, say 1244 bytes. It takes approx
>   102ms for the transfer to complete
> - cancel the RX transfer & start the RX transfer a few times
> - the TX transfer completes. dma_irq_handler() notices this and
>   schedules the completion callback
> - dma_ccerr_handler() is invoked. It returns IRQ_NONE because all four
>   checked registers return 0.
> - the last irq handler is repeated a few times until the irq core shuts it
>   down.
> 
> I see the above describes pattern also without dma_ccerr_handler()
> beeing invoked. But if it is invoked, it always _after_ the dma interrupt
> handler for the TX handler was handled. If I disable TX-DMA for the UART
> then I don't see this dma_ccerr_handler() at all.
> 
> Testing longer I see two addition scenarios of dma_ccerr_handler():
> - EDMA_EMR0 reports 0x04000000 (the channel used by TX-UART).
>   edma_callback() reports "looks like slot is null". Looks harmless.
> - EDMA_EMR0 reports the same values but later in the loop where a match
>   search again EDMA_EMR0 reports 0 and so thing is done.
> 
> Since it looks like the EDMA_EMR0 is loosing its content before the
> dma_ccerr_handler() is invoked, I suggest to unconditionally ack the
> interrupt so the irq core does not shut it down.

It is reasonable to ack the error interrupt unconditionally.
My hunch on what could be causing this that we might have unhandled dma event
and another comes. This will flag the EDMA_EMR register. Any change in this
register will assert error interrupt which can only be cleared by writing to
EDMA_EMRC register.
The EDMA_EMRC register bits also cleared on edma_start(), edma_stop() and
edma_clean_channel() apart from the error interrupt handler.
So it is possible that we have missed event on one of the channels but a stop
might clear the event so in the interrupt hander we do not see this.
I think it would be good to understand what is going on the backround...
Can you print out the EDMA_EMCR just before we clear it in the places I have
mentioned? We might get better understanding on which stage we clear it and
probably we can understand how to fix this properly so we are not going to
have missed events on channels.

Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>

> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/arm/common/edma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 88099175fc56..b31f3b7b3851 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -432,8 +432,10 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
>  	if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
>  	    (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
>  	    (edma_read(ctlr, EDMA_QEMR) == 0) &&
> -	    (edma_read(ctlr, EDMA_CCERR) == 0))
> +	    (edma_read(ctlr, EDMA_CCERR) == 0)) {
> +		edma_write(ctlr, EDMA_EEVAL, 1);
>  		return IRQ_NONE;
> +	}
>  
>  	while (1) {
>  		int j = -1;
> 


-- 
P?ter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC] ARM: edma: unconditionally ack the error interrupt
  2014-09-18  9:42 ` Peter Ujfalusi
@ 2014-09-18 16:12   ` Sebastian Andrzej Siewior
  2014-09-19 21:29     ` Peter Ujfalusi
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-09-18 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

* Peter Ujfalusi | 2014-09-18 12:42:24 [+0300]:

>My hunch on what could be causing this that we might have unhandled dma event
>and another comes. This will flag the EDMA_EMR register. Any change in this
>register will assert error interrupt which can only be cleared by writing to
>EDMA_EMRC register.
>The EDMA_EMRC register bits also cleared on edma_start(), edma_stop() and
>edma_clean_channel() apart from the error interrupt handler.
>So it is possible that we have missed event on one of the channels but a stop
>might clear the event so in the interrupt hander we do not see this.
>I think it would be good to understand what is going on the backround...
>Can you print out the EDMA_EMCR just before we clear it in the places I have
>mentioned? We might get better understanding on which stage we clear it and
>probably we can understand how to fix this properly so we are not going to
>have missed events on channels.

Okay. For the protocol I applied this patch:

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 160460ae3a49..16598625a4d1 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -422,20 +422,24 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
 	int i;
 	int ctlr;
 	unsigned int cnt = 0;
+	u32 emr0;
 
 	ctlr = irq2ctlr(irq);
 	if (ctlr < 0)
 		return IRQ_NONE;
 
 	dev_dbg(data, "dma_ccerr_handler\n");
+	emr0 = edma_read_array(ctlr, EDMA_EMR, 0);
 
-	if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
+	if ((emr0 == 0) &&
 	    (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
 	    (edma_read(ctlr, EDMA_QEMR) == 0) &&
 	    (edma_read(ctlr, EDMA_CCERR) == 0)) {
 		edma_write(ctlr, EDMA_EEVAL, 1);
+		trace_printk("Unhandled\n");
 		return IRQ_NONE;
 	}
+	trace_printk("emr0: %x\n", emr0);
 
 	while (1) {
 		int j = -1;
@@ -1310,6 +1314,9 @@ int edma_start(unsigned channel)
 		pr_debug("EDMA: ER%d %08x\n", j,
 			edma_shadow0_read_array(ctlr, SH_ER, j));
 		/* Clear any pending event or error */
+		trace_printk("j%d mask%x EDMA_EMCR: %x\n",
+			     j, mask,
+			     edma_read_array(ctlr, EDMA_EMCR, j));
 		edma_write_array(ctlr, EDMA_ECR, j, mask);
 		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 		/* Clear any SER */
@@ -1347,6 +1354,9 @@ void edma_stop(unsigned channel)
 		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
+		trace_printk("j%d mask%x EDMA_EMCR: %x\n",
+			     j, mask,
+			     edma_read_array(ctlr, EDMA_EMCR, j));
 		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 
 		pr_debug("EDMA: EER%d %08x\n", j,
@@ -1387,6 +1397,9 @@ void edma_clean_channel(unsigned channel)
 				edma_read_array(ctlr, EDMA_EMR, j));
 		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
 		/* Clear the corresponding EMR bits */
+		trace_printk("j%d mask%x EDMA_EMCR: %x\n",
+			     j, mask,
+			     edma_read_array(ctlr, EDMA_EMCR, j));
 		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 		/* Clear any SER */
 		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);

--

and the result is something like this:

           <idle>-0     [000] dnh.   303.356403: edma_start: j0 mask8000000 EDMA_EMCR: 0
           <idle>-0     [000] d.h.   303.396721: edma_stop: j0 mask8000000 EDMA_EMCR: 0
           <idle>-0     [000] dnh.   303.557103: edma_start: j0 mask8000000 EDMA_EMCR: 0
           <idle>-0     [000] dnh.   303.557129: edma_stop: j0 mask4000000 EDMA_EMCR: 0
           <idle>-0     [000] dnH.   303.557142: dma_ccerr_handler: Unhandled
             less-2612  [000] d...   303.557237: edma_start: j0 mask4000000 EDMA_EMCR: 0
             less-2612  [000] d.h.   303.562491: edma_stop: j0 mask4000000 EDMA_EMCR: 0
             less-2612  [000] d...   303.564475: edma_start: j0 mask4000000 EDMA_EMCR: 0

The full trace is at [0]. I haven't found a single entry where EDMA_EMCR
was != 0 at those spots. *If* there is a edma_stop() before the
interrupt then the interrupt remains unhandled. If there is a
edma_start() with mask 8000000 then we have dma_ccerr_handler() with a
mask of 4000000.

Fun fact: If I remove the write access to EDMA_EMCR register (the write
access after the read out) then I haven't seen [1] a single error interrupt
beeing "unhandled" out of 9. The former has three out of eight.

[0] https://breakpoint.cc/edma_trace.txt.xz
[1] https://breakpoint.cc/edma_trace_nowrite.txt.xz

Sebastian

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC] ARM: edma: unconditionally ack the error interrupt
  2014-09-18 16:12   ` Sebastian Andrzej Siewior
@ 2014-09-19 21:29     ` Peter Ujfalusi
  2014-09-25 16:56       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Ujfalusi @ 2014-09-19 21:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/18/2014 07:12 PM, Sebastian Andrzej Siewior wrote:
> * Peter Ujfalusi | 2014-09-18 12:42:24 [+0300]:
> 
>> My hunch on what could be causing this that we might have unhandled dma event
>> and another comes. This will flag the EDMA_EMR register. Any change in this
>> register will assert error interrupt which can only be cleared by writing to
>> EDMA_EMRC register.
>> The EDMA_EMRC register bits also cleared on edma_start(), edma_stop() and
>> edma_clean_channel() apart from the error interrupt handler.
>> So it is possible that we have missed event on one of the channels but a stop
>> might clear the event so in the interrupt hander we do not see this.
>> I think it would be good to understand what is going on the backround...
>> Can you print out the EDMA_EMCR just before we clear it in the places I have
>> mentioned? We might get better understanding on which stage we clear it and
>> probably we can understand how to fix this properly so we are not going to
>> have missed events on channels.
> 
> Okay. For the protocol I applied this patch:
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 160460ae3a49..16598625a4d1 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -422,20 +422,24 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
>  	int i;
>  	int ctlr;
>  	unsigned int cnt = 0;
> +	u32 emr0;
>  
>  	ctlr = irq2ctlr(irq);
>  	if (ctlr < 0)
>  		return IRQ_NONE;
>  
>  	dev_dbg(data, "dma_ccerr_handler\n");
> +	emr0 = edma_read_array(ctlr, EDMA_EMR, 0);
>  
> -	if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
> +	if ((emr0 == 0) &&
>  	    (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
>  	    (edma_read(ctlr, EDMA_QEMR) == 0) &&
>  	    (edma_read(ctlr, EDMA_CCERR) == 0)) {
>  		edma_write(ctlr, EDMA_EEVAL, 1);
> +		trace_printk("Unhandled\n");
>  		return IRQ_NONE;
>  	}
> +	trace_printk("emr0: %x\n", emr0);
>  
>  	while (1) {
>  		int j = -1;
> @@ -1310,6 +1314,9 @@ int edma_start(unsigned channel)
>  		pr_debug("EDMA: ER%d %08x\n", j,
>  			edma_shadow0_read_array(ctlr, SH_ER, j));
>  		/* Clear any pending event or error */
> +		trace_printk("j%d mask%x EDMA_EMCR: %x\n",
> +			     j, mask,
> +			     edma_read_array(ctlr, EDMA_EMCR, j));
>  		edma_write_array(ctlr, EDMA_ECR, j, mask);
>  		edma_write_array(ctlr, EDMA_EMCR, j, mask);
>  		/* Clear any SER */
> @@ -1347,6 +1354,9 @@ void edma_stop(unsigned channel)
>  		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
> +		trace_printk("j%d mask%x EDMA_EMCR: %x\n",
> +			     j, mask,
> +			     edma_read_array(ctlr, EDMA_EMCR, j));
>  		edma_write_array(ctlr, EDMA_EMCR, j, mask);
>  
>  		pr_debug("EDMA: EER%d %08x\n", j,
> @@ -1387,6 +1397,9 @@ void edma_clean_channel(unsigned channel)
>  				edma_read_array(ctlr, EDMA_EMR, j));
>  		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
>  		/* Clear the corresponding EMR bits */
> +		trace_printk("j%d mask%x EDMA_EMCR: %x\n",
> +			     j, mask,
> +			     edma_read_array(ctlr, EDMA_EMCR, j));
>  		edma_write_array(ctlr, EDMA_EMCR, j, mask);
>  		/* Clear any SER */
>  		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
> 
> --
> 
> and the result is something like this:
> 
>            <idle>-0     [000] dnh.   303.356403: edma_start: j0 mask8000000 EDMA_EMCR: 0
>            <idle>-0     [000] d.h.   303.396721: edma_stop: j0 mask8000000 EDMA_EMCR: 0
>            <idle>-0     [000] dnh.   303.557103: edma_start: j0 mask8000000 EDMA_EMCR: 0
>            <idle>-0     [000] dnh.   303.557129: edma_stop: j0 mask4000000 EDMA_EMCR: 0
>            <idle>-0     [000] dnH.   303.557142: dma_ccerr_handler: Unhandled
>              less-2612  [000] d...   303.557237: edma_start: j0 mask4000000 EDMA_EMCR: 0
>              less-2612  [000] d.h.   303.562491: edma_stop: j0 mask4000000 EDMA_EMCR: 0
>              less-2612  [000] d...   303.564475: edma_start: j0 mask4000000 EDMA_EMCR: 0
> 
> The full trace is at [0]. I haven't found a single entry where EDMA_EMCR
> was != 0 at those spots. *If* there is a edma_stop() before the
> interrupt then the interrupt remains unhandled. If there is a
> edma_start() with mask 8000000 then we have dma_ccerr_handler() with a
> mask of 4000000.

mask 8000000 means URXEVT0 (UART0 rx), 4000000 is UTXEVT0 (UART0 tx) events.
But it is clear that my theory was not even close to what's going on.
Do you have some tool which can be used to reproduce this issue?

> 
> Fun fact: If I remove the write access to EDMA_EMCR register (the write
> access after the read out) then I haven't seen [1] a single error interrupt
> beeing "unhandled" out of 9. The former has three out of eight.

Hrm, this is interesting. Am I interpret it right:
if you clear the event missed register's bit for the event, you will receive
the interrupt, but there if you read the EMR bits, they are all 0.
if you do not clear the bit in the event missed register (even if it is not
set) you will not receive the error interrupt. Right?

I think this can be due to the fact how edma (if I read the TRM right) works:
the error irq will be asserted if there is a transition from 0 to 1 in one of
the error registers. If you had error and you did not cleared the error bit,
the next error will not going to cause another transition so no interrupt will
be triggered. Having said that, there should be at least one error interrupt
coming since at some point we should have had 0 -> 1 transition...

Can you print out also the EDMA_EMR at places you were printing EDMA_EMCR?

> 
> [0] https://breakpoint.cc/edma_trace.txt.xz
> [1] https://breakpoint.cc/edma_trace_nowrite.txt.xz
> 
> Sebastian
> 


thanks,
P?ter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC] ARM: edma: unconditionally ack the error interrupt
  2014-09-19 21:29     ` Peter Ujfalusi
@ 2014-09-25 16:56       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2014-09-25 16:56 UTC (permalink / raw)
  To: linux-arm-kernel

* Peter Ujfalusi | 2014-09-20 00:29:01 [+0300]:

>mask 8000000 means URXEVT0 (UART0 rx), 4000000 is UTXEVT0 (UART0 tx) events.
>But it is clear that my theory was not even close to what's going on.
>Do you have some tool which can be used to reproduce this issue?

The latest uart patch set is at
    git://git.breakpoint.cc/bigeasy/linux.git uart_v10_pre3

and the latest one I tested and saw the issue was uart_v9.
The test case is, boot, login via serial, do "less file" and press space
for a while to get less to scroll the file.
The error counter increment after a while.

>> Fun fact: If I remove the write access to EDMA_EMCR register (the write
>> access after the read out) then I haven't seen [1] a single error interrupt
>> beeing "unhandled" out of 9. The former has three out of eight.
>
>Hrm, this is interesting. Am I interpret it right:
>if you clear the event missed register's bit for the event, you will receive
>the interrupt, but there if you read the EMR bits, they are all 0.
usually yes. Sometimes the EMR bits are set.

>if you do not clear the bit in the event missed register (even if it is not
>set) you will not receive the error interrupt. Right?
no, the error interrupt comes but the EMR is always non-zero.

>I think this can be due to the fact how edma (if I read the TRM right) works:
>the error irq will be asserted if there is a transition from 0 to 1 in one of
>the error registers. If you had error and you did not cleared the error bit,
>the next error will not going to cause another transition so no interrupt will
>be triggered. Having said that, there should be at least one error interrupt
>coming since at some point we should have had 0 -> 1 transition...
>
>Can you print out also the EDMA_EMR at places you were printing EDMA_EMCR?

So I had a "bug" where I changed the baud + DMA bits in the UART while
there was a DMA-TX in progress. Now I dropped this in v10_pre3 and it
seems not to trigger anymore.

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-09-25 16:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-10 19:39 [RFC] ARM: edma: unconditionally ack the error interrupt Sebastian Andrzej Siewior
2014-09-18  9:42 ` Peter Ujfalusi
2014-09-18 16:12   ` Sebastian Andrzej Siewior
2014-09-19 21:29     ` Peter Ujfalusi
2014-09-25 16:56       ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).