Linux-Aspeed Archive on lore.kernel.org
 help / color / mirror / Atom feed
* AST2600 i2c irq issue
@ 2019-10-17  6:14 Joel Stanley
  2019-10-17 20:31 ` Eddie James
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Stanley @ 2019-10-17  6:14 UTC (permalink / raw)
  To: linux-aspeed

I have been doing bringup of the ast2600. It contains i2c buses that
are register compatible with the ast2500, and I am running them
without buffer or DMA mode. This is with v5.3.6, with no patches other
than adding the compatible string:

--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -938,6 +938,10 @@ static const struct of_device_id
aspeed_i2c_bus_of_table[] = {
                .compatible = "aspeed,ast2500-i2c-bus",
                .data = aspeed_i2c_25xx_get_clk_reg_val,
        },
+       {
+               .compatible = "aspeed,ast2600-i2c-bus",
+               .data = aspeed_i2c_25xx_get_clk_reg_val,
+       },
        { },
 };


I see this behavior:

[   20.981417] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to RX
[   20.988259] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
expected 0x00000014, but was 0x00000010
[   22.451265] aspeed-i2c-bus 1e78a200.i2c-bus: master failed to STOP.
irq_status:0x0
[   22.459909] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
expected 0x00000010, but was 0x00000000
[   22.470604] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
expected 0x00000011, but was 0x00000000
[   29.156951] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to STOP.
irq_status:0x0
[   29.165601] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
expected 0x00000010, but was 0x00000000

It happens on boot, and can also be triggered by placing load on the
system. In particular, if I copy a large amount of data to the flash.

The IRQs are being served on one of the CPUs:

 29:          0          0     GIC-0 142 Level     1e78a080.i2c-bus
 30:          0          0     GIC-0 143 Level     1e78a100.i2c-bus
 31:          0          0     GIC-0 144 Level     1e78a180.i2c-bus
 32:     302596          0     GIC-0 145 Level     1e78a200.i2c-bus
 33:     197340          0     GIC-0 146 Level     1e78a280.i2c-bus
 34:     196900          0     GIC-0 147 Level     1e78a300.i2c-bus
 35:          0          0     GIC-0 149 Level     1e78a400.i2c-bus
 36:       2199          0     GIC-0 151 Level     1e78a500.i2c-bus
 37:          0          0     GIC-0 152 Level     1e78a580.i2c-bus
 38:       3407          0     GIC-0 153 Level     1e78a600.i2c-bus
 39:          0          0     GIC-0 154 Level     1e78a680.i2c-bus
 40:          0          0     GIC-0 155 Level     1e78a700.i2c-bus

Following a hunch, I booted the system with SMP disabled (it's a dual
core part). The issue did not reproduce.

This suggests the driver is lacking locking. I am yet to do any
detailed debugging.

Have you seen any behavior like this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-17  6:14 AST2600 i2c irq issue Joel Stanley
@ 2019-10-17 20:31 ` Eddie James
  2019-10-17 22:29   ` Jae Hyun Yoo
  0 siblings, 1 reply; 10+ messages in thread
From: Eddie James @ 2019-10-17 20:31 UTC (permalink / raw)
  To: linux-aspeed


On 10/17/19 1:14 AM, Joel Stanley wrote:
> I have been doing bringup of the ast2600. It contains i2c buses that
> are register compatible with the ast2500, and I am running them
> without buffer or DMA mode. This is with v5.3.6, with no patches other
> than adding the compatible string:
>
> --- a/drivers/i2c/busses/i2c-aspeed.c
> +++ b/drivers/i2c/busses/i2c-aspeed.c
> @@ -938,6 +938,10 @@ static const struct of_device_id
> aspeed_i2c_bus_of_table[] = {
>                  .compatible = "aspeed,ast2500-i2c-bus",
>                  .data = aspeed_i2c_25xx_get_clk_reg_val,
>          },
> +       {
> +               .compatible = "aspeed,ast2600-i2c-bus",
> +               .data = aspeed_i2c_25xx_get_clk_reg_val,
> +       },
>          { },
>   };
>
>
> I see this behavior:
>
> [   20.981417] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to RX
> [   20.988259] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
> expected 0x00000014, but was 0x00000010
> [   22.451265] aspeed-i2c-bus 1e78a200.i2c-bus: master failed to STOP.
> irq_status:0x0
> [   22.459909] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
> expected 0x00000010, but was 0x00000000
> [   22.470604] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
> expected 0x00000011, but was 0x00000000
> [   29.156951] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to STOP.
> irq_status:0x0
> [   29.165601] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
> expected 0x00000010, but was 0x00000000
>
> It happens on boot, and can also be triggered by placing load on the
> system. In particular, if I copy a large amount of data to the flash.
>
> The IRQs are being served on one of the CPUs:
>
>   29:          0          0     GIC-0 142 Level     1e78a080.i2c-bus
>   30:          0          0     GIC-0 143 Level     1e78a100.i2c-bus
>   31:          0          0     GIC-0 144 Level     1e78a180.i2c-bus
>   32:     302596          0     GIC-0 145 Level     1e78a200.i2c-bus
>   33:     197340          0     GIC-0 146 Level     1e78a280.i2c-bus
>   34:     196900          0     GIC-0 147 Level     1e78a300.i2c-bus
>   35:          0          0     GIC-0 149 Level     1e78a400.i2c-bus
>   36:       2199          0     GIC-0 151 Level     1e78a500.i2c-bus
>   37:          0          0     GIC-0 152 Level     1e78a580.i2c-bus
>   38:       3407          0     GIC-0 153 Level     1e78a600.i2c-bus
>   39:          0          0     GIC-0 154 Level     1e78a680.i2c-bus
>   40:          0          0     GIC-0 155 Level     1e78a700.i2c-bus
>
> Following a hunch, I booted the system with SMP disabled (it's a dual
> core part). The issue did not reproduce.
>
> This suggests the driver is lacking locking. I am yet to do any
> detailed debugging.


Been doing some testing.

I'm not sure it's locking, but I think it could have something to do 
with the fact that the driver only acknowledges (clears the irq status 
register) the RX done bit way later than the rest of the bits. Is there 
a reason for this?

It seems to me that we get a second interrupt pending (on the second 
processor? don't see how as we never see any irqs handled there) for RX 
done sometimes, immediately after we've cleared it. I've ONLY seen it on 
RX done. Here's some data to show this:

This is just some custom tracing to track the driver state and the irqs. 
There was too much traffic for dev_dbg to handle. before is the driver 
state before the irq, and after is the driver state after the irq.

0: error entries[10]
 ??? 0: irq[00000000] before[inactive] after[start]
 ??? 1: irq[00000001] before[start] after[tx_first]
 ??? 2: irq[00000001] before[tx_first] after[tx]
 ??? 3: irq[00000001] before[tx] after[start]
 ??? 4: irq[00000005] before[start] after[rx_first]
 ??? 5: irq[00000005] before[rx_first] after[rx]
 ??? 6: irq[00000004] before[rx] after[rx]
 ??? 7: irq[00000004] before[rx] after[stop]??? << all good, transfer is 
complete so we send stop.
 ??? 8: irq[00000000] before[stop] after[inactive]??? << this is the one 
that triggers "failed to STOP" below
 ??? 9: irq[00000010] before[inactive] after[inactive]??? << now we get 
the actual stop, but we're in the wrong state and ignore it
1: error entries[9]
 ??? 0: irq[00000000] before[inactive] after[start]
 ??? 1: irq[00000001] before[start] after[tx_first]
 ??? 2: irq[00000001] before[tx_first] after[tx]
 ??? 3: irq[00000001] before[tx] after[start]
 ??? 4: irq[00000005] before[start] after[rx_first]
 ??? 5: irq[00000005] before[rx_first] after[rx]
 ??? 6: irq[00000004] before[rx] after[rx]???? << all good, transfer is 
continuing
 ??? 7: irq[00000000] before[rx] after[stop]? << no RX during an RX 
operation causes driver to abort and stop
 ??? 8: irq[00000014] before[stop] after[inactive] << now we get the RX 
and the stop.

This corresponds to:

[?? 18.405472] aspeed-i2c-bus 1e78a300.i2c-bus: master failed to STOP. 
irq_status:0x0
[?? 18.414164] aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. 
expected 0x00000010, but was 0x00000000
[?? 21.355422] aspeed-i2c-bus 1e78a300.i2c-bus: master failed to RX
[?? 21.363323] aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. 
expected 0x00000014, but was 0x00000010


I don't understand how disabling SMP fixes this, since the second core 
doesn't seem to ever handle any interrupts. Maybe it's just reporting it 
wrong? From what I understand of interrupt handling, the second core is 
allowed to get interrupts while the first is handling an interrupt and 
has therefore disabled it's own interrupts... correct me if I 
misunderstand. In this case then, RX done gets triggered on the second 
core since it's sitting around uncleared in the status register?


I was also able to "fix" this by simple returning IRQ_NONE if irq_status 
== 0 in the interrupt handler. But probably not a good solution.

Eddie


>
> Have you seen any behavior like this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-17 20:31 ` Eddie James
@ 2019-10-17 22:29   ` Jae Hyun Yoo
  2019-10-18  4:28     ` Peter Rosin
  2019-10-19  1:36     ` Ryan Chen
  0 siblings, 2 replies; 10+ messages in thread
From: Jae Hyun Yoo @ 2019-10-17 22:29 UTC (permalink / raw)
  To: linux-aspeed

On 10/17/2019 1:31 PM, Eddie James wrote:
> 
> On 10/17/19 1:14 AM, Joel Stanley wrote:
>> I have been doing bringup of the ast2600. It contains i2c buses that
>> are register compatible with the ast2500, and I am running them
>> without buffer or DMA mode. This is with v5.3.6, with no patches other
>> than adding the compatible string:
>>
>> --- a/drivers/i2c/busses/i2c-aspeed.c
>> +++ b/drivers/i2c/busses/i2c-aspeed.c
>> @@ -938,6 +938,10 @@ static const struct of_device_id
>> aspeed_i2c_bus_of_table[] = {
>> ???????????????? .compatible = "aspeed,ast2500-i2c-bus",
>> ???????????????? .data = aspeed_i2c_25xx_get_clk_reg_val,
>> ???????? },
>> +?????? {
>> +?????????????? .compatible = "aspeed,ast2600-i2c-bus",
>> +?????????????? .data = aspeed_i2c_25xx_get_clk_reg_val,
>> +?????? },
>> ???????? { },
>> ? };
>>
>>
>> I see this behavior:
>>
>> [?? 20.981417] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to RX
>> [?? 20.988259] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
>> expected 0x00000014, but was 0x00000010
>> [?? 22.451265] aspeed-i2c-bus 1e78a200.i2c-bus: master failed to STOP.
>> irq_status:0x0
>> [?? 22.459909] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
>> expected 0x00000010, but was 0x00000000
>> [?? 22.470604] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
>> expected 0x00000011, but was 0x00000000
>> [?? 29.156951] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to STOP.
>> irq_status:0x0
>> [?? 29.165601] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
>> expected 0x00000010, but was 0x00000000
>>
>> It happens on boot, and can also be triggered by placing load on the
>> system. In particular, if I copy a large amount of data to the flash.
>>
>> The IRQs are being served on one of the CPUs:
>>
>> ? 29:????????? 0????????? 0???? GIC-0 142 Level???? 1e78a080.i2c-bus
>> ? 30:????????? 0????????? 0???? GIC-0 143 Level???? 1e78a100.i2c-bus
>> ? 31:????????? 0????????? 0???? GIC-0 144 Level???? 1e78a180.i2c-bus
>> ? 32:???? 302596????????? 0???? GIC-0 145 Level???? 1e78a200.i2c-bus
>> ? 33:???? 197340????????? 0???? GIC-0 146 Level???? 1e78a280.i2c-bus
>> ? 34:???? 196900????????? 0???? GIC-0 147 Level???? 1e78a300.i2c-bus
>> ? 35:????????? 0????????? 0???? GIC-0 149 Level???? 1e78a400.i2c-bus
>> ? 36:?????? 2199????????? 0???? GIC-0 151 Level???? 1e78a500.i2c-bus
>> ? 37:????????? 0????????? 0???? GIC-0 152 Level???? 1e78a580.i2c-bus
>> ? 38:?????? 3407????????? 0???? GIC-0 153 Level???? 1e78a600.i2c-bus
>> ? 39:????????? 0????????? 0???? GIC-0 154 Level???? 1e78a680.i2c-bus
>> ? 40:????????? 0????????? 0???? GIC-0 155 Level???? 1e78a700.i2c-bus
>>
>> Following a hunch, I booted the system with SMP disabled (it's a dual
>> core part). The issue did not reproduce.
>>
>> This suggests the driver is lacking locking. I am yet to do any
>> detailed debugging.
> 
> 
> Been doing some testing.
> 
> I'm not sure it's locking, but I think it could have something to do 
> with the fact that the driver only acknowledges (clears the irq status 
> register) the RX done bit way later than the rest of the bits. Is there 
> a reason for this?
> 
> It seems to me that we get a second interrupt pending (on the second 
> processor? don't see how as we never see any irqs handled there) for RX 
> done sometimes, immediately after we've cleared it. I've ONLY seen it on 
> RX done. Here's some data to show this:
> 
> This is just some custom tracing to track the driver state and the irqs. 
> There was too much traffic for dev_dbg to handle. before is the driver 
> state before the irq, and after is the driver state after the irq.
> 
> 0: error entries[10]
>  ??? 0: irq[00000000] before[inactive] after[start]
>  ??? 1: irq[00000001] before[start] after[tx_first]
>  ??? 2: irq[00000001] before[tx_first] after[tx]
>  ??? 3: irq[00000001] before[tx] after[start]
>  ??? 4: irq[00000005] before[start] after[rx_first]
>  ??? 5: irq[00000005] before[rx_first] after[rx]
>  ??? 6: irq[00000004] before[rx] after[rx]
>  ??? 7: irq[00000004] before[rx] after[stop]??? << all good, transfer is 
> complete so we send stop.
>  ??? 8: irq[00000000] before[stop] after[inactive]??? << this is the one 
> that triggers "failed to STOP" below

It's really odd. Why does H/W trigger an interrupt without setting any
flag? Or is it an irq affinity setting issue?

>  ??? 9: irq[00000010] before[inactive] after[inactive]??? << now we get 
> the actual stop, but we're in the wrong state and ignore it
> 1: error entries[9]
>  ??? 0: irq[00000000] before[inactive] after[start]
>  ??? 1: irq[00000001] before[start] after[tx_first]
>  ??? 2: irq[00000001] before[tx_first] after[tx]
>  ??? 3: irq[00000001] before[tx] after[start]
>  ??? 4: irq[00000005] before[start] after[rx_first]
>  ??? 5: irq[00000005] before[rx_first] after[rx]
>  ??? 6: irq[00000004] before[rx] after[rx]???? << all good, transfer is 
> continuing
>  ??? 7: irq[00000000] before[rx] after[stop]? << no RX during an RX 
> operation causes driver to abort and stop

Here again. If it happens, state machine in driver will be broken and it
would affect the next event handling.

I didn't see this issue in AST2500 which has a single core and runs in
UP kernel. Seems that it can be observed only in AST2600 SMP setting.

Ryan,
Can you please check it whether it's an expected H/W behavior of AST2600
or not?

>  ??? 8: irq[00000014] before[stop] after[inactive] << now we get the RX 
> and the stop.
> 
> This corresponds to:
> 
> [?? 18.405472] aspeed-i2c-bus 1e78a300.i2c-bus: master failed to STOP. 
> irq_status:0x0
> [?? 18.414164] aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. 
> expected 0x00000010, but was 0x00000000
> [?? 21.355422] aspeed-i2c-bus 1e78a300.i2c-bus: master failed to RX
> [?? 21.363323] aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. 
> expected 0x00000014, but was 0x00000010
> 
> 
> I don't understand how disabling SMP fixes this, since the second core 
> doesn't seem to ever handle any interrupts. Maybe it's just reporting it 
> wrong? From what I understand of interrupt handling, the second core is 
> allowed to get interrupts while the first is handling an interrupt and 
> has therefore disabled it's own interrupts... correct me if I 
> misunderstand. In this case then, RX done gets triggered on the second 
> core since it's sitting around uncleared in the status register?
> 
> 
> I was also able to "fix" this by simple returning IRQ_NONE if irq_status 
> == 0 in the interrupt handler. But probably not a good solution.

A fix like below, right?

@@ -603,6 +603,9 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void 
*dev_id)

         spin_lock(&bus->lock);
         irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
+       if (!irq_received)
+               return IRQ_NONE;
+
         /* Ack all interrupts except for Rx done */
         writel(irq_received & ~ASPEED_I2CD_INTR_RX_DONE,
                bus->base + ASPEED_I2C_INTR_STS_REG);


I think it's a right fix for the issue. At least, we need to prevent
any driver state corruption. The state machine would run correctly if we
filtering the garbage interrupt out.

Cheers,

Jae

> Eddie
> 
> 
>>
>> Have you seen any behavior like this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-17 22:29   ` Jae Hyun Yoo
@ 2019-10-18  4:28     ` Peter Rosin
  2019-10-18 16:07       ` Jae Hyun Yoo
  2019-10-19  1:36     ` Ryan Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Rosin @ 2019-10-18  4:28 UTC (permalink / raw)
  To: linux-aspeed

On 2019-10-18 00:29, Jae Hyun Yoo wrote:
> On 10/17/2019 1:31 PM, Eddie James wrote:
>> I was also able to "fix" this by simple returning IRQ_NONE if irq_status 
>> == 0 in the interrupt handler. But probably not a good solution.
> 
> A fix like below, right?
> 
> @@ -603,6 +603,9 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void 
> *dev_id)
> 
>          spin_lock(&bus->lock);
>          irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
> +       if (!irq_received)
> +               return IRQ_NONE;
> +

Don't forget spin_unlock(&bus->lock);

Cheers,
Peter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-18  4:28     ` Peter Rosin
@ 2019-10-18 16:07       ` Jae Hyun Yoo
  0 siblings, 0 replies; 10+ messages in thread
From: Jae Hyun Yoo @ 2019-10-18 16:07 UTC (permalink / raw)
  To: linux-aspeed

On 10/17/2019 9:28 PM, Peter Rosin wrote:
> On 2019-10-18 00:29, Jae Hyun Yoo wrote:
>> On 10/17/2019 1:31 PM, Eddie James wrote:
>>> I was also able to "fix" this by simple returning IRQ_NONE if irq_status
>>> == 0 in the interrupt handler. But probably not a good solution.
>>
>> A fix like below, right?
>>
>> @@ -603,6 +603,9 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void
>> *dev_id)
>>
>>           spin_lock(&bus->lock);
>>           irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
>> +       if (!irq_received)
>> +               return IRQ_NONE;
>> +
> 
> Don't forget spin_unlock(&bus->lock);

Ah, I missed that. Thanks for your pointing out.

Cheers,

Jae

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-17 22:29   ` Jae Hyun Yoo
  2019-10-18  4:28     ` Peter Rosin
@ 2019-10-19  1:36     ` Ryan Chen
  2019-10-21 12:16       ` Wolfram Sang
  2019-10-22 20:24       ` Eddie James
  1 sibling, 2 replies; 10+ messages in thread
From: Ryan Chen @ 2019-10-19  1:36 UTC (permalink / raw)
  To: linux-aspeed

Hello Joel,
	Could you enable kernel config CONFIG_I2C_DEBUG_BUS? And send me the log. 

	And also add following in aspeed_i2c_bus_irq function

	........................
	irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
	+ irq_received &= 0xf000ffff;	
	------------------------

	
	

-----Original Message-----
From: Jae Hyun Yoo [mailto:jae.hyun.yoo at linux.intel.com] 
Sent: Friday, October 18, 2019 6:29 AM
To: Eddie James <eajames@linux.vnet.ibm.com>; Joel Stanley <joel@jms.id.au>; Brendan Higgins <brendanhiggins@google.com>
Cc: linux-aspeed <linux-aspeed@lists.ozlabs.org>; linux-i2c at vger.kernel.org; Andrew Jeffery <andrew@aj.id.au>; Eddie James <eajames@linux.ibm.com>; Ryan Chen <ryan_chen@aspeedtech.com>
Subject: Re: AST2600 i2c irq issue

On 10/17/2019 1:31 PM, Eddie James wrote:
> 
> On 10/17/19 1:14 AM, Joel Stanley wrote:
>> I have been doing bringup of the ast2600. It contains i2c buses that 
>> are register compatible with the ast2500, and I am running them 
>> without buffer or DMA mode. This is with v5.3.6, with no patches 
>> other than adding the compatible string:
>>
>> --- a/drivers/i2c/busses/i2c-aspeed.c
>> +++ b/drivers/i2c/busses/i2c-aspeed.c
>> @@ -938,6 +938,10 @@ static const struct of_device_id 
>> aspeed_i2c_bus_of_table[] = {
>> ???????????????? .compatible = "aspeed,ast2500-i2c-bus",
>> ???????????????? .data = aspeed_i2c_25xx_get_clk_reg_val,
>> ???????? },
>> +?????? {
>> +?????????????? .compatible = "aspeed,ast2600-i2c-bus",
>> +?????????????? .data = aspeed_i2c_25xx_get_clk_reg_val,
>> +?????? },
>> ???????? { },
>> ? };
>>
>>
>> I see this behavior:
>>
>> [?? 20.981417] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to RX [?? 
>> 20.988259] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
>> expected 0x00000014, but was 0x00000010 [?? 22.451265] aspeed-i2c-bus 
>> 1e78a200.i2c-bus: master failed to STOP.
>> irq_status:0x0
>> [?? 22.459909] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
>> expected 0x00000010, but was 0x00000000 [?? 22.470604] aspeed-i2c-bus 
>> 1e78a200.i2c-bus: irq handled != irq.
>> expected 0x00000011, but was 0x00000000 [?? 29.156951] aspeed-i2c-bus 
>> 1e78a280.i2c-bus: master failed to STOP.
>> irq_status:0x0
>> [?? 29.165601] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
>> expected 0x00000010, but was 0x00000000
>>
>> It happens on boot, and can also be triggered by placing load on the 
>> system. In particular, if I copy a large amount of data to the flash.
>>
>> The IRQs are being served on one of the CPUs:
>>
>> ? 29:????????? 0????????? 0???? GIC-0 142 Level???? 1e78a080.i2c-bus
>> ? 30:????????? 0????????? 0???? GIC-0 143 Level???? 1e78a100.i2c-bus
>> ? 31:????????? 0????????? 0???? GIC-0 144 Level???? 1e78a180.i2c-bus
>> ? 32:???? 302596????????? 0???? GIC-0 145 Level???? 1e78a200.i2c-bus
>> ? 33:???? 197340????????? 0???? GIC-0 146 Level???? 1e78a280.i2c-bus
>> ? 34:???? 196900????????? 0???? GIC-0 147 Level???? 1e78a300.i2c-bus
>> ? 35:????????? 0????????? 0???? GIC-0 149 Level???? 1e78a400.i2c-bus
>> ? 36:?????? 2199????????? 0???? GIC-0 151 Level???? 1e78a500.i2c-bus
>> ? 37:????????? 0????????? 0???? GIC-0 152 Level???? 1e78a580.i2c-bus
>> ? 38:?????? 3407????????? 0???? GIC-0 153 Level???? 1e78a600.i2c-bus
>> ? 39:????????? 0????????? 0???? GIC-0 154 Level???? 1e78a680.i2c-bus
>> ? 40:????????? 0????????? 0???? GIC-0 155 Level???? 1e78a700.i2c-bus
>>
>> Following a hunch, I booted the system with SMP disabled (it's a dual 
>> core part). The issue did not reproduce.
>>
>> This suggests the driver is lacking locking. I am yet to do any 
>> detailed debugging.
> 
> 
> Been doing some testing.
> 
> I'm not sure it's locking, but I think it could have something to do 
> with the fact that the driver only acknowledges (clears the irq status
> register) the RX done bit way later than the rest of the bits. Is 
> there a reason for this?
> 
> It seems to me that we get a second interrupt pending (on the second 
> processor? don't see how as we never see any irqs handled there) for 
> RX done sometimes, immediately after we've cleared it. I've ONLY seen 
> it on RX done. Here's some data to show this:
> 
> This is just some custom tracing to track the driver state and the irqs. 
> There was too much traffic for dev_dbg to handle. before is the driver 
> state before the irq, and after is the driver state after the irq.
> 
> 0: error entries[10]
>  ??? 0: irq[00000000] before[inactive] after[start]
>  ??? 1: irq[00000001] before[start] after[tx_first]
>  ??? 2: irq[00000001] before[tx_first] after[tx]
>  ??? 3: irq[00000001] before[tx] after[start]
>  ??? 4: irq[00000005] before[start] after[rx_first]
>  ??? 5: irq[00000005] before[rx_first] after[rx]
>  ??? 6: irq[00000004] before[rx] after[rx]
>  ??? 7: irq[00000004] before[rx] after[stop]??? << all good, transfer 
> is complete so we send stop.
>  ??? 8: irq[00000000] before[stop] after[inactive]??? << this is the 
> one that triggers "failed to STOP" below

It's really odd. Why does H/W trigger an interrupt without setting any flag? Or is it an irq affinity setting issue?

>  ??? 9: irq[00000010] before[inactive] after[inactive]??? << now we 
> get the actual stop, but we're in the wrong state and ignore it
> 1: error entries[9]
>  ??? 0: irq[00000000] before[inactive] after[start]
>  ??? 1: irq[00000001] before[start] after[tx_first]
>  ??? 2: irq[00000001] before[tx_first] after[tx]
>  ??? 3: irq[00000001] before[tx] after[start]
>  ??? 4: irq[00000005] before[start] after[rx_first]
>  ??? 5: irq[00000005] before[rx_first] after[rx]
>  ??? 6: irq[00000004] before[rx] after[rx]???? << all good, transfer 
> is continuing
>  ??? 7: irq[00000000] before[rx] after[stop]? << no RX during an RX 
> operation causes driver to abort and stop

Here again. If it happens, state machine in driver will be broken and it would affect the next event handling.

I didn't see this issue in AST2500 which has a single core and runs in UP kernel. Seems that it can be observed only in AST2600 SMP setting.

Ryan,
Can you please check it whether it's an expected H/W behavior of AST2600 or not?

>  ??? 8: irq[00000014] before[stop] after[inactive] << now we get the 
> RX and the stop.
> 
> This corresponds to:
> 
> [?? 18.405472] aspeed-i2c-bus 1e78a300.i2c-bus: master failed to STOP. 
> irq_status:0x0
> [?? 18.414164] aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq. 
> expected 0x00000010, but was 0x00000000 [?? 21.355422] aspeed-i2c-bus 
> 1e78a300.i2c-bus: master failed to RX [?? 21.363323] aspeed-i2c-bus 
> 1e78a300.i2c-bus: irq handled != irq.
> expected 0x00000014, but was 0x00000010
> 
> 
> I don't understand how disabling SMP fixes this, since the second core 
> doesn't seem to ever handle any interrupts. Maybe it's just reporting 
> it wrong? From what I understand of interrupt handling, the second 
> core is allowed to get interrupts while the first is handling an 
> interrupt and has therefore disabled it's own interrupts... correct me 
> if I misunderstand. In this case then, RX done gets triggered on the 
> second core since it's sitting around uncleared in the status register?
> 
> 
> I was also able to "fix" this by simple returning IRQ_NONE if 
> irq_status == 0 in the interrupt handler. But probably not a good solution.

A fix like below, right?

@@ -603,6 +603,9 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void
*dev_id)

         spin_lock(&bus->lock);
         irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
+       if (!irq_received)
+               return IRQ_NONE;
+
         /* Ack all interrupts except for Rx done */
         writel(irq_received & ~ASPEED_I2CD_INTR_RX_DONE,
                bus->base + ASPEED_I2C_INTR_STS_REG);


I think it's a right fix for the issue. At least, we need to prevent any driver state corruption. The state machine would run correctly if we filtering the garbage interrupt out.

Cheers,

Jae

> Eddie
> 
> 
>>
>> Have you seen any behavior like this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-19  1:36     ` Ryan Chen
@ 2019-10-21 12:16       ` Wolfram Sang
  2019-10-21 18:27         ` Jae Hyun Yoo
  2019-10-22 20:24       ` Eddie James
  1 sibling, 1 reply; 10+ messages in thread
From: Wolfram Sang @ 2019-10-21 12:16 UTC (permalink / raw)
  To: linux-aspeed


>          spin_lock(&bus->lock);
>          irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
> +       if (!irq_received)
> +               return IRQ_NONE;
> +
>          /* Ack all interrupts except for Rx done */
>          writel(irq_received & ~ASPEED_I2CD_INTR_RX_DONE,
>                 bus->base + ASPEED_I2C_INTR_STS_REG);
> 
> 
> I think it's a right fix for the issue. At least, we need to prevent
> any driver state corruption. The state machine would run correctly if
> we filtering the garbage interrupt out.

Just unlock the spinlock before you return...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linux-aspeed/attachments/20191021/eeb5c44a/attachment.sig>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-21 12:16       ` Wolfram Sang
@ 2019-10-21 18:27         ` Jae Hyun Yoo
  0 siblings, 0 replies; 10+ messages in thread
From: Jae Hyun Yoo @ 2019-10-21 18:27 UTC (permalink / raw)
  To: linux-aspeed

On 10/21/2019 5:16 AM, Wolfram Sang wrote:
> 
>>           spin_lock(&bus->lock);
>>           irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
>> +       if (!irq_received)
>> +               return IRQ_NONE;
>> +
>>           /* Ack all interrupts except for Rx done */
>>           writel(irq_received & ~ASPEED_I2CD_INTR_RX_DONE,
>>                  bus->base + ASPEED_I2C_INTR_STS_REG);
>>
>>
>> I think it's a right fix for the issue. At least, we need to prevent
>> any driver state corruption. The state machine would run correctly if
>> we filtering the garbage interrupt out.
> 
> Just unlock the spinlock before you return...

Thanks Wolfram!

Peter also pointed it out, and Eddie made a patch which includes the
spin_unlock.

Hi Eddie,
Can you please submit your patch into 'i2c-next' upstream?

Thanks,

Jae

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-19  1:36     ` Ryan Chen
  2019-10-21 12:16       ` Wolfram Sang
@ 2019-10-22 20:24       ` Eddie James
  2019-10-24 18:03         ` Jae Hyun Yoo
  1 sibling, 1 reply; 10+ messages in thread
From: Eddie James @ 2019-10-22 20:24 UTC (permalink / raw)
  To: linux-aspeed


On 10/18/19 8:36 PM, Ryan Chen wrote:
> Hello Joel,
> 	Could you enable kernel config CONFIG_I2C_DEBUG_BUS? And send me the log.
>
> 	And also add following in aspeed_i2c_bus_irq function
>
> 	........................
> 	irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
> 	+ irq_received &= 0xf000ffff;	


Hi Ryan,

I dumped the I2C register state at each IRQ like you mentioned. This was 
the kernel log in this case:

[?? 39.266308] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to RX
[?? 39.273303] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq. 
expected 0x00000014, but was 0x00000010

The gXX registers are the global i2c registers (not specific to the bus).

0: irq[00000000] before[inactive] after[start]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000000]
 ??? 14[0a060000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[000004e1]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000000]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
1: irq[00000001] before[start] after[tx_first]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000000]
 ??? 14[0c430000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[0000e2e2]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000000]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
2: irq[00000001] before[tx_first] after[tx]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000000]
 ??? 14[0c430000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[0000e2e2]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000000]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
3: irq[00000001] before[start] after[start]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000000]
 ??? 14[0c430000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[0000e289]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000000]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
4: irq[00000005] before[start] after[rx_first]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000004]
 ??? 14[0c430000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[000000e3]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000010]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
5: irq[00000005] before[rx_first] after[rx]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000004]
 ??? 14[0c430000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[000000e3]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000010]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
6: irq[00000004] before[rx] after[rx]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000004]
 ??? 14[0c410000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[0000d8e3]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000010]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
7: irq[00000000] before[rx] after[stop]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000004]??? <<< this is interesting. this has changed since the 
start of the interrupt handler.
 ??? 14[0c430000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[00001de3]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000010]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]
8: irq[00000014] before[stop] after[inactive]
 ??? 00[00008001]
 ??? 04[007ee005]
 ??? 08[00000000]
 ??? 0c[0000607f]
 ??? 10[00000004]
 ??? 14[0a060000]
 ??? 18[00000080]
 ??? 1c[00000000]
 ??? 20[00001de3]
 ??? 24[fffffff7]
 ??? 28[00000000]
 ??? 2c[fffffff7]
 ??? 30[00000000]
 ??? g00[00000010]
 ??? g04[00000000]
 ??? g08[00000000]
 ??? g0c[00000000]
 ??? g10[00000000]


I grabbed the registers when the driver state transitions in the irq 
handler. So, after the irq status has been initially cleared (except for 
rx). In the case of 7, when the irq status changes, I grab it in 
aspeed_i2c_do_stop before writing the STOP. So immediately after the 
"master failed to RX" log. The entire irq handler is spin locked, so all 
of the state transitions and register grabbing is locked.

Any thoughts?

Thanks,

Eddie


> 	------------------------
>
> 	
> 	
>
> -----Original Message-----
> From: Jae Hyun Yoo [mailto:jae.hyun.yoo at linux.intel.com]
> Sent: Friday, October 18, 2019 6:29 AM
> To: Eddie James <eajames@linux.vnet.ibm.com>; Joel Stanley <joel@jms.id.au>; Brendan Higgins <brendanhiggins@google.com>
> Cc: linux-aspeed <linux-aspeed@lists.ozlabs.org>; linux-i2c at vger.kernel.org; Andrew Jeffery <andrew@aj.id.au>; Eddie James <eajames@linux.ibm.com>; Ryan Chen <ryan_chen@aspeedtech.com>
> Subject: Re: AST2600 i2c irq issue
>
> On 10/17/2019 1:31 PM, Eddie James wrote:
>> On 10/17/19 1:14 AM, Joel Stanley wrote:
>>> I have been doing bringup of the ast2600. It contains i2c buses that
>>> are register compatible with the ast2500, and I am running them
>>> without buffer or DMA mode. This is with v5.3.6, with no patches
>>> other than adding the compatible string:
>>>
>>> --- a/drivers/i2c/busses/i2c-aspeed.c
>>> +++ b/drivers/i2c/busses/i2c-aspeed.c
>>> @@ -938,6 +938,10 @@ static const struct of_device_id
>>> aspeed_i2c_bus_of_table[] = {
>>>  ???????????????? .compatible = "aspeed,ast2500-i2c-bus",
>>>  ???????????????? .data = aspeed_i2c_25xx_get_clk_reg_val,
>>>  ???????? },
>>> +?????? {
>>> +?????????????? .compatible = "aspeed,ast2600-i2c-bus",
>>> +?????????????? .data = aspeed_i2c_25xx_get_clk_reg_val,
>>> +?????? },
>>>  ???????? { },
>>>  ? };
>>>
>>>
>>> I see this behavior:
>>>
>>> [?? 20.981417] aspeed-i2c-bus 1e78a280.i2c-bus: master failed to RX [
>>> 20.988259] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
>>> expected 0x00000014, but was 0x00000010 [?? 22.451265] aspeed-i2c-bus
>>> 1e78a200.i2c-bus: master failed to STOP.
>>> irq_status:0x0
>>> [?? 22.459909] aspeed-i2c-bus 1e78a200.i2c-bus: irq handled != irq.
>>> expected 0x00000010, but was 0x00000000 [?? 22.470604] aspeed-i2c-bus
>>> 1e78a200.i2c-bus: irq handled != irq.
>>> expected 0x00000011, but was 0x00000000 [?? 29.156951] aspeed-i2c-bus
>>> 1e78a280.i2c-bus: master failed to STOP.
>>> irq_status:0x0
>>> [?? 29.165601] aspeed-i2c-bus 1e78a280.i2c-bus: irq handled != irq.
>>> expected 0x00000010, but was 0x00000000
>>>
>>> It happens on boot, and can also be triggered by placing load on the
>>> system. In particular, if I copy a large amount of data to the flash.
>>>
>>> The IRQs are being served on one of the CPUs:
>>>
>>>  ? 29:????????? 0????????? 0???? GIC-0 142 Level???? 1e78a080.i2c-bus
>>>  ? 30:????????? 0????????? 0???? GIC-0 143 Level???? 1e78a100.i2c-bus
>>>  ? 31:????????? 0????????? 0???? GIC-0 144 Level???? 1e78a180.i2c-bus
>>>  ? 32:???? 302596????????? 0???? GIC-0 145 Level???? 1e78a200.i2c-bus
>>>  ? 33:???? 197340????????? 0???? GIC-0 146 Level???? 1e78a280.i2c-bus
>>>  ? 34:???? 196900????????? 0???? GIC-0 147 Level???? 1e78a300.i2c-bus
>>>  ? 35:????????? 0????????? 0???? GIC-0 149 Level???? 1e78a400.i2c-bus
>>>  ? 36:?????? 2199????????? 0???? GIC-0 151 Level???? 1e78a500.i2c-bus
>>>  ? 37:????????? 0????????? 0???? GIC-0 152 Level???? 1e78a580.i2c-bus
>>>  ? 38:?????? 3407????????? 0???? GIC-0 153 Level???? 1e78a600.i2c-bus
>>>  ? 39:????????? 0????????? 0???? GIC-0 154 Level???? 1e78a680.i2c-bus
>>>  ? 40:????????? 0????????? 0???? GIC-0 155 Level???? 1e78a700.i2c-bus
>>>
>>> Following a hunch, I booted the system with SMP disabled (it's a dual
>>> core part). The issue did not reproduce.
>>>
>>> This suggests the driver is lacking locking. I am yet to do any
>>> detailed debugging.
>>
>> Been doing some testing.
>>
>> I'm not sure it's locking, but I think it could have something to do
>> with the fact that the driver only acknowledges (clears the irq status
>> register) the RX done bit way later than the rest of the bits. Is
>> there a reason for this?
>>
>> It seems to me that we get a second interrupt pending (on the second
>> processor? don't see how as we never see any irqs handled there) for
>> RX done sometimes, immediately after we've cleared it. I've ONLY seen
>> it on RX done. Here's some data to show this:
>>
>> This is just some custom tracing to track the driver state and the irqs.
>> There was too much traffic for dev_dbg to handle. before is the driver
>> state before the irq, and after is the driver state after the irq.
>>
>> 0: error entries[10]
>>   ??? 0: irq[00000000] before[inactive] after[start]
>>   ??? 1: irq[00000001] before[start] after[tx_first]
>>   ??? 2: irq[00000001] before[tx_first] after[tx]
>>   ??? 3: irq[00000001] before[tx] after[start]
>>   ??? 4: irq[00000005] before[start] after[rx_first]
>>   ??? 5: irq[00000005] before[rx_first] after[rx]
>>   ??? 6: irq[00000004] before[rx] after[rx]
>>   ??? 7: irq[00000004] before[rx] after[stop]??? << all good, transfer
>> is complete so we send stop.
>>   ??? 8: irq[00000000] before[stop] after[inactive]??? << this is the
>> one that triggers "failed to STOP" below
> It's really odd. Why does H/W trigger an interrupt without setting any flag? Or is it an irq affinity setting issue?
>
>>   ??? 9: irq[00000010] before[inactive] after[inactive]??? << now we
>> get the actual stop, but we're in the wrong state and ignore it
>> 1: error entries[9]
>>   ??? 0: irq[00000000] before[inactive] after[start]
>>   ??? 1: irq[00000001] before[start] after[tx_first]
>>   ??? 2: irq[00000001] before[tx_first] after[tx]
>>   ??? 3: irq[00000001] before[tx] after[start]
>>   ??? 4: irq[00000005] before[start] after[rx_first]
>>   ??? 5: irq[00000005] before[rx_first] after[rx]
>>   ??? 6: irq[00000004] before[rx] after[rx]???? << all good, transfer
>> is continuing
>>   ??? 7: irq[00000000] before[rx] after[stop]? << no RX during an RX
>> operation causes driver to abort and stop
> Here again. If it happens, state machine in driver will be broken and it would affect the next event handling.
>
> I didn't see this issue in AST2500 which has a single core and runs in UP kernel. Seems that it can be observed only in AST2600 SMP setting.
>
> Ryan,
> Can you please check it whether it's an expected H/W behavior of AST2600 or not?
>
>>   ??? 8: irq[00000014] before[stop] after[inactive] << now we get the
>> RX and the stop.
>>
>> This corresponds to:
>>
>> [?? 18.405472] aspeed-i2c-bus 1e78a300.i2c-bus: master failed to STOP.
>> irq_status:0x0
>> [?? 18.414164] aspeed-i2c-bus 1e78a300.i2c-bus: irq handled != irq.
>> expected 0x00000010, but was 0x00000000 [?? 21.355422] aspeed-i2c-bus
>> 1e78a300.i2c-bus: master failed to RX [?? 21.363323] aspeed-i2c-bus
>> 1e78a300.i2c-bus: irq handled != irq.
>> expected 0x00000014, but was 0x00000010
>>
>>
>> I don't understand how disabling SMP fixes this, since the second core
>> doesn't seem to ever handle any interrupts. Maybe it's just reporting
>> it wrong? From what I understand of interrupt handling, the second
>> core is allowed to get interrupts while the first is handling an
>> interrupt and has therefore disabled it's own interrupts... correct me
>> if I misunderstand. In this case then, RX done gets triggered on the
>> second core since it's sitting around uncleared in the status register?
>>
>>
>> I was also able to "fix" this by simple returning IRQ_NONE if
>> irq_status == 0 in the interrupt handler. But probably not a good solution.
> A fix like below, right?
>
> @@ -603,6 +603,9 @@ static irqreturn_t aspeed_i2c_bus_irq(int irq, void
> *dev_id)
>
>           spin_lock(&bus->lock);
>           irq_received = readl(bus->base + ASPEED_I2C_INTR_STS_REG);
> +       if (!irq_received)
> +               return IRQ_NONE;
> +
>           /* Ack all interrupts except for Rx done */
>           writel(irq_received & ~ASPEED_I2CD_INTR_RX_DONE,
>                  bus->base + ASPEED_I2C_INTR_STS_REG);
>
>
> I think it's a right fix for the issue. At least, we need to prevent any driver state corruption. The state machine would run correctly if we filtering the garbage interrupt out.
>
> Cheers,
>
> Jae
>
>> Eddie
>>
>>
>>> Have you seen any behavior like this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* AST2600 i2c irq issue
  2019-10-22 20:24       ` Eddie James
@ 2019-10-24 18:03         ` Jae Hyun Yoo
  0 siblings, 0 replies; 10+ messages in thread
From: Jae Hyun Yoo @ 2019-10-24 18:03 UTC (permalink / raw)
  To: linux-aspeed

Hi Eddie,

On 10/22/2019 1:24 PM, Eddie James wrote:

[...]

> 7: irq[00000000] before[rx] after[stop]

'irq[]' here and '10[]' below are the same register. This log means an
interrupt came with empty status.

>  ??? 00[00008001]
>  ??? 04[007ee005]
>  ??? 08[00000000]
>  ??? 0c[0000607f]
>  ??? 10[00000004]??? <<< this is interesting. this has changed since the 
> start of the interrupt handler.

The interrupt status was updated lately at here. Looks like there is a
timing gap in H/W between interrupt triggering and status updating. I
haven't seen this issue in previous H/W versions.

This issue can be resolved by your patch but ultimately this issue
should be fixed in AST2600 A1 revision H/W if possible so that we can
remove the unnecessary interrupt handling.

Thanks,

Jae

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-24 18:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-17  6:14 AST2600 i2c irq issue Joel Stanley
2019-10-17 20:31 ` Eddie James
2019-10-17 22:29   ` Jae Hyun Yoo
2019-10-18  4:28     ` Peter Rosin
2019-10-18 16:07       ` Jae Hyun Yoo
2019-10-19  1:36     ` Ryan Chen
2019-10-21 12:16       ` Wolfram Sang
2019-10-21 18:27         ` Jae Hyun Yoo
2019-10-22 20:24       ` Eddie James
2019-10-24 18:03         ` Jae Hyun Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox