* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 8:08 FlexCAN on i.MX28 interrupt flooding retrying send Stanislav Meduna
@ 2014-03-07 8:16 ` Marc Kleine-Budde
2014-03-07 8:40 ` Wolfgang Grandegger
2014-03-07 8:32 ` Matthias Klein
2014-03-07 8:46 ` Marc Kleine-Budde
2 siblings, 1 reply; 10+ messages in thread
From: Marc Kleine-Budde @ 2014-03-07 8:16 UTC (permalink / raw)
To: Stanislav Meduna, wg, linux-can, linux-kernel@vger.kernel.org,
linux-rt-users@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2998 bytes --]
Adding the linux-can mailinglist to Cc.
Marc
On 03/07/2014 09:08 AM, Stanislav Meduna wrote:
> Hi,
>
> I am using a FlexCAN CAN controller on a Freescale i.MX28 platform [1].
> If a packet is being sent when the bus is disconnected, I am getting
> an interrupt flooed that basically kills the machine.
>
> This is _not_ the same problem as [2] - my kernel already has
> the fix.
>
> The first interrupt comes with ESR 0x00028652, i.e.
>
> TXWRN_INT
> BIT1_ERR
> STF_ERR
> TX_WRN
> TXRX
> FLT_CONF error passive
> ERR_INT
>
> The next ones come the same without the acked TXWRN_INT.
> Reading the ESR again immediately after acking gives
> 0x00000250, i.e.
>
> TX_WRN
> TXRX
> FLT_CONF error passive
>
> so everything ackable has actually been acked.
>
> I think that the problem is that the FlexCAN tries to retransmit
> the frame indefinitely. Each retry senses the bus in the invalid
> state (BIT1_ERR) and immediately fires a new ERR_INT. To verify
> this I aborted the transmitted frame in the error state in the
> interrupt handler
>
> #define FLEXCAN_ESR_ERR_TRANSMIT \
> (FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | FLEXCAN_ESR_ACK_ERR)
>
> if (reg_esr & FLEXCAN_ESR_ERR_TRANSMIT) {
>
> /* In case of a transmission error the packet is retried and
> * if the error persists, we will get another interrupt right
> * away. Abort the transmission - a lost packet is better than
> * an irq storm.
> */
> if(printk_ratelimit())
> netdev_err(dev, "Aborted transmission, ESR %08x\n", reg_esr);
>
> can_get_echo_skb(dev, 0);
> flexcan_write(FLEXCAN_MB_CNT_CODE(0x4),
> ®s->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
> netif_wake_queue(dev);
> }
>
> and the problem disappeared as expected. However, the correct
> way is probably to retry during some reasonable (configurable?)
> time interval.
>
> What puzzles me is that I did not found any other instance
> of this problem in the relevant mailing lists, only the original [2].
>
> I am using the 3.4.77 kernel with the realtime patches, but the
> code in the latest mainline looks the same in this respect.
> Maybe the realtime patches change some bevaviour, but I don't
> think they affect the core problem. I am not really an expert
> in the network devices, NAPI etc - maybe in that case the error
> interrupt should be disabled and re-enabled only if the
> error condition goes away? - I don't know...
>
> Please Cc: me when answering to the list.
>
> [1] http://www.tq-group.com/en/products/product-details/prod/embedded-modul-tqma28/extb/Main/productdetail/
> [2] https://gitorious.org/linux-can/wg-linux-can-next/commit/8ad94fa
>
> Thanks
>
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 8:16 ` Marc Kleine-Budde
@ 2014-03-07 8:40 ` Wolfgang Grandegger
[not found] ` <OF203965D8.128520F3-ON86257C94.004FA84A-86257C94.0050AF36@notes.cat.com>
0 siblings, 1 reply; 10+ messages in thread
From: Wolfgang Grandegger @ 2014-03-07 8:40 UTC (permalink / raw)
To: Stanislav Meduna, linux-can, linux-kernel@vger.kernel.org,
linux-rt-users@vger.kernel.org
Cc: Marc Kleine-Budde
On 03/07/2014 09:16 AM, Marc Kleine-Budde wrote:
> Adding the linux-can mailinglist to Cc.
>
> Marc
>
> On 03/07/2014 09:08 AM, Stanislav Meduna wrote:
>> Hi,
>>
>> I am using a FlexCAN CAN controller on a Freescale i.MX28 platform [1].
>> If a packet is being sent when the bus is disconnected, I am getting
>> an interrupt flooed that basically kills the machine.
>>
>> This is _not_ the same problem as [2] - my kernel already has
>> the fix.
>>
>> The first interrupt comes with ESR 0x00028652, i.e.
>>
>> TXWRN_INT
>> BIT1_ERR
>> STF_ERR
>> TX_WRN
>> TXRX
>> FLT_CONF error passive
>> ERR_INT
>>
>> The next ones come the same without the acked TXWRN_INT.
>> Reading the ESR again immediately after acking gives
>> 0x00000250, i.e.
>>
>> TX_WRN
>> TXRX
>> FLT_CONF error passive
>>
>> so everything ackable has actually been acked.
>>
>> I think that the problem is that the FlexCAN tries to retransmit
>> the frame indefinitely. Each retry senses the bus in the invalid
>> state (BIT1_ERR) and immediately fires a new ERR_INT. To verify
>> this I aborted the transmitted frame in the error state in the
>> interrupt handler
>>
>> #define FLEXCAN_ESR_ERR_TRANSMIT \
>> (FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | FLEXCAN_ESR_ACK_ERR)
>>
>> if (reg_esr & FLEXCAN_ESR_ERR_TRANSMIT) {
>>
>> /* In case of a transmission error the packet is retried and
>> * if the error persists, we will get another interrupt right
>> * away. Abort the transmission - a lost packet is better than
>> * an irq storm.
>> */
>> if(printk_ratelimit())
>> netdev_err(dev, "Aborted transmission, ESR %08x\n", reg_esr);
>>
>> can_get_echo_skb(dev, 0);
>> flexcan_write(FLEXCAN_MB_CNT_CODE(0x4),
>> ®s->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
>> netif_wake_queue(dev);
>> }
>>
>> and the problem disappeared as expected. However, the correct
>> way is probably to retry during some reasonable (configurable?)
>> time interval.
>>
>> What puzzles me is that I did not found any other instance
>> of this problem in the relevant mailing lists, only the original [2].
>>
>> I am using the 3.4.77 kernel with the realtime patches, but the
>> code in the latest mainline looks the same in this respect.
>> Maybe the realtime patches change some bevaviour, but I don't
>> think they affect the core problem. I am not really an expert
>> in the network devices, NAPI etc - maybe in that case the error
>> interrupt should be disabled and re-enabled only if the
>> error condition goes away? - I don't know...
>>
>> Please Cc: me when answering to the list.
>>
>> [1] http://www.tq-group.com/en/products/product-details/prod/embedded-modul-tqma28/extb/Main/productdetail/
>> [2] https://gitorious.org/linux-can/wg-linux-can-next/commit/8ad94fa
If bus-error reporting is enabled, you will get an interrupt for each
TX retry. That's normal behavior. But for the i.MX28 it should not be
enabled:
$ cat flexcan.c
...
/*
* enable the "error interrupt" (FLEXCAN_CTRL_ERR_MSK),
* on most Flexcan cores, too. Otherwise we don't get
* any error warning or passive interrupts.
*/
if (priv->devtype_data->features & FLEXCAN_HAS_BROKEN_ERR_STATE ||
priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
Maybe there is something wrong with you platform code or DTS file. What
kernel are you using and how is the DTS can node defined in your DTS file?
Wolfgang.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 8:08 FlexCAN on i.MX28 interrupt flooding retrying send Stanislav Meduna
2014-03-07 8:16 ` Marc Kleine-Budde
@ 2014-03-07 8:32 ` Matthias Klein
2014-03-07 8:46 ` Marc Kleine-Budde
2 siblings, 0 replies; 10+ messages in thread
From: Matthias Klein @ 2014-03-07 8:32 UTC (permalink / raw)
To: Stanislav Meduna, wg, mkl, linux-can,
linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org
Hello Stanislav,
I made a similar observation on an i.MX537 with the 3.12.12-rt19 kernel:
I see the same interrupt flooed when the bus is disconnected.
What do you mean with "kills the machine"? I have a high interrupt load,
but the machine is still responsive.
Best regards,
Matthias
------ Originalnachricht ------
Von: "Stanislav Meduna" <stano@meduna.org>
An: wg@grandegger.com; mkl@pengutronix.de; linux-can@vger.kernel.org;
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>;
"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Gesendet: 07.03.2014 09:08:42
Betreff: FlexCAN on i.MX28 interrupt flooding retrying send
>Hi,
>
>I am using a FlexCAN CAN controller on a Freescale i.MX28 platform [1].
>If a packet is being sent when the bus is disconnected, I am getting
>an interrupt flooed that basically kills the machine.
>
>This is _not_ the same problem as [2] - my kernel already has
>the fix.
>
>The first interrupt comes with ESR 0x00028652, i.e.
>
>TXWRN_INT
>BIT1_ERR
>STF_ERR
>TX_WRN
>TXRX
>FLT_CONF error passive
>ERR_INT
>
>The next ones come the same without the acked TXWRN_INT.
>Reading the ESR again immediately after acking gives
>0x00000250, i.e.
>
>TX_WRN
>TXRX
>FLT_CONF error passive
>
>so everything ackable has actually been acked.
>
>I think that the problem is that the FlexCAN tries to retransmit
>the frame indefinitely. Each retry senses the bus in the invalid
>state (BIT1_ERR) and immediately fires a new ERR_INT. To verify
>this I aborted the transmitted frame in the error state in the
>interrupt handler
>
>#define FLEXCAN_ESR_ERR_TRANSMIT \
> (FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | FLEXCAN_ESR_ACK_ERR)
>
>if (reg_esr & FLEXCAN_ESR_ERR_TRANSMIT) {
>
> /* In case of a transmission error the packet is retried and
> * if the error persists, we will get another interrupt right
> * away. Abort the transmission - a lost packet is better than
> * an irq storm.
> */
> if(printk_ratelimit())
> netdev_err(dev, "Aborted transmission, ESR %08x\n", reg_esr);
>
> can_get_echo_skb(dev, 0);
> flexcan_write(FLEXCAN_MB_CNT_CODE(0x4),
> ®s->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
> netif_wake_queue(dev);
>}
>
>and the problem disappeared as expected. However, the correct
>way is probably to retry during some reasonable (configurable?)
>time interval.
>
>What puzzles me is that I did not found any other instance
>of this problem in the relevant mailing lists, only the original [2].
>
>I am using the 3.4.77 kernel with the realtime patches, but the
>code in the latest mainline looks the same in this respect.
>Maybe the realtime patches change some bevaviour, but I don't
>think they affect the core problem. I am not really an expert
>in the network devices, NAPI etc - maybe in that case the error
>interrupt should be disabled and re-enabled only if the
>error condition goes away? - I don't know...
>
>Please Cc: me when answering to the list.
>
>[1]
>http://www.tq-group.com/en/products/product-details/prod/embedded-modul-tqma28/extb/Main/productdetail/
>[2] https://gitorious.org/linux-can/wg-linux-can-next/commit/8ad94fa
>
>Thanks
>--
> Stano
>--
>To unsubscribe from this list: send the line "unsubscribe
>linux-rt-users" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 8:08 FlexCAN on i.MX28 interrupt flooding retrying send Stanislav Meduna
2014-03-07 8:16 ` Marc Kleine-Budde
2014-03-07 8:32 ` Matthias Klein
@ 2014-03-07 8:46 ` Marc Kleine-Budde
2014-03-07 13:36 ` Stanislav Meduna
2 siblings, 1 reply; 10+ messages in thread
From: Marc Kleine-Budde @ 2014-03-07 8:46 UTC (permalink / raw)
To: Stanislav Meduna, wg, linux-can, linux-kernel@vger.kernel.org,
linux-rt-users@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 3095 bytes --]
On 03/07/2014 09:08 AM, Stanislav Meduna wrote:
> Hi,
>
> I am using a FlexCAN CAN controller on a Freescale i.MX28 platform [1].
> If a packet is being sent when the bus is disconnected, I am getting
> an interrupt flooed that basically kills the machine.
>
> This is _not_ the same problem as [2] - my kernel already has
> the fix.
>
> The first interrupt comes with ESR 0x00028652, i.e.
>
> TXWRN_INT
> BIT1_ERR
> STF_ERR
> TX_WRN
> TXRX
> FLT_CONF error passive
> ERR_INT
>
> The next ones come the same without the acked TXWRN_INT.
> Reading the ESR again immediately after acking gives
> 0x00000250, i.e.
>
> TX_WRN
> TXRX
> FLT_CONF error passive
>
> so everything ackable has actually been acked.
>
> I think that the problem is that the FlexCAN tries to retransmit
> the frame indefinitely. Each retry senses the bus in the invalid
> state (BIT1_ERR) and immediately fires a new ERR_INT. To verify
> this I aborted the transmitted frame in the error state in the
> interrupt handler
>
> #define FLEXCAN_ESR_ERR_TRANSMIT \
> (FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | FLEXCAN_ESR_ACK_ERR)
>
> if (reg_esr & FLEXCAN_ESR_ERR_TRANSMIT) {
>
> /* In case of a transmission error the packet is retried and
> * if the error persists, we will get another interrupt right
> * away. Abort the transmission - a lost packet is better than
> * an irq storm.
> */
> if(printk_ratelimit())
> netdev_err(dev, "Aborted transmission, ESR %08x\n", reg_esr);
>
> can_get_echo_skb(dev, 0);
> flexcan_write(FLEXCAN_MB_CNT_CODE(0x4),
> ®s->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
> netif_wake_queue(dev);
> }
>
> and the problem disappeared as expected. However, the correct
> way is probably to retry during some reasonable (configurable?)
> time interval.
>
> What puzzles me is that I did not found any other instance
> of this problem in the relevant mailing lists, only the original [2].
>
> I am using the 3.4.77 kernel with the realtime patches, but the
> code in the latest mainline looks the same in this respect.
> Maybe the realtime patches change some bevaviour, but I don't
> think they affect the core problem. I am not really an expert
> in the network devices, NAPI etc - maybe in that case the error
> interrupt should be disabled and re-enabled only if the
> error condition goes away? - I don't know...
Your kernel is missing the patch:
e358784 can: flexcan: fix mx28 detection by rearanging OF match table
With this patch the CAN core properly detected as an mx28, so that bus
errors stay disabled (unless you enable them). If you need bus errors to
detect not connected CAN busses, you need another patchset berr_limit,
which is not yet mainline. I can repost it here, if you need it.
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 8:46 ` Marc Kleine-Budde
@ 2014-03-07 13:36 ` Stanislav Meduna
2014-03-07 13:54 ` Marc Kleine-Budde
2014-03-07 15:55 ` Wolfgang Grandegger
0 siblings, 2 replies; 10+ messages in thread
From: Stanislav Meduna @ 2014-03-07 13:36 UTC (permalink / raw)
To: Marc Kleine-Budde, wg, linux-can, linux-kernel@vger.kernel.org,
linux-rt-users@vger.kernel.org
On 07.03.2014 09:46, Marc Kleine-Budde wrote:
> Adding the linux-can mailinglist to Cc.
I am not subscribed so maybe that's why the original mail
did not get through - I did Cc: linux-can@vger.kernel.org
> Your kernel is missing the patch:
>
> e358784 can: flexcan: fix mx28 detection by rearanging OF match table
>
> With this patch the CAN core properly detected as an mx28, so that bus
> errors stay disabled (unless you enable them). If you need bus errors to
> detect not connected CAN busses, you need another patchset berr_limit,
> which is not yet mainline. I can repost it here, if you need it.
Ah ok.
Thank you, this probably points me to the right direction - I'll try
to implement this behaviour in my kernel (unfortunately
I cannot move to more recent one at the moment).
On 07.03.2014 09:40, Wolfgang Grandegger wrote:
> Maybe there is something wrong with you platform code or DTS file. What
> kernel are you using and how is the DTS can node defined in your DTS file?
This is a the 3.4.77 kernel with the realtime patches and without
the device tree, so these settings are missing and the patch does
not apply.
On 07.03.2014 09:32, Matthias Klein wrote:
> I made a similar observation on an i.MX537 with the 3.12.12-rt19
> kernel: I see the same interrupt flooed when the bus is
> disconnected.
>
> What do you mean with "kills the machine"? I have a high interrupt
> load, but the machine is still responsive.
In my case the ssh connection became hung or updated once per several
seconds etc. In one case it was even necessary to ifconfig down/up
the ethernet interface (NAPI overload? - no idea). The exact behaviour
might be related to the realtime patches - we need guaranteed response
times and runaway interrupt processing hogging the CPU at the realtime
priority is a problem.
Many thanks
--
Stano
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 13:36 ` Stanislav Meduna
@ 2014-03-07 13:54 ` Marc Kleine-Budde
[not found] ` <OFA84C0F6D.092C777B-ON86257C94.004E3F0B-86257C94.004F1F2D@notes.cat.com>
2014-03-07 15:55 ` Wolfgang Grandegger
1 sibling, 1 reply; 10+ messages in thread
From: Marc Kleine-Budde @ 2014-03-07 13:54 UTC (permalink / raw)
To: Stanislav Meduna, wg, linux-can, linux-kernel@vger.kernel.org,
linux-rt-users@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]
On 03/07/2014 02:36 PM, Stanislav Meduna wrote:
>> Adding the linux-can mailinglist to Cc.
>
> I am not subscribed so maybe that's why the original mail
> did not get through - I did Cc: linux-can@vger.kernel.org
My bad, linux-can was on Cc, but it arrived on linux-can late.
>> Your kernel is missing the patch:
>>
>> e358784 can: flexcan: fix mx28 detection by rearanging OF match table
>>
>> With this patch the CAN core properly detected as an mx28, so that bus
>> errors stay disabled (unless you enable them). If you need bus errors to
>> detect not connected CAN busses, you need another patchset berr_limit,
>> which is not yet mainline. I can repost it here, if you need it.
>
> Ah ok.
>
> Thank you, this probably points me to the right direction - I'll try
> to implement this behaviour in my kernel (unfortunately
> I cannot move to more recent one at the moment).
The flexcan driver in you kernel doesn't have the improvements for mx28,
i.e. that the bus error is not needed on that CAN core. I think for non
DT we never added that support.
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: FlexCAN on i.MX28 interrupt flooding retrying send
2014-03-07 13:36 ` Stanislav Meduna
2014-03-07 13:54 ` Marc Kleine-Budde
@ 2014-03-07 15:55 ` Wolfgang Grandegger
1 sibling, 0 replies; 10+ messages in thread
From: Wolfgang Grandegger @ 2014-03-07 15:55 UTC (permalink / raw)
To: Stanislav Meduna, Marc Kleine-Budde, linux-can,
linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org
On 03/07/2014 02:36 PM, Stanislav Meduna wrote:
> On 07.03.2014 09:46, Marc Kleine-Budde wrote:
>
>> Adding the linux-can mailinglist to Cc.
>
> I am not subscribed so maybe that's why the original mail
> did not get through - I did Cc: linux-can@vger.kernel.org
>
>> Your kernel is missing the patch:
>>
>> e358784 can: flexcan: fix mx28 detection by rearanging OF match table
>>
>> With this patch the CAN core properly detected as an mx28, so that bus
>> errors stay disabled (unless you enable them). If you need bus errors to
>> detect not connected CAN busses, you need another patchset berr_limit,
>> which is not yet mainline. I can repost it here, if you need it.
>
> Ah ok.
>
> Thank you, this probably points me to the right direction - I'll try
> to implement this behaviour in my kernel (unfortunately
> I cannot move to more recent one at the moment).
The following hack should fix the problem:
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index eb4014a..a6be018 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -727,7 +727,16 @@ static int flexcan_chip_start(struct net_device *dev)
reg_ctrl = flexcan_read(®s->ctrl);
reg_ctrl &= ~FLEXCAN_CTRL_TSYN;
reg_ctrl |= FLEXCAN_CTRL_BOFF_REC | FLEXCAN_CTRL_LBUF |
- FLEXCAN_CTRL_ERR_STATE | FLEXCAN_CTRL_ERR_MSK;
+ FLEXCAN_CTRL_ERR_STATE;
+
+ /*
+ * Quick and dirty hack to enable the "error interrupt"
+ * (FLEXCAN_CTRL_ERR_MSK) for the i.MX8. Warning: this
+ * does not work on most other Flexcan cores. There, we
+ * then don't get any error warning or passive interrupts.
+ */
+ if (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
+ reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
/* save for later use */
priv->reg_ctrl_default = reg_ctrl;
Anyway, you should check if there are other important improvement and
fixes a pending.
Wolfgang.
^ permalink raw reply related [flat|nested] 10+ messages in thread