Problem using Linux CAN

linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Problem using Linux CAN
@ 2015-07-22 10:49 Tim Hotfilter
  2015-07-22 17:41 ` Oliver Hartkopp
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-22 10:49 UTC (permalink / raw)
  To: linux-can

Hi,

I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can. 
Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available. The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable. 
After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows.

Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout?

Thank you in advance!

Best regards 
Tim Hotfilter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
  2015-07-22 10:49 Problem using Linux CAN Tim Hotfilter
@ 2015-07-22 17:41 ` Oliver Hartkopp
  2015-07-22 17:52   ` Tim Hotfilter
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-22 17:41 UTC (permalink / raw)
  To: Tim Hotfilter, linux-can

On 22.07.2015 12:49, Tim Hotfilter wrote:

> I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can.
> Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available.

How did you get this error?
What tools/kernel/distro are you using?

Did you try 'cangen' from https://github.com/linux-can/can-utils ?

Regards,
Oliver

> The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable.
> After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows.
>
> Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout?
>
> Thank you in advance!
>
> Best regards
> Tim Hotfilter
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
  2015-07-22 17:41 ` Oliver Hartkopp
@ 2015-07-22 17:52   ` Tim Hotfilter
  2015-07-22 18:39     ` Oliver Hartkopp
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-22 17:52 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org

Hi Oliver,

Thank you for you quick answer. 
I am using a kernel image from Xilinx git hub and cangen to generate static load. I get this error as result in cangen after a while. 

Thank you

Regards,
Tim Hotfilter 

>> On 22 Jul 2015, at 19:41, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> 
>> On 22.07.2015 12:49, Tim Hotfilter wrote:
>> 
>> I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can.
>> Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available.
> 
> How did you get this error?
> What tools/kernel/distro are you using?
> 
> Did you try 'cangen' from https://github.com/linux-can/can-utils ?
> 
> Regards,
> Oliver
> 
>> The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable.
>> After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows.
>> 
>> Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout?
>> 
>> Thank you in advance!
>> 
>> Best regards
>> Tim Hotfilter
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
  2015-07-22 17:52   ` Tim Hotfilter
@ 2015-07-22 18:39     ` Oliver Hartkopp
       [not found]       ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-22 18:39 UTC (permalink / raw)
  To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org

On 22.07.2015 19:52, Tim Hotfilter wrote:

> I am using a kernel image from Xilinx git hub

This is 4.0 then, right?

> and cangen to generate static load. I get this error as result in cangen after a while.

Can you send the output from

ip -details link show can0 (or whatever canX got stuck)
ip -stat link show can0 (or whatever canX got stuck)

just to see why the CAN interface gets offline.

Do you have a proper CAN setup (with CAN transceivers, real wires, 2x 120 ohms 
termination, etc) ?

Sending CAN frames only works when there's a second/different node with the 
same bitrate acknowledging the sent frame. Sending without cabling/termination 
always leads to CAN controllers that feel sad :-)

Regards,
Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
       [not found]       ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
@ 2015-07-23 13:13         ` Tim Hotfilter
  2015-07-23 17:26           ` Oliver Hartkopp
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-23 13:13 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org

Hi Oliver,

Kernel version is 3.19.

Here is the output of ip link show:
[root@mcu15 ~]# ip -s -d link show can1
3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 1000000 sample-point 0.700
          tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
          sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
          clock 50000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    54816      12451    0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    5279       108392   0       0       0       0

The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. Only the transmit path stops working.
My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. I can see incoming frames in CanOe and can also transmit frames. 

Regards,
Tim

> On 23 Jul 2015, at 15:12, Tim Hotfilter <thotfilter@osdr.org> wrote:
> 
> Hi Oliver,
> 
> Kernel version is 3.19.
> 
> Here is the output of ip link show:
> [root@mcu15 ~]# ip -s -d link show can1
> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
>     link/can  promiscuity 0
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>           bitrate 1000000 sample-point 0.700
>           tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
>           sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>           clock 50000000
>           re-started bus-errors arbit-lost error-warn error-pass bus-off
>           0          0          0          0          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     54816      12451    0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     5279       108392   0       0       0       0
> 
> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. Only the transmit path stops working.
> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. I can see incoming frames in CanOe and can also transmit frames. 
> 
> Regards,
> Tim
> 
> 
>> On 22 Jul 2015, at 20:39, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> 
>> On 22.07.2015 19:52, Tim Hotfilter wrote:
>> 
>>> I am using a kernel image from Xilinx git hub
>> 
>> This is 4.0 then, right?
>> 
>>> and cangen to generate static load. I get this error as result in cangen after a while.
>> 
>> Can you send the output from
>> 
>> ip -details link show can0 (or whatever canX got stuck)
>> ip -stat link show can0 (or whatever canX got stuck)
>> 
>> just to see why the CAN interface gets offline.
>> 
>> Do you have a proper CAN setup (with CAN transceivers, real wires, 2x 120 ohms termination, etc) ?
>> 
>> Sending CAN frames only works when there's a second/different node with the same bitrate acknowledging the sent frame. Sending without cabling/termination always leads to CAN controllers that feel sad :-)
>> 
>> Regards,
>> Oliver
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
  2015-07-23 13:13         ` Tim Hotfilter
@ 2015-07-23 17:26           ` Oliver Hartkopp
  2015-07-24  7:53             ` Tim Hotfilter
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-23 17:26 UTC (permalink / raw)
  To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org

Hi Tim,

On 23.07.2015 15:13, Tim Hotfilter wrote:
> Kernel version is 3.19.

ok

> Here is the output of ip link show:
> [root@mcu15 ~]# ip -s -d link show can1
> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
>      link/can  promiscuity 0
>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>            bitrate 1000000 sample-point 0.700
>            tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
>            sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>            clock 50000000
>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>            0          0          0          0          0          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      54816      12451    0       0       0       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      5279       108392   0       0       0       0
>
> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump.

ok.

> Only the transmit path stops working.

Hm :-(

> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors.
 > I can see incoming frames in CanOe and can also transmit frames.

Ah - you are using CANoe with CANcase ...

So here are some ideas.

1. Try it with 500kbit/s with a sampling-point of 80% on both sides.

2. I had a similar CANoe problem in the past. Please try to add *another* CAN 
node which can ACK your sent frames. E.g. connect one of the other CAN 
interfaces of your Xilinx board to the existing bus.

3. The error state handling has changed in 3.19. For a test you might revert 
this commit: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=215db1856e8313ef8a1d9b64346dc261570012a6

Regards,
Oliver


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
  2015-07-23 17:26           ` Oliver Hartkopp
@ 2015-07-24  7:53             ` Tim Hotfilter
  2015-07-25 13:45               ` Oliver Hartkopp
  0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-24  7:53 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org

Hi Oliver,

Thanks for your ideas. 
1. 500kBaud results in the same problem. 
2. The problem also occurs with the final configuration (about 5 can nodes). As i mentioned in the first mail, one node has a can-frame watchdog and turns off. 
3. I had kernel 3.17. before. For testing I upgraded to 3.19

I think the problem is the stopped tx queue. Is there any chance to implement something like a timeout function, which wakes the queue if the buffer fill level is over a threshold.

Regards 
Tim

> On 23 Jul 2015, at 19:26, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
> 
> Hi Tim,
> 
> On 23.07.2015 15:13, Tim Hotfilter wrote:
>> Kernel version is 3.19.
> 
> ok
> 
>> Here is the output of ip link show:
>> [root@mcu15 ~]# ip -s -d link show can1
>> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
>>     link/can  promiscuity 0
>>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>           bitrate 1000000 sample-point 0.700
>>           tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
>>           sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>           clock 50000000
>>           re-started bus-errors arbit-lost error-warn error-pass bus-off
>>           0          0          0          0          0          0
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     54816      12451    0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     5279       108392   0       0       0       0
>> 
>> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump.
> 
> ok.
> 
>> Only the transmit path stops working.
> 
> Hm :-(
> 
>> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors.
> > I can see incoming frames in CanOe and can also transmit frames.
> 
> Ah - you are using CANoe with CANcase ...
> 
> So here are some ideas.
> 
> 1. Try it with 500kbit/s with a sampling-point of 80% on both sides.
> 
> 2. I had a similar CANoe problem in the past. Please try to add *another* CAN node which can ACK your sent frames. E.g. connect one of the other CAN interfaces of your Xilinx board to the existing bus.
> 
> 3. The error state handling has changed in 3.19. For a test you might revert this commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=215db1856e8313ef8a1d9b64346dc261570012a6
> 
> Regards,
> Oliver


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Problem using Linux CAN
  2015-07-24  7:53             ` Tim Hotfilter
@ 2015-07-25 13:45               ` Oliver Hartkopp
  0 siblings, 0 replies; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-25 13:45 UTC (permalink / raw)
  To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org

Hi Tim,

On 24.07.2015 09:53, Tim Hotfilter wrote:

> 1. 500kBaud results in the same problem.
> 2. The problem also occurs with the final configuration (about 5 can nodes). As i mentioned in the first mail, one node has a can-frame watchdog and turns off.
> 3. I had kernel 3.17. before. For testing I upgraded to 3.19

Hm.

> I think the problem is the stopped tx queue.

How did you get this information that the queue is stopped?

The sja1000 queue is stopped in sja1000_start_xmit() and is enabled in 
sja1000_interrupt() again when the tx-ok interrupt (IRQ_TI) occurred.

So when the sja1000 has a stopped queue after some successful time of 
operation the tx-ok interrupt obviously got lost.

As you have two sja1000 cores (from opencores?) in you FPGA: Do you have them 
connected to separate irq lines?

> Is there any chance to implement something like a timeout function, which wakes the queue if the buffer fill level is over a threshold.

We should better try to fix the real issue than implementing workarounds like 
this. There is a timeout recovery for bus-off states (restart-ms option) - but 
your device does not get into bus-off.

Regards,
Oliver

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-25 13:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-22 10:49 Problem using Linux CAN Tim Hotfilter
2015-07-22 17:41 ` Oliver Hartkopp
2015-07-22 17:52   ` Tim Hotfilter
2015-07-22 18:39     ` Oliver Hartkopp
     [not found]       ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
2015-07-23 13:13         ` Tim Hotfilter
2015-07-23 17:26           ` Oliver Hartkopp
2015-07-24  7:53             ` Tim Hotfilter
2015-07-25 13:45               ` Oliver Hartkopp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).