* Problem using Linux CAN
@ 2015-07-22 10:49 Tim Hotfilter
2015-07-22 17:41 ` Oliver Hartkopp
0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-22 10:49 UTC (permalink / raw)
To: linux-can
Hi,
I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can.
Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available. The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable.
After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows.
Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout?
Thank you in advance!
Best regards
Tim Hotfilter
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
2015-07-22 10:49 Problem using Linux CAN Tim Hotfilter
@ 2015-07-22 17:41 ` Oliver Hartkopp
2015-07-22 17:52 ` Tim Hotfilter
0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-22 17:41 UTC (permalink / raw)
To: Tim Hotfilter, linux-can
On 22.07.2015 12:49, Tim Hotfilter wrote:
> I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can.
> Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available.
How did you get this error?
What tools/kernel/distro are you using?
Did you try 'cangen' from https://github.com/linux-can/can-utils ?
Regards,
Oliver
> The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable.
> After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows.
>
> Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout?
>
> Thank you in advance!
>
> Best regards
> Tim Hotfilter
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
2015-07-22 17:41 ` Oliver Hartkopp
@ 2015-07-22 17:52 ` Tim Hotfilter
2015-07-22 18:39 ` Oliver Hartkopp
0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-22 17:52 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org
Hi Oliver,
Thank you for you quick answer.
I am using a kernel image from Xilinx git hub and cangen to generate static load. I get this error as result in cangen after a while.
Thank you
Regards,
Tim Hotfilter
>> On 22 Jul 2015, at 19:41, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>>
>> On 22.07.2015 12:49, Tim Hotfilter wrote:
>>
>> I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can.
>> Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available.
>
> How did you get this error?
> What tools/kernel/distro are you using?
>
> Did you try 'cangen' from https://github.com/linux-can/can-utils ?
>
> Regards,
> Oliver
>
>> The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable.
>> After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows.
>>
>> Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout?
>>
>> Thank you in advance!
>>
>> Best regards
>> Tim Hotfilter
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
2015-07-22 17:52 ` Tim Hotfilter
@ 2015-07-22 18:39 ` Oliver Hartkopp
[not found] ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-22 18:39 UTC (permalink / raw)
To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org
On 22.07.2015 19:52, Tim Hotfilter wrote:
> I am using a kernel image from Xilinx git hub
This is 4.0 then, right?
> and cangen to generate static load. I get this error as result in cangen after a while.
Can you send the output from
ip -details link show can0 (or whatever canX got stuck)
ip -stat link show can0 (or whatever canX got stuck)
just to see why the CAN interface gets offline.
Do you have a proper CAN setup (with CAN transceivers, real wires, 2x 120 ohms
termination, etc) ?
Sending CAN frames only works when there's a second/different node with the
same bitrate acknowledging the sent frame. Sending without cabling/termination
always leads to CAN controllers that feel sad :-)
Regards,
Oliver
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
[not found] ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
@ 2015-07-23 13:13 ` Tim Hotfilter
2015-07-23 17:26 ` Oliver Hartkopp
0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-23 13:13 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org
Hi Oliver,
Kernel version is 3.19.
Here is the output of ip link show:
[root@mcu15 ~]# ip -s -d link show can1
3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.700
tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
clock 50000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0
RX: bytes packets errors dropped overrun mcast
54816 12451 0 0 0 0
TX: bytes packets errors dropped carrier collsns
5279 108392 0 0 0 0
The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. Only the transmit path stops working.
My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. I can see incoming frames in CanOe and can also transmit frames.
Regards,
Tim
> On 23 Jul 2015, at 15:12, Tim Hotfilter <thotfilter@osdr.org> wrote:
>
> Hi Oliver,
>
> Kernel version is 3.19.
>
> Here is the output of ip link show:
> [root@mcu15 ~]# ip -s -d link show can1
> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
> link/can promiscuity 0
> can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> bitrate 1000000 sample-point 0.700
> tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
> sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 50000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 0 0 0 0 0
> RX: bytes packets errors dropped overrun mcast
> 54816 12451 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 5279 108392 0 0 0 0
>
> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. Only the transmit path stops working.
> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. I can see incoming frames in CanOe and can also transmit frames.
>
> Regards,
> Tim
>
>
>> On 22 Jul 2015, at 20:39, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>>
>> On 22.07.2015 19:52, Tim Hotfilter wrote:
>>
>>> I am using a kernel image from Xilinx git hub
>>
>> This is 4.0 then, right?
>>
>>> and cangen to generate static load. I get this error as result in cangen after a while.
>>
>> Can you send the output from
>>
>> ip -details link show can0 (or whatever canX got stuck)
>> ip -stat link show can0 (or whatever canX got stuck)
>>
>> just to see why the CAN interface gets offline.
>>
>> Do you have a proper CAN setup (with CAN transceivers, real wires, 2x 120 ohms termination, etc) ?
>>
>> Sending CAN frames only works when there's a second/different node with the same bitrate acknowledging the sent frame. Sending without cabling/termination always leads to CAN controllers that feel sad :-)
>>
>> Regards,
>> Oliver
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
2015-07-23 13:13 ` Tim Hotfilter
@ 2015-07-23 17:26 ` Oliver Hartkopp
2015-07-24 7:53 ` Tim Hotfilter
0 siblings, 1 reply; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-23 17:26 UTC (permalink / raw)
To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org
Hi Tim,
On 23.07.2015 15:13, Tim Hotfilter wrote:
> Kernel version is 3.19.
ok
> Here is the output of ip link show:
> [root@mcu15 ~]# ip -s -d link show can1
> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
> link/can promiscuity 0
> can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> bitrate 1000000 sample-point 0.700
> tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
> sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 50000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 0 0 0 0 0
> RX: bytes packets errors dropped overrun mcast
> 54816 12451 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 5279 108392 0 0 0 0
>
> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump.
ok.
> Only the transmit path stops working.
Hm :-(
> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors.
> I can see incoming frames in CanOe and can also transmit frames.
Ah - you are using CANoe with CANcase ...
So here are some ideas.
1. Try it with 500kbit/s with a sampling-point of 80% on both sides.
2. I had a similar CANoe problem in the past. Please try to add *another* CAN
node which can ACK your sent frames. E.g. connect one of the other CAN
interfaces of your Xilinx board to the existing bus.
3. The error state handling has changed in 3.19. For a test you might revert
this commit:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=215db1856e8313ef8a1d9b64346dc261570012a6
Regards,
Oliver
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
2015-07-23 17:26 ` Oliver Hartkopp
@ 2015-07-24 7:53 ` Tim Hotfilter
2015-07-25 13:45 ` Oliver Hartkopp
0 siblings, 1 reply; 8+ messages in thread
From: Tim Hotfilter @ 2015-07-24 7:53 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org
Hi Oliver,
Thanks for your ideas.
1. 500kBaud results in the same problem.
2. The problem also occurs with the final configuration (about 5 can nodes). As i mentioned in the first mail, one node has a can-frame watchdog and turns off.
3. I had kernel 3.17. before. For testing I upgraded to 3.19
I think the problem is the stopped tx queue. Is there any chance to implement something like a timeout function, which wakes the queue if the buffer fill level is over a threshold.
Regards
Tim
> On 23 Jul 2015, at 19:26, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
> Hi Tim,
>
> On 23.07.2015 15:13, Tim Hotfilter wrote:
>> Kernel version is 3.19.
>
> ok
>
>> Here is the output of ip link show:
>> [root@mcu15 ~]# ip -s -d link show can1
>> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128
>> link/can promiscuity 0
>> can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>> bitrate 1000000 sample-point 0.700
>> tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1
>> sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 50000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 0 0 0 0 0
>> RX: bytes packets errors dropped overrun mcast
>> 54816 12451 0 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 5279 108392 0 0 0 0
>>
>> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump.
>
> ok.
>
>> Only the transmit path stops working.
>
> Hm :-(
>
>> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors.
> > I can see incoming frames in CanOe and can also transmit frames.
>
> Ah - you are using CANoe with CANcase ...
>
> So here are some ideas.
>
> 1. Try it with 500kbit/s with a sampling-point of 80% on both sides.
>
> 2. I had a similar CANoe problem in the past. Please try to add *another* CAN node which can ACK your sent frames. E.g. connect one of the other CAN interfaces of your Xilinx board to the existing bus.
>
> 3. The error state handling has changed in 3.19. For a test you might revert this commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=215db1856e8313ef8a1d9b64346dc261570012a6
>
> Regards,
> Oliver
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN
2015-07-24 7:53 ` Tim Hotfilter
@ 2015-07-25 13:45 ` Oliver Hartkopp
0 siblings, 0 replies; 8+ messages in thread
From: Oliver Hartkopp @ 2015-07-25 13:45 UTC (permalink / raw)
To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org
Hi Tim,
On 24.07.2015 09:53, Tim Hotfilter wrote:
> 1. 500kBaud results in the same problem.
> 2. The problem also occurs with the final configuration (about 5 can nodes). As i mentioned in the first mail, one node has a can-frame watchdog and turns off.
> 3. I had kernel 3.17. before. For testing I upgraded to 3.19
Hm.
> I think the problem is the stopped tx queue.
How did you get this information that the queue is stopped?
The sja1000 queue is stopped in sja1000_start_xmit() and is enabled in
sja1000_interrupt() again when the tx-ok interrupt (IRQ_TI) occurred.
So when the sja1000 has a stopped queue after some successful time of
operation the tx-ok interrupt obviously got lost.
As you have two sja1000 cores (from opencores?) in you FPGA: Do you have them
connected to separate irq lines?
> Is there any chance to implement something like a timeout function, which wakes the queue if the buffer fill level is over a threshold.
We should better try to fix the real issue than implementing workarounds like
this. There is a timeout recovery for bus-off states (restart-ms option) - but
your device does not get into bus-off.
Regards,
Oliver
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-07-25 13:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-22 10:49 Problem using Linux CAN Tim Hotfilter
2015-07-22 17:41 ` Oliver Hartkopp
2015-07-22 17:52 ` Tim Hotfilter
2015-07-22 18:39 ` Oliver Hartkopp
[not found] ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
2015-07-23 13:13 ` Tim Hotfilter
2015-07-23 17:26 ` Oliver Hartkopp
2015-07-24 7:53 ` Tim Hotfilter
2015-07-25 13:45 ` Oliver Hartkopp
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).