* Problem using Linux CAN @ 2015-07-22 10:49 Tim Hotfilter 2015-07-22 17:41 ` Oliver Hartkopp 0 siblings, 1 reply; 8+ messages in thread From: Tim Hotfilter @ 2015-07-22 10:49 UTC (permalink / raw) To: linux-can Hi, I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can. Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available. The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable. After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows. Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout? Thank you in advance! Best regards Tim Hotfilter ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN 2015-07-22 10:49 Problem using Linux CAN Tim Hotfilter @ 2015-07-22 17:41 ` Oliver Hartkopp 2015-07-22 17:52 ` Tim Hotfilter 0 siblings, 1 reply; 8+ messages in thread From: Oliver Hartkopp @ 2015-07-22 17:41 UTC (permalink / raw) To: Tim Hotfilter, linux-can On 22.07.2015 12:49, Tim Hotfilter wrote: > I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can. > Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available. How did you get this error? What tools/kernel/distro are you using? Did you try 'cangen' from https://github.com/linux-can/can-utils ? Regards, Oliver > The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable. > After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows. > > Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout? > > Thank you in advance! > > Best regards > Tim Hotfilter > > -- > To unsubscribe from this list: send the line "unsubscribe linux-can" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN 2015-07-22 17:41 ` Oliver Hartkopp @ 2015-07-22 17:52 ` Tim Hotfilter 2015-07-22 18:39 ` Oliver Hartkopp 0 siblings, 1 reply; 8+ messages in thread From: Tim Hotfilter @ 2015-07-22 17:52 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org Hi Oliver, Thank you for you quick answer. I am using a kernel image from Xilinx git hub and cangen to generate static load. I get this error as result in cangen after a while. Thank you Regards, Tim Hotfilter >> On 22 Jul 2015, at 19:41, Oliver Hartkopp <socketcan@hartkopp.net> wrote: >> >> On 22.07.2015 12:49, Tim Hotfilter wrote: >> >> I am developing a control unit using four CAN Controllers based on the Xilinx Zynq 7000. Two CAN Controllers (based on the sja1000) are implemented in the Zynq’s FPGA and two are already integrated. The control unit runs Linux 3.19 with socket can. >> Under higher CAN busload (30% or more) socket CAN returns Error 105: No buffer space available. > > How did you get this error? > What tools/kernel/distro are you using? > > Did you try 'cangen' from https://github.com/linux-can/can-utils ? > > Regards, > Oliver > >> The problem is independent from the hardware, it occurs on both sja1000 and xilinx can. Restarting the network device by bringing it down and up enables to send again. But since there are other ECUs using watchdogs this solution is not suitable. >> After some kernel debugging it exposes that somehow a transmit interrupt gets lost. This results in a stopped tx queue and the transmit buffer overflows. >> >> Is there a possibility to handle with the lost interrupt, like waking up the queue after a short timeout? >> >> Thank you in advance! >> >> Best regards >> Tim Hotfilter >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-can" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-can" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN 2015-07-22 17:52 ` Tim Hotfilter @ 2015-07-22 18:39 ` Oliver Hartkopp [not found] ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org> 0 siblings, 1 reply; 8+ messages in thread From: Oliver Hartkopp @ 2015-07-22 18:39 UTC (permalink / raw) To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org On 22.07.2015 19:52, Tim Hotfilter wrote: > I am using a kernel image from Xilinx git hub This is 4.0 then, right? > and cangen to generate static load. I get this error as result in cangen after a while. Can you send the output from ip -details link show can0 (or whatever canX got stuck) ip -stat link show can0 (or whatever canX got stuck) just to see why the CAN interface gets offline. Do you have a proper CAN setup (with CAN transceivers, real wires, 2x 120 ohms termination, etc) ? Sending CAN frames only works when there's a second/different node with the same bitrate acknowledging the sent frame. Sending without cabling/termination always leads to CAN controllers that feel sad :-) Regards, Oliver ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>]
* Re: Problem using Linux CAN [not found] ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org> @ 2015-07-23 13:13 ` Tim Hotfilter 2015-07-23 17:26 ` Oliver Hartkopp 0 siblings, 1 reply; 8+ messages in thread From: Tim Hotfilter @ 2015-07-23 13:13 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org Hi Oliver, Kernel version is 3.19. Here is the output of ip link show: [root@mcu15 ~]# ip -s -d link show can1 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128 link/can promiscuity 0 can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.700 tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1 sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 clock 50000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 RX: bytes packets errors dropped overrun mcast 54816 12451 0 0 0 0 TX: bytes packets errors dropped carrier collsns 5279 108392 0 0 0 0 The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. Only the transmit path stops working. My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. I can see incoming frames in CanOe and can also transmit frames. Regards, Tim > On 23 Jul 2015, at 15:12, Tim Hotfilter <thotfilter@osdr.org> wrote: > > Hi Oliver, > > Kernel version is 3.19. > > Here is the output of ip link show: > [root@mcu15 ~]# ip -s -d link show can1 > 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128 > link/can promiscuity 0 > can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 > bitrate 1000000 sample-point 0.700 > tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1 > sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 > clock 50000000 > re-started bus-errors arbit-lost error-warn error-pass bus-off > 0 0 0 0 0 0 > RX: bytes packets errors dropped overrun mcast > 54816 12451 0 0 0 0 > TX: bytes packets errors dropped carrier collsns > 5279 108392 0 0 0 0 > > The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. Only the transmit path stops working. > My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. I can see incoming frames in CanOe and can also transmit frames. > > Regards, > Tim > > >> On 22 Jul 2015, at 20:39, Oliver Hartkopp <socketcan@hartkopp.net> wrote: >> >> On 22.07.2015 19:52, Tim Hotfilter wrote: >> >>> I am using a kernel image from Xilinx git hub >> >> This is 4.0 then, right? >> >>> and cangen to generate static load. I get this error as result in cangen after a while. >> >> Can you send the output from >> >> ip -details link show can0 (or whatever canX got stuck) >> ip -stat link show can0 (or whatever canX got stuck) >> >> just to see why the CAN interface gets offline. >> >> Do you have a proper CAN setup (with CAN transceivers, real wires, 2x 120 ohms termination, etc) ? >> >> Sending CAN frames only works when there's a second/different node with the same bitrate acknowledging the sent frame. Sending without cabling/termination always leads to CAN controllers that feel sad :-) >> >> Regards, >> Oliver >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-can" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN 2015-07-23 13:13 ` Tim Hotfilter @ 2015-07-23 17:26 ` Oliver Hartkopp 2015-07-24 7:53 ` Tim Hotfilter 0 siblings, 1 reply; 8+ messages in thread From: Oliver Hartkopp @ 2015-07-23 17:26 UTC (permalink / raw) To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org Hi Tim, On 23.07.2015 15:13, Tim Hotfilter wrote: > Kernel version is 3.19. ok > Here is the output of ip link show: > [root@mcu15 ~]# ip -s -d link show can1 > 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128 > link/can promiscuity 0 > can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 > bitrate 1000000 sample-point 0.700 > tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1 > sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 > clock 50000000 > re-started bus-errors arbit-lost error-warn error-pass bus-off > 0 0 0 0 0 0 > RX: bytes packets errors dropped overrun mcast > 54816 12451 0 0 0 0 > TX: bytes packets errors dropped carrier collsns > 5279 108392 0 0 0 0 > > The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. ok. > Only the transmit path stops working. Hm :-( > My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. > I can see incoming frames in CanOe and can also transmit frames. Ah - you are using CANoe with CANcase ... So here are some ideas. 1. Try it with 500kbit/s with a sampling-point of 80% on both sides. 2. I had a similar CANoe problem in the past. Please try to add *another* CAN node which can ACK your sent frames. E.g. connect one of the other CAN interfaces of your Xilinx board to the existing bus. 3. The error state handling has changed in 3.19. For a test you might revert this commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=215db1856e8313ef8a1d9b64346dc261570012a6 Regards, Oliver ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN 2015-07-23 17:26 ` Oliver Hartkopp @ 2015-07-24 7:53 ` Tim Hotfilter 2015-07-25 13:45 ` Oliver Hartkopp 0 siblings, 1 reply; 8+ messages in thread From: Tim Hotfilter @ 2015-07-24 7:53 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can@vger.kernel.org Hi Oliver, Thanks for your ideas. 1. 500kBaud results in the same problem. 2. The problem also occurs with the final configuration (about 5 can nodes). As i mentioned in the first mail, one node has a can-frame watchdog and turns off. 3. I had kernel 3.17. before. For testing I upgraded to 3.19 I think the problem is the stopped tx queue. Is there any chance to implement something like a timeout function, which wakes the queue if the buffer fill level is over a threshold. Regards Tim > On 23 Jul 2015, at 19:26, Oliver Hartkopp <socketcan@hartkopp.net> wrote: > > Hi Tim, > > On 23.07.2015 15:13, Tim Hotfilter wrote: >> Kernel version is 3.19. > > ok > >> Here is the output of ip link show: >> [root@mcu15 ~]# ip -s -d link show can1 >> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 128 >> link/can promiscuity 0 >> can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 >> bitrate 1000000 sample-point 0.700 >> tq 100 prop-seg 3 phase-seg1 3 phase-seg2 3 sjw 1 >> sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 >> clock 50000000 >> re-started bus-errors arbit-lost error-warn error-pass bus-off >> 0 0 0 0 0 0 >> RX: bytes packets errors dropped overrun mcast >> 54816 12451 0 0 0 0 >> TX: bytes packets errors dropped carrier collsns >> 5279 108392 0 0 0 0 >> >> The Inferface does not really go offline. This interface is also not in an error state. Receive still works: I can see incoming can frames via candump. > > ok. > >> Only the transmit path stops working. > > Hm :-( > >> My hardware configuration is quite simple: The control unit is directly connected to a vector can case with termination resistors. > > I can see incoming frames in CanOe and can also transmit frames. > > Ah - you are using CANoe with CANcase ... > > So here are some ideas. > > 1. Try it with 500kbit/s with a sampling-point of 80% on both sides. > > 2. I had a similar CANoe problem in the past. Please try to add *another* CAN node which can ACK your sent frames. E.g. connect one of the other CAN interfaces of your Xilinx board to the existing bus. > > 3. The error state handling has changed in 3.19. For a test you might revert this commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=215db1856e8313ef8a1d9b64346dc261570012a6 > > Regards, > Oliver ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Problem using Linux CAN 2015-07-24 7:53 ` Tim Hotfilter @ 2015-07-25 13:45 ` Oliver Hartkopp 0 siblings, 0 replies; 8+ messages in thread From: Oliver Hartkopp @ 2015-07-25 13:45 UTC (permalink / raw) To: Tim Hotfilter; +Cc: linux-can@vger.kernel.org Hi Tim, On 24.07.2015 09:53, Tim Hotfilter wrote: > 1. 500kBaud results in the same problem. > 2. The problem also occurs with the final configuration (about 5 can nodes). As i mentioned in the first mail, one node has a can-frame watchdog and turns off. > 3. I had kernel 3.17. before. For testing I upgraded to 3.19 Hm. > I think the problem is the stopped tx queue. How did you get this information that the queue is stopped? The sja1000 queue is stopped in sja1000_start_xmit() and is enabled in sja1000_interrupt() again when the tx-ok interrupt (IRQ_TI) occurred. So when the sja1000 has a stopped queue after some successful time of operation the tx-ok interrupt obviously got lost. As you have two sja1000 cores (from opencores?) in you FPGA: Do you have them connected to separate irq lines? > Is there any chance to implement something like a timeout function, which wakes the queue if the buffer fill level is over a threshold. We should better try to fix the real issue than implementing workarounds like this. There is a timeout recovery for bus-off states (restart-ms option) - but your device does not get into bus-off. Regards, Oliver ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-07-25 13:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-22 10:49 Problem using Linux CAN Tim Hotfilter
2015-07-22 17:41 ` Oliver Hartkopp
2015-07-22 17:52 ` Tim Hotfilter
2015-07-22 18:39 ` Oliver Hartkopp
[not found] ` <0948A186-060F-4A31-8359-755DE78647A0@osdr.org>
2015-07-23 13:13 ` Tim Hotfilter
2015-07-23 17:26 ` Oliver Hartkopp
2015-07-24 7:53 ` Tim Hotfilter
2015-07-25 13:45 ` Oliver Hartkopp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).