From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wolfgang Grandegger Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19) Date: Sat, 26 Oct 2013 21:40:00 +0200 Message-ID: <526C1A90.4050005@grandegger.com> References: <526A6B28.4040800@kkmicro.cz> <526AB12C.7090900@grandegger.com> <526C0768.8040903@kkmicro.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from ngcobalt02.manitu.net ([217.11.48.102]:35342 "EHLO ngcobalt02.manitu.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753120Ab3JZTkF (ORCPT ); Sat, 26 Oct 2013 15:40:05 -0400 In-Reply-To: <526C0768.8040903@kkmicro.cz> Sender: linux-can-owner@vger.kernel.org List-ID: To: Martin Kozusky , linux-can@vger.kernel.org On 10/26/2013 08:18 PM, Martin Kozusky wrote: > Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a): >> Hi Martin, >> >> On 10/25/2013 02:59 PM, Martin Kozusky wrote: >>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a): >>>> Hello, >>>> after more than year I'm back with CAN fifo overrun problems on i.MX25 >>>> board. >>>> (it was good enough earlier, if some frames were lost, but not this >>>> time) >>>> >>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s >>>> (situation is a little better if I use just one iface, but I need both) >>>> I just configure them and then run: >>>> >>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test >>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done >>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals) >>>> >>>> and start sending data from another device. >>>> >>>> I am not running any other program (like candump etc) to read from CAN. >>>> >>>> this is what is shown after I finish sending 35777 packets (both >>>> interfaces now connected to same bus so they should receive same data) >>>> with ip -d -s link show can0/1 >>>> >>>> 2: can0: mtu 16 qdisc pfifo_fast state DOWN qlen 10 >>>> link/can >>>> can state STOPPED (berr-counter tx 0 rx 0) >>>> restart-ms 0 >>>> bitrate 250000 sample-point 0.857 >>>> tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1 >>>> flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1 >>>> clock 66500000 >>>> re-started bus-errors arbit-lost error-warn error-pass bus-off >>>> 0 0 0 1 1 0 >> >> Do you have electrical problems on the bus? Or is reaching error-passive >> not related to this problem? > It is not related to this problem - there is only RX pin connected on > can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is > doing TX) > > >>>> RX: bytes packets errors dropped overrun mcast >>>> 151769 19000 1699 0 1699 0 >>>> TX: bytes packets errors dropped carrier collsns >>>> 0 0 0 0 0 0 >>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1 >>>> 3: can1: mtu 16 qdisc pfifo_fast state DOWN qlen 10 >>>> link/can >>>> can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0 >>>> bitrate 250000 sample-point 0.857 >>>> tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1 >>>> flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1 >>>> clock 66500000 >>>> re-started bus-errors arbit-lost error-warn error-pass bus-off >>>> 0 0 0 0 0 0 >>>> RX: bytes packets errors dropped overrun mcast >>>> 157377 19696 2664 0 2664 0 >>>> TX: bytes packets errors dropped carrier collsns >>>> 0 0 0 0 0 0 >>>> >>>> >>>> With just one iface used: >>>> >>>> 2: can0: mtu 16 qdisc pfifo_fast state DOWN qlen 10 >>>> link/can >>>> can state STOPPED (berr-counter tx 0 rx 0) >>>> restart-ms 0 >>>> bitrate 250000 sample-point 0.857 >>>> tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1 >>>> flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1 >>>> clock 66500000 >>>> re-started bus-errors arbit-lost error-warn error-pass bus-off >>>> 0 0 0 1 1 0 >>>> RX: bytes packets errors dropped overrun mcast >>>> 233277 29201 1483 0 1483 0 >>>> TX: bytes packets errors dropped carrier collsns >>>> 0 0 0 0 0 0 >>>> >>>> >>>> Too many packets are lost. >>>> >>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that >>>> didn't hepl too much, if I put it too high then the system response >>>> was slow and packets still lost, also tried to change priority of CAN >>>> interrupts with (don't know if correctly) >>>> // imx_irq_set_priority(43,14); >>>> // imx_irq_set_priority(44,14); >>>> >>>> But it didn't help either. >>>> >>>> >>>> Does anybody have any idea how not to lose any packets? :) >>> >>> >>> Hello, >>> I tried to disable >>> //netif_receive_skb(skb); in flexcan_read_frame() and other functions >>> so that data is not processed further in system >> >> Well ... >> >>> It didn't help. >>> So I tried to put time_start=ktime_get_real() at the begining of >>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and >>> add their difference to the global variable >>> time_total+=time_stop-time_start; >>> I divided this time_total by rx_packets count at flexcan_chip_stop and >>> wrote with dev_info into log (variables were initialized in >>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset >>> those counters and write them to log), so now I had average time spent >>> int flexcan_read_frame. >>> This time it was around 100usec! just with one CAN used, if both were >>> connected, it was more than twice. And many CAN frames were lost. >>> >>> So I tried to disable >>> /* >>> skb = alloc_can_skb(dev, &cf); >>> if (unlikely(!skb)) { >>> stats->rx_dropped++; >>> return 0; >>> } >>> */ >>> and made "struct can_frame cf" (not pointer, so that I can use it in >>> flexcan_read_fifo call) >>> And tried to send data again. >>> Now - average time in flexcan_read_frame was not 100usec, but just 2 >>> usec! 50x less ... no CAN frame was lost, even if I was using both CAN >>> interfaces, each getting over 1100 msgs/sec and writing 100KB data to SD >>> card. >> >> ... but the messages need to be allocated, queued, delivered to and even >> processed by a user space task. What you messure it part of the network >> stack overhead but 100us just for alloc_can_skb() seems quite a lot to >> me. At what frequency is your CPU running? Is the system low of memory? >> Maybe your system is simply not fast enough. To see what code is >> involved just follow: > CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally > and free enough :( > >> >> http://lxr.free-electrons.com/ident?i=alloc_can_skb >>> So I am asking - how to make this alloc_can_skb faster (or is there any >>> alternative)? Or if there is another way how to get data to user? >> >> Well, not with Linux-CAN. Anyway, messages arrive at a rate of approx. 1 >> KHz. So there is 1ms per message. I think it's a latency problem in the >> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If >> the queue is full you loose messages. This obviously happens when the >> SDcard is accessed. >> >> Could you take function traces on your system? > Is there any special tool for this or should I use my start/stop timers? Your start/stop timers will not show what other activity is disturbing the CAN messages reception. There is the Linux function tracer: http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt It need to be enabled in the kernel. Especially event and function tracing could help to better understand your problems. Wolfgang.