From: Martin Kozusky <mkozusky@kkmicro.cz>
To: linux-can@vger.kernel.org
Cc: Wolfgang Grandegger <wg@grandegger.com>
Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
Date: Tue, 29 Oct 2013 13:49:02 +0100 [thread overview]
Message-ID: <526FAEBE.20808@kkmicro.cz> (raw)
In-Reply-To: <526FA899.2070708@grandegger.com>
Dne 29.10.2013 13:22, Wolfgang Grandegger napsal(a):
> On 10/29/2013 01:03 PM, Wolfgang Grandegger wrote:
>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>>> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>>>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>>>> Hello,
>>>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>>>> i.MX25
>>>>>>>> board.
>>>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>>>> time)
>>>>>>>>
>>>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>>>> both)
>>>>>>>> I just configure them and then run:
>>>>>>>>
>>>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>>>
>>>>>>>> and start sending data from another device.
>>>>>>>>
>>>>>>>> I am not running any other program (like candump etc) to read from
>>>>>>>> CAN.
>>>>>>>>
>>>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>>>> interfaces now connected to same bus so they should receive same
>>>>>>>> data)
>>>>>>>> with ip -d -s link show can0/1
>>>>>>>>
>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>> link/can
>>>>>>>> can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>> restart-ms 0
>>>>>>>> bitrate 250000 sample-point 0.857
>>>>>>>> tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>> flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>> clock 66500000
>>>>>>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>> 0 0 0 1 1 0
>>>>>>
>>>>>> Do you have electrical problems on the bus? Or is reaching
>>>>>> error-passive
>>>>>> not related to this problem?
>>>>> It is not related to this problem - there is only RX pin connected on
>>>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>>>> doing TX)
>>>>>
>>>>>
>>>>>>>> RX: bytes packets errors dropped overrun mcast
>>>>>>>> 151769 19000 1699 0 1699 0
>>>>>>>> TX: bytes packets errors dropped carrier collsns
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>> link/can
>>>>>>>> can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>>> bitrate 250000 sample-point 0.857
>>>>>>>> tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>> flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>> clock 66500000
>>>>>>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>> 0 0 0 0 0 0
>>>>>>>> RX: bytes packets errors dropped overrun mcast
>>>>>>>> 157377 19696 2664 0 2664 0
>>>>>>>> TX: bytes packets errors dropped carrier collsns
>>>>>>>> 0 0 0 0 0 0
>>>>>>>>
>>>>>>>>
>>>>>>>> With just one iface used:
>>>>>>>>
>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>> link/can
>>>>>>>> can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>> restart-ms 0
>>>>>>>> bitrate 250000 sample-point 0.857
>>>>>>>> tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>> flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>> clock 66500000
>>>>>>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>> 0 0 0 1 1 0
>>>>>>>> RX: bytes packets errors dropped overrun mcast
>>>>>>>> 233277 29201 1483 0 1483 0
>>>>>>>> TX: bytes packets errors dropped carrier collsns
>>>>>>>> 0 0 0 0 0 0
>>>>>>>>
>>>>>>>>
>>>>>>>> Too many packets are lost.
>>>>>>>>
>>>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>>>> interrupts with (don't know if correctly)
>>>>>>>> // imx_irq_set_priority(43,14);
>>>>>>>> // imx_irq_set_priority(44,14);
>>>>>>>>
>>>>>>>> But it didn't help either.
>>>>>>>>
>>>>>>>>
>>>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>> I tried to disable
>>>>>>> //netif_receive_skb(skb); in flexcan_read_frame() and other functions
>>>>>>> so that data is not processed further in system
>>>>>>
>>>>>> Well ...
>>>>>>
>>>>>>> It didn't help.
>>>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>>>> add their difference to the global variable
>>>>>>> time_total+=time_stop-time_start;
>>>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>>>> wrote with dev_info into log (variables were initialized in
>>>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>>>> those counters and write them to log), so now I had average time spent
>>>>>>> int flexcan_read_frame.
>>>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>>>
>>>>>>> So I tried to disable
>>>>>>> /*
>>>>>>> skb = alloc_can_skb(dev, &cf);
>>>>>>> if (unlikely(!skb)) {
>>>>>>> stats->rx_dropped++;
>>>>>>> return 0;
>>>>>>> }
>>>>>>> */
>>>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>>>> flexcan_read_fifo call)
>>>>>>> And tried to send data again.
>>>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>>>> usec! 50x less ... no CAN frame was lost, even if I was using both
>>>>>>> CAN
>>>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>>>> to SD
>>>>>>> card.
>>>>>>
>>>>>> ... but the messages need to be allocated, queued, delivered to and
>>>>>> even
>>>>>> processed by a user space task. What you messure it part of the network
>>>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>>>> Maybe your system is simply not fast enough. To see what code is
>>>>>> involved just follow:
>>>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>>>> and free enough :(
>>>>>
>>>>>>
>>>>>> http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>>>> any
>>>>>>> alternative)? Or if there is another way how to get data to user?
>>>>>>
>>>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>>>> approx. 1
>>>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>>>> the queue is full you loose messages. This obviously happens when the
>>>>>> SDcard is accessed.
>>>>>>
>>>>>> Could you take function traces on your system?
>>>>> Is there any special tool for this or should I use my start/stop timers?
>>>>
>>>> Your start/stop timers will not show what other activity is disturbing
>>>> the CAN messages reception. There is the Linux function tracer:
>>>>
>>>> http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>>>
>>>> It need to be enabled in the kernel. Especially event and function
>>>> tracing could help to better understand your problems.
>>> Hello Wolfgang,
>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>> that easy, will be?
>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>> that won't help me much :(
>>
>> Probably that version is to old for proper ftrace support. The 100us you
>> measured for alloc_can_skb() is worst case, right? What is the mean value?
>
> Flexcan support was added to the mainline kernel 2.6.36. Where did you
> get your flexcan driver from? Could you post it please here? Any chance
> to switch to a (more) recent version of the Linux kernel?
It is 2.6.35.9 kernel, I think flexcan was backported from 2.6.36 by board developer who made patch for this kernel. But I am keeping it "updated" with latest updates from 3.x kernel, so I think there should be no errors in this driver.
I tried to switch to 2.6.39 but I think there were some errors with the kernel patch that adds support for this board so I gave up. May be I should try again.
Martin
> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2013-10-29 12:49 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-24 13:48 CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19) Martin Kozusky
2013-10-25 12:59 ` Martin Kozusky
2013-10-25 17:58 ` Wolfgang Grandegger
2013-10-26 18:18 ` Martin Kozusky
2013-10-26 19:40 ` Wolfgang Grandegger
2013-10-29 10:46 ` Martin Kozusky
2013-10-29 12:03 ` Wolfgang Grandegger
2013-10-29 12:22 ` Wolfgang Grandegger
2013-10-29 12:49 ` Martin Kozusky [this message]
2013-10-29 12:54 ` Gary Thomas
2013-10-29 13:00 ` "Martin Kožuský [KK micro s.r.o.]"
2013-10-29 12:40 ` Martin Kozusky
2013-10-29 14:30 ` Wolfgang Grandegger
2013-10-30 8:43 ` "Martin Kožuský [KK micro s.r.o.]"
2013-10-30 9:04 ` Wolfgang Grandegger
2013-10-30 9:14 ` "Martin Kožuský [KK micro s.r.o.]"
2013-10-30 9:27 ` Wolfgang Grandegger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=526FAEBE.20808@kkmicro.cz \
--to=mkozusky@kkmicro.cz \
--cc=linux-can@vger.kernel.org \
--cc=wg@grandegger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).