From: Tom Evans <tom_usenet@optusnet.com.au>
To: Holger Schurig <holgerschurig@gmail.com>,
Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-can@vger.kernel.org
Subject: Re: CAN question: how to trace frame errors?
Date: Fri, 12 Jun 2015 22:33:17 +1000 [thread overview]
Message-ID: <557AD18D.8010807@optusnet.com.au> (raw)
In-Reply-To: <CAOpc7mH-xD+eesJNjhpFwUSeMwGCdsRK8Q9AZSjmE2uw=cBgsw@mail.gmail.com>
On 12/06/2015 8:01 PM, Holger Schurig wrote:
>> These are probably RX-FIFO overrun errors:
>
> You're right.
>
> 13:03:16 kernel: ##HS flexcan_irq FIFO overrun
>
> That really puzzles me. The errors occured originally when someuse
> used a Kvazer to generate 80% load at 500 kB/s. In my tests, I used 1
> MB/s and cangen (no Kvazer here). The i.MX6 was otherwise idle:
Idle has nothing to do with it. The problem is the latency.
Let's do the maths.
The worst case CAN message arrival rate is with standard ID CAN messages
with zero bytes of data. These are about 50 bits long. So at 500kHz they
arrive at 10kHz or 100us interval. FlexCAN has a six message FIFO. So it
can receive six without overflowing, but the seventh will blow it. So it
has to (start to) be unloaded within 700us.
Since the CPU is running at 800MHz, that's only 560,000 instructions to
respond to an interrupt. And it isn't managing it. That's what "Not Real
Time" means - half a million instructions isn't enough time.
Except it isn't "respond to an interrupt" as the FlexCAN driver doesn't
receive the 8-byte messages during the interrupt. It schedules a NAPI
service routine to read the data, and they can easily be delayed that
long waiting for a slot.
Do you have a Frame Buffer? Is it write-through or cache-flushed? I've
read that a flush of a frame-buffer-sized chunk of memory can take the
L2 cache a very long time to complete. Think MILLIseconds. That locks up
ALL CORES unless the other ones are lucky enough to stay inside both
their L1 caches.
We couldn't handle CAN losing data (and run at 1 MBit) so I rewrote the
3.4 FlexCAN driver to unload the FIFO into a ring buffer during
interrupts and to have the NAPI routine unload that. No problems since then.
> kernel is 3.18.14
Then you SHOULD be better off than we were, running 3.4. In that version
the FlexCAN controller uses NAPI (and always has) while the Ethernet
controller didn't, but would happily try and unload 100 Ethernet packets
all the way to the network layer in the interrupt routine, blocking the
FlexCAN interrupts and NAPI run.
So check your Ethernet driver and see if it uses NAPI and if there's any
work-limiting in the interrupt routine.
Flood ping (ping -f -l 20) one of them and see if that makes the
overruns worse during your CAN test.
Run cangen with an 8-byte CAN packet and see if the lower arrival rate
fixes it. Change the message length and see if you can use that to
"measure the latency".
Then see what other interrupts you're getting. Try:
while true; do cat /proc/interrupts; sleep 1; done
See which interrupt counts (apart from FlexCAN) are increasing quickly.
Then you'll want to FTRACE (CONFIG_PERF_EVENTS) the kernel. The Kernel
Tracing is very good and very powerful. Learning how to run this is a
skill very worth having. You want to find where it is spending its time
between the interrupt and the NAPI run.
In our case we found our kernel was spending the majority of its time in
mutex, spinlock and slub debugging. It got six times faster when I got
rid of those.
Tom
next prev parent reply other threads:[~2015-06-12 12:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-12 9:29 CAN question: how to trace frame errors? Holger Schurig
2015-06-12 9:34 ` Marc Kleine-Budde
2015-06-12 10:01 ` Holger Schurig
2015-06-12 12:33 ` Tom Evans [this message]
2015-06-12 14:24 ` Holger Schurig
2015-06-13 15:30 ` Tom Evans
2015-06-13 20:28 ` Holger Schurig
2015-06-14 2:42 ` Tom Evans
2015-06-22 12:17 ` Holger Schurig
2015-06-22 13:15 ` Marc Kleine-Budde
2015-06-24 14:29 ` Holger Schurig
2015-06-25 8:37 ` Tom Evans
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=557AD18D.8010807@optusnet.com.au \
--to=tom_usenet@optusnet.com.au \
--cc=holgerschurig@gmail.com \
--cc=linux-can@vger.kernel.org \
--cc=mkl@pengutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.