linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Setka <setka@vstk.cz>
To: Marc Kleine-Budde <mkl@pengutronix.de>,
	Robert Schwebel <r.schwebel@pengutronix.de>
Cc: rfi@lists.rocketboards.org, linux-can <linux-can@vger.kernel.org>
Subject: Re: [Rfi] Cyclone V CAN errors when application pinned to CPU1
Date: Sat, 6 Feb 2016 18:59:45 +0100	[thread overview]
Message-ID: <56B63491.9020500@vstk.cz> (raw)
In-Reply-To: <5625EF45.2000807@pengutronix.de>

Hi, I was just able to get back to this issue after some time.

20.10.2015 9:37 Marc Kleine-Budde:
>> On Fri, Oct 16, 2015 at 09:53:27PM +0200, Vlastimil Setka wrote:
>>> We discovered very weird behaviour of CAN controller in Cyclone V SoC
>>> with Linux socketcan stack. The problem was first seen on 3.10-ltsi a
>>> few months ago, and now again on 3.18 from altera github (with rt
>>> preempt patch applied).
>> Could you try if the issue happens with a recent mainline kernel as
>> well? RT is available for 4.1, so that would be a good choice.
Now the tests were done on 4.3 kernel with a few Altera patches, without RT patch - build from: https://github.com/altera-opensource/linux-socfpga/tree/rel_socfpga-4.3_16.02.01_pr
> Which CAN driver are you using?
CAN controller of the affected system is embedded in the Altera Cyclone V SoC chip, it's implementation of a standard D_CAN core. It use mainline c_can driver: https://github.com/altera-opensource/linux-socfpga/tree/rel_socfpga-4.3_16.02.01_pr/drivers/net/can/c_can
>>> We have a linux application which sends data periodically (1 to 20 ms
>>> period) out over the can0 socketcan interface. Sometimes the first
>>> data byte in the CAN frame is zero on the wire, but non-zero in the
>>> data sent! When running with this period, this happens at random
>>> times, but during a few minutes it can be allways replicated.
> How do you measure the CAN frames on the wire?
Using another linux system by:
    candump can0,0:0,#FFFFFFFF | grep -v "11 22 33 44 55 66 77 88"
We also verified the results concurrently using USB P-CAN adapter on another system with some proprietary monitoring application, and also using oscilloscope with logic analyzer.
>>> The problem only appears when the application is pinned to CPU1 by
>>> linux process afinity mechanism. When pinned to default CPU0, there is
>>> no problem.
>>>
>>> Anyone seen this issue? Any idea how to debug it and what can be a
>>> reason? What version (git repo / tag) of Linux should I use?
>>>
>>> We plan to do some in-deep evaluation and testing, but I want to share
>>> the experience now.
> The TX functions is usually pretty straight forward. Copy all data bytes
> into the hardware, write ID and DLC, then hit the send bit (or whatever
> triggers the hardware to send the frame). Maybe there's some barrier
> missing in this sequence? Where can I find your driver?
The driver is c_can from mainline (see above).
>> Is your test program available somewhere?
Test program is very simple C code which cyclically writes to CAN socket, available here: https://gist.github.com/vstk/f9e4b7c5646bedfedd42

We now discovered that the problem is not triggered by CPU pinning which was our initial idea. It can be reproducibly triggered by a high network load on ethernet generated by iperf for example. Other load on CPU or data flash does not affect this. Everything is OK only in case when the test program is executed with a high (80) RT priority on RT-patched kernel, and the whole system is limited to a single CPU by isolcpus=1 kernel option (this case initially pointed us to some SMP-related problem). Errors are most common when the test program is pinned to CPU1 by taskset and rest of the system is limited to CPU0.

Steps to reproduce:

- monitoring linux system connected to CAN bus: configure can0 interface, run:

root@monitor:~# ip link set can0 type can bitrate 1000000 restart-ms 100
root@monitor:~# ip link set can0 up
root@monitor:~# candump can0,0:0,#FFFFFFFF | grep -v "11 22 33 44 55 66 77 88"

- system under test: configure can0 interface, build test program (https://gist.github.com/vstk/f9e4b7c5646bedfedd42), run it:

root@cyclone-soc:~# ./canbench
# canbench [ <rt-prio> [ <interval-us> ] ]
iface can0 at index 2 OK
interval: 1.000000 ms

- do some network load at system under test:

root@cyclone-soc:~# iperf -c 192.168.1.1

- data errors can be seen in a few seconds at monitoring system (the "11 22 33 44 55 66 77 88" is a test pattern generated by the test program):

root@monitor:~# candump can0,0:0,#FFFFFFFF | grep -v "11 22 33 44 55 66 77 88"
  can0  266   [8]  11 22 33 44 00 00 77 88
  can0  266   [8]  11 22 33 44 01 00 77 88
  can0  266   [8]  11 22 33 44 01 00 77 88
  can0  266   [8]  0E 00 33 44 55 66 77 88
  can0  266   [8]  11 22 33 44 01 00 77 88

As a next step, I plan to check data inside the driver just before it writes into the hardware to verify if the error is not in network stack above the driver. Any other idea?

Vlastimil Setka


  reply	other threads:[~2016-02-06 18:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <562155B7.7020504@vsis.cz>
2015-10-20  7:18 ` [Rfi] Cyclone V CAN errors when application pinned to CPU1 Robert Schwebel
2015-10-20  7:37   ` Marc Kleine-Budde
2016-02-06 17:59     ` Vlastimil Setka [this message]
2016-02-06 22:34       ` Tom Evans
2016-02-06 23:56         ` Vlastimil Setka
2016-02-07  0:54           ` Tom Evans
2016-02-07 22:19           ` Tom Evans

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B63491.9020500@vstk.cz \
    --to=setka@vstk.cz \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=r.schwebel@pengutronix.de \
    --cc=rfi@lists.rocketboards.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).