linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oliver Hartkopp <socketcan@hartkopp.net>
To: Tom Evans <tom_usenet@optusnet.com.au>,
	Stephane Grosjean <s.grosjean@peak-system.com>,
	Marc Kleine-Budde <mkl@pengutronix.de>
Cc: "linux-can@vger.kernel.org" <linux-can@vger.kernel.org>,
	Manfred Schlaegl <manfred.schlaegl@gmx.at>
Subject: Re: [BULK]Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv
Date: Sun, 05 Jul 2015 20:21:22 +0200	[thread overview]
Message-ID: <559975A2.9020300@hartkopp.net> (raw)
In-Reply-To: <559885DC.8040208@optusnet.com.au>

On 05.07.2015 03:18, Tom Evans wrote:
> On 5/07/2015 2:54 AM, Oliver Hartkopp wrote:
>> Hi Stephane,
>> ...
>> While testing the patches
>> ...
>> I discovered an increase of out-of-order CAN frame receptions.
>> My setup is a core i7 with a PCAN USB and a PCAN USB pro connected to my
>> full busload CAN source (1MBit/s, ~8008 frames/s).
> 
> Out of order reception is guaranteed with some CAN hardware and driver 
> software, such as the MCP2515 controller and Linux. The chip doesn't implement 
> a FIFO, but has two receive buffers which can give message swaps quite easily. 
> This can be fixed in the driver, but nobody has. Details here:
> 
> http://www.microchip.com/forums/m620741.aspx
> 

Ugh. When reading "Out of order reception is guaranteed" I assumed a typo %-(
I don't have any MCP2515 hardware here. Any volunteers out there to fix that?

> The PCAN-USB uses an SJA1000 which doesn't have that problem. It has a 64 byte 
> FIFO. What is inside the PCAN-USB Pro isn't documented on their web page, but 
> it may be faster or have less transaction overhead or latency or something. 

IIRC it's some NXP LPC 17xx CPU with two CAN interfaces. Yes and it should be
faster than the SJA1000/C161 combo inside the PCAN-USB.

> The PCAN-USB Pro is "no longer manufactured", the replacement "PCAN-USB Pro 
> FD" has an FPGA controller.

I did my first tests with the PCAN-USB Pro FD - but to check the effect in
kernel versions < v4.0 I swapped over to the standard USB Pro due to the
missing FD support in older kernels.

>  > It's more with the PCAN USB and very few with PCAN USB pro.
> 
> I'd guess the Pro has less overhead and can get messages over USB faster than 
> the other one.
> 
>  > I'm a bit confused as this effect seems to increase with Linux kernel
>  > version numbers.
> 
> As for the later kernels being worse, that looks like a simple case of 
> "bloat", with them taking longer to get around to servicing the interrupts and 
> reading the messages. Earlier ones are probably reading CAN messages one at a 
> time, with each one getting through the stack before the next one arrives. 
> Later kernels are probably reading them in bursts. Slower controllers 
> (PCAN-USB) expose this sooner.
> 
> Can you drop back to a single core to see if this is a multicore problem? It 
> will either fix it or make it worse if it is a loading/delay problem.

Good idea!

I took my old 2005 Samsung X20 with 1.73GHz Pentium M and Xubuntu 14.04 ...

Both the stock Xubuntu 3.13 and the 4.1.1 did not have the out-of-order
issues. There were 'only' two sporadic drops with the PCAN-USB in more than
three hours of testing:

 drop detected: expected 224 received 18   (50 frames lost)
 drop detected: expected 251 received 252  (1 frame lost)

The drops emerged only on the PCAN USB interface in this case.

> Reordering packets should be considered a serious bug as some CAN protocols 
> can't handle this at all.

Yes definitely, e.g. ISO15765-2 will not work with out-of-order frames.

Going back to the latest 4.2-merge kernel with all the CAN fixes and the core
i7 SMP setup, I tried to assign the interrupt from the USB host controller to
a specific CPU using the documentation in

	https://www.kernel.org/doc/Documentation/IRQ-affinity.txt

My USB controller is on IRQ 28:

# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:         23          0          0          0  IR-IO-APIC   2-edge      timer
(..)
 28:      10114      53480     566927    1634485  IR-PCI-MSI 327680-edge      xhci_hcd
(..)

With

# echo 1 > /proc/irq/28/smp_affinity

I assigned the IRQ 28 to CPU0 and it now looks like this:

# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:         23          0          0          0  IR-IO-APIC   2-edge      timer
(..)
 28:    9072996      53480     766233    2125901  IR-PCI-MSI 327680-edge      xhci_hcd
(..)

and all the out-of-order receptions were totally gone! \o/

When nailing the CAN controller/driver interrupt to a specific CPU fixes the
out-of-order reception, we need to check whether we can do this by default.

New embedded systems like the imx6-quad will run into this problem otherwise.

Asking google about it lead to

http://stackoverflow.com/questions/11858487/change-smp-affinity-from-linux-device-driver

and finally to irq_set_affinity()

http://lxr.free-electrons.com/source/kernel/irq/manage.c#L182

which can be called by drivers from inside the kernel context.

Do you think this could be a valuable idea to follow?

Regards,
Oliver

  reply	other threads:[~2015-07-05 18:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-20 17:21 [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv Manfred Schlaegl
2015-06-20 22:42 ` Oliver Hartkopp
2015-06-22  9:48   ` Manfred Schlaegl
2015-06-22 10:24     ` Oliver Hartkopp
     [not found]       ` <5588E6FB.5040903@optusnet.com.au>
2015-06-23  8:01         ` Oliver Hartkopp
2015-06-24  2:13           ` Tom Evans
2015-06-24 19:56             ` Oliver Hartkopp
2015-06-25  8:32               ` [BULK]Re: " Stephane Grosjean
2015-06-25  9:36                 ` Oliver Hartkopp
2015-06-29 16:13                   ` Oliver Hartkopp
2015-07-04 16:54                     ` Oliver Hartkopp
2015-07-05  1:18                       ` Tom Evans
2015-07-05 18:21                         ` Oliver Hartkopp [this message]
2015-07-06  5:44                           ` Oliver Hartkopp
2015-07-06  6:50                             ` Tom Evans
2015-07-06 17:09                               ` Oliver Hartkopp
2015-07-06  7:58                       ` [BULK]Re: " Stephane Grosjean
2015-07-06 17:14                         ` Oliver Hartkopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559975A2.9020300@hartkopp.net \
    --to=socketcan@hartkopp.net \
    --cc=linux-can@vger.kernel.org \
    --cc=manfred.schlaegl@gmx.at \
    --cc=mkl@pengutronix.de \
    --cc=s.grosjean@peak-system.com \
    --cc=tom_usenet@optusnet.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).