linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* wrong CAN frame order in network layer due to SMP?
@ 2016-11-24 15:49 Alexander Stein
  2016-11-25 11:46 ` Oliver Hartkopp
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Stein @ 2016-11-24 15:49 UTC (permalink / raw)
  To: linux-can; +Cc: Daniel Krüger

Hi,

I experienced a rather interesting problem while doing a CAN burst test.
This sends a bunch of CAN frames while each frame has a counter in it and the 
receiver checks if the counter are in correct order.
Now I got the error about wrong order:
> Error on MSG ID 0x251. Got counter 141606 and expected 141605
> Error on MSG ID 0x251. Got counter 141605 and expected 141607
> Error on MSG ID 0x251. Got counter 141607 and expected 141606

Here is the corresponding part from "candump -t d -a -x any -l"
> (1479993291.766686) can0 251#00022926
> (1479993291.766574) can0 251#00022925
> (1479993291.766816) can0 251#00022927

And in fact they are ordered wrongly. But the timestamps are correct, e.g. if 
you reorder the frame by timestamp the order of counters is ok. This also 
shows that the order is wrong on both sockets in the same way!

The used driver is systec_can (downloadable at [1] if you like).
But the driver is not to blame: In the URB complete callback a SKB is created 
and passed to network stack by netif_rx.
The timestamp is set by net_timestamp_check() directly at the beginning of 
netif_rx_internal() so at this point the frames are still in order.
In enqueue_to_backlog the skb are enqueued in some per_cpu softnet_data. I 
don't know the details.
Now per_cpu gave me a hint the problem might be caused by SMP. So I put 7 of 
my 8 cores offline:
> for i in $(seq 1 7); do echo 0 > /sys/bus/cpu/devices/cpu${i}/online; done
Repeating the test run resulted in no order problems. My guess is that the USB 
interrupt is done on different cores and that gathering the SKBs to be put in 
the sockets is somewhat racy.
This is supported by the fact after puttng the core online again:
> for i in $(seq 1 7); do echo 1 > /sys/bus/cpu/devices/cpu${i}/online; done
there are still no order problems. But this again is caused by the fact 
(probably a bug in ACPI tables of my local machine) that any CPU put back 
online serves no IO-APIC or MSI IRQs. This can easily be checked by calling:
> watch -d -n 1 cat /proc/interrupts
See that interrupts are handled on different CPUs. Put some CPUs (or all but 
one) offline and online again. The ones which went offline dont serve any IO-
APIC or MSI IRQs (so including the one for ehci_hcd).
Back to my CAN problem: Only a single core handles USB IRQs and there is 
apparently no softnet_data race.
The test about wrong CAN frame ordering was done on kernel 4.8.9-gentoo but I 
was also able to reproduce this problem on 3.14.58-gentoo-r1. 3.12.52-gentoo-
r1 apparently does not suffer from that problem, at least 3 tries were without 
errors. In buggy kernels this problems occured next to every time.

Any idea what got wrong in the network code about gathering the SKBs 
which might result in wrong order?

Best regards,
Alexander

[1] http://www.systec-electronic.com/en/products/industrial-communication/interfaces-and-gateways/can-usb-adapter-usb-canmodul1
-- 
Dipl.-Inf. Alexander Stein
SYS TEC electronic GmbH
alexander.stein@systec-electronic.com

Legal and Commercial Address:
Am Windrad 2
08468 Heinsdorfergrund
Germany

Office: +49 (0) 3765 38600-0
Fax:    +49 (0) 3765 38600-4100
 
Managing Directors:
	Director Technology/CEO: Dipl.-Phys. Siegmar Schmidt;
	Director Commercial Affairs/COO: Dipl. Ing. (FH) Armin von Collrepp
Commercial Registry:
	Amtsgericht Chemnitz, HRB 28082; USt.-Id Nr. DE150534010

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-11-30  7:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-24 15:49 wrong CAN frame order in network layer due to SMP? Alexander Stein
2016-11-25 11:46 ` Oliver Hartkopp
2016-11-28  9:01   ` Alexander Stein
2016-11-28 20:36     ` Oliver Hartkopp
2016-11-29 10:30       ` Alexander Stein
2016-11-29 19:48         ` Oliver Hartkopp
2016-11-30  7:23           ` Alexander Stein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).