linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Stein <alexander.stein@systec-electronic.com>
To: linux-can@vger.kernel.org
Cc: "Daniel Krüger" <daniel.krueger@systec-electronic.com>
Subject: wrong CAN frame order in network layer due to SMP?
Date: Thu, 24 Nov 2016 16:49:13 +0100	[thread overview]
Message-ID: <1864402.pXgGBBp51L@ws-stein> (raw)

Hi,

I experienced a rather interesting problem while doing a CAN burst test.
This sends a bunch of CAN frames while each frame has a counter in it and the 
receiver checks if the counter are in correct order.
Now I got the error about wrong order:
> Error on MSG ID 0x251. Got counter 141606 and expected 141605
> Error on MSG ID 0x251. Got counter 141605 and expected 141607
> Error on MSG ID 0x251. Got counter 141607 and expected 141606

Here is the corresponding part from "candump -t d -a -x any -l"
> (1479993291.766686) can0 251#00022926
> (1479993291.766574) can0 251#00022925
> (1479993291.766816) can0 251#00022927

And in fact they are ordered wrongly. But the timestamps are correct, e.g. if 
you reorder the frame by timestamp the order of counters is ok. This also 
shows that the order is wrong on both sockets in the same way!

The used driver is systec_can (downloadable at [1] if you like).
But the driver is not to blame: In the URB complete callback a SKB is created 
and passed to network stack by netif_rx.
The timestamp is set by net_timestamp_check() directly at the beginning of 
netif_rx_internal() so at this point the frames are still in order.
In enqueue_to_backlog the skb are enqueued in some per_cpu softnet_data. I 
don't know the details.
Now per_cpu gave me a hint the problem might be caused by SMP. So I put 7 of 
my 8 cores offline:
> for i in $(seq 1 7); do echo 0 > /sys/bus/cpu/devices/cpu${i}/online; done
Repeating the test run resulted in no order problems. My guess is that the USB 
interrupt is done on different cores and that gathering the SKBs to be put in 
the sockets is somewhat racy.
This is supported by the fact after puttng the core online again:
> for i in $(seq 1 7); do echo 1 > /sys/bus/cpu/devices/cpu${i}/online; done
there are still no order problems. But this again is caused by the fact 
(probably a bug in ACPI tables of my local machine) that any CPU put back 
online serves no IO-APIC or MSI IRQs. This can easily be checked by calling:
> watch -d -n 1 cat /proc/interrupts
See that interrupts are handled on different CPUs. Put some CPUs (or all but 
one) offline and online again. The ones which went offline dont serve any IO-
APIC or MSI IRQs (so including the one for ehci_hcd).
Back to my CAN problem: Only a single core handles USB IRQs and there is 
apparently no softnet_data race.
The test about wrong CAN frame ordering was done on kernel 4.8.9-gentoo but I 
was also able to reproduce this problem on 3.14.58-gentoo-r1. 3.12.52-gentoo-
r1 apparently does not suffer from that problem, at least 3 tries were without 
errors. In buggy kernels this problems occured next to every time.

Any idea what got wrong in the network code about gathering the SKBs 
which might result in wrong order?

Best regards,
Alexander

[1] http://www.systec-electronic.com/en/products/industrial-communication/interfaces-and-gateways/can-usb-adapter-usb-canmodul1
-- 
Dipl.-Inf. Alexander Stein
SYS TEC electronic GmbH
alexander.stein@systec-electronic.com

Legal and Commercial Address:
Am Windrad 2
08468 Heinsdorfergrund
Germany

Office: +49 (0) 3765 38600-0
Fax:    +49 (0) 3765 38600-4100
 
Managing Directors:
	Director Technology/CEO: Dipl.-Phys. Siegmar Schmidt;
	Director Commercial Affairs/COO: Dipl. Ing. (FH) Armin von Collrepp
Commercial Registry:
	Amtsgericht Chemnitz, HRB 28082; USt.-Id Nr. DE150534010

             reply	other threads:[~2016-11-24 15:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-24 15:49 Alexander Stein [this message]
2016-11-25 11:46 ` wrong CAN frame order in network layer due to SMP? Oliver Hartkopp
2016-11-28  9:01   ` Alexander Stein
2016-11-28 20:36     ` Oliver Hartkopp
2016-11-29 10:30       ` Alexander Stein
2016-11-29 19:48         ` Oliver Hartkopp
2016-11-30  7:23           ` Alexander Stein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1864402.pXgGBBp51L@ws-stein \
    --to=alexander.stein@systec-electronic.com \
    --cc=daniel.krueger@systec-electronic.com \
    --cc=linux-can@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).