From: Alexander Stein <alexander.stein@systec-electronic.com>
To: linux-can@vger.kernel.org
Cc: "Daniel Krüger" <daniel.krueger@systec-electronic.com>
Subject: wrong CAN frame order in network layer due to SMP?
Date: Thu, 24 Nov 2016 16:49:13 +0100 [thread overview]
Message-ID: <1864402.pXgGBBp51L@ws-stein> (raw)
Hi,
I experienced a rather interesting problem while doing a CAN burst test.
This sends a bunch of CAN frames while each frame has a counter in it and the
receiver checks if the counter are in correct order.
Now I got the error about wrong order:
> Error on MSG ID 0x251. Got counter 141606 and expected 141605
> Error on MSG ID 0x251. Got counter 141605 and expected 141607
> Error on MSG ID 0x251. Got counter 141607 and expected 141606
Here is the corresponding part from "candump -t d -a -x any -l"
> (1479993291.766686) can0 251#00022926
> (1479993291.766574) can0 251#00022925
> (1479993291.766816) can0 251#00022927
And in fact they are ordered wrongly. But the timestamps are correct, e.g. if
you reorder the frame by timestamp the order of counters is ok. This also
shows that the order is wrong on both sockets in the same way!
The used driver is systec_can (downloadable at [1] if you like).
But the driver is not to blame: In the URB complete callback a SKB is created
and passed to network stack by netif_rx.
The timestamp is set by net_timestamp_check() directly at the beginning of
netif_rx_internal() so at this point the frames are still in order.
In enqueue_to_backlog the skb are enqueued in some per_cpu softnet_data. I
don't know the details.
Now per_cpu gave me a hint the problem might be caused by SMP. So I put 7 of
my 8 cores offline:
> for i in $(seq 1 7); do echo 0 > /sys/bus/cpu/devices/cpu${i}/online; done
Repeating the test run resulted in no order problems. My guess is that the USB
interrupt is done on different cores and that gathering the SKBs to be put in
the sockets is somewhat racy.
This is supported by the fact after puttng the core online again:
> for i in $(seq 1 7); do echo 1 > /sys/bus/cpu/devices/cpu${i}/online; done
there are still no order problems. But this again is caused by the fact
(probably a bug in ACPI tables of my local machine) that any CPU put back
online serves no IO-APIC or MSI IRQs. This can easily be checked by calling:
> watch -d -n 1 cat /proc/interrupts
See that interrupts are handled on different CPUs. Put some CPUs (or all but
one) offline and online again. The ones which went offline dont serve any IO-
APIC or MSI IRQs (so including the one for ehci_hcd).
Back to my CAN problem: Only a single core handles USB IRQs and there is
apparently no softnet_data race.
The test about wrong CAN frame ordering was done on kernel 4.8.9-gentoo but I
was also able to reproduce this problem on 3.14.58-gentoo-r1. 3.12.52-gentoo-
r1 apparently does not suffer from that problem, at least 3 tries were without
errors. In buggy kernels this problems occured next to every time.
Any idea what got wrong in the network code about gathering the SKBs
which might result in wrong order?
Best regards,
Alexander
[1] http://www.systec-electronic.com/en/products/industrial-communication/interfaces-and-gateways/can-usb-adapter-usb-canmodul1
--
Dipl.-Inf. Alexander Stein
SYS TEC electronic GmbH
alexander.stein@systec-electronic.com
Legal and Commercial Address:
Am Windrad 2
08468 Heinsdorfergrund
Germany
Office: +49 (0) 3765 38600-0
Fax: +49 (0) 3765 38600-4100
Managing Directors:
Director Technology/CEO: Dipl.-Phys. Siegmar Schmidt;
Director Commercial Affairs/COO: Dipl. Ing. (FH) Armin von Collrepp
Commercial Registry:
Amtsgericht Chemnitz, HRB 28082; USt.-Id Nr. DE150534010
next reply other threads:[~2016-11-24 15:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-24 15:49 Alexander Stein [this message]
2016-11-25 11:46 ` wrong CAN frame order in network layer due to SMP? Oliver Hartkopp
2016-11-28 9:01 ` Alexander Stein
2016-11-28 20:36 ` Oliver Hartkopp
2016-11-29 10:30 ` Alexander Stein
2016-11-29 19:48 ` Oliver Hartkopp
2016-11-30 7:23 ` Alexander Stein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1864402.pXgGBBp51L@ws-stein \
--to=alexander.stein@systec-electronic.com \
--cc=daniel.krueger@systec-electronic.com \
--cc=linux-can@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).