All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Stein <alexander.stein@systec-electronic.com>
To: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-can@vger.kernel.org,
	"Daniel Krüger" <daniel.krueger@systec-electronic.com>
Subject: Re: wrong CAN frame order in network layer due to SMP?
Date: Mon, 28 Nov 2016 10:01:24 +0100	[thread overview]
Message-ID: <3312577.WzoMqUrz0A@ws-stein> (raw)
In-Reply-To: <153c7653-ab74-fd0c-c605-7bafe5e44297@hartkopp.net>

On Friday 25 November 2016 12:46:08, Oliver Hartkopp wrote:
> Hello Alexander,
> 
> On 11/24/2016 04:49 PM, Alexander Stein wrote:
> > Back to my CAN problem: Only a single core handles USB IRQs and there is
> > apparently no softnet_data race.
> > The test about wrong CAN frame ordering was done on kernel 4.8.9-gentoo
> > but I was also able to reproduce this problem on 3.14.58-gentoo-r1.
> > 3.12.52-gentoo- r1 apparently does not suffer from that problem, at least
> > 3 tries were without errors. In buggy kernels this problems occured next
> > to every time.
> > 
> > Any idea what got wrong in the network code about gathering the SKBs
> > which might result in wrong order?
> 
> I detected a similar issue in some 3.1x kernel and asked this question:
> 
> http://marc.info/?l=linux-can&m=143637774606287&w=2
> 
> When you follow the entire discussion at
> 
> http://marc.info/?t=143637789700002&r=1&w=2
> 
> you will see that they pushed me to implement NAPI on all CAN interfaces
> which neither makes no sense for CAN controllers that do not have a RX
> FIFO (e.g. sja1000) nor fixes the issue at it's root cause.
> 
> Your findings bring up the problem again - good :-)

Too bad I didn't know of that post earlier, well never searched for it :-/

> When you look at the networking guys that like to speed up TCP traffic
> and also put skbs into percpu queues that are related to the receiving
> socket(!!!) instance then it should be possible to put CAN skbs related
> to their CAN interfaces into a percpu queue (to suppress out-of-order
> reception).

But wouldn't using queues related to sockets result in different orderings in 
different sockets? I've yet to find an erroneous rest run with a non-
conforming candump.
Anyway I don't yet fully understand the complete code and/or data flow up to 
the socket once netif_rx() is called.

> IMO the difference is not to queue the skbs for a specific socket but
> for a specific interface.
> The 'endpoint' of CAN frames where they have to be in order is can_rcv()
> in af_can.c and not any TCP instance that needs to reassemble the TCP
> traffic for a specific socket.

Sure, TCP can handle OOO pretty fine. Even for UDP this is not a problem at 
all. But isn't using raw sockets on ethernet in promiscuous mode a somewhat 
similar scenario? Or to put it in another way: Wouldn't tcpdump or wireshark 
suffer from the same problem?

> Can you check whether my suggestion with skb_set_hash() in
> alloc_can_skb() works for you?

For ease of use I didn't change alloc_can_skb() but rather used the patched 
inlined below. Using this change and (!)
> echo f >  /sys/class/net/can0/queues/rx-0/rps_cpus

3 test runs didn't raise any OOO errors.

But shouldn't the hash type be rather PKT_HASH_TYPE_L4? Otherwise 
skb_get_hash() doesn't use skb->hash directly (or at all?). I am aware that L4 
is semantically wrong though.

> In any way I think we should start a new attempt to make clear that the
> skbs have to be in order for a specific interface at can_rcv().
> And we need some solution that is enabled by default and fits to the
> netdev guys mindset.

It should not only be enabled by default but rather the only solution with no 
way to be disabled/wrong.

Best regards,
Alexander

diff --git a/systec_can.c b/systec_can.c
index b6d9b74..51b2bf6 100644
--- a/systec_can.c
+++ b/systec_can.c
@@ -978,6 +978,8 @@ static void systec_can_rx_can_msg(struct systec_can_chan 
*chan, u8 *msg_buf)
                 return;
         }
 
+        skb_set_hash(skb, chan->netdev->ifindex, PKT_HASH_TYPE_L2);
+
         /* get size of data part of CAN message */
         cf->can_dlc = get_can_dlc(msg->format & USBCAN_DATAFF_DLC);
-- 
Dipl.-Inf. Alexander Stein
SYS TEC electronic GmbH
alexander.stein@systec-electronic.com

Legal and Commercial Address:
Am Windrad 2
08468 Heinsdorfergrund
Germany

Office: +49 (0) 3765 38600-0
Fax:    +49 (0) 3765 38600-4100
 
Managing Directors:
	Director Technology/CEO: Dipl.-Phys. Siegmar Schmidt;
	Director Commercial Affairs/COO: Dipl. Ing. (FH) Armin von Collrepp
Commercial Registry:
	Amtsgericht Chemnitz, HRB 28082; USt.-Id Nr. DE150534010


  reply	other threads:[~2016-11-28  9:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-24 15:49 wrong CAN frame order in network layer due to SMP? Alexander Stein
2016-11-25 11:46 ` Oliver Hartkopp
2016-11-28  9:01   ` Alexander Stein [this message]
2016-11-28 20:36     ` Oliver Hartkopp
2016-11-29 10:30       ` Alexander Stein
2016-11-29 19:48         ` Oliver Hartkopp
2016-11-30  7:23           ` Alexander Stein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3312577.WzoMqUrz0A@ws-stein \
    --to=alexander.stein@systec-electronic.com \
    --cc=daniel.krueger@systec-electronic.com \
    --cc=linux-can@vger.kernel.org \
    --cc=socketcan@hartkopp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.