* socket can receive order @ 2015-09-08 9:42 Daniel Squires 2015-09-08 10:01 ` Marc Kleine-Budde 0 siblings, 1 reply; 20+ messages in thread From: Daniel Squires @ 2015-09-08 9:42 UTC (permalink / raw) To: linux-can Hi all, new to this list. Just a quick question at present, when using recv on a socket that is bound to a can interface, should the packets be received in the order they came off the wire? or is this not guaranteed? for example, is this valid or an error in some part of the system? candump can3 -tz <snip> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 The packets were sent to the wire in order, the CAN ID is incremented with each send to ensure mailbox/arbitration details dont mess up the order on to the wire, the packets were seen coming off the wire over USB in wireshark in the correct order, but my test utility which aborts when something unexpected happens and candump both see this out of sequence result. Note that the timestamps confirm that something saw them in the correct order, but recv returns them out of order. Another identical receiver with another instance of candump sees the packets in the expected order, but sees other packets at other times out of order, i.e. it appears random. Thanks -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 9:42 socket can receive order Daniel Squires @ 2015-09-08 10:01 ` Marc Kleine-Budde 2015-09-08 10:41 ` Daniel Squires 0 siblings, 1 reply; 20+ messages in thread From: Marc Kleine-Budde @ 2015-09-08 10:01 UTC (permalink / raw) To: Daniel Squires, linux-can [-- Attachment #1: Type: text/plain, Size: 658 bytes --] On 09/08/2015 11:42 AM, Daniel Squires wrote: > Hi all, > > new to this list. > > Just a quick question at present, when using recv on a socket that is > bound to a can interface, should the packets be received in the order > they came off the wire? or is this not guaranteed? Should be guaranteed. Which CAN core are you using? What's your kernel version? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 10:01 ` Marc Kleine-Budde @ 2015-09-08 10:41 ` Daniel Squires 2015-09-08 11:13 ` Marc Kleine-Budde 0 siblings, 1 reply; 20+ messages in thread From: Daniel Squires @ 2015-09-08 10:41 UTC (permalink / raw) To: Marc Kleine-Budde, linux-can Hi Marc, I should have mentioned that this "issue" seems to only show up on our application PC, (which is an Intel NUC). On my laptop and Desktop PC I have not seen it happen. Both the application PC (NUC) and the Laptop are running Ubuntu kernel 3.19.0-26-generic The NUC has the kernel rebuilt without xhci due to problems it causes with another USB peripheral. I am not entirely sure what you mean by which can core I am using but if it helps i am opening the socket as follows : sock = socket(PF_CAN,SOCK_RAW,CAN_RAW); in a small standalone test application which I wrote after having difficulty with our main application. I am using custom hardware/firmware and am using the kernel module found here : https://github.com/fabiobaltieri/open-usb-can though it has a small change to stop the net queue at the top of open_usb_can_start_xmit as otherwise its prone to loosing TX packets when loaded. I can see the packets coming in the correct order in wireshark and it is not immediately obvious to me how the kernel module could mix up the order, so it seems that it must be something that happens at the socket level? On the top level I am using CANFestival for CANOpen implementation, so it has occurred to me I could implement a CANFestival "driver" using libusb and completely bypass the kernel module and socket can layers, but I hope not to have to do this. On 08/09/15 11:01, Marc Kleine-Budde wrote: > On 09/08/2015 11:42 AM, Daniel Squires wrote: >> Hi all, >> >> new to this list. >> >> Just a quick question at present, when using recv on a socket that is >> bound to a can interface, should the packets be received in the order >> they came off the wire? or is this not guaranteed? > Should be guaranteed. Which CAN core are you using? What's your kernel > version? > > Marc > -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 10:41 ` Daniel Squires @ 2015-09-08 11:13 ` Marc Kleine-Budde 2015-09-08 11:17 ` Daniel Squires 2015-09-08 11:46 ` Wolfgang Grandegger 0 siblings, 2 replies; 20+ messages in thread From: Marc Kleine-Budde @ 2015-09-08 11:13 UTC (permalink / raw) To: Daniel Squires, linux-can, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 2328 bytes --] On 09/08/2015 12:41 PM, Daniel Squires wrote: > On my laptop and Desktop PC I have not seen it happen. > Both the application PC (NUC) and the Laptop are running Ubuntu kernel > 3.19.0-26-generic > > The NUC has the kernel rebuilt without xhci due to problems it causes > with another USB peripheral. > > I am not entirely sure what you mean by which can core I am using but if > it helps i am opening the socket as follows : I mean what kind of CAN adapter... > > sock = socket(PF_CAN,SOCK_RAW,CAN_RAW); > > in a small standalone test application which I wrote after having > difficulty with our main application. > > I am using custom hardware/firmware and am using the kernel module found > here : https://github.com/fabiobaltieri/open-usb-can > though it has a small change to stop the net queue at the top of > open_usb_can_start_xmit as otherwise its prone to loosing TX packets > when loaded. Yes, this looks racy - You should ask then to mainline working the driver. > I can see the packets coming in the correct order in wireshark and it is > not immediately obvious to me how the kernel module could mix up the > order, so it seems that it must be something that happens at the socket > level? The kernel module "produces" the CAN frames, so if you see them in the correct order in wireshark, they have left the module in the right order. > candump can3 -tz > <snip> > (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 > (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 > (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 The timestamps are in the correct order. Maybe Oliver can help here, he's an expert when it comes to strange reordering :) > On the top level I am using CANFestival for CANOpen implementation, so > it has occurred to me I could implement a CANFestival "driver" using > libusb and completely bypass the kernel module and socket can layers, > but I hope not to have to do this. Na, you don't want to do this. Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:13 ` Marc Kleine-Budde @ 2015-09-08 11:17 ` Daniel Squires 2015-09-08 11:20 ` Marc Kleine-Budde 2015-09-08 16:56 ` Oliver Hartkopp 2015-09-08 11:46 ` Wolfgang Grandegger 1 sibling, 2 replies; 20+ messages in thread From: Daniel Squires @ 2015-09-08 11:17 UTC (permalink / raw) To: Marc Kleine-Budde, linux-can, Oliver Hartkopp On 08/09/15 12:13, Marc Kleine-Budde wrote: > On 09/08/2015 12:41 PM, Daniel Squires wrote: >> On my laptop and Desktop PC I have not seen it happen. >> Both the application PC (NUC) and the Laptop are running Ubuntu kernel >> 3.19.0-26-generic >> >> The NUC has the kernel rebuilt without xhci due to problems it causes >> with another USB peripheral. >> >> I am not entirely sure what you mean by which can core I am using but if >> it helps i am opening the socket as follows : > I mean what kind of CAN adapter... > >> sock = socket(PF_CAN,SOCK_RAW,CAN_RAW); >> >> in a small standalone test application which I wrote after having >> difficulty with our main application. >> >> I am using custom hardware/firmware and am using the kernel module found >> here : https://github.com/fabiobaltieri/open-usb-can >> though it has a small change to stop the net queue at the top of >> open_usb_can_start_xmit as otherwise its prone to loosing TX packets >> when loaded. > Yes, this looks racy - You should ask then to mainline working the driver. > >> I can see the packets coming in the correct order in wireshark and it is >> not immediately obvious to me how the kernel module could mix up the >> order, so it seems that it must be something that happens at the socket >> level? > The kernel module "produces" the CAN frames, so if you see them in the > correct order in wireshark, they have left the module in the right order. Sorry , I should have been clearer here, in wireshark was looking at the USB frames not the CAN frames. however I think what you say still stands due to the time stamps being in the correct order. > >> candump can3 -tz >> <snip> >> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 >> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 >> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 > The timestamps are in the correct order. Maybe Oliver can help here, > he's an expert when it comes to strange reordering :) > >> On the top level I am using CANFestival for CANOpen implementation, so >> it has occurred to me I could implement a CANFestival "driver" using >> libusb and completely bypass the kernel module and socket can layers, >> but I hope not to have to do this. > Na, you don't want to do this. > > Marc -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:17 ` Daniel Squires @ 2015-09-08 11:20 ` Marc Kleine-Budde 2015-09-08 11:37 ` Daniel Squires 2015-09-08 16:56 ` Oliver Hartkopp 1 sibling, 1 reply; 20+ messages in thread From: Marc Kleine-Budde @ 2015-09-08 11:20 UTC (permalink / raw) To: Daniel Squires, linux-can, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 793 bytes --] On 09/08/2015 01:17 PM, Daniel Squires wrote: >> The kernel module "produces" the CAN frames, so if you see them in the >> correct order in wireshark, they have left the module in the right order. > > Sorry , I should have been clearer here, in wireshark was looking at the > USB frames not the CAN frames. however I think what you say still stands > due to the time stamps being in the correct order. Thanks for the clarification. Can you have a look at the CAN interface with wireshark, too? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:20 ` Marc Kleine-Budde @ 2015-09-08 11:37 ` Daniel Squires 0 siblings, 0 replies; 20+ messages in thread From: Daniel Squires @ 2015-09-08 11:37 UTC (permalink / raw) To: Marc Kleine-Budde, linux-can, Oliver Hartkopp On 08/09/15 12:20, Marc Kleine-Budde wrote: > On 09/08/2015 01:17 PM, Daniel Squires wrote: >>> The kernel module "produces" the CAN frames, so if you see them in the >>> correct order in wireshark, they have left the module in the right order. >> Sorry , I should have been clearer here, in wireshark was looking at the >> USB frames not the CAN frames. however I think what you say still stands >> due to the time stamps being in the correct order. > Thanks for the clarification. Can you have a look at the CAN interface > with wireshark, too? Wireshark shows the packets in the same order as candump, however it seems the timestamps are in the order wireshark got the packets, rather than when they were generated. A couple of other observations, it seems to take longer for an out of order packet ot happen whilst wireshark is capturing, and on one occasion my application saw an out of order packet which candump showed as being in the correct order! This is a first and I wonder if is also related to wireshark also capturing. > Marc > -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:17 ` Daniel Squires 2015-09-08 11:20 ` Marc Kleine-Budde @ 2015-09-08 16:56 ` Oliver Hartkopp 2015-09-09 2:30 ` Austin Schuh 1 sibling, 1 reply; 20+ messages in thread From: Oliver Hartkopp @ 2015-09-08 16:56 UTC (permalink / raw) To: Daniel Squires, Marc Kleine-Budde, linux-can Hi all, On 08.09.2015 13:17, Daniel Squires wrote: > On 08/09/15 12:13, Marc Kleine-Budde wrote: >>> I can see the packets coming in the correct order in wireshark and it is >>> not immediately obvious to me how the kernel module could mix up the >>> order, so it seems that it must be something that happens at the socket >>> level? >> The kernel module "produces" the CAN frames, so if you see them in the >> correct order in wireshark, they have left the module in the right order. Yes. This is trivial. But Daniel is right to ask about the frame reordering on socket level - better say - reordering outside the driver level. > > Sorry , I should have been clearer here, in wireshark was looking at the USB > frames not the CAN frames. however I think what you say still stands due to > the time stamps being in the correct order. >> >>> candump can3 -tz >>> <snip> >>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 >>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 >>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 >> The timestamps are in the correct order. Maybe Oliver can help here, >> he's an expert when it comes to strange reordering :) Will try - see below. >> >>> On the top level I am using CANFestival for CANOpen implementation, so >>> it has occurred to me I could implement a CANFestival "driver" using >>> libusb and completely bypass the kernel module and socket can layers, >>> but I hope not to have to do this. >> Na, you don't want to do this. The point this that it would not help either - even if you are using the PF_PACKET socket (which wireshark does) - bypassing the CAN network layer modules (can, can_raw) doesn't fix the problem. I discussed the problem on netdev ML as I discovered a out-of-order issue when fixing the CAN_RAW join feature. When you have a multicore SMP processor the interrupt can be processed by different CPUs, which can lead to packet reordering when using netif_ix() on driver level. The discussion ended with the networking guys pointing me to use NAPI which does not really help, e.g. there's only one USB network adapter in linux/drivers/net which is a complete mess. My suggestion was to set a hash value into the socket buffer (skb) at driver level, which is used for generating a 'flow' for IP traffic too. You can generate flows by hashes to put all traffic from a specific IP into the same per-cpu input queue to help TCP assembling the packets in the softirq for this IP address in correct order (aha!). See http://marc.info/?l=linux-netdev&m=143689694125450&w=2 I assume the networking guys interpreted my suggestion as hack as they are not aware how 'addressing' is done in CAN. They only know about IP ... NAPI is not really a valid solution for CAN USB adapters and I think I'll have to restart the discussion as out-of-order frames are a no-go for CAN as it kills ISO15765-2 and (obviously) CANopen segmentation. I assume Daniel uses a multicore system, right? If so, please try the 'hack' I suggested on the netdev ML if it fixes your problem. It might help for the discussion too. Regards, Oliver ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 16:56 ` Oliver Hartkopp @ 2015-09-09 2:30 ` Austin Schuh 2015-09-09 3:10 ` Brian Silverman ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Austin Schuh @ 2015-09-09 2:30 UTC (permalink / raw) To: Oliver Hartkopp, Daniel Squires, Marc Kleine-Budde, linux-can On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote: > > Hi all, > > On 08.09.2015 13:17, Daniel Squires wrote: > > On 08/09/15 12:13, Marc Kleine-Budde wrote: > > >>> I can see the packets coming in the correct order in wireshark and it is > >>> not immediately obvious to me how the kernel module could mix up the > >>> order, so it seems that it must be something that happens at the socket > >>> level? > >> The kernel module "produces" the CAN frames, so if you see them in the > >> correct order in wireshark, they have left the module in the right order. > > Yes. This is trivial. > > But Daniel is right to ask about the frame reordering on socket level - better > say - reordering outside the driver level. > > > > > Sorry , I should have been clearer here, in wireshark was looking at the USB > > frames not the CAN frames. however I think what you say still stands due to > > the time stamps being in the correct order. > >> > >>> candump can3 -tz > >>> <snip> > >>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 > >>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 > >>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 > >> The timestamps are in the correct order. Maybe Oliver can help here, > >> he's an expert when it comes to strange reordering :) > > Will try - see below. > > >> > >>> On the top level I am using CANFestival for CANOpen implementation, so > >>> it has occurred to me I could implement a CANFestival "driver" using > >>> libusb and completely bypass the kernel module and socket can layers, > >>> but I hope not to have to do this. > >> Na, you don't want to do this. > > The point this that it would not help either - even if you are using the > PF_PACKET socket (which wireshark does) - bypassing the CAN network layer > modules (can, can_raw) doesn't fix the problem. > > I discussed the problem on netdev ML as I discovered a out-of-order issue when > fixing the CAN_RAW join feature. > > When you have a multicore SMP processor the interrupt can be processed by > different CPUs, which can lead to packet reordering when using netif_ix() on > driver level. > > The discussion ended with the networking guys pointing me to use NAPI which > does not really help, e.g. there's only one USB network adapter in > linux/drivers/net which is a complete mess. > > My suggestion was to set a hash value into the socket buffer (skb) at driver > level, which is used for generating a 'flow' for IP traffic too. You can > generate flows by hashes to put all traffic from a specific IP into the same > per-cpu input queue to help TCP assembling the packets in the softirq for this > IP address in correct order (aha!). > > See http://marc.info/?l=linux-netdev&m=143689694125450&w=2 > > I assume the networking guys interpreted my suggestion as hack as they are not > aware how 'addressing' is done in CAN. They only know about IP ... > > NAPI is not really a valid solution for CAN USB adapters and I think I'll have > to restart the discussion as out-of-order frames are a no-go for CAN as it > kills ISO15765-2 and (obviously) CANopen segmentation. > > I assume Daniel uses a multicore system, right? > > If so, please try the 'hack' I suggested on the netdev ML if it fixes your > problem. It might help for the discussion too. > > Regards, > Oliver On our boxes, I've been setting the affinity for both the IRQ thread (we are running a RT kernel), and the interrupt to the same single core. Would that help here? We've seen CAN packets get significantly delayed causing overruns due to Ethernet load and both CAN and ethernet sharing the same softirq. Our solution has been to set the affinity for each of those to different cores to keep them isolated. Austin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-09 2:30 ` Austin Schuh @ 2015-09-09 3:10 ` Brian Silverman 2015-09-09 16:23 ` Oliver Hartkopp 2015-09-09 12:05 ` Daniel Squires 2015-09-09 16:14 ` Daniel Squires 2 siblings, 1 reply; 20+ messages in thread From: Brian Silverman @ 2015-09-09 3:10 UTC (permalink / raw) To: Austin Schuh Cc: Oliver Hartkopp, Daniel Squires, Marc Kleine-Budde, linux-can Another thing Austin and I do is set up RPS (receive packet steering) for the Ethernet interfaces so the hardware sends the physical Ethernet interrupts to that same core which isn't involved in servicing the CAN interfaces at all. On Tue, Sep 8, 2015 at 10:30 PM, Austin Schuh <austin@peloton-tech.com> wrote: > > On our boxes, I've been setting the affinity for both the IRQ thread > (we are running a RT kernel), and the interrupt to the same single > core. Would that help here? > > We've seen CAN packets get significantly delayed causing overruns due > to Ethernet load and both CAN and ethernet sharing the same softirq. > Our solution has been to set the affinity for each of those to > different cores to keep them isolated. > > Austin ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-09 3:10 ` Brian Silverman @ 2015-09-09 16:23 ` Oliver Hartkopp 0 siblings, 0 replies; 20+ messages in thread From: Oliver Hartkopp @ 2015-09-09 16:23 UTC (permalink / raw) To: Brian Silverman, Austin Schuh, Daniel Squires Cc: Marc Kleine-Budde, linux-can On 09.09.2015 05:10, Brian Silverman wrote: > Another thing Austin and I do is set up RPS (receive packet steering) > for the Ethernet interfaces so the hardware sends the physical > Ethernet interrupts to that same core which isn't involved in > servicing the CAN interfaces at all. > > On Tue, Sep 8, 2015 at 10:30 PM, Austin Schuh <austin@peloton-tech.com> wrote: >> >> On our boxes, I've been setting the affinity for both the IRQ thread >> (we are running a RT kernel), and the interrupt to the same single >> core. Would that help here? Yes it does. Nailing the interrupts from ethernet and CAN interfaces to different CPUs by setting irq_affinity is a valid but pretty hard solution. Therefore I was trying to use hash based RPS to fix the out-of-order problem and let the kernel networking do the (hopefully optimal) rest. >> We've seen CAN packets get significantly delayed causing overruns due >> to Ethernet load and both CAN and ethernet sharing the same softirq. >> Our solution has been to set the affinity for each of those to >> different cores to keep them isolated. Yes. That's a good point. Due to the fact that you split up ethernet and CAN on different CPUs the softirq is also running on different CPUs. This could also be the solution for Daniels problem! My suggested solution (aka 'hack' http://marc.info/?l=linux-netdev&m=143689694125450&w=2 ) with the hash based RPS does not split the ethernet/CAN traffic among CPUs as - depending on the IP hashes - some of the ethernet traffic can be pushed into the same CPU we use for the CAN interface. So it's a softer solution which at least fixes out-of-order for CAN interfaces. Regards, Oliver ps. There were some performance tests (vanilla and RT kernel) from the University of Prague where you can see the impact of additional ethernet load: http://rtime.felk.cvut.cz/can/ http://rtime.felk.cvut.cz/can/benchmark/3.0/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-09 2:30 ` Austin Schuh 2015-09-09 3:10 ` Brian Silverman @ 2015-09-09 12:05 ` Daniel Squires 2015-09-09 16:14 ` Daniel Squires 2 siblings, 0 replies; 20+ messages in thread From: Daniel Squires @ 2015-09-09 12:05 UTC (permalink / raw) To: Austin Schuh, Oliver Hartkopp, Marc Kleine-Budde, linux-can On 09/09/15 03:30, Austin Schuh wrote: > On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote: >> Hi all, >> >> On 08.09.2015 13:17, Daniel Squires wrote: >>> On 08/09/15 12:13, Marc Kleine-Budde wrote: >>>>> I can see the packets coming in the correct order in wireshark and it is >>>>> not immediately obvious to me how the kernel module could mix up the >>>>> order, so it seems that it must be something that happens at the socket >>>>> level? >>>> The kernel module "produces" the CAN frames, so if you see them in the >>>> correct order in wireshark, they have left the module in the right order. >> Yes. This is trivial. >> >> But Daniel is right to ask about the frame reordering on socket level - better >> say - reordering outside the driver level. >> >>> Sorry , I should have been clearer here, in wireshark was looking at the USB >>> frames not the CAN frames. however I think what you say still stands due to >>> the time stamps being in the correct order. >>>>> candump can3 -tz >>>>> <snip> >>>>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 >>>>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 >>>>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 >>>> The timestamps are in the correct order. Maybe Oliver can help here, >>>> he's an expert when it comes to strange reordering :) >> Will try - see below. >> >>>>> On the top level I am using CANFestival for CANOpen implementation, so >>>>> it has occurred to me I could implement a CANFestival "driver" using >>>>> libusb and completely bypass the kernel module and socket can layers, >>>>> but I hope not to have to do this. >>>> Na, you don't want to do this. >> The point this that it would not help either - even if you are using the >> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer >> modules (can, can_raw) doesn't fix the problem. I meant to bypass ALL the kernel CAN / sock layers and go direct from usb frames to application, which I think would avoid the problem, tho also renders useless tools such as wireshark and can-utils and i would rather avoid. The USB frames appear to arrive in order as the timestamps (as shown by candump) are in order, though the packets come out of recv() OOO, and further testing reveals some of them are significantly delayed at the application level, by 10s of mS, in that in that time many newer pkts are received promptly (<mS). >> I discussed the problem on netdev ML as I discovered a out-of-order issue when >> fixing the CAN_RAW join feature. >> >> When you have a multicore SMP processor the interrupt can be processed by >> different CPUs, which can lead to packet reordering when using netif_ix() on >> driver level. >> >> The discussion ended with the networking guys pointing me to use NAPI which >> does not really help, e.g. there's only one USB network adapter in >> linux/drivers/net which is a complete mess. >> >> My suggestion was to set a hash value into the socket buffer (skb) at driver >> level, which is used for generating a 'flow' for IP traffic too. You can >> generate flows by hashes to put all traffic from a specific IP into the same >> per-cpu input queue to help TCP assembling the packets in the softirq for this >> IP address in correct order (aha!). >> >> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2 >> >> I assume the networking guys interpreted my suggestion as hack as they are not >> aware how 'addressing' is done in CAN. They only know about IP ... >> >> NAPI is not really a valid solution for CAN USB adapters and I think I'll have >> to restart the discussion as out-of-order frames are a no-go for CAN as it >> kills ISO15765-2 and (obviously) CANopen segmentation. >> >> I assume Daniel uses a multicore system, right? Correct, a core I5 in this case. >> >> If so, please try the 'hack' I suggested on the netdev ML if it fixes your >> problem. It might help for the discussion too. >> >> Regards, >> Oliver > On our boxes, I've been setting the affinity for both the IRQ thread > (we are running a RT kernel), and the interrupt to the same single > core. Would that help here? > > We've seen CAN packets get significantly delayed causing overruns due > to Ethernet load and both CAN and ethernet sharing the same softirq. > Our solution has been to set the affinity for each of those to > different cores to keep them isolated. > > Austin > -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-09 2:30 ` Austin Schuh 2015-09-09 3:10 ` Brian Silverman 2015-09-09 12:05 ` Daniel Squires @ 2015-09-09 16:14 ` Daniel Squires 2015-09-09 16:31 ` Oliver Hartkopp 2015-09-17 19:18 ` Oliver Hartkopp 2 siblings, 2 replies; 20+ messages in thread From: Daniel Squires @ 2015-09-09 16:14 UTC (permalink / raw) To: Austin Schuh, Oliver Hartkopp, Marc Kleine-Budde, linux-can The Hack seems to work, its been a short test of a half hour so far, but before it happened reliably after a few seconds. On 09/09/15 03:30, Austin Schuh wrote: > On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote: >> Hi all, >> >> On 08.09.2015 13:17, Daniel Squires wrote: >>> On 08/09/15 12:13, Marc Kleine-Budde wrote: >>>>> I can see the packets coming in the correct order in wireshark and it is >>>>> not immediately obvious to me how the kernel module could mix up the >>>>> order, so it seems that it must be something that happens at the socket >>>>> level? >>>> The kernel module "produces" the CAN frames, so if you see them in the >>>> correct order in wireshark, they have left the module in the right order. >> Yes. This is trivial. >> >> But Daniel is right to ask about the frame reordering on socket level - better >> say - reordering outside the driver level. >> >>> Sorry , I should have been clearer here, in wireshark was looking at the USB >>> frames not the CAN frames. however I think what you say still stands due to >>> the time stamps being in the correct order. >>>>> candump can3 -tz >>>>> <snip> >>>>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00 >>>>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00 >>>>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00 >>>> The timestamps are in the correct order. Maybe Oliver can help here, >>>> he's an expert when it comes to strange reordering :) >> Will try - see below. >> >>>>> On the top level I am using CANFestival for CANOpen implementation, so >>>>> it has occurred to me I could implement a CANFestival "driver" using >>>>> libusb and completely bypass the kernel module and socket can layers, >>>>> but I hope not to have to do this. >>>> Na, you don't want to do this. >> The point this that it would not help either - even if you are using the >> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer >> modules (can, can_raw) doesn't fix the problem. >> >> I discussed the problem on netdev ML as I discovered a out-of-order issue when >> fixing the CAN_RAW join feature. >> >> When you have a multicore SMP processor the interrupt can be processed by >> different CPUs, which can lead to packet reordering when using netif_ix() on >> driver level. >> >> The discussion ended with the networking guys pointing me to use NAPI which >> does not really help, e.g. there's only one USB network adapter in >> linux/drivers/net which is a complete mess. >> >> My suggestion was to set a hash value into the socket buffer (skb) at driver >> level, which is used for generating a 'flow' for IP traffic too. You can >> generate flows by hashes to put all traffic from a specific IP into the same >> per-cpu input queue to help TCP assembling the packets in the softirq for this >> IP address in correct order (aha!). >> >> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2 >> >> I assume the networking guys interpreted my suggestion as hack as they are not >> aware how 'addressing' is done in CAN. They only know about IP ... >> >> NAPI is not really a valid solution for CAN USB adapters and I think I'll have >> to restart the discussion as out-of-order frames are a no-go for CAN as it >> kills ISO15765-2 and (obviously) CANopen segmentation. >> >> I assume Daniel uses a multicore system, right? >> >> If so, please try the 'hack' I suggested on the netdev ML if it fixes your >> problem. It might help for the discussion too. >> >> Regards, >> Oliver > On our boxes, I've been setting the affinity for both the IRQ thread > (we are running a RT kernel), and the interrupt to the same single > core. Would that help here? > > We've seen CAN packets get significantly delayed causing overruns due > to Ethernet load and both CAN and ethernet sharing the same softirq. > Our solution has been to set the affinity for each of those to > different cores to keep them isolated. > > Austin > -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-09 16:14 ` Daniel Squires @ 2015-09-09 16:31 ` Oliver Hartkopp 2015-09-17 19:18 ` Oliver Hartkopp 1 sibling, 0 replies; 20+ messages in thread From: Oliver Hartkopp @ 2015-09-09 16:31 UTC (permalink / raw) To: Daniel Squires, Austin Schuh, Marc Kleine-Budde, linux-can On 09.09.2015 18:14, Daniel Squires wrote: > The Hack seems to work, its been a short test of a half hour so far, but > before it happened reliably after a few seconds. Great! You should not see any out-of-order frames anymore. I obviously have to start a new attempt to push that single line of source code into mainline :-) If it doesn't help to fix your latency problem under ethernet load, you might check the fixed irq_affinity setting for separating ethernet/CAN CPUs that Brian and Austin were suggesting. Thanks for the feedback, Oliver ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-09 16:14 ` Daniel Squires 2015-09-09 16:31 ` Oliver Hartkopp @ 2015-09-17 19:18 ` Oliver Hartkopp 1 sibling, 0 replies; 20+ messages in thread From: Oliver Hartkopp @ 2015-09-17 19:18 UTC (permalink / raw) To: Daniel Squires, Austin Schuh, Marc Kleine-Budde, linux-can Hello Daniel, On 09.09.2015 18:14, Daniel Squires wrote: > The Hack seems to work, its been a short test of a half hour so far, but > before it happened reliably after a few seconds. > >> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote: >>> >>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2 >>> >>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your >>> problem. It might help for the discussion too. In the referenced posting above I suggested to set skb_set_hash(skb, dev->ifindex, PKT_HASH_TYPE_L2); to create a interface specific hash for the socket buffer. And then to enable the receive packet steering (RPS) with echo f > /sys/class/net/can0/queues/rx-0/rps_cpus To create a proper patch and description I evaluated some more skb_set_hash() parameters and finally discovered that setting the skb hash seems to be obsolete ... %-) Can you confirm that echo f > /sys/class/net/can0/queues/rx-0/rps_cpus already fixes the out-of-order issue even without setting the skb hash? If so we could give a general recommendation for multi-core CPU system users to enable RPS for CAN interfaces by setting the specific sysfs entry. Regards, Oliver ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:13 ` Marc Kleine-Budde 2015-09-08 11:17 ` Daniel Squires @ 2015-09-08 11:46 ` Wolfgang Grandegger 2015-09-08 11:49 ` Daniel Squires ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: Wolfgang Grandegger @ 2015-09-08 11:46 UTC (permalink / raw) To: Marc Kleine-Budde, Daniel Squires, linux-can, Oliver Hartkopp Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde: > On 09/08/2015 12:41 PM, Daniel Squires wrote: >> On my laptop and Desktop PC I have not seen it happen. > >> Both the application PC (NUC) and the Laptop are running Ubuntu kernel >> 3.19.0-26-generic >> >> The NUC has the kernel rebuilt without xhci due to problems it causes >> with another USB peripheral. >> >> I am not entirely sure what you mean by which can core I am using but if >> it helps i am opening the socket as follows : > > I mean what kind of CAN adapter... "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 controller. Wolfgang. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:46 ` Wolfgang Grandegger @ 2015-09-08 11:49 ` Daniel Squires 2015-09-08 11:56 ` Marc Kleine-Budde 2015-09-10 2:29 ` Tom Evans 2 siblings, 0 replies; 20+ messages in thread From: Daniel Squires @ 2015-09-08 11:49 UTC (permalink / raw) To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can, Oliver Hartkopp Whilst I am using the kernel module of that project at present, the firmware and hardware are not from that project, but instead based around an STM32 MCU. On 08/09/15 12:46, Wolfgang Grandegger wrote: > > > Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde: >> On 09/08/2015 12:41 PM, Daniel Squires wrote: >>> On my laptop and Desktop PC I have not seen it happen. >> >>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel >>> 3.19.0-26-generic >>> >>> The NUC has the kernel rebuilt without xhci due to problems it causes >>> with another USB peripheral. >>> >>> I am not entirely sure what you mean by which can core I am using >>> but if >>> it helps i am opening the socket as follows : >> >> I mean what kind of CAN adapter... > > "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 > controller. > > Wolfgang. > -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:46 ` Wolfgang Grandegger 2015-09-08 11:49 ` Daniel Squires @ 2015-09-08 11:56 ` Marc Kleine-Budde 2015-09-10 2:29 ` Tom Evans 2 siblings, 0 replies; 20+ messages in thread From: Marc Kleine-Budde @ 2015-09-08 11:56 UTC (permalink / raw) To: Wolfgang Grandegger, Daniel Squires, linux-can, Oliver Hartkopp [-- Attachment #1: Type: text/plain, Size: 583 bytes --] On 09/08/2015 01:46 PM, Wolfgang Grandegger wrote: >> I mean what kind of CAN adapter... > > "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 > controller. ACK. But from the Linux driver side it's a USB device and Daniel is using the driver from github. Thanks, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-08 11:46 ` Wolfgang Grandegger 2015-09-08 11:49 ` Daniel Squires 2015-09-08 11:56 ` Marc Kleine-Budde @ 2015-09-10 2:29 ` Tom Evans 2015-09-10 8:08 ` Daniel Squires 2 siblings, 1 reply; 20+ messages in thread From: Tom Evans @ 2015-09-10 2:29 UTC (permalink / raw) To: Daniel Squires, linux-can On 08/09/15 21:46, Wolfgang Grandegger wrote: > > Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde: >> On 09/08/2015 12:41 PM, Daniel Squires wrote: >>> On my laptop and Desktop PC I have not seen it happen. >> I mean what kind of CAN adapter... > > "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 > controller. http://fabiobaltieri.com/2013/07/23/hacking-into-a-vehicle-can-bus-toyothack-and-socketcan/#more-1419 "my own open hardware USB AVR + MCP2515 interface", "the performances are not that good above 250kbps", "It’s tempting to use an SPI controller (the MCP2515 is very common), but that has terrible performances on highly loaded fast busses, and you will end up with problem such as RX buffer underruns and out-of-order frames." He means "overruns". The MCP2515 doesn't have a FIFO. Messages have to be read out over a slow SPI bus one bit at a time within one message time or it overruns. Or two if the BUKT bit is set, but that risks reading messages in the wrong order. The design uses an ATMEGA32U2 and an MCP2515. I can't see why it shouldn't be able to buffer messages from the MCP2515 at relatively high data rates, if the code is well written. From my experience though, code for the MCP2515 is seldom "well written". It is too easy to fall into a trap and get the message arrival order wrong. This is unlikely to be related to the OP's problem, but just something to be aware of. For anybody still coding and debugging MCP2515 stuff: http://www.microchip.com/forums/m620741.aspx > otherwise its prone to loosing TX packets when loaded. Do you know about having to do something like the following to stop CAN Transmit Drops? The networking stack defaults to DROPPING CAN transmit frames before blocking the socket if you don't. /bin/echo 256 > /sys/class/net/can0/tx_queue_len ... int sndbuf = (250 + 8) * 256; socklen_t socklen = sizeof(sndbuf); /* Minimum socket buffer to try and get it blocking */ rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF, &sndbuf, sizeof(sndbuf)); http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html Tom ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order 2015-09-10 2:29 ` Tom Evans @ 2015-09-10 8:08 ` Daniel Squires 0 siblings, 0 replies; 20+ messages in thread From: Daniel Squires @ 2015-09-10 8:08 UTC (permalink / raw) To: tom_usenet, linux-can On 10/09/15 03:29, Tom Evans wrote: > On 08/09/15 21:46, Wolfgang Grandegger wrote: >> >> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde: >>> On 09/08/2015 12:41 PM, Daniel Squires wrote: >>>> On my laptop and Desktop PC I have not seen it happen. >>> I mean what kind of CAN adapter... >> >> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 >> controller. > > http://fabiobaltieri.com/2013/07/23/hacking-into-a-vehicle-can-bus-toyothack-and-socketcan/#more-1419 > > > "my own open hardware USB AVR + MCP2515 interface", "the performances > are not that good above 250kbps", "It’s tempting to use an SPI > controller (the MCP2515 is very common), but that has terrible > performances on highly loaded fast busses, and you will end up with > problem such as RX buffer underruns and out-of-order frames." > > He means "overruns". The MCP2515 doesn't have a FIFO. Messages have to > be read out over a slow SPI bus one bit at a time within one message > time or it overruns. Or two if the BUKT bit is set, but that risks > reading messages in the wrong order. > > The design uses an ATMEGA32U2 and an MCP2515. I can't see why it > shouldn't be able to buffer messages from the MCP2515 at relatively > high data rates, if the code is well written. From my experience > though, code for the MCP2515 is seldom "well written". It is too easy > to fall into a trap and get the message arrival order wrong. > > This is unlikely to be related to the OP's problem, but just something > to be aware of. Yes, I had read that whole article and didn't use the hardware / firmware there for those reasons. I needed 1Mbit also. I am using the STM32F4 Discovery boards with a CAN phy attached. I didn't know where to start with t kernel module which is why i am suing the one from there at present, it would be nice to get a "standardised" usb class kernel module, but i guess that would require input form the USB implementers group. > > For anybody still coding and debugging MCP2515 stuff: > > http://www.microchip.com/forums/m620741.aspx > > > otherwise its prone to loosing TX packets when loaded. > > Do you know about having to do something like the following to stop > CAN Transmit Drops? The networking stack defaults to DROPPING CAN > transmit frames before blocking the socket if you don't. > > /bin/echo 256 > /sys/class/net/can0/tx_queue_len > ... > int sndbuf = (250 + 8) * 256; > socklen_t socklen = sizeof(sndbuf); > /* Minimum socket buffer to try and get it blocking */ > rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF, > &sndbuf, sizeof(sndbuf)); > I hadn't noticed it could be done in that way, had been using ip utility, but was aware of the 10 frames default queue size and that it could be changed. In my very basic OOO test app i'm actually sending packets with incrementing values until there is no space (send returns ENOBUF), then doing the receives and checking the values until there is nothing to receive, before continuing to send from the previous failed value. > > http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html > > > Tom > > > -- Dan Squires Engineered Arts Ltd. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2015-09-17 19:24 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-08 9:42 socket can receive order Daniel Squires 2015-09-08 10:01 ` Marc Kleine-Budde 2015-09-08 10:41 ` Daniel Squires 2015-09-08 11:13 ` Marc Kleine-Budde 2015-09-08 11:17 ` Daniel Squires 2015-09-08 11:20 ` Marc Kleine-Budde 2015-09-08 11:37 ` Daniel Squires 2015-09-08 16:56 ` Oliver Hartkopp 2015-09-09 2:30 ` Austin Schuh 2015-09-09 3:10 ` Brian Silverman 2015-09-09 16:23 ` Oliver Hartkopp 2015-09-09 12:05 ` Daniel Squires 2015-09-09 16:14 ` Daniel Squires 2015-09-09 16:31 ` Oliver Hartkopp 2015-09-17 19:18 ` Oliver Hartkopp 2015-09-08 11:46 ` Wolfgang Grandegger 2015-09-08 11:49 ` Daniel Squires 2015-09-08 11:56 ` Marc Kleine-Budde 2015-09-10 2:29 ` Tom Evans 2015-09-10 8:08 ` Daniel Squires
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).