* Stuck vcan
@ 2014-03-11 0:22 Austin Schuh
2014-03-11 7:05 ` Oliver Hartkopp
0 siblings, 1 reply; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 0:22 UTC (permalink / raw)
To: linux-can
I am seeing the following
uname -a
$ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1
(2013-09-10) x86_64 GNU/Linux
syslog:
Mar 10 16:29:56 vpc3 kernel: [355322.806011] Dead loop on virtual device
vcan2, fix it urgently!
Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
vcan2, fix it urgently!
austin[4262] vpc3 ~
$ ifconfig vcan2
vcan2 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
UP RUNNING NOARP MTU:16 Metric:1
RX packets:76160257 errors:0 dropped:0 overruns:0 frame:0
TX packets:76160257 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:609282056 (581.0 MiB) TX bytes:609282056 (581.0 MiB)
Any thoughts? I don't see this happen very often. The write call on the
socket fails when this happens. It is generally when I'm sending a large
number of messages on multiple vcans from multiple processes at once.
Austin
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Stuck vcan 2014-03-11 0:22 Stuck vcan Austin Schuh @ 2014-03-11 7:05 ` Oliver Hartkopp 2014-03-11 20:37 ` Austin Schuh 0 siblings, 1 reply; 7+ messages in thread From: Oliver Hartkopp @ 2014-03-11 7:05 UTC (permalink / raw) To: Austin Schuh, linux-can Hi Austin, On 11.03.2014 01:22, Austin Schuh wrote: > $ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1 Oh, one of these -rt kernels again %-) > Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device > vcan2, fix it urgently! The vcan driver usually does not set the IFF_ECHO flag: > > austin[4262] vpc3 ~ > $ ifconfig vcan2 > vcan2 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > UP RUNNING NOARP MTU:16 Metric:1 ^^ Which would be indicated here then. Therefore the 'shortcut' in af_can.c is used to provide the echo: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c#n283 which increases performance but may lead to funny effects as the sending process leads to a reception event which is probably already handled by the receiver when the write() of the sender returns. Two things I would suggest: 1. Let the vcan really handle the frames by enabling IFF_ECHO 2. Add a queue (queue length != 0) to the vcan interface For 1: Insert the vcan module with the parameter echo=1, e.g. # rmmod vcan # modprobe vcan echo=1 # ip link add type vcan # ip link show vcan0 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default link/can And see if this already fixes your problem ('ECHO' is enabled here). For 2: Add a queue length to your vcan with # ip link set vcan0 txqueuelen 10 # ip link show vcan0 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10 link/can This urges the network subsystem to used the normal packet flow, as a queue length of zero is only feasible for software devices like loopback / vcan ... Please give a feedback if it solves the problem in your -rt kernel. Best regards, Oliver ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan 2014-03-11 7:05 ` Oliver Hartkopp @ 2014-03-11 20:37 ` Austin Schuh 2014-03-11 20:56 ` Marc Kleine-Budde 2014-03-11 21:29 ` Oliver Hartkopp 0 siblings, 2 replies; 7+ messages in thread From: Austin Schuh @ 2014-03-11 20:37 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can On Tue, Mar 11, 2014 at 12:05 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: > Hi Austin, > > > > On 11.03.2014 01:22, Austin Schuh wrote: > >> $ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1 > > Oh, one of these -rt kernels again %-) > >> Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device >> vcan2, fix it urgently! > > The vcan driver usually does not set the IFF_ECHO flag: >> >> austin[4262] vpc3 ~ >> $ ifconfig vcan2 >> vcan2 Link encap:UNSPEC HWaddr >> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >> UP RUNNING NOARP MTU:16 Metric:1 > ^^ > Which would be indicated here then. > > Therefore the 'shortcut' in af_can.c is used to provide the echo: > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c#n283 > > which increases performance but may lead to funny effects as the sending > process leads to a reception event which is probably already handled by the > receiver when the write() of the sender returns. > > Two things I would suggest: > > 1. Let the vcan really handle the frames by enabling IFF_ECHO > 2. Add a queue (queue length != 0) to the vcan interface > > For 1: Insert the vcan module with the parameter echo=1, e.g. > > # rmmod vcan > # modprobe vcan echo=1 > # ip link add type vcan > # ip link show vcan0 > 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default > link/can > > And see if this already fixes your problem ('ECHO' is enabled here). > > For 2: Add a queue length to your vcan with > > # ip link set vcan0 txqueuelen 10 > # ip link show vcan0 > 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10 > link/can > > This urges the network subsystem to used the normal packet flow, as a queue > length of zero is only feasible for software devices like loopback / vcan ... > > Please give a feedback if it solves the problem in your -rt kernel. > > Best regards, > Oliver > Hi Oliver, I'm not trying to cause problems ;) Neither of those helped. # ifconfig vcan0 vcan0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 UP RUNNING NOARP MTU:16 Metric:1 RX packets:428406 errors:0 dropped:0 overruns:0 frame:0 TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3427248 (3.2 MiB) TX bytes:3427248 (3.2 MiB) # ip link show vcan0 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN mode DEFAULT link/can # ifconfig vcan0 vcan0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 UP RUNNING NOARP MTU:16 Metric:1 RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0 TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:10 RX bytes:11200752 (10.6 MiB) TX bytes:11200752 (10.6 MiB) # ip link show vcan0 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN mode DEFAULT qlen 10 link/can Both times, I get the following. Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on virtual device vcan0, fix it urgently! Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on virtual device vcan0, fix it urgently! Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on virtual device vcan0, fix it urgently! Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on virtual device vcan0, fix it urgently! Luckily, it is easy to reproduce. I have some trivial code which sends out 10,000 msgs/sec on the vcan. I run 3+ copies of the code on a dual core machine with hyperthreads enabled. More parallel sends should cause it to occur more frequently. Austin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan 2014-03-11 20:37 ` Austin Schuh @ 2014-03-11 20:56 ` Marc Kleine-Budde 2014-03-11 21:54 ` Austin Schuh 2014-03-11 21:29 ` Oliver Hartkopp 1 sibling, 1 reply; 7+ messages in thread From: Marc Kleine-Budde @ 2014-03-11 20:56 UTC (permalink / raw) To: Austin Schuh, Oliver Hartkopp; +Cc: linux-can [-- Attachment #1: Type: text/plain, Size: 2288 bytes --] On 03/11/2014 09:37 PM, Austin Schuh wrote: > I'm not trying to cause problems ;) :D > Neither of those helped. > > # ifconfig vcan0 > vcan0 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > UP RUNNING NOARP MTU:16 Metric:1 > RX packets:428406 errors:0 dropped:0 overruns:0 frame:0 > TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:3427248 (3.2 MiB) TX bytes:3427248 (3.2 MiB) > > # ip link show vcan0 > 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN > mode DEFAULT > link/can > > # ifconfig vcan0 > vcan0 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > UP RUNNING NOARP MTU:16 Metric:1 > RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:10 > RX bytes:11200752 (10.6 MiB) TX bytes:11200752 (10.6 MiB) > > # ip link show vcan0 > 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN > mode DEFAULT qlen 10 > link/can > > Both times, I get the following. > > Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on > virtual device vcan0, fix it urgently! > Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on > virtual device vcan0, fix it urgently! > Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on > virtual device vcan0, fix it urgently! > Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on > virtual device vcan0, fix it urgently! > > Luckily, it is easy to reproduce. I have some trivial code which > sends out 10,000 msgs/sec on the vcan. I run 3+ copies of the code on > a dual core machine with hyperthreads enabled. More parallel sends > should cause it to occur more frequently. Can you share the code? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 242 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan 2014-03-11 20:56 ` Marc Kleine-Budde @ 2014-03-11 21:54 ` Austin Schuh 0 siblings, 0 replies; 7+ messages in thread From: Austin Schuh @ 2014-03-11 21:54 UTC (permalink / raw) To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can [-- Attachment #1: Type: text/plain, Size: 2465 bytes --] On Tue, Mar 11, 2014 at 1:56 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > On 03/11/2014 09:37 PM, Austin Schuh wrote: >> I'm not trying to cause problems ;) > > :D > >> Neither of those helped. >> >> # ifconfig vcan0 >> vcan0 Link encap:UNSPEC HWaddr >> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >> UP RUNNING NOARP MTU:16 Metric:1 >> RX packets:428406 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:3427248 (3.2 MiB) TX bytes:3427248 (3.2 MiB) >> >> # ip link show vcan0 >> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN >> mode DEFAULT >> link/can >> >> # ifconfig vcan0 >> vcan0 Link encap:UNSPEC HWaddr >> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 >> UP RUNNING NOARP MTU:16 Metric:1 >> RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:10 >> RX bytes:11200752 (10.6 MiB) TX bytes:11200752 (10.6 MiB) >> >> # ip link show vcan0 >> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN >> mode DEFAULT qlen 10 >> link/can >> >> Both times, I get the following. >> >> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on >> virtual device vcan0, fix it urgently! >> Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on >> virtual device vcan0, fix it urgently! >> Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on >> virtual device vcan0, fix it urgently! >> Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on >> virtual device vcan0, fix it urgently! >> >> Luckily, it is easy to reproduce. I have some trivial code which >> sends out 10,000 msgs/sec on the vcan. I run 3+ copies of the code on >> a dual core machine with hyperthreads enabled. More parallel sends >> should cause it to occur more frequently. > > Can you share the code? > > Marc > > -- > Pengutronix e.K. | Marc Kleine-Budde | > Industrial Linux Solutions | Phone: +49-231-2826-924 | > Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | > Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | > The attached code reproduces the problem for me. I removed all the extra dependencies. Austin [-- Attachment #2: bus_load_linux_can.cc --] [-- Type: text/x-c++src, Size: 1873 bytes --] #include <linux/can.h> #include <linux/can/raw.h> #include <sys/ioctl.h> #include <net/if.h> #include <unistd.h> #include <assert.h> #include <stdio.h> #include <string.h> #include <string> #include <iostream> // A can bus which sends using SocketCan to the specified interface. class SocketCanBus { public: explicit SocketCanBus(const ::std::string &iface) : iface_(iface) {} ~SocketCanBus() { if (socket_ != -1) { close(socket_); } } void Initialize(); bool Send(const struct can_frame &frame) __attribute__((warn_unused_result)); private: // The socket. int socket_; ::std::string iface_; }; void SocketCanBus::Initialize() { socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW); struct sockaddr_can addr; struct ifreq ifr; assert(socket_ >= 0); strcpy(ifr.ifr_name, iface_.c_str()); assert(ioctl(socket_, SIOCGIFINDEX, &ifr) == 0); // Zero initialize the sockaddr, mainly for valgrind, but it can't hurt. struct sockaddr *my_sockaddr_can = (struct sockaddr *)&addr; memset(my_sockaddr_can->sa_data, 0, sizeof(my_sockaddr_can->sa_data)); addr.can_family = AF_CAN; addr.can_ifindex = ifr.ifr_ifindex; ::std::cout << "Connecting to " << iface_ << ::std::endl; assert(bind(socket_, (struct sockaddr *)&addr, sizeof(addr)) == 0); } bool SocketCanBus::Send(const struct can_frame &frame) { assert(socket_ != -1); return write(socket_, &frame, sizeof(struct can_frame)) == sizeof(struct can_frame); } int main(int argc, char *argv[]) { SocketCanBus bus("vcan0"); bus.Initialize(); while (true) { usleep(1000000 / 10000); struct can_frame frame; frame.can_id = 0x100; frame.can_dlc = 8; for (int i = 0; i < 8; ++i) { frame.data[i] = i; } if (!bus.Send(frame)) { ::std::cout << "Failed to send the CAN frame." << ::std::endl; } } return 0; } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan 2014-03-11 20:37 ` Austin Schuh 2014-03-11 20:56 ` Marc Kleine-Budde @ 2014-03-11 21:29 ` Oliver Hartkopp 2014-03-11 21:55 ` Austin Schuh 1 sibling, 1 reply; 7+ messages in thread From: Oliver Hartkopp @ 2014-03-11 21:29 UTC (permalink / raw) To: Austin Schuh; +Cc: linux-can On 11.03.2014 21:37, Austin Schuh wrote: > # ip link show vcan0 > 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN > mode DEFAULT qlen 10 > link/can > > Both times, I get the following. > > Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on > virtual device vcan0, fix it urgently! As you can see above the qdisc is 'noqueue' Please check out the __dev_queue_xmit() function in http://lxr.free-electrons.com/source/net/core/dev.c#L2782 which creates the 'fix urgently' message. I assume when creating a qdisc which is not 'noqueue' the handling for skb queueing is forced. See: http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf Regards, Oliver ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan 2014-03-11 21:29 ` Oliver Hartkopp @ 2014-03-11 21:55 ` Austin Schuh 0 siblings, 0 replies; 7+ messages in thread From: Austin Schuh @ 2014-03-11 21:55 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can On Tue, Mar 11, 2014 at 2:29 PM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: > > > On 11.03.2014 21:37, Austin Schuh wrote: > >> # ip link show vcan0 >> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN >> mode DEFAULT qlen 10 >> link/can >> >> Both times, I get the following. >> >> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on >> virtual device vcan0, fix it urgently! > > As you can see above the qdisc is 'noqueue' > > Please check out the __dev_queue_xmit() function in > > http://lxr.free-electrons.com/source/net/core/dev.c#L2782 > > which creates the 'fix urgently' message. > > I assume when creating a qdisc which is not 'noqueue' the handling for skb > queueing is forced. > > See: > http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf > > Regards, > Oliver > Thanks for the pointers Oliver. I'll take a look and see what I can find. It will probably be a week or two before I can really sit down and dig through this. Austin ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-11 21:55 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-11 0:22 Stuck vcan Austin Schuh 2014-03-11 7:05 ` Oliver Hartkopp 2014-03-11 20:37 ` Austin Schuh 2014-03-11 20:56 ` Marc Kleine-Budde 2014-03-11 21:54 ` Austin Schuh 2014-03-11 21:29 ` Oliver Hartkopp 2014-03-11 21:55 ` Austin Schuh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.