* Stuck vcan
@ 2014-03-11 0:22 Austin Schuh
2014-03-11 7:05 ` Oliver Hartkopp
0 siblings, 1 reply; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 0:22 UTC (permalink / raw)
To: linux-can
I am seeing the following
uname -a
$ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1
(2013-09-10) x86_64 GNU/Linux
syslog:
Mar 10 16:29:56 vpc3 kernel: [355322.806011] Dead loop on virtual device
vcan2, fix it urgently!
Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
vcan2, fix it urgently!
austin[4262] vpc3 ~
$ ifconfig vcan2
vcan2 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
UP RUNNING NOARP MTU:16 Metric:1
RX packets:76160257 errors:0 dropped:0 overruns:0 frame:0
TX packets:76160257 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:609282056 (581.0 MiB) TX bytes:609282056 (581.0 MiB)
Any thoughts? I don't see this happen very often. The write call on the
socket fails when this happens. It is generally when I'm sending a large
number of messages on multiple vcans from multiple processes at once.
Austin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan
2014-03-11 0:22 Stuck vcan Austin Schuh
@ 2014-03-11 7:05 ` Oliver Hartkopp
2014-03-11 20:37 ` Austin Schuh
0 siblings, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2014-03-11 7:05 UTC (permalink / raw)
To: Austin Schuh, linux-can
Hi Austin,
On 11.03.2014 01:22, Austin Schuh wrote:
> $ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1
Oh, one of these -rt kernels again %-)
> Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
> vcan2, fix it urgently!
The vcan driver usually does not set the IFF_ECHO flag:
>
> austin[4262] vpc3 ~
> $ ifconfig vcan2
> vcan2 Link encap:UNSPEC HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> UP RUNNING NOARP MTU:16 Metric:1
^^
Which would be indicated here then.
Therefore the 'shortcut' in af_can.c is used to provide the echo:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c#n283
which increases performance but may lead to funny effects as the sending
process leads to a reception event which is probably already handled by the
receiver when the write() of the sender returns.
Two things I would suggest:
1. Let the vcan really handle the frames by enabling IFF_ECHO
2. Add a queue (queue length != 0) to the vcan interface
For 1: Insert the vcan module with the parameter echo=1, e.g.
# rmmod vcan
# modprobe vcan echo=1
# ip link add type vcan
# ip link show vcan0
26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default
link/can
And see if this already fixes your problem ('ECHO' is enabled here).
For 2: Add a queue length to your vcan with
# ip link set vcan0 txqueuelen 10
# ip link show vcan0
26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
link/can
This urges the network subsystem to used the normal packet flow, as a queue
length of zero is only feasible for software devices like loopback / vcan ...
Please give a feedback if it solves the problem in your -rt kernel.
Best regards,
Oliver
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan
2014-03-11 7:05 ` Oliver Hartkopp
@ 2014-03-11 20:37 ` Austin Schuh
2014-03-11 20:56 ` Marc Kleine-Budde
2014-03-11 21:29 ` Oliver Hartkopp
0 siblings, 2 replies; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 20:37 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Tue, Mar 11, 2014 at 12:05 AM, Oliver Hartkopp
<socketcan@hartkopp.net> wrote:
> Hi Austin,
>
>
>
> On 11.03.2014 01:22, Austin Schuh wrote:
>
>> $ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1
>
> Oh, one of these -rt kernels again %-)
>
>> Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
>> vcan2, fix it urgently!
>
> The vcan driver usually does not set the IFF_ECHO flag:
>>
>> austin[4262] vpc3 ~
>> $ ifconfig vcan2
>> vcan2 Link encap:UNSPEC HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>> UP RUNNING NOARP MTU:16 Metric:1
> ^^
> Which would be indicated here then.
>
> Therefore the 'shortcut' in af_can.c is used to provide the echo:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c#n283
>
> which increases performance but may lead to funny effects as the sending
> process leads to a reception event which is probably already handled by the
> receiver when the write() of the sender returns.
>
> Two things I would suggest:
>
> 1. Let the vcan really handle the frames by enabling IFF_ECHO
> 2. Add a queue (queue length != 0) to the vcan interface
>
> For 1: Insert the vcan module with the parameter echo=1, e.g.
>
> # rmmod vcan
> # modprobe vcan echo=1
> # ip link add type vcan
> # ip link show vcan0
> 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default
> link/can
>
> And see if this already fixes your problem ('ECHO' is enabled here).
>
> For 2: Add a queue length to your vcan with
>
> # ip link set vcan0 txqueuelen 10
> # ip link show vcan0
> 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
> link/can
>
> This urges the network subsystem to used the normal packet flow, as a queue
> length of zero is only feasible for software devices like loopback / vcan ...
>
> Please give a feedback if it solves the problem in your -rt kernel.
>
> Best regards,
> Oliver
>
Hi Oliver,
I'm not trying to cause problems ;)
Neither of those helped.
# ifconfig vcan0
vcan0 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
UP RUNNING NOARP MTU:16 Metric:1
RX packets:428406 errors:0 dropped:0 overruns:0 frame:0
TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3427248 (3.2 MiB) TX bytes:3427248 (3.2 MiB)
# ip link show vcan0
17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
mode DEFAULT
link/can
# ifconfig vcan0
vcan0 Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
UP RUNNING NOARP MTU:16 Metric:1
RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0
TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:10
RX bytes:11200752 (10.6 MiB) TX bytes:11200752 (10.6 MiB)
# ip link show vcan0
17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
mode DEFAULT qlen 10
link/can
Both times, I get the following.
Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
virtual device vcan0, fix it urgently!
Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on
virtual device vcan0, fix it urgently!
Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on
virtual device vcan0, fix it urgently!
Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on
virtual device vcan0, fix it urgently!
Luckily, it is easy to reproduce. I have some trivial code which
sends out 10,000 msgs/sec on the vcan. I run 3+ copies of the code on
a dual core machine with hyperthreads enabled. More parallel sends
should cause it to occur more frequently.
Austin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan
2014-03-11 20:37 ` Austin Schuh
@ 2014-03-11 20:56 ` Marc Kleine-Budde
2014-03-11 21:54 ` Austin Schuh
2014-03-11 21:29 ` Oliver Hartkopp
1 sibling, 1 reply; 7+ messages in thread
From: Marc Kleine-Budde @ 2014-03-11 20:56 UTC (permalink / raw)
To: Austin Schuh, Oliver Hartkopp; +Cc: linux-can
[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]
On 03/11/2014 09:37 PM, Austin Schuh wrote:
> I'm not trying to cause problems ;)
:D
> Neither of those helped.
>
> # ifconfig vcan0
> vcan0 Link encap:UNSPEC HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> UP RUNNING NOARP MTU:16 Metric:1
> RX packets:428406 errors:0 dropped:0 overruns:0 frame:0
> TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:3427248 (3.2 MiB) TX bytes:3427248 (3.2 MiB)
>
> # ip link show vcan0
> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
> mode DEFAULT
> link/can
>
> # ifconfig vcan0
> vcan0 Link encap:UNSPEC HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
> UP RUNNING NOARP MTU:16 Metric:1
> RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:10
> RX bytes:11200752 (10.6 MiB) TX bytes:11200752 (10.6 MiB)
>
> # ip link show vcan0
> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
> mode DEFAULT qlen 10
> link/can
>
> Both times, I get the following.
>
> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
> virtual device vcan0, fix it urgently!
> Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on
> virtual device vcan0, fix it urgently!
> Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on
> virtual device vcan0, fix it urgently!
> Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on
> virtual device vcan0, fix it urgently!
>
> Luckily, it is easy to reproduce. I have some trivial code which
> sends out 10,000 msgs/sec on the vcan. I run 3+ copies of the code on
> a dual core machine with hyperthreads enabled. More parallel sends
> should cause it to occur more frequently.
Can you share the code?
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan
2014-03-11 20:37 ` Austin Schuh
2014-03-11 20:56 ` Marc Kleine-Budde
@ 2014-03-11 21:29 ` Oliver Hartkopp
2014-03-11 21:55 ` Austin Schuh
1 sibling, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2014-03-11 21:29 UTC (permalink / raw)
To: Austin Schuh; +Cc: linux-can
On 11.03.2014 21:37, Austin Schuh wrote:
> # ip link show vcan0
> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
> mode DEFAULT qlen 10
> link/can
>
> Both times, I get the following.
>
> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
> virtual device vcan0, fix it urgently!
As you can see above the qdisc is 'noqueue'
Please check out the __dev_queue_xmit() function in
http://lxr.free-electrons.com/source/net/core/dev.c#L2782
which creates the 'fix urgently' message.
I assume when creating a qdisc which is not 'noqueue' the handling for skb
queueing is forced.
See:
http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
Regards,
Oliver
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan
2014-03-11 20:56 ` Marc Kleine-Budde
@ 2014-03-11 21:54 ` Austin Schuh
0 siblings, 0 replies; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 21:54 UTC (permalink / raw)
To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can
[-- Attachment #1: Type: text/plain, Size: 2465 bytes --]
On Tue, Mar 11, 2014 at 1:56 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 03/11/2014 09:37 PM, Austin Schuh wrote:
>> I'm not trying to cause problems ;)
>
> :D
>
>> Neither of those helped.
>>
>> # ifconfig vcan0
>> vcan0 Link encap:UNSPEC HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>> UP RUNNING NOARP MTU:16 Metric:1
>> RX packets:428406 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:3427248 (3.2 MiB) TX bytes:3427248 (3.2 MiB)
>>
>> # ip link show vcan0
>> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
>> mode DEFAULT
>> link/can
>>
>> # ifconfig vcan0
>> vcan0 Link encap:UNSPEC HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>> UP RUNNING NOARP MTU:16 Metric:1
>> RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:10
>> RX bytes:11200752 (10.6 MiB) TX bytes:11200752 (10.6 MiB)
>>
>> # ip link show vcan0
>> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
>> mode DEFAULT qlen 10
>> link/can
>>
>> Both times, I get the following.
>>
>> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
>> virtual device vcan0, fix it urgently!
>> Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on
>> virtual device vcan0, fix it urgently!
>> Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on
>> virtual device vcan0, fix it urgently!
>> Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on
>> virtual device vcan0, fix it urgently!
>>
>> Luckily, it is easy to reproduce. I have some trivial code which
>> sends out 10,000 msgs/sec on the vcan. I run 3+ copies of the code on
>> a dual core machine with hyperthreads enabled. More parallel sends
>> should cause it to occur more frequently.
>
> Can you share the code?
>
> Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Industrial Linux Solutions | Phone: +49-231-2826-924 |
> Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
>
The attached code reproduces the problem for me. I removed all the
extra dependencies.
Austin
[-- Attachment #2: bus_load_linux_can.cc --]
[-- Type: text/x-c++src, Size: 1873 bytes --]
#include <linux/can.h>
#include <linux/can/raw.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <unistd.h>
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <string>
#include <iostream>
// A can bus which sends using SocketCan to the specified interface.
class SocketCanBus {
public:
explicit SocketCanBus(const ::std::string &iface) : iface_(iface) {}
~SocketCanBus() {
if (socket_ != -1) {
close(socket_);
}
}
void Initialize();
bool Send(const struct can_frame &frame)
__attribute__((warn_unused_result));
private:
// The socket.
int socket_;
::std::string iface_;
};
void SocketCanBus::Initialize() {
socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW);
struct sockaddr_can addr;
struct ifreq ifr;
assert(socket_ >= 0);
strcpy(ifr.ifr_name, iface_.c_str());
assert(ioctl(socket_, SIOCGIFINDEX, &ifr) == 0);
// Zero initialize the sockaddr, mainly for valgrind, but it can't hurt.
struct sockaddr *my_sockaddr_can = (struct sockaddr *)&addr;
memset(my_sockaddr_can->sa_data, 0, sizeof(my_sockaddr_can->sa_data));
addr.can_family = AF_CAN;
addr.can_ifindex = ifr.ifr_ifindex;
::std::cout << "Connecting to " << iface_ << ::std::endl;
assert(bind(socket_, (struct sockaddr *)&addr, sizeof(addr)) == 0);
}
bool SocketCanBus::Send(const struct can_frame &frame) {
assert(socket_ != -1);
return write(socket_, &frame, sizeof(struct can_frame)) ==
sizeof(struct can_frame);
}
int main(int argc, char *argv[]) {
SocketCanBus bus("vcan0");
bus.Initialize();
while (true) {
usleep(1000000 / 10000);
struct can_frame frame;
frame.can_id = 0x100;
frame.can_dlc = 8;
for (int i = 0; i < 8; ++i) {
frame.data[i] = i;
}
if (!bus.Send(frame)) {
::std::cout << "Failed to send the CAN frame." << ::std::endl;
}
}
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Stuck vcan
2014-03-11 21:29 ` Oliver Hartkopp
@ 2014-03-11 21:55 ` Austin Schuh
0 siblings, 0 replies; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 21:55 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: linux-can
On Tue, Mar 11, 2014 at 2:29 PM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
>
> On 11.03.2014 21:37, Austin Schuh wrote:
>
>> # ip link show vcan0
>> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
>> mode DEFAULT qlen 10
>> link/can
>>
>> Both times, I get the following.
>>
>> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
>> virtual device vcan0, fix it urgently!
>
> As you can see above the qdisc is 'noqueue'
>
> Please check out the __dev_queue_xmit() function in
>
> http://lxr.free-electrons.com/source/net/core/dev.c#L2782
>
> which creates the 'fix urgently' message.
>
> I assume when creating a qdisc which is not 'noqueue' the handling for skb
> queueing is forced.
>
> See:
> http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
>
> Regards,
> Oliver
>
Thanks for the pointers Oliver. I'll take a look and see what I can
find. It will probably be a week or two before I can really sit down
and dig through this.
Austin
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-03-11 21:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-11 0:22 Stuck vcan Austin Schuh
2014-03-11 7:05 ` Oliver Hartkopp
2014-03-11 20:37 ` Austin Schuh
2014-03-11 20:56 ` Marc Kleine-Budde
2014-03-11 21:54 ` Austin Schuh
2014-03-11 21:29 ` Oliver Hartkopp
2014-03-11 21:55 ` Austin Schuh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.