All of lore.kernel.org
 help / color / mirror / Atom feed
* Stuck vcan
@ 2014-03-11  0:22 Austin Schuh
  2014-03-11  7:05 ` Oliver Hartkopp
  0 siblings, 1 reply; 7+ messages in thread
From: Austin Schuh @ 2014-03-11  0:22 UTC (permalink / raw)
  To: linux-can

I am seeing the following

uname -a
$ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1
(2013-09-10) x86_64 GNU/Linux

syslog:
Mar 10 16:29:56 vpc3 kernel: [355322.806011] Dead loop on virtual device
vcan2, fix it urgently!
Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
vcan2, fix it urgently!

austin[4262] vpc3 ~
$ ifconfig vcan2
vcan2     Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:76160257 errors:0 dropped:0 overruns:0 frame:0
          TX packets:76160257 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:609282056 (581.0 MiB)  TX bytes:609282056 (581.0 MiB)

Any thoughts?  I don't see this happen very often.  The write call on the
socket fails when this happens.  It is generally when I'm sending a large
number of messages on multiple vcans from multiple processes at once.

Austin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Stuck vcan
  2014-03-11  0:22 Stuck vcan Austin Schuh
@ 2014-03-11  7:05 ` Oliver Hartkopp
  2014-03-11 20:37   ` Austin Schuh
  0 siblings, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2014-03-11  7:05 UTC (permalink / raw)
  To: Austin Schuh, linux-can

Hi Austin,



On 11.03.2014 01:22, Austin Schuh wrote:

> $ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1

Oh, one of these -rt kernels again %-)

> Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
> vcan2, fix it urgently!

The vcan driver usually does not set the IFF_ECHO flag:
> 
> austin[4262] vpc3 ~
> $ ifconfig vcan2
> vcan2     Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           UP RUNNING NOARP  MTU:16  Metric:1
                            ^^
Which would be indicated here then.

Therefore the 'shortcut' in af_can.c is used to provide the echo:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c#n283

which increases performance but may lead to funny effects as the sending
process leads to a reception event which is probably already handled by the
receiver when the write() of the sender returns.

Two things I would suggest:

1. Let the vcan really handle the frames by enabling IFF_ECHO
2. Add a queue (queue length != 0) to the vcan interface

For 1: Insert the vcan module with the parameter echo=1, e.g.

# rmmod vcan
# modprobe vcan echo=1
# ip link add type vcan
# ip link show vcan0
26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default 
    link/can 

And see if this already fixes your problem ('ECHO' is enabled here).

For 2: Add a queue length to your vcan with

# ip link set vcan0 txqueuelen 10
# ip link show vcan0
26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
    link/can 

This urges the network subsystem to used the normal packet flow, as a queue
length of zero is only feasible for software devices like loopback / vcan ...

Please give a feedback if it solves the problem in your -rt kernel.

Best regards,
Oliver


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Stuck vcan
  2014-03-11  7:05 ` Oliver Hartkopp
@ 2014-03-11 20:37   ` Austin Schuh
  2014-03-11 20:56     ` Marc Kleine-Budde
  2014-03-11 21:29     ` Oliver Hartkopp
  0 siblings, 2 replies; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 20:37 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can

On Tue, Mar 11, 2014 at 12:05 AM, Oliver Hartkopp
<socketcan@hartkopp.net> wrote:
> Hi Austin,
>
>
>
> On 11.03.2014 01:22, Austin Schuh wrote:
>
>> $ Linux vpc3 3.10-3-rt-amd64 #1 SMP PREEMPT RT Debian 3.10.11-1
>
> Oh, one of these -rt kernels again %-)
>
>> Mar 10 16:49:12 vpc3 kernel: [356480.638351] Dead loop on virtual device
>> vcan2, fix it urgently!
>
> The vcan driver usually does not set the IFF_ECHO flag:
>>
>> austin[4262] vpc3 ~
>> $ ifconfig vcan2
>> vcan2     Link encap:UNSPEC  HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>           UP RUNNING NOARP  MTU:16  Metric:1
>                             ^^
> Which would be indicated here then.
>
> Therefore the 'shortcut' in af_can.c is used to provide the echo:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c#n283
>
> which increases performance but may lead to funny effects as the sending
> process leads to a reception event which is probably already handled by the
> receiver when the write() of the sender returns.
>
> Two things I would suggest:
>
> 1. Let the vcan really handle the frames by enabling IFF_ECHO
> 2. Add a queue (queue length != 0) to the vcan interface
>
> For 1: Insert the vcan module with the parameter echo=1, e.g.
>
> # rmmod vcan
> # modprobe vcan echo=1
> # ip link add type vcan
> # ip link show vcan0
> 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default
>     link/can
>
> And see if this already fixes your problem ('ECHO' is enabled here).
>
> For 2: Add a queue length to your vcan with
>
> # ip link set vcan0 txqueuelen 10
> # ip link show vcan0
> 26: vcan0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
>     link/can
>
> This urges the network subsystem to used the normal packet flow, as a queue
> length of zero is only feasible for software devices like loopback / vcan ...
>
> Please give a feedback if it solves the problem in your -rt kernel.
>
> Best regards,
> Oliver
>

Hi Oliver,

I'm not trying to cause problems ;)

Neither of those helped.

# ifconfig vcan0
vcan0     Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:428406 errors:0 dropped:0 overruns:0 frame:0
          TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3427248 (3.2 MiB)  TX bytes:3427248 (3.2 MiB)

# ip link show vcan0
17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
mode DEFAULT
    link/can

# ifconfig vcan0
vcan0     Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10
          RX bytes:11200752 (10.6 MiB)  TX bytes:11200752 (10.6 MiB)

# ip link show vcan0
17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
mode DEFAULT qlen 10
    link/can

Both times, I get the following.

Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
virtual device vcan0, fix it urgently!
Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on
virtual device vcan0, fix it urgently!
Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on
virtual device vcan0, fix it urgently!
Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on
virtual device vcan0, fix it urgently!

Luckily, it is easy to reproduce.  I have some trivial code which
sends out 10,000 msgs/sec on the vcan.  I run 3+ copies of the code on
a dual core machine with hyperthreads enabled.  More parallel sends
should cause it to occur more frequently.

Austin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Stuck vcan
  2014-03-11 20:37   ` Austin Schuh
@ 2014-03-11 20:56     ` Marc Kleine-Budde
  2014-03-11 21:54       ` Austin Schuh
  2014-03-11 21:29     ` Oliver Hartkopp
  1 sibling, 1 reply; 7+ messages in thread
From: Marc Kleine-Budde @ 2014-03-11 20:56 UTC (permalink / raw)
  To: Austin Schuh, Oliver Hartkopp; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]

On 03/11/2014 09:37 PM, Austin Schuh wrote:
> I'm not trying to cause problems ;)

:D

> Neither of those helped.
> 
> # ifconfig vcan0
> vcan0     Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           UP RUNNING NOARP  MTU:16  Metric:1
>           RX packets:428406 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:3427248 (3.2 MiB)  TX bytes:3427248 (3.2 MiB)
> 
> # ip link show vcan0
> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
> mode DEFAULT
>     link/can
> 
> # ifconfig vcan0
> vcan0     Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           UP RUNNING NOARP  MTU:16  Metric:1
>           RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:10
>           RX bytes:11200752 (10.6 MiB)  TX bytes:11200752 (10.6 MiB)
> 
> # ip link show vcan0
> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
> mode DEFAULT qlen 10
>     link/can
> 
> Both times, I get the following.
> 
> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
> virtual device vcan0, fix it urgently!
> Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on
> virtual device vcan0, fix it urgently!
> Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on
> virtual device vcan0, fix it urgently!
> Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on
> virtual device vcan0, fix it urgently!
> 
> Luckily, it is easy to reproduce.  I have some trivial code which
> sends out 10,000 msgs/sec on the vcan.  I run 3+ copies of the code on
> a dual core machine with hyperthreads enabled.  More parallel sends
> should cause it to occur more frequently.

Can you share the code?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Stuck vcan
  2014-03-11 20:37   ` Austin Schuh
  2014-03-11 20:56     ` Marc Kleine-Budde
@ 2014-03-11 21:29     ` Oliver Hartkopp
  2014-03-11 21:55       ` Austin Schuh
  1 sibling, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2014-03-11 21:29 UTC (permalink / raw)
  To: Austin Schuh; +Cc: linux-can



On 11.03.2014 21:37, Austin Schuh wrote:

> # ip link show vcan0
> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
> mode DEFAULT qlen 10
>     link/can
> 
> Both times, I get the following.
> 
> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
> virtual device vcan0, fix it urgently!

As you can see above the qdisc is 'noqueue'

Please check out the __dev_queue_xmit() function in

http://lxr.free-electrons.com/source/net/core/dev.c#L2782

which creates the 'fix urgently' message.

I assume when creating a qdisc which is not 'noqueue' the handling for skb
queueing is forced.

See:
http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf

Regards,
Oliver


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Stuck vcan
  2014-03-11 20:56     ` Marc Kleine-Budde
@ 2014-03-11 21:54       ` Austin Schuh
  0 siblings, 0 replies; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 21:54 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can

[-- Attachment #1: Type: text/plain, Size: 2465 bytes --]

On Tue, Mar 11, 2014 at 1:56 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 03/11/2014 09:37 PM, Austin Schuh wrote:
>> I'm not trying to cause problems ;)
>
> :D
>
>> Neither of those helped.
>>
>> # ifconfig vcan0
>> vcan0     Link encap:UNSPEC  HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>           UP RUNNING NOARP  MTU:16  Metric:1
>>           RX packets:428406 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:428406 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:0
>>           RX bytes:3427248 (3.2 MiB)  TX bytes:3427248 (3.2 MiB)
>>
>> # ip link show vcan0
>> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
>> mode DEFAULT
>>     link/can
>>
>> # ifconfig vcan0
>> vcan0     Link encap:UNSPEC  HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>           UP RUNNING NOARP  MTU:16  Metric:1
>>           RX packets:1400094 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:1400094 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:10
>>           RX bytes:11200752 (10.6 MiB)  TX bytes:11200752 (10.6 MiB)
>>
>> # ip link show vcan0
>> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
>> mode DEFAULT qlen 10
>>     link/can
>>
>> Both times, I get the following.
>>
>> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
>> virtual device vcan0, fix it urgently!
>> Mar 11 13:28:59 aschuh-peloton kernel: [96230.313875] Dead loop on
>> virtual device vcan0, fix it urgently!
>> Mar 11 13:28:59 aschuh-peloton kernel: [96230.319494] Dead loop on
>> virtual device vcan0, fix it urgently!
>> Mar 11 13:29:00 aschuh-peloton kernel: [96231.072617] Dead loop on
>> virtual device vcan0, fix it urgently!
>>
>> Luckily, it is easy to reproduce.  I have some trivial code which
>> sends out 10,000 msgs/sec on the vcan.  I run 3+ copies of the code on
>> a dual core machine with hyperthreads enabled.  More parallel sends
>> should cause it to occur more frequently.
>
> Can you share the code?
>
> Marc
>
> --
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
>

The attached code reproduces the problem for me.  I removed all the
extra dependencies.

Austin

[-- Attachment #2: bus_load_linux_can.cc --]
[-- Type: text/x-c++src, Size: 1873 bytes --]

#include <linux/can.h>
#include <linux/can/raw.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <unistd.h>
#include <assert.h>
#include <stdio.h>
#include <string.h>

#include <string>
#include <iostream>

// A can bus which sends using SocketCan to the specified interface.
class SocketCanBus {
 public:
  explicit SocketCanBus(const ::std::string &iface) : iface_(iface) {}

  ~SocketCanBus() {
    if (socket_ != -1) {
      close(socket_);
    }
  }

  void Initialize();
  bool Send(const struct can_frame &frame)
      __attribute__((warn_unused_result));

 private:
  // The socket.
  int socket_;
  ::std::string iface_;
};

void SocketCanBus::Initialize() {
  socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW);
  struct sockaddr_can addr;
  struct ifreq ifr;
  assert(socket_ >= 0);

  strcpy(ifr.ifr_name, iface_.c_str());
  assert(ioctl(socket_, SIOCGIFINDEX, &ifr) == 0);

  // Zero initialize the sockaddr, mainly for valgrind, but it can't hurt.
  struct sockaddr *my_sockaddr_can = (struct sockaddr *)&addr;
  memset(my_sockaddr_can->sa_data, 0, sizeof(my_sockaddr_can->sa_data));

  addr.can_family = AF_CAN;
  addr.can_ifindex = ifr.ifr_ifindex;

  ::std::cout << "Connecting to " << iface_ << ::std::endl;

  assert(bind(socket_, (struct sockaddr *)&addr, sizeof(addr)) == 0);
}

bool SocketCanBus::Send(const struct can_frame &frame) {
  assert(socket_ != -1);
  return write(socket_, &frame, sizeof(struct can_frame)) ==
         sizeof(struct can_frame);
}

int main(int argc, char *argv[]) {
  SocketCanBus bus("vcan0");
  bus.Initialize();

  while (true) {
    usleep(1000000 / 10000);
    struct can_frame frame;
    frame.can_id = 0x100;
    frame.can_dlc = 8;
    for (int i = 0; i < 8; ++i) {
      frame.data[i] = i;
    }
    if (!bus.Send(frame)) {
      ::std::cout << "Failed to send the CAN frame." << ::std::endl;
    }
  }
  return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Stuck vcan
  2014-03-11 21:29     ` Oliver Hartkopp
@ 2014-03-11 21:55       ` Austin Schuh
  0 siblings, 0 replies; 7+ messages in thread
From: Austin Schuh @ 2014-03-11 21:55 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can

On Tue, Mar 11, 2014 at 2:29 PM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
>
> On 11.03.2014 21:37, Austin Schuh wrote:
>
>> # ip link show vcan0
>> 17: vcan0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc noqueue state UNKNOWN
>> mode DEFAULT qlen 10
>>     link/can
>>
>> Both times, I get the following.
>>
>> Mar 11 13:28:58 aschuh-peloton kernel: [96229.460901] Dead loop on
>> virtual device vcan0, fix it urgently!
>
> As you can see above the qdisc is 'noqueue'
>
> Please check out the __dev_queue_xmit() function in
>
> http://lxr.free-electrons.com/source/net/core/dev.c#L2782
>
> which creates the 'fix urgently' message.
>
> I assume when creating a qdisc which is not 'noqueue' the handling for skb
> queueing is forced.
>
> See:
> http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
>
> Regards,
> Oliver
>

Thanks for the pointers Oliver.  I'll take a look and see what I can
find.  It will probably be a week or two before I can really sit down
and dig through this.

Austin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-03-11 21:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-11  0:22 Stuck vcan Austin Schuh
2014-03-11  7:05 ` Oliver Hartkopp
2014-03-11 20:37   ` Austin Schuh
2014-03-11 20:56     ` Marc Kleine-Budde
2014-03-11 21:54       ` Austin Schuh
2014-03-11 21:29     ` Oliver Hartkopp
2014-03-11 21:55       ` Austin Schuh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.