Re: [Socketcan-users] Message stalls in SocketCan Layer?

linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Socketcan-users] Message stalls in SocketCan Layer?
       [not found] ` <4F2AA5DF.9080304@grandegger.com>
@ 2012-02-03 16:08   ` Michael Jäntsch
  2012-02-03 16:57     ` Oliver Hartkopp
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Jäntsch @ 2012-02-03 16:08 UTC (permalink / raw)
  Cc: linux-can

Hi,

thanks Wolfgang, I didn't know the socket-can mailinglist is deprecated.

On 02.02.2012 16:03, Wolfgang Grandegger wrote:
> On 02/01/2012 10:42 AM, Michael Jäntsch wrote:
>> Hi everyone,
>>
>> for about a month now, I have a problem with messages that get stalled,
>> causing the select system call on several CAN sockets to time out. We're
>> using the PEAK PCAN-USB interface on a Ubuntu 11.10 (kernel:
>> 3.0.0-15-generic). I've been using the system for about 2 years now and
>> there were no revelant changes to the protocol or software lately. This
>> problem occurs quite frequently now.
> When did the reported problems start to show up? After switching to a
> new kernel?

yes, it all started when I switched to kernel 3.0.0-15-generic. However,
weirdly enough there is one computer with this kernel where the problem
doesn't appear and two where it does. I think I will confirm this by
going back to an older kernel and seeing if the problem disappears.
>> What happens is, that our master sends out a message and receives 13
>> reply messages from 13 different nodes on the bus, at a frequency of
>> 50Hz. This works for some time (minutes) and then a timeout on the
>> select system call occurs that reads from the 13 sockets. Wireshark
>> shows that there is an unusual time delay between the messages. They get
>> sent at the same time and are normally received within the time of
>> 2-4ms. When the timeout occurs, this time is 30ms. When I turn on the
>> debug information in the peak driver I see debug output with time
>> stamps. On this layer the times are still ok when the timeout occurs.
>> Talking to the peak support, I got pointed at the socket can layer which
>> in this case might be causing the problem. However, I have no idea how
>> to debug this and find out more about the problem.
> Hm, is there some other activity blocking the kernel? You could use
> ftrace to find out what's going on.

I don't know, it does look like something is blocking my kernel.
However, I didn't really want to go into kernel debugging/tracing.
If you can give me some hints, as to where I should start I could give
it a shot...

thanks a lot
Michael

-- 
Technische Universität München
Michael Jäntsch
Fakultät für Informatik
Robotics and Embedded Systems
Parkring 13
85748 Garching bei München
Tel: + 49.89.289.17626
Fax: + 49.89.289.17637
michael.jaentsch@in.tum.de
www6.in.tum.de



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Socketcan-users] Message stalls in SocketCan Layer?
  2012-02-03 16:08   ` [Socketcan-users] Message stalls in SocketCan Layer? Michael Jäntsch
@ 2012-02-03 16:57     ` Oliver Hartkopp
  2012-02-03 19:41       ` Wolfgang Grandegger
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Hartkopp @ 2012-02-03 16:57 UTC (permalink / raw)
  To: Michael Jäntsch; +Cc: linux-can

Hi Michael,

On 03.02.2012 17:08, Michael Jäntsch wrote:

> yes, it all started when I switched to kernel 3.0.0-15-generic. However,
> weirdly enough there is one computer with this kernel where the problem
> doesn't appear and two where it does. I think I will confirm this by
> going back to an older kernel and seeing if the problem disappears.

please consider that if it works on other machines with the same kernel that
there could be some changes within the USB subsystem or some interrupt or
ACPI/powersaving issues too.

Going back to an older Kernel is a good check - don't know, if you also can go
further (3.1, 3.2) with Ubuntu 11.10 too ?!?

Maybe there's some USB debugging available to check whether the URB has been
sent properly to the USB wire and if the interrupt of the USB controller
reached the system after sending.

Generally i would not assume that there's a problem in the PF_CAN netlayer in
this special case.

Regards,
Oliver

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Socketcan-users] Message stalls in SocketCan Layer?
  2012-02-03 16:57     ` Oliver Hartkopp
@ 2012-02-03 19:41       ` Wolfgang Grandegger
  2012-02-13 10:00         ` Michael Jäntsch
  0 siblings, 1 reply; 5+ messages in thread
From: Wolfgang Grandegger @ 2012-02-03 19:41 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: Michael Jäntsch, linux-can

On 02/03/2012 05:57 PM, Oliver Hartkopp wrote:
> Hi Michael,
> 
> On 03.02.2012 17:08, Michael Jäntsch wrote:
> 
> 
>> yes, it all started when I switched to kernel 3.0.0-15-generic. However,
>> weirdly enough there is one computer with this kernel where the problem
>> doesn't appear and two where it does. I think I will confirm this by
>> going back to an older kernel and seeing if the problem disappears.
> 
> 
> please consider that if it works on other machines with the same kernel that
> there could be some changes within the USB subsystem or some interrupt or
> ACPI/powersaving issues too.
> 
> Going back to an older Kernel is a good check - don't know, if you also can go
> further (3.1, 3.2) with Ubuntu 11.10 too ?!?
> 
> Maybe there's some USB debugging available to check whether the URB has been
> sent properly to the USB wire and if the interrupt of the USB controller
> reached the system after sending.

IIUC, he said that the time stamps indicated that the USB transfers have
happened in time. The delays occured later. That's wired, indeed.

Wolfgang.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Socketcan-users] Message stalls in SocketCan Layer?
  2012-02-03 19:41       ` Wolfgang Grandegger
@ 2012-02-13 10:00         ` Michael Jäntsch
  2012-02-13 10:45           ` Wolfgang Grandegger
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Jäntsch @ 2012-02-13 10:00 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Oliver Hartkopp, linux-can

Hi,

sorry for the late reply. I wanted to test the older kernel and
therefore needed to setup the OS from scratch which I didn't have time
to do right away. Now, I tested it with Ubuntu 11.04 and kernel 2.6.38-13.
The problem appears a lot less often. From the time frame of a few
minutes we went to something more like half an hour. So maybe this has
been there all along but did not appear frequently enough to be noticed.
For the reasons I already mentioned I'm still pretty sure this has
nothing to do with our code.
Does anybody have an idea how to go about debugging this?

On 03.02.2012 20:41, Wolfgang Grandegger wrote:
> On 02/03/2012 05:57 PM, Oliver Hartkopp wrote:
>> please consider that if it works on other machines with the same kernel that
>> there could be some changes within the USB subsystem or some interrupt or
>> ACPI/powersaving issues too.
>>
>> Going back to an older Kernel is a good check - don't know, if you also can go
>> further (3.1, 3.2) with Ubuntu 11.10 too ?!?
>>
>> Maybe there's some USB debugging available to check whether the URB has been
>> sent properly to the USB wire and if the interrupt of the USB controller
>> reached the system after sending.
> IIUC, he said that the time stamps indicated that the USB transfers have
> happened in time. The delays occured later. That's wired, indeed.
Wolfgang is right, the debug output of the driver indicated that the USB
transfers happened in time which is why I the guys from PEAK send me to
you. This looks to me like the messages get stalled somewhere in the
PF_CAN netlayer, but I'm also not a hundred percent sure how to verify
this, as this of course not that reproducable. Any ideas?

thanks
Michael

-- 
Technische Universität München
Michael Jäntsch
Fakultät für Informatik
Robotics and Embedded Systems
Parkring 13
85748 Garching bei München
Tel: + 49.89.289.17626
Fax: + 49.89.289.17637
michael.jaentsch@in.tum.de
www6.in.tum.de



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Socketcan-users] Message stalls in SocketCan Layer?
  2012-02-13 10:00         ` Michael Jäntsch
@ 2012-02-13 10:45           ` Wolfgang Grandegger
  0 siblings, 0 replies; 5+ messages in thread
From: Wolfgang Grandegger @ 2012-02-13 10:45 UTC (permalink / raw)
  To: Michael Jäntsch; +Cc: Oliver Hartkopp, linux-can

Hi Michael,

On 02/13/2012 11:00 AM, Michael Jäntsch wrote:
> Hi,
> 
> sorry for the late reply. I wanted to test the older kernel and
> therefore needed to setup the OS from scratch which I didn't have time
> to do right away. Now, I tested it with Ubuntu 11.04 and kernel 2.6.38-13.
> The problem appears a lot less often. From the time frame of a few
> minutes we went to something more like half an hour. So maybe this has
> been there all along but did not appear frequently enough to be noticed.
> For the reasons I already mentioned I'm still pretty sure this has
> nothing to do with our code.
> Does anybody have an idea how to go about debugging this?
> 
> On 03.02.2012 20:41, Wolfgang Grandegger wrote:
>> On 02/03/2012 05:57 PM, Oliver Hartkopp wrote:
>>> please consider that if it works on other machines with the same kernel that
>>> there could be some changes within the USB subsystem or some interrupt or
>>> ACPI/powersaving issues too.
>>>
>>> Going back to an older Kernel is a good check - don't know, if you also can go
>>> further (3.1, 3.2) with Ubuntu 11.10 too ?!?
>>>
>>> Maybe there's some USB debugging available to check whether the URB has been
>>> sent properly to the USB wire and if the interrupt of the USB controller
>>> reached the system after sending.
>> IIUC, he said that the time stamps indicated that the USB transfers have
>> happened in time. The delays occured later. That's wired, indeed.
> Wolfgang is right, the debug output of the driver indicated that the USB
> transfers happened in time which is why I the guys from PEAK send me to
> you. This looks to me like the messages get stalled somewhere in the
> PF_CAN netlayer, but I'm also not a hundred percent sure how to verify
> this, as this of course not that reproducable. Any ideas?

Ftrace may help to find the problem. It's not too difficult to use, I
think. I assume that function tracing is enabled in your kernel. You
should find in your ".config":

CONFIG_DEBUG_FS=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_DYNAMIC_FTRACE=y

With that kernel, activate FTRACE as shown below:

Mount DebugFS file system
# mount -t debugfs nope /sys/kernel/debug
Stop tracing
# echo 0 > /sys/kernel/debug/tracing/tracing_on
Clear trace buffer
# echo 0 > /sys/kernel/debug/tracing/trace
If necessary, increase buffer size
# echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
Use function tracer
# cat /sys/kernel/debug/tracing/available_tracers
... function ...
# echo function > /sys/kernel/debug/tracing/current_tracer
Finally enable ftracing
# echo 1 > /sys/kernel/debug/tracing/tracing_on
Inspect trace buffer
# cat /sys/kernel/debug/tracing/trace

This will create a huge amount of data. In a first step, only the
functions relevant for CAN might be enabled to debug the timing.

When you realize the long delay in your application, you should call
"echo 0 > /sys/kernel/debug/tracing/tracing_on" from the application
context (using fopen, etc.) to freeze the trace. That should give you
more or less detailed information on what has happened. I think it's not
too complicated. Interpreting the traces might be harder, tough.

Some useful links are:

http://lxr.linux.no/#linux+v2.6.37/Documentation/trace/ftrace.txt

There are also a few nice articles about Ftrace at LWN.

Wolfgang.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-02-13 10:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4F29091F.4010908@in.tum.de>
     [not found] ` <4F2AA5DF.9080304@grandegger.com>
2012-02-03 16:08   ` [Socketcan-users] Message stalls in SocketCan Layer? Michael Jäntsch
2012-02-03 16:57     ` Oliver Hartkopp
2012-02-03 19:41       ` Wolfgang Grandegger
2012-02-13 10:00         ` Michael Jäntsch
2012-02-13 10:45           ` Wolfgang Grandegger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).