From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4F68BDBF.2030002@domain.hid> Date: Tue, 20 Mar 2012 18:26:23 +0100 From: Thierry Bultel MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="------------040008030205030602020205" Subject: [Xenomai-help] CAN: Locked in rtdm_sem_timeddown of tx_sem; Lost TX IRQ ? List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: xenomai@xenomai.org, Jean-Baptiste Tredez This is a multi-part message in MIME format. --------------040008030205030602020205 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hello, the issue I am describing here happens on a dual-core Atom (without hyperthreading) It is easy to reproduce with 2.6.32.7+xenomai 2.5.2, which was my initial configuration until I remembered that Philippe told us that SMP was correctly supported from 2.6.38.8 The fact is that I have been able to reproduce it with 2.6.38.8+xenomai-2.6 as well. Only once, but I did. I am using CAN with a IXXAT PCI-04 board . There is a single thread per bus. With the old kernel, after about 400-500 seconds, and heavy load the communication stops, and after some analysis, I found out that my process was stuck at : rtcan_raw.c /* Try to pass the guard in order to access the controller */ * ret = rtdm_sem_timeddown(&dev->tx_sem, timeout, NULL);* The Refcount shown in /proc/rtcan/rtcan0/info is 1. The workaround I found was to set the timeout to a non-zero value with the appropriate ioctl, and when a timeout issues, to stop and restart the bus, with the effect to destroy and re-recreate the semaphore and thus to communicate again. By reading the code, the only reason I can see is that a TX interrupt is lost. I do not have much more ways to analyze deeper, so any advice would be greatly appreciated Cheers, Thierry --------------040008030205030602020205 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hello,

the issue I am describing here happens on a dual-core Atom (without hyperthreading)

It is easy to reproduce with 2.6.32.7+xenomai 2.5.2, which was my initial configuration until
I remembered that Philippe told us that SMP was correctly supported from 2.6.38.8

The fact is that I have been able to reproduce it with 2.6.38.8+xenomai-2.6 as well. Only once, but I did.

I am using CAN with a IXXAT PCI-04 board .
There is a single thread per bus.

With the old kernel, after about 400-500 seconds, and heavy load the communication stops, and after some
analysis, I found out that my process was stuck at :

rtcan_raw.c

  /* Try to pass the guard in order to access the controller */
            ret = rtdm_sem_timeddown(&dev->tx_sem, timeout, NULL);


The Refcount shown in /proc/rtcan/rtcan0/info is 1.

The workaround I found was to set the timeout to a non-zero value with the appropriate ioctl,
and when a timeout issues, to stop and restart the bus, with the effect to destroy and re-recreate the semaphore and
thus to communicate again.

By reading the code, the only reason I can see is that a TX interrupt is lost.

I do not have much more ways to analyze deeper, so any advice would be greatly appreciated

Cheers,
Thierry
--------------040008030205030602020205--