From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <45F7DEA8.2050309@domain.hid> Date: Wed, 14 Mar 2007 12:38:16 +0100 From: Wolfgang Grandegger MIME-Version: 1.0 Subject: Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) References: <45EC4D36.2000600@domain.hid> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sebastian Smolorz Cc: xenomai@xenomai.org Sebastian Smolorz wrote: > St=E9phane ANCELOT wrote: >> Sebastian Smolorz wrote: >>> St=E9phane ANCELOT wrote: >>>> Sebastian Smolorz wrote: >>>>> Note that the current implementation of RT-Socket-CAN shows this >>>>> behaviour on purpose. See also [1] ("may flood!"). Whether this is = the >>>>> right handling or not may be discussed here. I admit that the curre= nt >>>>> implementation forces an application developer to take more >>>>> responsibility but that is not a bug of the underlying driver/stack= per >>>>> se. Look, you don't connect anything to the CAN bus, start a >>>>> *real-time* application which sends a message to a non-existent CAN >>>>> node. This is an error situation an it is more than ever for a >>>>> real-time task. So the proper reaction for a RT-application would b= e to >>>>> handle those errors and e.g. shut down the CAN interface which in t= his >>>>> case will force the CAN hardware to stop its endless attempts to se= nd >>>>> the message. >>>> I agree and this is what I was doing , however this does not seem to >>>> work as expected in the driver. >>> What does not work? The shutdown and stopping transmitting the CAN >>> messages? >>> >>> -- >>> Sebastian >> Yes, this is exactly what has happened to me and rolland problem , one >> rtcansend launched and BEI interrupt come always.... >=20 > Yes, I know. But when you stop the CAN interface in such a situation th= e=20 > interrupts must disappear because the controller does not try to send t= he=20 > message any more. >=20 >> since the error management shoudl be done by appplication process, I >> think that BUS ERROR INTERRUPT can be reported however the ECC reading >> must not be done by the interrupt routine. >=20 > I don't think that reading the ECC is the critical point, rather the in= terrupt=20 > flodding is. >=20 >> Since it permits a next bus error interrupt. the ECC reading should b= e >> left to user application eg through an ioctl. >=20 > Error reporting in RT-Socket-CAN is the same as in Socket-CAN for plain= Linux. > It is done via error frames sent to the application. So your suggestio= n would=20 > break the API here and frankly is not necessary. You have several=20 > possibilities to detect a bus error due to a disconnected bus and can h= andle=20 > the situation properly (e.g. restart the interface). If a series of err= or=20 > frames are generated which shows you TX bus errors with missing=20 > acknowledgments you can be quite sure that no other node is connected t= o the=20 > bus. >=20 >> This may be an option or a error mode selectable by the programmer at >> startup . >> >> what do you think ? >=20 > Last summer we had a discussion about the BEI issue on the socketcan-ML= . Two=20 > additional handling policies popped up: > 1. The interface could restart itself after an amount of BEIs, thus tak= ing=20 > responsibility from the user application. > 2. The BEI could be completely disabled if no one is interested in this= type=20 > of error frame. I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly does not work. :-(. The controller requires a re-start of the device to get the bus error reporting back to work. > Maybe it is time to think about the implementation of these policies as= more=20 > and more users seem to run into the BEI issue with a disconnected bus.=20 > Wolfgang, Jan, what is your opinion? Well, solution 2. with the limitations mentioned above is therefore less attractive because it interrupts the CAN traffic. The Socket-CAN=20 implementation actually restarts the CAN controller after a certain=20 amount of bus error interrupts (200 by default) which matches your first=20 policy above. But in RT-Socket-CAN, we do not automatically re-start the=20 device by purpose. Therefore I tend to just stop the device. It's then=20 up to the application to restart it. What do you think? Wolfgang.