From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <45FE8AA2.1030507@domain.hid> Date: Mon, 19 Mar 2007 14:05:38 +0100 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) References: <45FDA81F.2080004@domain.hid> <45FE7578.4000306@domain.hid> In-Reply-To: <45FE7578.4000306@domain.hid> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig9768C860DE465874996A259B" Sender: jan.kiszka@domain.hid List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wolfgang Grandegger Cc: xenomai@xenomai.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig9768C860DE465874996A259B Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Wolfgang Grandegger wrote: > Sebastian Smolorz wrote: >> Sebastian Smolorz wrote: >>> Hi Jan, >>> >>> Jan Kiszka wrote: >>>> Wolfgang Grandegger wrote: >>>>> you know, on the SJA1000 the bus error interrupt can result in high= >>>>> error interrupt rates and even hang the system on slow processors. >>>>> Just >>>>> unplugging the CAN cable can cause such interrupt flooding. This >>>>> problem >>>>> >>>>> popped up again recently and Sebastian proposed: >>>>>> Last summer we had a discussion about the BEI issue on the >>>>>> socketcan-ML. Two additional handling policies popped up: >>>>>> 1. The interface could restart itself after an amount of BEIs, thu= s >>>>>> taking responsibility from the user application. >>>>>> 2. The BEI could be completely disabled if no one is interested in= >>>>>> this ype of error frame. >>>>> As 2. is also my preferred solution, I have implemented it. The onl= y >>>>> downside is that you do not see the error counter increasing when >>>>> /proc/rtcan/devices is inspected. We also discussed 1., but >>>>> RT-Socket-CAN does not restart the CAN controller by purpose and ju= st >>>>> stoppping it requires user intervention. >>>> And if there is someone listening, how is the flooding issue on cabl= e >>>> unplug etc. solved by option 2? >>> Hm, maybe we could implement 1 additionally (but without automatical >>> restart)? >> >> A more precise suggestion: What about letting BEIs appear until >> passive mode is reached and if the TX error counter doesn't count up >> any more (indication of start-up situation discovered by the SJA1000) >> the driver ceases to read out ECC any further (thanks Stephane for the= >> hint). The controller would be still operating but not reporting BEIs >> any more. There has to be some mechanism to let BEIs through after the= >> situation has normalized. Maybe the driver could check inside the >> interrupt handler if active mode was reached again after the above >> situation occured. >=20 > Well, this is rather sophisticated and needs some more careful > evaluation. We might also reach the passive level slowly without > flooding. Furthermore, the method should also be applicable for other > controllers. What is the current behaviour of other controllers? >=20 > Let's implement 1. and downscaled printk and wait for the users reactio= n > , see also my other mail. Then we should bring up this discussion again= > on the Socket-CAN-ML to negotiate a common solution. Instead of waiting on some user triggering a (potential) latency mine, I would prefer that we experimentally evaluate the effect. E.g. via an I-pipe tracer dump on a faster and a slower box. I would offer to run some demo code here on our PC104 Phytec boards as well. The problem is to define what degree of error-related IRQ load is generally acceptable. We surely can't do this, so we have to document the effect /at least/ and help the users to check it on their own - or we have to avoid it / make it insignificant compared to normal CAN operation (I'm still in favour of this path). Jan --------------enig9768C860DE465874996A259B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF/oqjniDOoMHTA+kRAt+LAJ9VlcN53HtE8+oE9gUm2MY3+q0lswCeI/5q ieKF1+lIipeiL1WZ4F7Fx8c= =2sdk -----END PGP SIGNATURE----- --------------enig9768C860DE465874996A259B--