* [Xenomai-help] CAN errors and real-time behaviour @ 2007-03-03 14:09 roland Tollenaar 2007-03-05 8:49 ` Stéphane ANCELOT 0 siblings, 1 reply; 38+ messages in thread From: roland Tollenaar @ 2007-03-03 14:09 UTC (permalink / raw) To: xenomai HI, I thought I would put this in a separate thread. The experiment works as follows. I have a 1ms and a 2 ms rt periodic task. In the real-time periodic task I am only reading out the message buffer (only work done in the task). In the 2ms task I am doing nothing. I always read-out the measured period times. This is done by writing the measured value into a variable which is displayed in a separate thread outside the rt tasks so the display does not influence the measurement. (Unlike printf is said to do). There is nothing connected to the rtcan2 device (Peak dongle). The applicaiton runs fine the tasktimes relatively well maintained (fluctuatin about 0.003ms) around 1ms and 2ms. The moment I write to the device using rtcansend eg: ./rtcansend rtcan2 -i 0x700 0x03 0x02 the buserror comes up and the protocol error. from that moment onwards the messagebuffer gets flooded and does not stop being flooded forever after. The period times then fluctuate badly up to 0.2ms around their nominal values. This is not desirable behavior. Firstly its not necessary to have the message buffer flooded all the time I would think. How do I change that so that I will only pick up an error once in response to a failed send? Secondly what am I doing wrong that breaks the real-time behaviour? If the bus gives an error on one part of the process I don;t want other processes that may have nothing to do with the CAN bus to misbehave.? I do suspect that if I can prevent the message buffer flooding forever and manage to clean it out that the behaviour will be better because if its flooded then messages get sent to dmesg well wherever dmesg reads from that is) and this may explain the behavior? Can anyone comment on this please? Regards, Roland. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour 2007-03-03 14:09 [Xenomai-help] CAN errors and real-time behaviour roland Tollenaar @ 2007-03-05 8:49 ` Stéphane ANCELOT 2007-03-05 9:26 ` Roland Tollenaar 2007-03-05 10:39 ` [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) Stéphane ANCELOT 0 siblings, 2 replies; 38+ messages in thread From: Stéphane ANCELOT @ 2007-03-05 8:49 UTC (permalink / raw) To: roland Tollenaar; +Cc: xenomai Hi, May be this related to your problem,I am trying to deal with some problems regarding CAN applications when NOTHING IS CONNECTED and my system : Like you I have a first task that reads the CAN bus a second task is doing nothing than waiting for a a semaphore. A third task begins to send two messages in can bus (the second message has got error) and goes to 5ms peridoic loop The major problem is as follow : I launch my RT TASK : no problem .I can do things in my linux console . Now, I start X , X begins to launch and is frozen.If I plug the bus CAN, "magician things " happen : X manage to launch and everything goes normal ...... important NOTE for the behavour : CAN RX has got a timeout of almost 100ms and tx of 10 ms. if something goes wrong task 2 rt_task_sleeps for a while. I think the problem is related to can driver behavour.Do you think it is related to the same problem origin you have? roland Tollenaar wrote: > HI, > > I thought I would put this in a separate thread. > > The experiment works as follows. I have a 1ms and a 2 ms rt periodic task. > > In the real-time periodic task I am only reading out the message > buffer (only work done in the task). In the 2ms task I am doing > nothing. > > I always read-out the measured period times. This is done by writing > the measured value into a variable which is displayed in a separate > thread outside the rt tasks so the display does not influence the > measurement. (Unlike printf is said to do). > > There is nothing connected to the rtcan2 device (Peak dongle). The > applicaiton runs fine the tasktimes relatively well maintained > (fluctuatin about 0.003ms) around 1ms and 2ms. > > The moment I write to the device using rtcansend eg: > ./rtcansend rtcan2 -i 0x700 0x03 0x02 > > the buserror comes up and the protocol error. from that moment onwards > the messagebuffer gets flooded and does not stop being flooded forever > after. > > The period times then fluctuate badly up to 0.2ms around their nominal > values. > > This is not desirable behavior. Firstly its not necessary to have the > message buffer flooded all the time I would think. How do I change > that so that I will only pick up an error once in response to a failed > send? > Secondly what am I doing wrong that breaks the real-time behaviour? If > the bus gives an error on one part of the process I don;t want other > processes that may have nothing to do with the CAN bus to misbehave.? > > I do suspect that if I can prevent the message buffer flooding forever > and manage to clean it out that the behaviour will be better because > if its flooded then messages get sent to dmesg well wherever dmesg > reads from that is) and this may explain the behavior? > > Can anyone comment on this please? > > Regards, > > Roland. > > _______________________________________________ > Xenomai-help mailing list > Xenomai-help@domain.hid > https://mail.gna.org/listinfo/xenomai-help > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour 2007-03-05 8:49 ` Stéphane ANCELOT @ 2007-03-05 9:26 ` Roland Tollenaar 2007-03-05 10:39 ` [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) Stéphane ANCELOT 1 sibling, 0 replies; 38+ messages in thread From: Roland Tollenaar @ 2007-03-05 9:26 UTC (permalink / raw) To: Stéphane ANCELOT; +Cc: xenomai Hi Stephane, I would have no idea whether the problems are related. I am too much of a novice myself. However I can confirm difference in behaviour between having the bus live and having an unplugged thus unterminated bus. From incidents I have seen this change in behaviour seems to be related to flooding of the buffer with error messages in the case of a bus that is down which results in system messages. But whether that would block X from starting up for example I don't know. From what I gather X is a normal program under linux and as such will not have priority over your real-time tasks. Perhaps with a bus down you also have syslogd busier than a one legged man in an ass-kicking contest, which along with the priority demanded by your rt tasks leaves no time for X. However once your bus is plugged in, syslogd stops complaining, processor time is freed up and X gets some quality, one-on-one time with the CPU? But others would be better judges than I at this stage. Regards, Roland. Stéphane ANCELOT wrote: > Hi, > May be this related to your problem,I am trying to deal with some > problems regarding CAN applications when NOTHING IS CONNECTED and my > system : > > Like you I have a first task that reads the CAN bus > a second task is doing nothing than waiting for a a semaphore. > A third task begins to send two messages in can bus (the second message > has got error) and goes to 5ms peridoic loop > > > The major problem is as follow : > I launch my RT TASK : no problem .I can do things in my linux console . > > Now, I start X , X begins to launch and is frozen.If I plug the bus > CAN, "magician things " happen : X manage to launch and everything goes > normal ...... > > important NOTE for the behavour : CAN RX has got a timeout of almost > 100ms and tx of 10 ms. if something goes wrong task 2 rt_task_sleeps for > a while. > > I think the problem is related to can driver behavour.Do you think it is > related to the same problem origin you have? > > > roland Tollenaar wrote: >> HI, >> >> I thought I would put this in a separate thread. >> >> The experiment works as follows. I have a 1ms and a 2 ms rt periodic >> task. >> >> In the real-time periodic task I am only reading out the message >> buffer (only work done in the task). In the 2ms task I am doing >> nothing. >> >> I always read-out the measured period times. This is done by writing >> the measured value into a variable which is displayed in a separate >> thread outside the rt tasks so the display does not influence the >> measurement. (Unlike printf is said to do). >> >> There is nothing connected to the rtcan2 device (Peak dongle). The >> applicaiton runs fine the tasktimes relatively well maintained >> (fluctuatin about 0.003ms) around 1ms and 2ms. >> >> The moment I write to the device using rtcansend eg: >> ./rtcansend rtcan2 -i 0x700 0x03 0x02 >> >> the buserror comes up and the protocol error. from that moment onwards >> the messagebuffer gets flooded and does not stop being flooded forever >> after. >> >> The period times then fluctuate badly up to 0.2ms around their nominal >> values. >> >> This is not desirable behavior. Firstly its not necessary to have the >> message buffer flooded all the time I would think. How do I change >> that so that I will only pick up an error once in response to a failed >> send? >> Secondly what am I doing wrong that breaks the real-time behaviour? If >> the bus gives an error on one part of the process I don;t want other >> processes that may have nothing to do with the CAN bus to misbehave.? >> >> I do suspect that if I can prevent the message buffer flooding forever >> and manage to clean it out that the behaviour will be better because >> if its flooded then messages get sent to dmesg well wherever dmesg >> reads from that is) and this may explain the behavior? >> >> Can anyone comment on this please? >> >> Regards, >> >> Roland. >> >> _______________________________________________ >> Xenomai-help mailing list >> Xenomai-help@domain.hid >> https://mail.gna.org/listinfo/xenomai-help >> >> > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 8:49 ` Stéphane ANCELOT 2007-03-05 9:26 ` Roland Tollenaar @ 2007-03-05 10:39 ` Stéphane ANCELOT 2007-03-05 11:26 ` Sebastian Smolorz 1 sibling, 1 reply; 38+ messages in thread From: Stéphane ANCELOT @ 2007-03-05 10:39 UTC (permalink / raw) To: roland Tollenaar; +Cc: xenomai I have checked what has happened when nothing plugged on canbus: a bus error occured . when bus error occurs the ecc register contains possible error information. once we read the ecc register (in isr routine) the error code register mechanism is enabled again . Thus another bus error interrupt appears again and a new BEI ocurs In this case, the system spend time in the ISR routine of the CAN driver.It gives you enough time to run a linux console , but not to launch other tasks like X... It would be necessary to find a way to manage this kind of problem and avoiding enabling forever Bus Error Interrupt. Best Regards steph Stéphane ANCELOT wrote: > Hi, > May be this related to your problem,I am trying to deal with some > problems regarding CAN applications when NOTHING IS CONNECTED and my > system : > > Like you I have a first task that reads the CAN bus > a second task is doing nothing than waiting for a a semaphore. > A third task begins to send two messages in can bus (the second message > has got error) and goes to 5ms peridoic loop > > > The major problem is as follow : > I launch my RT TASK : no problem .I can do things in my linux console . > > Now, I start X , X begins to launch and is frozen.If I plug the bus > CAN, "magician things " happen : X manage to launch and everything goes > normal ...... > > important NOTE for the behavour : CAN RX has got a timeout of almost > 100ms and tx of 10 ms. if something goes wrong task 2 rt_task_sleeps for > a while. > > I think the problem is related to can driver behavour.Do you think it is > related to the same problem origin you have? > > > roland Tollenaar wrote: >> HI, >> >> I thought I would put this in a separate thread. >> >> The experiment works as follows. I have a 1ms and a 2 ms rt periodic >> task. >> >> In the real-time periodic task I am only reading out the message >> buffer (only work done in the task). In the 2ms task I am doing >> nothing. >> >> I always read-out the measured period times. This is done by writing >> the measured value into a variable which is displayed in a separate >> thread outside the rt tasks so the display does not influence the >> measurement. (Unlike printf is said to do). >> >> There is nothing connected to the rtcan2 device (Peak dongle). The >> applicaiton runs fine the tasktimes relatively well maintained >> (fluctuatin about 0.003ms) around 1ms and 2ms. >> >> The moment I write to the device using rtcansend eg: >> ./rtcansend rtcan2 -i 0x700 0x03 0x02 >> >> the buserror comes up and the protocol error. from that moment onwards >> the messagebuffer gets flooded and does not stop being flooded forever >> after. >> >> The period times then fluctuate badly up to 0.2ms around their nominal >> values. >> >> This is not desirable behavior. Firstly its not necessary to have the >> message buffer flooded all the time I would think. How do I change >> that so that I will only pick up an error once in response to a failed >> send? >> Secondly what am I doing wrong that breaks the real-time behaviour? If >> the bus gives an error on one part of the process I don;t want other >> processes that may have nothing to do with the CAN bus to misbehave.? >> >> I do suspect that if I can prevent the message buffer flooding forever >> and manage to clean it out that the behaviour will be better because >> if its flooded then messages get sent to dmesg well wherever dmesg >> reads from that is) and this may explain the behavior? >> >> Can anyone comment on this please? >> >> Regards, >> >> Roland. >> >> _______________________________________________ >> Xenomai-help mailing list >> Xenomai-help@domain.hid >> https://mail.gna.org/listinfo/xenomai-help >> >> > > > _______________________________________________ > Xenomai-help mailing list > Xenomai-help@domain.hid > https://mail.gna.org/listinfo/xenomai-help > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 10:39 ` [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) Stéphane ANCELOT @ 2007-03-05 11:26 ` Sebastian Smolorz 2007-03-05 11:42 ` Roland Tollenaar 2007-03-05 14:57 ` Stéphane ANCELOT 0 siblings, 2 replies; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-05 11:26 UTC (permalink / raw) To: Stéphane ANCELOT; +Cc: xenomai Stéphane ANCELOT wrote: > I have checked what has happened when nothing plugged on canbus: > > a bus error occured . > when bus error occurs the ecc register contains possible error information. > once we read the ecc register (in isr routine) the error code register > mechanism is enabled again . Thus another bus error interrupt appears > again and a new BEI ocurs > > In this case, the system spend time in the ISR routine of the CAN > driver.It gives you enough time to run a linux console , but not to > launch other tasks like X... > > It would be necessary to find a way to manage this kind of problem and > avoiding enabling forever Bus Error Interrupt. Note that the current implementation of RT-Socket-CAN shows this behaviour on purpose. See also [1] ("may flood!"). Whether this is the right handling or not may be discussed here. I admit that the current implementation forces an application developer to take more responsibility but that is not a bug of the underlying driver/stack per se. Look, you don't connect anything to the CAN bus, start a *real-time* application which sends a message to a non-existent CAN node. This is an error situation an it is more than ever for a real-time task. So the proper reaction for a RT-application would be to handle those errors and e.g. shut down the CAN interface which in this case will force the CAN hardware to stop its endless attempts to send the message. [1]http://www.xenomai.org/documentation/trunk/html/api/group__rtcan.html#g0b068b1221129441b89967ee2ddb9f44 -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 11:26 ` Sebastian Smolorz @ 2007-03-05 11:42 ` Roland Tollenaar 2007-03-05 12:01 ` Sebastian Smolorz 2007-03-05 14:57 ` Stéphane ANCELOT 1 sibling, 1 reply; 38+ messages in thread From: Roland Tollenaar @ 2007-03-05 11:42 UTC (permalink / raw) To: Sebastian Smolorz, Xenomai-help Hi, my comments: > Note that the current implementation of RT-Socket-CAN shows this behaviour on > purpose. See also [1] ("may flood!"). Whether this is the right handling or > not may be discussed here. I admit that the current implementation forces an > application developer to take more responsibility but that is not a bug of > the underlying driver/stack per se. Agreed. > Look, you don't connect anything to the > CAN bus, start a *real-time* application which sends a message to a > non-existent CAN node. This is an error situation an it is more than ever for > a real-time task. So the proper reaction for a RT-application would be to > handle those errors and e.g. shut down the CAN interface which in this case > will force the CAN hardware to stop its endless attempts to send the message. Just an example: Say for arguments sake that my application is running two CAN buses. One gets addressed from one task and the other from another task. The tasks belong to the same physical process(machine) but are otherwise unrelated. Say the one is controlling the the quality of the mayonnaise the other of the ketchup. If the can-bus of the ketchup gets unplugged I don't want a batch of bad mayo as well. :) At the moment this is what seems to be happening. IMHO it would be nice if the warning did get into the message buffer (assuming that they cannot -easily- be expelled elsewhere) but that the overflowing does not result in triggering non-application warning or error handeling mechanisms. ? Kind regards, Roland. > > [1]http://www.xenomai.org/documentation/trunk/html/api/group__rtcan.html#g0b068b1221129441b89967ee2ddb9f44 > > -- > Sebastian > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 11:42 ` Roland Tollenaar @ 2007-03-05 12:01 ` Sebastian Smolorz 2007-03-05 12:16 ` Roland Tollenaar 0 siblings, 1 reply; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-05 12:01 UTC (permalink / raw) To: rolandtollenaar; +Cc: xenomai Roland Tollenaar wrote: > IMHO it would be nice if the warning did get into the message buffer > (assuming that they cannot -easily- be expelled elsewhere) but that the > overflowing does not result in triggering non-application You mean the log printing? Simply don't set CONFIG_XENO_DRIVERS_CAN_DEBUG. > warning or > error handeling mechanisms. Error handling is the duty of your application. The CAN stack doesn't do this for you. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 12:01 ` Sebastian Smolorz @ 2007-03-05 12:16 ` Roland Tollenaar 2007-03-05 12:48 ` Sebastian Smolorz 0 siblings, 1 reply; 38+ messages in thread From: Roland Tollenaar @ 2007-03-05 12:16 UTC (permalink / raw) To: Sebastian Smolorz, Xenomai-help Hi, Sebastian Smolorz wrote: > Roland Tollenaar wrote: >> IMHO it would be nice if the warning did get into the message buffer >> (assuming that they cannot -easily- be expelled elsewhere) but that the >> overflowing does not result in triggering non-application > > You mean the log printing? Simply don't set CONFIG_XENO_DRIVERS_CAN_DEBUG. Aha! This sounds like something that may sort out my little problem. Where do I set this parameter? I certainly did not turn it on so it must be in some default configuration. Is it on compiling the kernel? Where? how do I unset it? Thanks. Roland > >> warning or >> error handeling mechanisms. > > Error handling is the duty of your application. The CAN stack doesn't do this > for you. > > -- > Sebastian > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 12:16 ` Roland Tollenaar @ 2007-03-05 12:48 ` Sebastian Smolorz 2007-03-05 13:13 ` Roland Tollenaar 0 siblings, 1 reply; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-05 12:48 UTC (permalink / raw) To: rolandtollenaar; +Cc: Xenomai-help Roland Tollenaar wrote: > Hi, > > Sebastian Smolorz wrote: > > Roland Tollenaar wrote: > >> IMHO it would be nice if the warning did get into the message buffer > >> (assuming that they cannot -easily- be expelled elsewhere) but that the > >> overflowing does not result in triggering non-application > > > > You mean the log printing? Simply don't set > > CONFIG_XENO_DRIVERS_CAN_DEBUG. > > Aha! This sounds like something that may sort out my little problem. > Where do I set this parameter? I certainly did not turn it on so it must > be in some default configuration. Is it on compiling the kernel? Where? > how do I unset it? It's in the same kernel config menu where you activated RT-Socket-CAN. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 12:48 ` Sebastian Smolorz @ 2007-03-05 13:13 ` Roland Tollenaar 0 siblings, 0 replies; 38+ messages in thread From: Roland Tollenaar @ 2007-03-05 13:13 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: Xenomai-help thanks. Will rebuild the modules. Regards, Roland Sebastian Smolorz wrote: > Roland Tollenaar wrote: >> Hi, >> >> Sebastian Smolorz wrote: >>> Roland Tollenaar wrote: >>>> IMHO it would be nice if the warning did get into the message buffer >>>> (assuming that they cannot -easily- be expelled elsewhere) but that the >>>> overflowing does not result in triggering non-application >>> You mean the log printing? Simply don't set >>> CONFIG_XENO_DRIVERS_CAN_DEBUG. >> Aha! This sounds like something that may sort out my little problem. >> Where do I set this parameter? I certainly did not turn it on so it must >> be in some default configuration. Is it on compiling the kernel? Where? >> how do I unset it? > > It's in the same kernel config menu where you activated RT-Socket-CAN. > > -- > Sebastian > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 11:26 ` Sebastian Smolorz 2007-03-05 11:42 ` Roland Tollenaar @ 2007-03-05 14:57 ` Stéphane ANCELOT 2007-03-05 14:42 ` Sebastian Smolorz 1 sibling, 1 reply; 38+ messages in thread From: Stéphane ANCELOT @ 2007-03-05 14:57 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai Sebastian Smolorz wrote: > Stéphane ANCELOT wrote: >> I have checked what has happened when nothing plugged on canbus: >> >> a bus error occured . >> when bus error occurs the ecc register contains possible error information. >> once we read the ecc register (in isr routine) the error code register >> mechanism is enabled again . Thus another bus error interrupt appears >> again and a new BEI ocurs >> >> In this case, the system spend time in the ISR routine of the CAN >> driver.It gives you enough time to run a linux console , but not to >> launch other tasks like X... >> >> It would be necessary to find a way to manage this kind of problem and >> avoiding enabling forever Bus Error Interrupt. > > Note that the current implementation of RT-Socket-CAN shows this behaviour on > purpose. See also [1] ("may flood!"). Whether this is the right handling or > not may be discussed here. I admit that the current implementation forces an > application developer to take more responsibility but that is not a bug of > the underlying driver/stack per se. Look, you don't connect anything to the > CAN bus, start a *real-time* application which sends a message to a > non-existent CAN node. This is an error situation an it is more than ever for > a real-time task. So the proper reaction for a RT-application would be to > handle those errors and e.g. shut down the CAN interface which in this case > will force the CAN hardware to stop its endless attempts to send the message. I agree and this is what I was doing , however this does not seem to work as expected in the driver. > [1]http://www.xenomai.org/documentation/trunk/html/api/group__rtcan.html#g0b068b1221129441b89967ee2ddb9f44 > > -- > Sebastian > > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 14:57 ` Stéphane ANCELOT @ 2007-03-05 14:42 ` Sebastian Smolorz 2007-03-05 17:02 ` Stéphane ANCELOT 0 siblings, 1 reply; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-05 14:42 UTC (permalink / raw) To: Stéphane ANCELOT; +Cc: xenomai Stéphane ANCELOT wrote: > Sebastian Smolorz wrote: > > Note that the current implementation of RT-Socket-CAN shows this > > behaviour on purpose. See also [1] ("may flood!"). Whether this is the > > right handling or not may be discussed here. I admit that the current > > implementation forces an application developer to take more > > responsibility but that is not a bug of the underlying driver/stack per > > se. Look, you don't connect anything to the CAN bus, start a *real-time* > > application which sends a message to a non-existent CAN node. This is an > > error situation an it is more than ever for a real-time task. So the > > proper reaction for a RT-application would be to handle those errors and > > e.g. shut down the CAN interface which in this case will force the CAN > > hardware to stop its endless attempts to send the message. > > I agree and this is what I was doing , however this does not seem to > work as expected in the driver. What does not work? The shutdown and stopping transmitting the CAN messages? -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 14:42 ` Sebastian Smolorz @ 2007-03-05 17:02 ` Stéphane ANCELOT 2007-03-06 9:36 ` Sebastian Smolorz 0 siblings, 1 reply; 38+ messages in thread From: Stéphane ANCELOT @ 2007-03-05 17:02 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai Sebastian Smolorz wrote: > Stéphane ANCELOT wrote: >> Sebastian Smolorz wrote: >>> Note that the current implementation of RT-Socket-CAN shows this >>> behaviour on purpose. See also [1] ("may flood!"). Whether this is the >>> right handling or not may be discussed here. I admit that the current >>> implementation forces an application developer to take more >>> responsibility but that is not a bug of the underlying driver/stack per >>> se. Look, you don't connect anything to the CAN bus, start a *real-time* >>> application which sends a message to a non-existent CAN node. This is an >>> error situation an it is more than ever for a real-time task. So the >>> proper reaction for a RT-application would be to handle those errors and >>> e.g. shut down the CAN interface which in this case will force the CAN >>> hardware to stop its endless attempts to send the message. >> I agree and this is what I was doing , however this does not seem to >> work as expected in the driver. > > What does not work? The shutdown and stopping transmitting the CAN messages? > > -- > Sebastian > > Yes, this is exactly what has happened to me and rolland problem , one rtcansend launched and BEI interrupt come always.... since the error management shoudl be done by appplication process, I think that BUS ERROR INTERRUPT can be reported however the ECC reading must not be done by the interrupt routine. Since it permits a next bus error interrupt. the ECC reading should be left to user application eg through an ioctl. This may be an option or a error mode selectable by the programmer at startup . what do you think ? Regards steph ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-05 17:02 ` Stéphane ANCELOT @ 2007-03-06 9:36 ` Sebastian Smolorz 2007-03-10 20:53 ` Wolfgang Grandegger 2007-03-14 11:38 ` [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) Wolfgang Grandegger 0 siblings, 2 replies; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-06 9:36 UTC (permalink / raw) To: Stéphane ANCELOT; +Cc: xenomai Stéphane ANCELOT wrote: > Sebastian Smolorz wrote: > > Stéphane ANCELOT wrote: > >> Sebastian Smolorz wrote: > >>> Note that the current implementation of RT-Socket-CAN shows this > >>> behaviour on purpose. See also [1] ("may flood!"). Whether this is the > >>> right handling or not may be discussed here. I admit that the current > >>> implementation forces an application developer to take more > >>> responsibility but that is not a bug of the underlying driver/stack per > >>> se. Look, you don't connect anything to the CAN bus, start a > >>> *real-time* application which sends a message to a non-existent CAN > >>> node. This is an error situation an it is more than ever for a > >>> real-time task. So the proper reaction for a RT-application would be to > >>> handle those errors and e.g. shut down the CAN interface which in this > >>> case will force the CAN hardware to stop its endless attempts to send > >>> the message. > >> > >> I agree and this is what I was doing , however this does not seem to > >> work as expected in the driver. > > > > What does not work? The shutdown and stopping transmitting the CAN > > messages? > > > > -- > > Sebastian > > Yes, this is exactly what has happened to me and rolland problem , one > rtcansend launched and BEI interrupt come always.... Yes, I know. But when you stop the CAN interface in such a situation the interrupts must disappear because the controller does not try to send the message any more. > since the error management shoudl be done by appplication process, I > think that BUS ERROR INTERRUPT can be reported however the ECC reading > must not be done by the interrupt routine. I don't think that reading the ECC is the critical point, rather the interrupt flodding is. > Since it permits a next bus error interrupt. the ECC reading should be > left to user application eg through an ioctl. Error reporting in RT-Socket-CAN is the same as in Socket-CAN for plain Linux. It is done via error frames sent to the application. So your suggestion would break the API here and frankly is not necessary. You have several possibilities to detect a bus error due to a disconnected bus and can handle the situation properly (e.g. restart the interface). If a series of error frames are generated which shows you TX bus errors with missing acknowledgments you can be quite sure that no other node is connected to the bus. > > This may be an option or a error mode selectable by the programmer at > startup . > > what do you think ? Last summer we had a discussion about the BEI issue on the socketcan-ML. Two additional handling policies popped up: 1. The interface could restart itself after an amount of BEIs, thus taking responsibility from the user application. 2. The BEI could be completely disabled if no one is interested in this type of error frame. Maybe it is time to think about the implementation of these policies as more and more users seem to run into the BEI issue with a disconnected bus. Wolfgang, Jan, what is your opinion? -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) 2007-03-06 9:36 ` Sebastian Smolorz @ 2007-03-10 20:53 ` Wolfgang Grandegger 2007-03-14 11:38 ` [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) Wolfgang Grandegger 1 sibling, 0 replies; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-10 20:53 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai Sebastian Smolorz wrote: > Stéphane ANCELOT wrote: >> Sebastian Smolorz wrote: >>> Stéphane ANCELOT wrote: >>>> Sebastian Smolorz wrote: >>>>> Note that the current implementation of RT-Socket-CAN shows this >>>>> behaviour on purpose. See also [1] ("may flood!"). Whether this is the >>>>> right handling or not may be discussed here. I admit that the current >>>>> implementation forces an application developer to take more >>>>> responsibility but that is not a bug of the underlying driver/stack per >>>>> se. Look, you don't connect anything to the CAN bus, start a >>>>> *real-time* application which sends a message to a non-existent CAN >>>>> node. This is an error situation an it is more than ever for a >>>>> real-time task. So the proper reaction for a RT-application would be to >>>>> handle those errors and e.g. shut down the CAN interface which in this >>>>> case will force the CAN hardware to stop its endless attempts to send >>>>> the message. >>>> I agree and this is what I was doing , however this does not seem to >>>> work as expected in the driver. >>> What does not work? The shutdown and stopping transmitting the CAN >>> messages? >>> >>> -- >>> Sebastian >> Yes, this is exactly what has happened to me and rolland problem , one >> rtcansend launched and BEI interrupt come always.... > > Yes, I know. But when you stop the CAN interface in such a situation the > interrupts must disappear because the controller does not try to send the > message any more. > >> since the error management shoudl be done by appplication process, I >> think that BUS ERROR INTERRUPT can be reported however the ECC reading >> must not be done by the interrupt routine. > > I don't think that reading the ECC is the critical point, rather the interrupt > flodding is. > >> Since it permits a next bus error interrupt. the ECC reading should be >> left to user application eg through an ioctl. > > Error reporting in RT-Socket-CAN is the same as in Socket-CAN for plain Linux. > It is done via error frames sent to the application. So your suggestion would > break the API here and frankly is not necessary. You have several > possibilities to detect a bus error due to a disconnected bus and can handle > the situation properly (e.g. restart the interface). If a series of error > frames are generated which shows you TX bus errors with missing > acknowledgments you can be quite sure that no other node is connected to the > bus. > >> This may be an option or a error mode selectable by the programmer at >> startup . >> >> what do you think ? > > Last summer we had a discussion about the BEI issue on the socketcan-ML. Two > additional handling policies popped up: > 1. The interface could restart itself after an amount of BEIs, thus taking > responsibility from the user application. > 2. The BEI could be completely disabled if no one is interested in this type > of error frame. I personally prefer 2. More below... > > Maybe it is time to think about the implementation of these policies as more > and more users seem to run into the BEI issue with a disconnected bus. > Wolfgang, Jan, what is your opinion? Just trying to catch up with this issue. As you mention, it has already been discussed on the Socket-CAN mailing list. Just follow https://lists.berlios.de/pipermail/socketcan-core/2006-July/000215.html. I realized as well, that it is easily possible, to flood the system with BIE interrupts, especially on low end systems. The problem gets worse, when the error frames are delivered to the socket or even printk debug messages are generated due to buffer overflows. The latter can be suppressed by disabling RTCAN debugging (via XENO_DRIVERS_CAN_DEBUG). Then the system normally will _not_ hang because the interrupt rate is not critical. Nevertheless, this issue should be re-discussed on the Socket-CAN mailing list. I will not accept a rt-only only solution. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-06 9:36 ` Sebastian Smolorz 2007-03-10 20:53 ` Wolfgang Grandegger @ 2007-03-14 11:38 ` Wolfgang Grandegger 2007-03-14 12:51 ` Sebastian Smolorz 1 sibling, 1 reply; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-14 11:38 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai Sebastian Smolorz wrote: > Stéphane ANCELOT wrote: >> Sebastian Smolorz wrote: >>> Stéphane ANCELOT wrote: >>>> Sebastian Smolorz wrote: >>>>> Note that the current implementation of RT-Socket-CAN shows this >>>>> behaviour on purpose. See also [1] ("may flood!"). Whether this is the >>>>> right handling or not may be discussed here. I admit that the current >>>>> implementation forces an application developer to take more >>>>> responsibility but that is not a bug of the underlying driver/stack per >>>>> se. Look, you don't connect anything to the CAN bus, start a >>>>> *real-time* application which sends a message to a non-existent CAN >>>>> node. This is an error situation an it is more than ever for a >>>>> real-time task. So the proper reaction for a RT-application would be to >>>>> handle those errors and e.g. shut down the CAN interface which in this >>>>> case will force the CAN hardware to stop its endless attempts to send >>>>> the message. >>>> I agree and this is what I was doing , however this does not seem to >>>> work as expected in the driver. >>> What does not work? The shutdown and stopping transmitting the CAN >>> messages? >>> >>> -- >>> Sebastian >> Yes, this is exactly what has happened to me and rolland problem , one >> rtcansend launched and BEI interrupt come always.... > > Yes, I know. But when you stop the CAN interface in such a situation the > interrupts must disappear because the controller does not try to send the > message any more. > >> since the error management shoudl be done by appplication process, I >> think that BUS ERROR INTERRUPT can be reported however the ECC reading >> must not be done by the interrupt routine. > > I don't think that reading the ECC is the critical point, rather the interrupt > flodding is. > >> Since it permits a next bus error interrupt. the ECC reading should be >> left to user application eg through an ioctl. > > Error reporting in RT-Socket-CAN is the same as in Socket-CAN for plain Linux. > It is done via error frames sent to the application. So your suggestion would > break the API here and frankly is not necessary. You have several > possibilities to detect a bus error due to a disconnected bus and can handle > the situation properly (e.g. restart the interface). If a series of error > frames are generated which shows you TX bus errors with missing > acknowledgments you can be quite sure that no other node is connected to the > bus. > >> This may be an option or a error mode selectable by the programmer at >> startup . >> >> what do you think ? > > Last summer we had a discussion about the BEI issue on the socketcan-ML. Two > additional handling policies popped up: > 1. The interface could restart itself after an amount of BEIs, thus taking > responsibility from the user application. > 2. The BEI could be completely disabled if no one is interested in this type > of error frame. I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly does not work. :-(. The controller requires a re-start of the device to get the bus error reporting back to work. > Maybe it is time to think about the implementation of these policies as more > and more users seem to run into the BEI issue with a disconnected bus. > Wolfgang, Jan, what is your opinion? Well, solution 2. with the limitations mentioned above is therefore less attractive because it interrupts the CAN traffic. The Socket-CAN implementation actually restarts the CAN controller after a certain amount of bus error interrupts (200 by default) which matches your first policy above. But in RT-Socket-CAN, we do not automatically re-start the device by purpose. Therefore I tend to just stop the device. It's then up to the application to restart it. What do you think? Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-14 11:38 ` [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) Wolfgang Grandegger @ 2007-03-14 12:51 ` Sebastian Smolorz 2007-03-14 13:18 ` Wolfgang Grandegger 2007-03-17 11:56 ` Wolfgang Grandegger 0 siblings, 2 replies; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-14 12:51 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai Wolfgang Grandegger wrote: > Sebastian Smolorz wrote: > > > > Last summer we had a discussion about the BEI issue on the socketcan-ML. > > Two additional handling policies popped up: > > 1. The interface could restart itself after an amount of BEIs, thus > > taking responsibility from the user application. > > 2. The BEI could be completely disabled if no one is interested in this > > type of error frame. > > I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly > does not work. :-(. The controller requires a re-start of the device to > get the bus error reporting back to work. Oh, really? I wasn't aware of this. > > > Maybe it is time to think about the implementation of these policies as > > more and more users seem to run into the BEI issue with a disconnected > > bus. Wolfgang, Jan, what is your opinion? > > Well, solution 2. with the limitations mentioned above is therefore less > attractive because it interrupts the CAN traffic. True. > The Socket-CAN > implementation actually restarts the CAN controller after a certain > amount of bus error interrupts (200 by default) which matches your first > policy above. But in RT-Socket-CAN, we do not automatically re-start the > device by purpose. Therefore I tend to just stop the device. It's then > up to the application to restart it. What do you think? No fundamental objections but it would be best if an application would be informed of this special situation e.g. through an error frame with the meaning "controller was stopped because of a disconnected bus after trying to send 200 times the same message". A question pops up in this context: Why do we define CAN_ERR_RESTARTED if we never do this? Only to be compatible with Socket-CAN? Then I would propose to extend the documentation by pointing out that this will not appear under RT-Socket-CAN. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-14 12:51 ` Sebastian Smolorz @ 2007-03-14 13:18 ` Wolfgang Grandegger 2007-03-14 13:24 ` Sebastian Smolorz 2007-03-17 11:56 ` Wolfgang Grandegger 1 sibling, 1 reply; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-14 13:18 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai Sebastian Smolorz wrote: > Wolfgang Grandegger wrote: >> Sebastian Smolorz wrote: >>> Last summer we had a discussion about the BEI issue on the socketcan-ML. >>> Two additional handling policies popped up: >>> 1. The interface could restart itself after an amount of BEIs, thus >>> taking responsibility from the user application. >>> 2. The BEI could be completely disabled if no one is interested in this >>> type of error frame. >> I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly >> does not work. :-(. The controller requires a re-start of the device to >> get the bus error reporting back to work. > > Oh, really? I wasn't aware of this. I was surprised as well. The bus error interrupt can be disabled but not enabled in active mode. > >>> Maybe it is time to think about the implementation of these policies as >>> more and more users seem to run into the BEI issue with a disconnected >>> bus. Wolfgang, Jan, what is your opinion? >> Well, solution 2. with the limitations mentioned above is therefore less >> attractive because it interrupts the CAN traffic. > > True. > >> The Socket-CAN >> implementation actually restarts the CAN controller after a certain >> amount of bus error interrupts (200 by default) which matches your first >> policy above. But in RT-Socket-CAN, we do not automatically re-start the >> device by purpose. Therefore I tend to just stop the device. It's then >> up to the application to restart it. What do you think? > > No fundamental objections but it would be best if an application would be > informed of this special situation e.g. through an error frame with the > meaning "controller was stopped because of a disconnected bus after trying to > send 200 times the same message". D'accord, but we need another error definition for it, e.g. CAN_ERR_BUSERROR_FLOOD. I also plan to reset the error count in case of a successful transmission (counting only successive errors). > A question pops up in this context: Why do we define CAN_ERR_RESTARTED if we > never do this? Only to be compatible with Socket-CAN? Then I would propose to > extend the documentation by pointing out that this will not appear under > RT-Socket-CAN. It's just copied from the Socket-CAN implementation and some doc would be nice, indeed. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-14 13:18 ` Wolfgang Grandegger @ 2007-03-14 13:24 ` Sebastian Smolorz 0 siblings, 0 replies; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-14 13:24 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai Wolfgang Grandegger wrote: > Sebastian Smolorz wrote: > > Wolfgang Grandegger wrote: > >> The Socket-CAN > >> implementation actually restarts the CAN controller after a certain > >> amount of bus error interrupts (200 by default) which matches your first > >> policy above. But in RT-Socket-CAN, we do not automatically re-start the > >> device by purpose. Therefore I tend to just stop the device. It's then > >> up to the application to restart it. What do you think? > > > > No fundamental objections but it would be best if an application would be > > informed of this special situation e.g. through an error frame with the > > meaning "controller was stopped because of a disconnected bus after > > trying to send 200 times the same message". > > D'accord, but we need another error definition for it, e.g. > CAN_ERR_BUSERROR_FLOOD. Agreed. > I also plan to reset the error count in case of > a successful transmission (counting only successive errors). That would make sense. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-14 12:51 ` Sebastian Smolorz 2007-03-14 13:18 ` Wolfgang Grandegger @ 2007-03-17 11:56 ` Wolfgang Grandegger 2007-03-18 10:22 ` Jan Kiszka 2007-03-19 8:49 ` Stéphane ANCELOT 1 sibling, 2 replies; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-17 11:56 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 2989 bytes --] Hi Sebastian, Sebastian Smolorz wrote: > Wolfgang Grandegger wrote: >> Sebastian Smolorz wrote: >>> Last summer we had a discussion about the BEI issue on the socketcan-ML. >>> Two additional handling policies popped up: >>> 1. The interface could restart itself after an amount of BEIs, thus >>> taking responsibility from the user application. >>> 2. The BEI could be completely disabled if no one is interested in this >>> type of error frame. >> I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly >> does not work. :-(. The controller requires a re-start of the device to >> get the bus error reporting back to work. > > Oh, really? I wasn't aware of this. Well, I got it working. Reading the ECC register after re-enabling the bus error interrupts fixed the problem: if (CAN_STATE_OPERATING(dev->state)) { chip->write_reg(dev, SJA_IER, chip->ier); /* update on the fly */ chip->read_reg(dev, SJA_ECC); } >>> Maybe it is time to think about the implementation of these policies as >>> more and more users seem to run into the BEI issue with a disconnected >>> bus. Wolfgang, Jan, what is your opinion? >> Well, solution 2. with the limitations mentioned above is therefore less >> attractive because it interrupts the CAN traffic. > > True. Back to our preferred solution 1. Attached is a patch for review including some other fixes and suggestions accumulated over time: * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to enable bus error interrupts "on demand" only if an application is interested in such errors. It is automatically selected for CAN controllers supporting bus error interrupts like the SJA1000. * include/rtdm/rtcan.h: Add some doc on bus-off and bus-error error conditions and the restart policy. * src/utils/can/rtcanconfig.c: Controller mode settings and doc has been corrected. >> The Socket-CAN >> implementation actually restarts the CAN controller after a certain >> amount of bus error interrupts (200 by default) which matches your first >> policy above. But in RT-Socket-CAN, we do not automatically re-start the >> device by purpose. Therefore I tend to just stop the device. It's then >> up to the application to restart it. What do you think? > > No fundamental objections but it would be best if an application would be > informed of this special situation e.g. through an error frame with the > meaning "controller was stopped because of a disconnected bus after trying to > send 200 times the same message". > > A question pops up in this context: Why do we define CAN_ERR_RESTARTED if we > never do this? Only to be compatible with Socket-CAN? Then I would propose to > extend the documentation by pointing out that this will not appear under > RT-Socket-CAN. Let's wait if solution 1. is sufficient. maybe we need 2. later as well. Wolfgang. [-- Attachment #2: xenomai-rtcan-bus-error.patch --] [-- Type: text/x-patch, Size: 15558 bytes --] Index: include/rtdm/rtcan.h =================================================================== --- include/rtdm/rtcan.h (revision 2299) +++ include/rtdm/rtcan.h (working copy) @@ -598,16 +598,16 @@ typedef struct can_frame { /** * CAN error mask * - * A CAN error mask (see @ref Errors) can be set with @c setsockopt. This - * mask is then used to decided if error frames are send to this socket - * in case of error condidtions. The error frames are marked with the - * @ref CAN_ERR_FLAG of @ref CAN_xxx_FLAG and must be handled by the - * application properly. A detailed description of the error can be - * found in the @c can_id and the @c data fields of struct can_frame - * (see @ref Errors for futher details). + * A CAN error mask (see @ref Errors) can be set with @c setsockopt. This + * mask is then used to decide if error frames are delivered to this socket + * in case of error condidtions. The error frames are marked with the + * @ref CAN_ERR_FLAG of @ref CAN_xxx_FLAG and must be handled by the + * application properly. A detailed description of the errors can be + * found in the @c can_id and the @c data fields of struct can_frame + * (see @ref Errors for futher details). * * @n - * @param [in] level @b SOL_CAN_RAW + * @param [in] level @b SOL_CAN_RAW * * @param [in] optname @b CAN_RAW_ERR_FILTER * @@ -1062,7 +1062,19 @@ typedef struct can_frame { /*! * @anchor Errors @name Error mask * Error class (mask) in @c can_id field of struct can_frame to - * be used with @ref CAN_RAW_ERR_FILTER. + * be used with @ref CAN_RAW_ERR_FILTER. + * + * @b Note: In case of a bus-off error condition (@ref CAN_ERR_BUSOFF), the + * CAN controller is @b not restarted automatically. It is the application's + * responsibility to react appropriately, e.g. calling @ref CAN_MODE_START. + * + * @b Note: Bus error interrupts (@ref CAN_ERR_BUSERROR) are normally enabled + * "on demand" only if the application is interested in such errors using + * @ref CAN_RAW_ERR_FILTER. This avoids unnessecary bus error interrupt + * flooding. Flooding can still occur if the error interrupt rate is high + * and especially if the RT-Socket-CAN debugging option + * (@c CONFIG_XENO_DRIVERS_CAN_DEBUG) is enabled resulting in socket buffer + * overflow messages. Alternatively, you could check @c /proc/rtcan/sockets. * @{ */ /** TX timeout (netdevice driver) */ @@ -1074,7 +1086,7 @@ typedef struct can_frame { /** Controller problems (see @ref Error1 "data[1]") */ #define CAN_ERR_CRTL 0x00000004U -/** Protocol violations (see @ref Error2 "data[2]", +/** Protocol violations (see @ref Error2 "data[2]", @ref Error3 "data[3]") */ #define CAN_ERR_PROT 0x00000008U @@ -1088,14 +1100,14 @@ typedef struct can_frame { #define CAN_ERR_BUSOFF 0x00000040U /** Bus error (may flood!) */ -#define CAN_ERR_BUSERROR 0x00000080U +#define CAN_ERR_BUSERROR 0x00000080U /** Controller restarted */ #define CAN_ERR_RESTARTED 0x00000100U /** Omit EFF, RTR, ERR flags */ #define CAN_ERR_MASK 0x1FFFFFFFU - + /** @} */ /*! Index: ChangeLog =================================================================== --- ChangeLog (revision 2299) +++ ChangeLog (working copy) @@ -1,3 +1,17 @@ +2007-03-17 Wolfgang Grandegger <wg@domain.hid> + + * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt + flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to + enable bus error interrupts "on demand" only if an application is + interested in such errors. It is automatically selected for CAN + controllers supporting bus error interrupts like the SJA1000. + + * include/rtdm/rtcan.h: Add some doc on bus-off and bus-error error + conditions and the restart policy. + + * src/utils/can/rtcanconfig.c: Controller mode settings and doc + has been corrected. + 2007-03-12 Paul Corner <paul_c@domain.hid> * debian/: Add rule set for generating a series of Debian Index: src/utils/can/rtcanconfig.c =================================================================== --- src/utils/can/rtcanconfig.c (revision 2299) +++ src/utils/can/rtcanconfig.c (working copy) @@ -41,7 +41,7 @@ static void print_usage(char *prg) "Options:\n" " -v, --verbose be verbose\n" " -h, --help this help\n" - " -c, --ctrlmode=M1:M2:... listenonly or loopback mode\n" + " -c, --ctrlmode=CTRLMODE listenonly, loopback or none\n" " -b, --baudrate=BPS baudrate in bits/sec\n" " -B, --bittime=BTR0:BTR1 BTR or standard bit-time\n" " -B, --bittime=BRP:PROP_SEG:PHASE_SEG1:PHASE_SEG2:SJW:SAM\n", @@ -73,8 +73,10 @@ int string_to_ctrlmode(char *str) return CAN_CTRLMODE_LISTENONLY; else if ( !strcmp(str, "loopback") ) return CAN_CTRLMODE_LOOPBACK; + else if ( !strcmp(str, "none") ) + return 0; - return 0; + return -1; } int main(int argc, char *argv[]) @@ -137,7 +139,12 @@ int main(int argc, char *argv[]) break; case 'c': - new_ctrlmode |= string_to_ctrlmode(optarg); + ret = string_to_ctrlmode(optarg); + if (ret == -1) { + print_usage(argv[0]); + exit(0); + } + new_ctrlmode |= ret; set_ctrlmode = 1; break; Index: ksrc/drivers/can/Kconfig =================================================================== --- ksrc/drivers/can/Kconfig (revision 2299) +++ ksrc/drivers/can/Kconfig (working copy) @@ -49,6 +49,17 @@ config XENO_DRIVERS_CAN_MAX_RECEIVERS The driver maintains a receive filter list per device for fast access. +config XENO_DRIVERS_CAN_BUS_ERR + depends on XENO_DRIVERS_CAN + bool + default n + help + + To avoid unnecessary bus error interrupt flooding, this option enables + bus error interrupts "on demand" only if an application is interested + in such errors. It is automatically selected for CAN controllers + supporting bus error interrupts like the SJA1000. + config XENO_DRIVERS_CAN_VIRT depends on XENO_DRIVERS_CAN tristate "Virtual CAN bus driver" Index: ksrc/drivers/can/rtcan_dev.h =================================================================== --- ksrc/drivers/can/rtcan_dev.h (revision 2299) +++ ksrc/drivers/can/rtcan_dev.h (working copy) @@ -142,6 +142,10 @@ struct rtcan_device { #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_root; #endif +#ifdef CONFIG_XENO_DRIVERS_CAN_BUS_ERR + int bus_err_users; + void (*do_set_bus_err)(struct rtcan_device *dev, int enable); +#endif #ifdef CONFIG_XENO_DRIVERS_CAN_LOOPBACK struct rtcan_skb tx_skb; struct rtcan_socket *tx_socket; Index: ksrc/drivers/can/rtcan_raw_dev.c =================================================================== --- ksrc/drivers/can/rtcan_raw_dev.c (revision 2299) +++ ksrc/drivers/can/rtcan_raw_dev.c (working copy) @@ -312,3 +312,49 @@ int rtcan_raw_ioctl_dev(struct rtdm_dev_ return ret; } + +void rtcan_raw_set_err_mask(struct rtcan_socket *sock, + can_err_mask_t err_mask) +{ +#ifdef CONFIG_XENO_DRIVERS_CAN_BUS_ERR + int i, begin, end; + struct rtcan_device *dev; + rtdm_lockctx_t lock_ctx; + int ifindex = atomic_read(&sock->ifindex); + + if ((sock->err_mask & CAN_ERR_BUSERROR) != (err_mask & CAN_ERR_BUSERROR)) { + + if (ifindex) { + begin = ifindex; + end = ifindex; + } else { + begin = 1; + end = RTCAN_MAX_DEVICES; + } + + for (i = begin; i <= end; i++) { + if ((dev = rtcan_dev_get_by_index(i)) == NULL) + continue; + + if (dev->do_set_bus_err) { + rtdm_lock_get_irqsave(&dev->device_lock, lock_ctx); + if (err_mask & CAN_ERR_BUSERROR) { + if (dev->bus_err_users == 0) + dev->do_set_bus_err(dev, 1); + dev->bus_err_users++; + } else { + if (dev->bus_err_users > 0) { + dev->bus_err_users--; + if (dev->bus_err_users == 0) + dev->do_set_bus_err(dev, 0); + } + } + rtdm_lock_put_irqrestore(&dev->device_lock, lock_ctx); + } + rtcan_dev_dereference(dev); + } + } +#endif /* CONFIG_XENO_DRIVERS_CAN_BUS_ERR*/ + + sock->err_mask = err_mask; +} Index: ksrc/drivers/can/rtcan_raw.c =================================================================== --- ksrc/drivers/can/rtcan_raw.c (revision 2299) +++ ksrc/drivers/can/rtcan_raw.c (working copy) @@ -116,9 +116,9 @@ static void rtcan_rcv_deliver(struct rtc sock->recv_tail = (sock->recv_tail + cpy_size) & (RTCAN_RXBUF_SIZE - 1); - /*Notify the delivery of the message */ + /* Notify the delivery of the message */ rtdm_sem_up(&sock->recv_sem); - + } else { /* Overflow of socket's ring buffer! */ sock->rx_buf_full++; @@ -242,6 +242,9 @@ static int rtcan_raw_close(struct rtdm_d /* Get lock for reception lists */ rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx); + /* Reset error mask? */ + rtcan_raw_set_err_mask(sock, 0x0); + /* Check if socket is bound */ if (rtcan_sock_is_bound(sock)) rtcan_raw_unbind(sock); @@ -378,7 +381,7 @@ static int rtcan_raw_setsockopt(struct r /* Get lock for reception lists */ rtdm_lock_get_irqsave(&rtcan_recv_list_lock, lock_ctx); - sock->err_mask = err_mask; + rtcan_raw_set_err_mask(sock, err_mask); rtdm_lock_put_irqrestore(&rtcan_recv_list_lock, lock_ctx); break; Index: ksrc/drivers/can/rtcan_version.h =================================================================== --- ksrc/drivers/can/rtcan_version.h (revision 2299) +++ ksrc/drivers/can/rtcan_version.h (working copy) @@ -22,6 +22,6 @@ #define RTCAN_MAJOR_VER 0 #define RTCAN_MINOR_VER 90 -#define RTCAN_BUGFIX_VER 1 +#define RTCAN_BUGFIX_VER 2 #endif /* __RTCAN_VERSION_H_ */ Index: ksrc/drivers/can/rtcan_raw.h =================================================================== --- ksrc/drivers/can/rtcan_raw.h (revision 2299) +++ ksrc/drivers/can/rtcan_raw.h (working copy) @@ -32,6 +32,9 @@ void rtcan_raw_remove_filter(struct rtca void rtcan_rcv(struct rtcan_device *rtcandev, struct rtcan_skb *skb); +void rtcan_raw_set_err_mask(struct rtcan_socket *sock, + can_err_mask_t err_mask); + void rtcan_loopback(struct rtcan_device *rtcandev); #ifdef CONFIG_XENO_DRIVERS_CAN_LOOPBACK #define rtcan_loopback_enabled(sock) (sock->loopback) Index: ksrc/drivers/can/sja1000/Kconfig =================================================================== --- ksrc/drivers/can/sja1000/Kconfig (revision 2299) +++ ksrc/drivers/can/sja1000/Kconfig (working copy) @@ -1,6 +1,7 @@ config XENO_DRIVERS_CAN_SJA1000 depends on XENO_DRIVERS_CAN tristate "Philips SJA1000 CAN controller" + select XENO_DRIVERS_CAN_BUS_ERR config XENO_DRIVERS_CAN_SJA1000_ISA depends on XENO_DRIVERS_CAN_SJA1000 Index: ksrc/drivers/can/sja1000/rtcan_sja1000.c =================================================================== --- ksrc/drivers/can/sja1000/rtcan_sja1000.c (revision 2299) +++ ksrc/drivers/can/sja1000/rtcan_sja1000.c (working copy) @@ -66,7 +66,7 @@ /* Value for the interrupt enable register */ #define SJA1000_IER SJA_IER_RIE | SJA_IER_TIE | \ SJA_IER_EIE | SJA_IER_WUIE | \ - SJA_IER_EPIE | SJA_IER_BEIE | \ + SJA_IER_EPIE | \ SJA_IER_ALIE | SJA_IER_DOIE static char *sja_ctrl_name = "SJA1000"; @@ -247,7 +247,7 @@ static inline void rtcan_sja_err_interru else /* Bus-off recovery complete, enable all interrupts again */ - chip->write_reg(dev, SJA_IER, SJA1000_IER); + chip->write_reg(dev, SJA_IER, chip->ier); } if (state != dev->state && @@ -439,7 +439,7 @@ static int rtcan_sja_mode_stop(struct rt } else { ret = -EAGAIN; /* Enable interrupts again as we did not succeed */ - chip->write_reg(dev, SJA_IER, SJA1000_IER); + chip->write_reg(dev, SJA_IER, chip->ier); } out: @@ -489,7 +489,7 @@ static int rtcan_sja_mode_start(struct r /* Set up sender "mutex" */ rtdm_sem_init(&dev->tx_sem, 1); /* Enable interrupts */ - chip->write_reg(dev, SJA_IER, SJA1000_IER); + chip->write_reg(dev, SJA_IER, chip->ier); /* Clear reset mode bit in SJA1000 */ chip->write_reg(dev, SJA_MOD, mod_reg); @@ -621,12 +621,26 @@ int rtcan_sja_set_bit_time(struct rtcan_ chip->write_reg(dev, SJA_BTR0, btr0); chip->write_reg(dev, SJA_BTR1, btr1); - + return 0; } +/* + * Enable/disable bus error reporting + */ +void rtcan_sja_set_bus_err(struct rtcan_device *dev, int enable) +{ + struct rtcan_sja1000 *chip = (struct rtcan_sja1000 *)dev->priv; - + if (enable) + chip->ier |= SJA_IER_BEIE; + else + chip->ier &= ~SJA_IER_BEIE; + if (CAN_STATE_OPERATING(dev->state)) { + chip->write_reg(dev, SJA_IER, chip->ier); /* update on the fly */ + chip->read_reg(dev, SJA_ECC); + } +} /* * Start a transmission to a SJA1000 device @@ -722,7 +736,7 @@ static void sja1000_chip_config(struct r int rtcan_sja1000_register(struct rtcan_device *dev) { - int ret; + int ret; struct rtcan_sja1000 *chip = dev->priv; if (chip == NULL) @@ -745,8 +759,10 @@ int rtcan_sja1000_register(struct rtcan_ dev->do_set_mode = rtcan_sja_set_mode; dev->do_get_state = rtcan_sja_get_state; dev->do_set_bit_time = rtcan_sja_set_bit_time; + dev->do_set_bus_err = rtcan_sja_set_bus_err; + chip->ier = SJA1000_IER; - ret = rtdm_irq_request(&dev->irq_handle, + ret = rtdm_irq_request(&dev->irq_handle, chip->irq_num, rtcan_sja_interrupt, chip->irq_flags, sja_ctrl_name, dev); if (ret) { Index: ksrc/drivers/can/sja1000/rtcan_sja1000_proc.c =================================================================== --- ksrc/drivers/can/sja1000/rtcan_sja1000_proc.c (revision 2299) +++ ksrc/drivers/can/sja1000/rtcan_sja1000_proc.c (working copy) @@ -36,20 +36,20 @@ static int rtcan_sja_proc_regs(char *buf struct rtcan_sja1000 *chip = (struct rtcan_sja1000 *)dev->priv; int i; RTCAN_PROC_PRINT_VARS(80); - + if (!RTCAN_PROC_PRINT("SJA1000 registers")) goto done; for (i = 0; i < 0x20; i++) { if ((i % 0x10) == 0) { if (!RTCAN_PROC_PRINT("\n%02x:", i)) - goto done; + goto done; } if (!RTCAN_PROC_PRINT(" %02x", chip->read_reg(dev, i))) - goto done; + goto done; } if (!RTCAN_PROC_PRINT("\n")) goto done; - + done: RTCAN_PROC_PRINT_DONE; } Index: ksrc/drivers/can/sja1000/Config.in =================================================================== --- ksrc/drivers/can/sja1000/Config.in (revision 2299) +++ ksrc/drivers/can/sja1000/Config.in (working copy) @@ -4,6 +4,10 @@ dep_tristate 'Philips SJA1000 CAN controller' CONFIG_XENO_DRIVERS_CAN_SJA1000 $CONFIG_XENO_DRIVERS_CAN +if [ "$CONFIG_XENO_DRIVERS_CAN_SJA1000" != "n" ]; then + define_bool CONFIG_XENO_DRIVERS_CAN_BUS_ERR y +fi + dep_tristate ' Standard ISA controllers' CONFIG_XENO_DRIVERS_CAN_SJA1000_ISA $CONFIG_XENO_DRIVERS_CAN_SJA1000 if [ "$CONFIG_XENO_DRIVERS_CAN_SJA1000_ISA" != "n" ]; then int ' Maximum number of controllers' CONFIG_XENO_DRIVERS_CAN_SJA1000_ISA_MAX_DEV 4 Index: ksrc/drivers/can/sja1000/rtcan_sja1000.h =================================================================== --- ksrc/drivers/can/sja1000/rtcan_sja1000.h (revision 2299) +++ ksrc/drivers/can/sja1000/rtcan_sja1000.h (working copy) @@ -30,6 +30,7 @@ struct rtcan_sja1000 { unsigned short irq_flags; unsigned char ocr; unsigned char cdr; + unsigned char ier; }; int rtcan_sja_create_proc(struct rtcan_device* dev); ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-17 11:56 ` Wolfgang Grandegger @ 2007-03-18 10:22 ` Jan Kiszka 2007-03-18 11:33 ` Wolfgang Grandegger 2007-03-19 8:49 ` Stéphane ANCELOT 1 sibling, 1 reply; 38+ messages in thread From: Jan Kiszka @ 2007-03-18 10:22 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 825 bytes --] Wolfgang Grandegger wrote: > Back to our preferred solution 1. Attached is a patch for review > including some other fixes and suggestions accumulated over time: > > * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt > flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to > enable bus error interrupts "on demand" only if an application is > interested in such errors. It is automatically selected for CAN > controllers supporting bus error interrupts like the SJA1000. Jumping into this more or less blindly: Could you explain to me (as well as the poor CAN users...) what the downsides of enabling CONFIG_XENO_DRIVERS_CAN_BUS_ERR are? If there isn't anything significant, I would strongly vote for keeping the switch forest as small as possible. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-18 10:22 ` Jan Kiszka @ 2007-03-18 11:33 ` Wolfgang Grandegger 2007-03-18 20:59 ` Jan Kiszka 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-18 11:33 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Jan Kiszka wrote: > Wolfgang Grandegger wrote: >> Back to our preferred solution 1. Attached is a patch for review >> including some other fixes and suggestions accumulated over time: >> >> * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt >> flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to >> enable bus error interrupts "on demand" only if an application is >> interested in such errors. It is automatically selected for CAN >> controllers supporting bus error interrupts like the SJA1000. > > Jumping into this more or less blindly: Could you explain to me (as well > as the poor CAN users...) what the downsides of enabling > CONFIG_XENO_DRIVERS_CAN_BUS_ERR are? If there isn't anything > significant, I would strongly vote for keeping the switch forest as > small as possible. The user has not to care and cannot even enable this option. It is auto selected for SJA1000. The purpose of this config option is to suppress correlated code for builds without SJA1000. About the functionality: as you know, on the SJA1000 the bus error interrupt can result in high error interrupt rates and even hang the system on slow processors. Just unplugging the CAN cable can cause such interrupt flooding. This problem popped up again recently and Sebastian proposed: > Last summer we had a discussion about the BEI issue on the > socketcan-ML. Two additional handling policies popped up: > 1. The interface could restart itself after an amount of BEIs, thus > taking responsibility from the user application. > 2. The BEI could be completely disabled if no one is interested in > this ype of error frame. As 2. is also my preferred solution, I have implemented it. The only downside is that you do not see the error counter increasing when /proc/rtcan/devices is inspected. We also discussed 1., but RT-Socket-CAN does not restart the CAN controller by purpose and just stoppping it requires user intervention. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-18 11:33 ` Wolfgang Grandegger @ 2007-03-18 20:59 ` Jan Kiszka 2007-03-19 8:21 ` Sebastian Smolorz 0 siblings, 1 reply; 38+ messages in thread From: Jan Kiszka @ 2007-03-18 20:59 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 2487 bytes --] Wolfgang Grandegger wrote: > Jan Kiszka wrote: >> Wolfgang Grandegger wrote: >>> Back to our preferred solution 1. Attached is a patch for review >>> including some other fixes and suggestions accumulated over time: >>> >>> * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt >>> flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to >>> enable bus error interrupts "on demand" only if an application is >>> interested in such errors. It is automatically selected for CAN >>> controllers supporting bus error interrupts like the SJA1000. >> >> Jumping into this more or less blindly: Could you explain to me (as well >> as the poor CAN users...) what the downsides of enabling >> CONFIG_XENO_DRIVERS_CAN_BUS_ERR are? If there isn't anything >> significant, I would strongly vote for keeping the switch forest as >> small as possible. > > The user has not to care and cannot even enable this option. It is auto > selected for SJA1000. The purpose of this config option is to suppress > correlated code for builds without SJA1000. About the functionality: as Hmm, probably the attached help text confused me (who should see it BTW?). > you know, on the SJA1000 the bus error interrupt can result in high > error interrupt rates and even hang the system on slow processors. Just > unplugging the CAN cable can cause such interrupt flooding. This problem > popped up again recently and Sebastian proposed: > >> Last summer we had a discussion about the BEI issue on the >> socketcan-ML. Two additional handling policies popped up: >> 1. The interface could restart itself after an amount of BEIs, thus >> taking responsibility from the user application. >> 2. The BEI could be completely disabled if no one is interested in >> this ype of error frame. > > As 2. is also my preferred solution, I have implemented it. The only > downside is that you do not see the error counter increasing when > /proc/rtcan/devices is inspected. We also discussed 1., but > RT-Socket-CAN does not restart the CAN controller by purpose and just > stoppping it requires user intervention. And if there is someone listening, how is the flooding issue on cable unplug etc. solved by option 2? What about something like option 3: After the first error occurred that may mark the beginning of a flood, disable that error interrupt until the next stop/start cycle or the user has read the event? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-18 20:59 ` Jan Kiszka @ 2007-03-19 8:21 ` Sebastian Smolorz 2007-03-19 8:50 ` Sebastian Smolorz ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-19 8:21 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Hi Jan, Jan Kiszka wrote: > Wolfgang Grandegger wrote: > > you know, on the SJA1000 the bus error interrupt can result in high > > error interrupt rates and even hang the system on slow processors. Just > > unplugging the CAN cable can cause such interrupt flooding. This problem > > > > popped up again recently and Sebastian proposed: > >> Last summer we had a discussion about the BEI issue on the > >> socketcan-ML. Two additional handling policies popped up: > >> 1. The interface could restart itself after an amount of BEIs, thus > >> taking responsibility from the user application. > >> 2. The BEI could be completely disabled if no one is interested in > >> this ype of error frame. > > > > As 2. is also my preferred solution, I have implemented it. The only > > downside is that you do not see the error counter increasing when > > /proc/rtcan/devices is inspected. We also discussed 1., but > > RT-Socket-CAN does not restart the CAN controller by purpose and just > > stoppping it requires user intervention. > > And if there is someone listening, how is the flooding issue on cable > unplug etc. solved by option 2? Hm, maybe we could implement 1 additionally (but without automatical restart)? > > What about something like option 3: After the first error occurred that > may mark the beginning of a flood, disable that error interrupt until > the next stop/start cycle or the user has read the event? IIRC, there is no possibility to detect a "normal" bus error (acknowledge) appearing during normal operation from the one occuring when the cable is plugged off. The best indication is a high number of consecutive BEIs. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 8:21 ` Sebastian Smolorz @ 2007-03-19 8:50 ` Sebastian Smolorz 2007-03-19 11:35 ` Wolfgang Grandegger 2007-03-19 8:54 ` Wolfgang Grandegger 2007-03-19 16:48 ` Stéphane ANCELOT 2 siblings, 1 reply; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-19 8:50 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Sebastian Smolorz wrote: > Hi Jan, > > Jan Kiszka wrote: > > Wolfgang Grandegger wrote: > > > you know, on the SJA1000 the bus error interrupt can result in high > > > error interrupt rates and even hang the system on slow processors. Just > > > unplugging the CAN cable can cause such interrupt flooding. This > > > problem > > > > > > popped up again recently and Sebastian proposed: > > >> Last summer we had a discussion about the BEI issue on the > > >> socketcan-ML. Two additional handling policies popped up: > > >> 1. The interface could restart itself after an amount of BEIs, thus > > >> taking responsibility from the user application. > > >> 2. The BEI could be completely disabled if no one is interested in > > >> this ype of error frame. > > > > > > As 2. is also my preferred solution, I have implemented it. The only > > > downside is that you do not see the error counter increasing when > > > /proc/rtcan/devices is inspected. We also discussed 1., but > > > RT-Socket-CAN does not restart the CAN controller by purpose and just > > > stoppping it requires user intervention. > > > > And if there is someone listening, how is the flooding issue on cable > > unplug etc. solved by option 2? > > Hm, maybe we could implement 1 additionally (but without automatical > restart)? A more precise suggestion: What about letting BEIs appear until passive mode is reached and if the TX error counter doesn't count up any more (indication of start-up situation discovered by the SJA1000) the driver ceases to read out ECC any further (thanks Stephane for the hint). The controller would be still operating but not reporting BEIs any more. There has to be some mechanism to let BEIs through after the situation has normalized. Maybe the driver could check inside the interrupt handler if active mode was reached again after the above situation occured. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 8:50 ` Sebastian Smolorz @ 2007-03-19 11:35 ` Wolfgang Grandegger 2007-03-19 11:46 ` Sebastian Smolorz 2007-03-19 13:05 ` Jan Kiszka 0 siblings, 2 replies; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-19 11:35 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai, Jan Kiszka Sebastian Smolorz wrote: > Sebastian Smolorz wrote: >> Hi Jan, >> >> Jan Kiszka wrote: >>> Wolfgang Grandegger wrote: >>>> you know, on the SJA1000 the bus error interrupt can result in high >>>> error interrupt rates and even hang the system on slow processors. Just >>>> unplugging the CAN cable can cause such interrupt flooding. This >>>> problem >>>> >>>> popped up again recently and Sebastian proposed: >>>>> Last summer we had a discussion about the BEI issue on the >>>>> socketcan-ML. Two additional handling policies popped up: >>>>> 1. The interface could restart itself after an amount of BEIs, thus >>>>> taking responsibility from the user application. >>>>> 2. The BEI could be completely disabled if no one is interested in >>>>> this ype of error frame. >>>> As 2. is also my preferred solution, I have implemented it. The only >>>> downside is that you do not see the error counter increasing when >>>> /proc/rtcan/devices is inspected. We also discussed 1., but >>>> RT-Socket-CAN does not restart the CAN controller by purpose and just >>>> stoppping it requires user intervention. >>> And if there is someone listening, how is the flooding issue on cable >>> unplug etc. solved by option 2? >> Hm, maybe we could implement 1 additionally (but without automatical >> restart)? > > A more precise suggestion: What about letting BEIs appear until passive mode > is reached and if the TX error counter doesn't count up any more (indication > of start-up situation discovered by the SJA1000) the driver ceases to read > out ECC any further (thanks Stephane for the hint). The controller would be > still operating but not reporting BEIs any more. There has to be some > mechanism to let BEIs through after the situation has normalized. Maybe the > driver could check inside the interrupt handler if active mode was reached > again after the above situation occured. Well, this is rather sophisticated and needs some more careful evaluation. We might also reach the passive level slowly without flooding. Furthermore, the method should also be applicable for other controllers. Let's implement 1. and downscaled printk and wait for the users reaction , see also my other mail. Then we should bring up this discussion again on the Socket-CAN-ML to negotiate a common solution. OK? Wolfgang. > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 11:35 ` Wolfgang Grandegger @ 2007-03-19 11:46 ` Sebastian Smolorz 2007-03-19 13:05 ` Jan Kiszka 1 sibling, 0 replies; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-19 11:46 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai, Jan Kiszka Wolfgang Grandegger wrote: > Sebastian Smolorz wrote: > > Sebastian Smolorz wrote: > >> Hi Jan, > >> > >> Jan Kiszka wrote: > >>> Wolfgang Grandegger wrote: > >>>> you know, on the SJA1000 the bus error interrupt can result in high > >>>> error interrupt rates and even hang the system on slow processors. > >>>> Just unplugging the CAN cable can cause such interrupt flooding. This > >>>> problem > >>>> > >>>> popped up again recently and Sebastian proposed: > >>>>> Last summer we had a discussion about the BEI issue on the > >>>>> socketcan-ML. Two additional handling policies popped up: > >>>>> 1. The interface could restart itself after an amount of BEIs, thus > >>>>> taking responsibility from the user application. > >>>>> 2. The BEI could be completely disabled if no one is interested in > >>>>> this ype of error frame. > >>>> > >>>> As 2. is also my preferred solution, I have implemented it. The only > >>>> downside is that you do not see the error counter increasing when > >>>> /proc/rtcan/devices is inspected. We also discussed 1., but > >>>> RT-Socket-CAN does not restart the CAN controller by purpose and just > >>>> stoppping it requires user intervention. > >>> > >>> And if there is someone listening, how is the flooding issue on cable > >>> unplug etc. solved by option 2? > >> > >> Hm, maybe we could implement 1 additionally (but without automatical > >> restart)? > > > > A more precise suggestion: What about letting BEIs appear until passive > > mode is reached and if the TX error counter doesn't count up any more > > (indication of start-up situation discovered by the SJA1000) the driver > > ceases to read out ECC any further (thanks Stephane for the hint). The > > controller would be still operating but not reporting BEIs any more. > > There has to be some mechanism to let BEIs through after the situation > > has normalized. Maybe the driver could check inside the interrupt handler > > if active mode was reached again after the above situation occured. > > Well, this is rather sophisticated and needs some more careful > evaluation. We might also reach the passive level slowly without > flooding. Furthermore, the method should also be applicable for other > controllers. > > Let's implement 1. and downscaled printk and wait for the users reaction > , see also my other mail. Then we should bring up this discussion again > on the Socket-CAN-ML to negotiate a common solution. > > OK? OK. It was only a hot-of-the-brain-proposal. ;-) -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 11:35 ` Wolfgang Grandegger 2007-03-19 11:46 ` Sebastian Smolorz @ 2007-03-19 13:05 ` Jan Kiszka 2007-03-19 20:44 ` Wolfgang Grandegger 1 sibling, 1 reply; 38+ messages in thread From: Jan Kiszka @ 2007-03-19 13:05 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 3106 bytes --] Wolfgang Grandegger wrote: > Sebastian Smolorz wrote: >> Sebastian Smolorz wrote: >>> Hi Jan, >>> >>> Jan Kiszka wrote: >>>> Wolfgang Grandegger wrote: >>>>> you know, on the SJA1000 the bus error interrupt can result in high >>>>> error interrupt rates and even hang the system on slow processors. >>>>> Just >>>>> unplugging the CAN cable can cause such interrupt flooding. This >>>>> problem >>>>> >>>>> popped up again recently and Sebastian proposed: >>>>>> Last summer we had a discussion about the BEI issue on the >>>>>> socketcan-ML. Two additional handling policies popped up: >>>>>> 1. The interface could restart itself after an amount of BEIs, thus >>>>>> taking responsibility from the user application. >>>>>> 2. The BEI could be completely disabled if no one is interested in >>>>>> this ype of error frame. >>>>> As 2. is also my preferred solution, I have implemented it. The only >>>>> downside is that you do not see the error counter increasing when >>>>> /proc/rtcan/devices is inspected. We also discussed 1., but >>>>> RT-Socket-CAN does not restart the CAN controller by purpose and just >>>>> stoppping it requires user intervention. >>>> And if there is someone listening, how is the flooding issue on cable >>>> unplug etc. solved by option 2? >>> Hm, maybe we could implement 1 additionally (but without automatical >>> restart)? >> >> A more precise suggestion: What about letting BEIs appear until >> passive mode is reached and if the TX error counter doesn't count up >> any more (indication of start-up situation discovered by the SJA1000) >> the driver ceases to read out ECC any further (thanks Stephane for the >> hint). The controller would be still operating but not reporting BEIs >> any more. There has to be some mechanism to let BEIs through after the >> situation has normalized. Maybe the driver could check inside the >> interrupt handler if active mode was reached again after the above >> situation occured. > > Well, this is rather sophisticated and needs some more careful > evaluation. We might also reach the passive level slowly without > flooding. Furthermore, the method should also be applicable for other > controllers. What is the current behaviour of other controllers? > > Let's implement 1. and downscaled printk and wait for the users reaction > , see also my other mail. Then we should bring up this discussion again > on the Socket-CAN-ML to negotiate a common solution. Instead of waiting on some user triggering a (potential) latency mine, I would prefer that we experimentally evaluate the effect. E.g. via an I-pipe tracer dump on a faster and a slower box. I would offer to run some demo code here on our PC104 Phytec boards as well. The problem is to define what degree of error-related IRQ load is generally acceptable. We surely can't do this, so we have to document the effect /at least/ and help the users to check it on their own - or we have to avoid it / make it insignificant compared to normal CAN operation (I'm still in favour of this path). Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 13:05 ` Jan Kiszka @ 2007-03-19 20:44 ` Wolfgang Grandegger 2007-03-19 21:19 ` Wolfgang Grandegger 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-19 20:44 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Jan Kiszka wrote: > Wolfgang Grandegger wrote: >> Sebastian Smolorz wrote: >>> Sebastian Smolorz wrote: >>>> Hi Jan, >>>> >>>> Jan Kiszka wrote: >>>>> Wolfgang Grandegger wrote: >>>>>> you know, on the SJA1000 the bus error interrupt can result in high >>>>>> error interrupt rates and even hang the system on slow processors. >>>>>> Just >>>>>> unplugging the CAN cable can cause such interrupt flooding. This >>>>>> problem >>>>>> >>>>>> popped up again recently and Sebastian proposed: >>>>>>> Last summer we had a discussion about the BEI issue on the >>>>>>> socketcan-ML. Two additional handling policies popped up: >>>>>>> 1. The interface could restart itself after an amount of BEIs, thus >>>>>>> taking responsibility from the user application. >>>>>>> 2. The BEI could be completely disabled if no one is interested in >>>>>>> this ype of error frame. >>>>>> As 2. is also my preferred solution, I have implemented it. The only >>>>>> downside is that you do not see the error counter increasing when >>>>>> /proc/rtcan/devices is inspected. We also discussed 1., but >>>>>> RT-Socket-CAN does not restart the CAN controller by purpose and just >>>>>> stoppping it requires user intervention. >>>>> And if there is someone listening, how is the flooding issue on cable >>>>> unplug etc. solved by option 2? >>>> Hm, maybe we could implement 1 additionally (but without automatical >>>> restart)? >>> A more precise suggestion: What about letting BEIs appear until >>> passive mode is reached and if the TX error counter doesn't count up >>> any more (indication of start-up situation discovered by the SJA1000) >>> the driver ceases to read out ECC any further (thanks Stephane for the >>> hint). The controller would be still operating but not reporting BEIs >>> any more. There has to be some mechanism to let BEIs through after the >>> situation has normalized. Maybe the driver could check inside the >>> interrupt handler if active mode was reached again after the above >>> situation occured. >> Well, this is rather sophisticated and needs some more careful >> evaluation. We might also reach the passive level slowly without >> flooding. Furthermore, the method should also be applicable for other >> controllers. > > What is the current behaviour of other controllers? Most do not have such detailed error reporting via bus error interrupts. I know just the i82527 reporting bus errors as well. >> Let's implement 1. and downscaled printk and wait for the users reaction >> , see also my other mail. Then we should bring up this discussion again >> on the Socket-CAN-ML to negotiate a common solution. > > Instead of waiting on some user triggering a (potential) latency mine, I > would prefer that we experimentally evaluate the effect. E.g. via an > I-pipe tracer dump on a faster and a slower box. I would offer to run > some demo code here on our PC104 Phytec boards as well. I think we should first run the latency test concurrently and if we discover high latencies an IPIPE trace helps locating the latency peaks. > The problem is to define what degree of error-related IRQ load is > generally acceptable. We surely can't do this, so we have to document > the effect /at least/ and help the users to check it on their own - or > we have to avoid it / make it insignificant compared to normal CAN > operation (I'm still in favour of this path). We speak about a pathological situation and therefore I do not share your concerns. When there are electrical problems or even the cable is not connected, we do have an abnormal mode of operation and CAN related real-time is broken anyhow. The bus error messages are then useful for analyzing the problem. The effect of the bus error interrupts on non-CAN related latencies is another issue but I think it's not that critical either (handling a bus error just requires the reading of 2 SJA1000 registers). But I agree, a more detailed analysis of "bus error flooding" would help to understand the impact on the real-time behavior. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 20:44 ` Wolfgang Grandegger @ 2007-03-19 21:19 ` Wolfgang Grandegger 2007-03-19 22:25 ` Jan Kiszka 0 siblings, 1 reply; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-19 21:19 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai, Jan Kiszka Wolfgang Grandegger wrote: > Jan Kiszka wrote: >> Wolfgang Grandegger wrote: >>> Sebastian Smolorz wrote: >>>> Sebastian Smolorz wrote: >>>>> Hi Jan, >>>>> >>>>> Jan Kiszka wrote: >>>>>> Wolfgang Grandegger wrote: >>>>>>> you know, on the SJA1000 the bus error interrupt can result in high >>>>>>> error interrupt rates and even hang the system on slow processors. >>>>>>> Just >>>>>>> unplugging the CAN cable can cause such interrupt flooding. This >>>>>>> problem >>>>>>> >>>>>>> popped up again recently and Sebastian proposed: >>>>>>>> Last summer we had a discussion about the BEI issue on the >>>>>>>> socketcan-ML. Two additional handling policies popped up: >>>>>>>> 1. The interface could restart itself after an amount of BEIs, thus >>>>>>>> taking responsibility from the user application. >>>>>>>> 2. The BEI could be completely disabled if no one is interested in >>>>>>>> this ype of error frame. >>>>>>> As 2. is also my preferred solution, I have implemented it. The only >>>>>>> downside is that you do not see the error counter increasing when >>>>>>> /proc/rtcan/devices is inspected. We also discussed 1., but >>>>>>> RT-Socket-CAN does not restart the CAN controller by purpose and >>>>>>> just >>>>>>> stoppping it requires user intervention. >>>>>> And if there is someone listening, how is the flooding issue on cable >>>>>> unplug etc. solved by option 2? >>>>> Hm, maybe we could implement 1 additionally (but without automatical >>>>> restart)? >>>> A more precise suggestion: What about letting BEIs appear until >>>> passive mode is reached and if the TX error counter doesn't count up >>>> any more (indication of start-up situation discovered by the SJA1000) >>>> the driver ceases to read out ECC any further (thanks Stephane for the >>>> hint). The controller would be still operating but not reporting BEIs >>>> any more. There has to be some mechanism to let BEIs through after the >>>> situation has normalized. Maybe the driver could check inside the >>>> interrupt handler if active mode was reached again after the above >>>> situation occured. >>> Well, this is rather sophisticated and needs some more careful >>> evaluation. We might also reach the passive level slowly without >>> flooding. Furthermore, the method should also be applicable for other >>> controllers. >> >> What is the current behaviour of other controllers? > > Most do not have such detailed error reporting via bus error interrupts. > I know just the i82527 reporting bus errors as well. > >>> Let's implement 1. and downscaled printk and wait for the users reaction >>> , see also my other mail. Then we should bring up this discussion again >>> on the Socket-CAN-ML to negotiate a common solution. >> >> Instead of waiting on some user triggering a (potential) latency mine, I >> would prefer that we experimentally evaluate the effect. E.g. via an >> I-pipe tracer dump on a faster and a slower box. I would offer to run >> some demo code here on our PC104 Phytec boards as well. > > I think we should first run the latency test concurrently and if we > discover high latencies an IPIPE trace helps locating the latency peaks. > >> The problem is to define what degree of error-related IRQ load is >> generally acceptable. We surely can't do this, so we have to document >> the effect /at least/ and help the users to check it on their own - or >> we have to avoid it / make it insignificant compared to normal CAN >> operation (I'm still in favour of this path). > > We speak about a pathological situation and therefore I do not share > your concerns. When there are electrical problems or even the cable is > not connected, we do have an abnormal mode of operation and CAN related > real-time is broken anyhow. The bus error messages are then useful for > analyzing the problem. The effect of the bus error interrupts on non-CAN > related latencies is another issue but I think it's not that critical > either (handling a bus error just requires the reading of 2 SJA1000 > registers). But I agree, a more detailed analysis of "bus error > flooding" would help to understand the impact on the real-time behavior. And also be aware, that heavy CAN traffic can cause similar latencies as well and when there is more than one CAN controller, they can accumulate (as I have observed with my PCAN dongle tests). Here a IRQ service task or threaded IRQs would help. Maybe this is the right way to go. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 21:19 ` Wolfgang Grandegger @ 2007-03-19 22:25 ` Jan Kiszka 2007-03-20 6:53 ` Wolfgang Grandegger 0 siblings, 1 reply; 38+ messages in thread From: Jan Kiszka @ 2007-03-19 22:25 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 1884 bytes --] Wolfgang Grandegger wrote: >>> The problem is to define what degree of error-related IRQ load is >>> generally acceptable. We surely can't do this, so we have to document >>> the effect /at least/ and help the users to check it on their own - or >>> we have to avoid it / make it insignificant compared to normal CAN >>> operation (I'm still in favour of this path). >> >> We speak about a pathological situation and therefore I do not share >> your concerns. When there are electrical problems or even the cable is >> not connected, we do have an abnormal mode of operation and CAN >> related real-time is broken anyhow. The bus error messages are then >> useful for analyzing the problem. The effect of the bus error I do understand, and that's why I was looking for some solution that rate-controls such IRQs deterministically, but doesn't switch them off altogether. >> interrupts on non-CAN related latencies is another issue but I think >> it's not that critical either (handling a bus error just requires the >> reading of 2 SJA1000 registers). But I agree, a more detailed analysis >> of "bus error flooding" would help to understand the impact on the >> real-time behavior. > > And also be aware, that heavy CAN traffic can cause similar latencies as > well and when there is more than one CAN controller, they can accumulate > (as I have observed with my PCAN dongle tests). Here a IRQ service task Well, this argumentation doesn't help if some concrete CAN bus was specified to _not_ deliver such high traffic. > or threaded IRQs would help. Maybe this is the right way to go. Again: threaded IRQs are no magic bullet. First of all, they add latencies, specifically on low-end. And they can only help if IRQ priorities can actually be lowered appropriately (or if you apply round-robin or a similar CPU bandwidth policy). Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 22:25 ` Jan Kiszka @ 2007-03-20 6:53 ` Wolfgang Grandegger 0 siblings, 0 replies; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-20 6:53 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Jan Kiszka wrote: > Wolfgang Grandegger wrote: >>>> The problem is to define what degree of error-related IRQ load is >>>> generally acceptable. We surely can't do this, so we have to document >>>> the effect /at least/ and help the users to check it on their own - or >>>> we have to avoid it / make it insignificant compared to normal CAN >>>> operation (I'm still in favour of this path). >>> We speak about a pathological situation and therefore I do not share >>> your concerns. When there are electrical problems or even the cable is >>> not connected, we do have an abnormal mode of operation and CAN >>> related real-time is broken anyhow. The bus error messages are then >>> useful for analyzing the problem. The effect of the bus error > > I do understand, and that's why I was looking for some solution that > rate-controls such IRQs deterministically, but doesn't switch them off > altogether. OK, then lets first check if we need bus error rate control at all for the sake of real-time. I will do some test a.s.s.p. I still believe that most of the reported problems are due to heavy printk debuging output. >>> interrupts on non-CAN related latencies is another issue but I think >>> it's not that critical either (handling a bus error just requires the >>> reading of 2 SJA1000 registers). But I agree, a more detailed analysis >>> of "bus error flooding" would help to understand the impact on the >>> real-time behavior. >> And also be aware, that heavy CAN traffic can cause similar latencies as >> well and when there is more than one CAN controller, they can accumulate >> (as I have observed with my PCAN dongle tests). Here a IRQ service task > > Well, this argumentation doesn't help if some concrete CAN bus was > specified to _not_ deliver such high traffic. > >> or threaded IRQs would help. Maybe this is the right way to go. > > Again: threaded IRQs are no magic bullet. First of all, they add > latencies, specifically on low-end. And they can only help if IRQ > priorities can actually be lowered appropriately (or if you apply > round-robin or a similar CPU bandwidth policy). I know, but if the servicing of interrupts takes too long and other task should not suffer from that, it's the only reasonable solution, AFAIS. Nevertheless, I think we all agree that the patch for 2., the on-demand bus error interrupts, should be commited, right? Wolfgang. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 8:21 ` Sebastian Smolorz 2007-03-19 8:50 ` Sebastian Smolorz @ 2007-03-19 8:54 ` Wolfgang Grandegger 2007-03-19 16:48 ` Stéphane ANCELOT 2 siblings, 0 replies; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-19 8:54 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai, Jan Kiszka Sebastian Smolorz wrote: > Hi Jan, > > Jan Kiszka wrote: >> Wolfgang Grandegger wrote: >>> you know, on the SJA1000 the bus error interrupt can result in high >>> error interrupt rates and even hang the system on slow processors. Just >>> unplugging the CAN cable can cause such interrupt flooding. This problem >>> >>> popped up again recently and Sebastian proposed: >>>> Last summer we had a discussion about the BEI issue on the >>>> socketcan-ML. Two additional handling policies popped up: >>>> 1. The interface could restart itself after an amount of BEIs, thus >>>> taking responsibility from the user application. >>>> 2. The BEI could be completely disabled if no one is interested in >>>> this ype of error frame. >>> As 2. is also my preferred solution, I have implemented it. The only >>> downside is that you do not see the error counter increasing when >>> /proc/rtcan/devices is inspected. We also discussed 1., but >>> RT-Socket-CAN does not restart the CAN controller by purpose and just >>> stoppping it requires user intervention. >> And if there is someone listening, how is the flooding issue on cable >> unplug etc. solved by option 2? > > Hm, maybe we could implement 1 additionally (but without automatical restart)? > >> What about something like option 3: After the first error occurred that >> may mark the beginning of a flood, disable that error interrupt until >> the next stop/start cycle or the user has read the event? > > IIRC, there is no possibility to detect a "normal" bus error (acknowledge) > appearing during normal operation from the one occuring when the cable is > plugged off. The best indication is a high number of consecutive BEIs. I agree. But the controller internally counts the errors as well reflected by the change of the state to warning or passive. If the application is interested in more details, it could listen on error messages. Let's summarize the situation with 2. (on request bus errors) available: - Bus error interrupts are suppressed unless an application really request them. - If an application listens on error messages, a high interrupt rate could cause the socket buffer to overflow resulting in lost messages. As far as I have seen, this is not yet a real problem but it gets worse when debugging is configured and printk messages are generated: /* Overflow of socket's ring buffer! */ sock->rx_buf_full++; RTCAN_RTDM_DBG("%s: socket buffer overflow (fd=%d), message " "discarded\n", rtcan_proto_raw_dev.driver_name, context->fd); This can indeed hang the system and I tend just to downscale the frequency of the log output by, let's say a factor of 10 or 20 and adding to the log: "Not all overflows are listed. Please inspect /proc/rtcan/sockets!" Concerning 1. (stopping the device after n bus errors): I think this conflicts somehow with 2. because the application explicitly wants to receive them. If it realizes a high rate, it could react appropriately. For the moment I think 2. and downscaled printk's are already be a big improvement and should make most users happy. Let's wait for some real world application requiring solution 1. Wolfgang. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 8:21 ` Sebastian Smolorz 2007-03-19 8:50 ` Sebastian Smolorz 2007-03-19 8:54 ` Wolfgang Grandegger @ 2007-03-19 16:48 ` Stéphane ANCELOT 2007-03-19 16:56 ` Sebastian Smolorz 2 siblings, 1 reply; 38+ messages in thread From: Stéphane ANCELOT @ 2007-03-19 16:48 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai, Jan Kiszka > > IIRC, there is no possibility to detect a "normal" bus error (acknowledge) > appearing during normal operation from the one occuring when the cable is > plugged off. The best indication is a high number of consecutive BEIs. > I do not agree : case normal : In normal bus error condition, if error repeats the chip will go to busoff state (unfortunately I don't know how to simulate this...) case unplugged (easy to simulate): when the cable is not plugged it will not go to busoff condition. The can 2.0 spec says : Start-up / Wake-up: If during system start-up only 1 node is online, and if this node transmits some message, it will get no acknowledgement, detect an error and repeat the message. It can become ’error passive’ but not ’bus off’ due to this reason. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 16:48 ` Stéphane ANCELOT @ 2007-03-19 16:56 ` Sebastian Smolorz 2007-03-19 17:33 ` Jan Kiszka 0 siblings, 1 reply; 38+ messages in thread From: Sebastian Smolorz @ 2007-03-19 16:56 UTC (permalink / raw) To: Stéphane ANCELOT; +Cc: xenomai, Jan Kiszka Stéphane ANCELOT wrote: > > IIRC, there is no possibility to detect a "normal" bus error > > (acknowledge) appearing during normal operation from the one occuring > > when the cable is plugged off. The best indication is a high number of > > consecutive BEIs. > > I do not agree : > case normal : > In normal bus error condition, if error repeats the chip will go to > busoff state (unfortunately I don't know how to simulate this...) > > case unplugged (easy to simulate): > when the cable is not plugged it will not go to busoff condition. I know that. Unfortunately, you took my above answer out of context. I replied to: > What about something like option 3: After the first error occurred that > may mark the beginning of a flood, disable that error interrupt until > the next stop/start cycle or the user has read the event? I was referring to *one* bus error. If one bus error occurs how will the driver know it is the beginning of a flood due to an unplugged cable? It can only tell after serveral consecutive ones. -- Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 16:56 ` Sebastian Smolorz @ 2007-03-19 17:33 ` Jan Kiszka 0 siblings, 0 replies; 38+ messages in thread From: Jan Kiszka @ 2007-03-19 17:33 UTC (permalink / raw) To: Sebastian Smolorz; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 1592 bytes --] Sebastian Smolorz wrote: > Stéphane ANCELOT wrote: >>> IIRC, there is no possibility to detect a "normal" bus error >>> (acknowledge) appearing during normal operation from the one occuring >>> when the cable is plugged off. The best indication is a high number of >>> consecutive BEIs. >> I do not agree : >> case normal : >> In normal bus error condition, if error repeats the chip will go to >> busoff state (unfortunately I don't know how to simulate this...) >> >> case unplugged (easy to simulate): >> when the cable is not plugged it will not go to busoff condition. > > I know that. Unfortunately, you took my above answer out of context. I replied > to: > >> What about something like option 3: After the first error occurred that >> may mark the beginning of a flood, disable that error interrupt until >> the next stop/start cycle or the user has read the event? > > I was referring to *one* bus error. If one bus error occurs how will the > driver know it is the beginning of a flood due to an unplugged cable? It can > only tell after serveral consecutive ones. As damn SJA1000 doesn't seem to help us here, I was suggesting to play safe: disable that particular IRQ source until administrator reset or some task reads the generated error frame AND (that's probably required too) continues to ask for the next one. The latter condition would allow the error frame reader to fully control the IRQ rate. Note that this is not just about avoiding total lock-up. Even a specific period of an abnormal IRQ load can kill an RT system. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-17 11:56 ` Wolfgang Grandegger 2007-03-18 10:22 ` Jan Kiszka @ 2007-03-19 8:49 ` Stéphane ANCELOT 2007-03-19 8:30 ` Wolfgang Grandegger 1 sibling, 1 reply; 38+ messages in thread From: Stéphane ANCELOT @ 2007-03-19 8:49 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: xenomai Hi, only one comment : It is not anbsolutely necessary to disable bus error interrupt " a new bus error in terrupt is not possible until the ecc register is read out once". only disabling reading of ecc in isr will disable new bei generation. Best Regards Steph Wolfgang Grandegger wrote: > Hi Sebastian, > > Sebastian Smolorz wrote: >> Wolfgang Grandegger wrote: >>> Sebastian Smolorz wrote: >>>> Last summer we had a discussion about the BEI issue on the >>>> socketcan-ML. >>>> Two additional handling policies popped up: >>>> 1. The interface could restart itself after an amount of BEIs, thus >>>> taking responsibility from the user application. >>>> 2. The BEI could be completely disabled if no one is interested in this >>>> type of error frame. >>> I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly >>> does not work. :-(. The controller requires a re-start of the device to >>> get the bus error reporting back to work. >> >> Oh, really? I wasn't aware of this. > > Well, I got it working. Reading the ECC register after re-enabling the > bus error interrupts fixed the problem: > > if (CAN_STATE_OPERATING(dev->state)) { > chip->write_reg(dev, SJA_IER, chip->ier); /* update on the fly */ > chip->read_reg(dev, SJA_ECC); > } > >>>> Maybe it is time to think about the implementation of these policies as >>>> more and more users seem to run into the BEI issue with a disconnected >>>> bus. Wolfgang, Jan, what is your opinion? >>> Well, solution 2. with the limitations mentioned above is therefore less >>> attractive because it interrupts the CAN traffic. >> >> True. > > Back to our preferred solution 1. Attached is a patch for review > including some other fixes and suggestions accumulated over time: > > * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt > flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to > enable bus error interrupts "on demand" only if an application is > interested in such errors. It is automatically selected for CAN > controllers supporting bus error interrupts like the SJA1000. > > * include/rtdm/rtcan.h: Add some doc on bus-off and bus-error error > conditions and the restart policy. > > * src/utils/can/rtcanconfig.c: Controller mode settings and doc > has been corrected. > >>> The Socket-CAN implementation actually restarts the CAN controller >>> after a certain >>> amount of bus error interrupts (200 by default) which matches your first >>> policy above. But in RT-Socket-CAN, we do not automatically re-start the >>> device by purpose. Therefore I tend to just stop the device. It's then >>> up to the application to restart it. What do you think? >> >> No fundamental objections but it would be best if an application would >> be informed of this special situation e.g. through an error frame with >> the meaning "controller was stopped because of a disconnected bus >> after trying to send 200 times the same message". >> >> A question pops up in this context: Why do we define CAN_ERR_RESTARTED >> if we never do this? Only to be compatible with Socket-CAN? Then I >> would propose to extend the documentation by pointing out that this >> will not appear under RT-Socket-CAN. > > Let's wait if solution 1. is sufficient. maybe we need 2. later as well. > > Wolfgang. > ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) 2007-03-19 8:49 ` Stéphane ANCELOT @ 2007-03-19 8:30 ` Wolfgang Grandegger 0 siblings, 0 replies; 38+ messages in thread From: Wolfgang Grandegger @ 2007-03-19 8:30 UTC (permalink / raw) To: Stéphane ANCELOT; +Cc: xenomai Stéphane ANCELOT wrote: > Hi, > only one comment : > It is not anbsolutely necessary to disable bus error interrupt " a new > bus error in terrupt is not possible until the ecc register is read out > once". > > only disabling reading of ecc in isr will disable new bei generation. Ah, good hint. It might make the implementation simpler. I have to check. Wolfgang. > Wolfgang Grandegger wrote: >> Hi Sebastian, >> >> Sebastian Smolorz wrote: >>> Wolfgang Grandegger wrote: >>>> Sebastian Smolorz wrote: >>>>> Last summer we had a discussion about the BEI issue on the >>>>> socketcan-ML. >>>>> Two additional handling policies popped up: >>>>> 1. The interface could restart itself after an amount of BEIs, thus >>>>> taking responsibility from the user application. >>>>> 2. The BEI could be completely disabled if no one is interested in >>>>> this >>>>> type of error frame. >>>> I tried to implement 2. for SJA1000, but re-enabling the BIE on the fly >>>> does not work. :-(. The controller requires a re-start of the device to >>>> get the bus error reporting back to work. >>> >>> Oh, really? I wasn't aware of this. >> >> Well, I got it working. Reading the ECC register after re-enabling the >> bus error interrupts fixed the problem: >> >> if (CAN_STATE_OPERATING(dev->state)) { >> chip->write_reg(dev, SJA_IER, chip->ier); /* update on the fly */ >> chip->read_reg(dev, SJA_ECC); >> } >> >>>>> Maybe it is time to think about the implementation of these >>>>> policies as >>>>> more and more users seem to run into the BEI issue with a disconnected >>>>> bus. Wolfgang, Jan, what is your opinion? >>>> Well, solution 2. with the limitations mentioned above is therefore >>>> less >>>> attractive because it interrupts the CAN traffic. >>> >>> True. >> >> Back to our preferred solution 1. Attached is a patch for review >> including some other fixes and suggestions accumulated over time: >> >> * ksrc/drivers/can/*: To avoid unnecessary bus error interrupt >> flooding, the option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now allows to >> enable bus error interrupts "on demand" only if an application is >> interested in such errors. It is automatically selected for CAN >> controllers supporting bus error interrupts like the SJA1000. >> >> * include/rtdm/rtcan.h: Add some doc on bus-off and bus-error error >> conditions and the restart policy. >> >> * src/utils/can/rtcanconfig.c: Controller mode settings and doc >> has been corrected. >> >>>> The Socket-CAN implementation actually restarts the CAN controller >>>> after a certain >>>> amount of bus error interrupts (200 by default) which matches your >>>> first >>>> policy above. But in RT-Socket-CAN, we do not automatically re-start >>>> the >>>> device by purpose. Therefore I tend to just stop the device. It's then >>>> up to the application to restart it. What do you think? >>> >>> No fundamental objections but it would be best if an application >>> would be informed of this special situation e.g. through an error >>> frame with the meaning "controller was stopped because of a >>> disconnected bus after trying to send 200 times the same message". >>> >>> A question pops up in this context: Why do we define >>> CAN_ERR_RESTARTED if we never do this? Only to be compatible with >>> Socket-CAN? Then I would propose to extend the documentation by >>> pointing out that this will not appear under RT-Socket-CAN. >> >> Let's wait if solution 1. is sufficient. maybe we need 2. later as well. >> >> Wolfgang. >> > > ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2007-03-20 6:53 UTC | newest] Thread overview: 38+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-03 14:09 [Xenomai-help] CAN errors and real-time behaviour roland Tollenaar 2007-03-05 8:49 ` Stéphane ANCELOT 2007-03-05 9:26 ` Roland Tollenaar 2007-03-05 10:39 ` [Xenomai-help] CAN errors and real-time behaviour (IRQ raise forever and may lock system) Stéphane ANCELOT 2007-03-05 11:26 ` Sebastian Smolorz 2007-03-05 11:42 ` Roland Tollenaar 2007-03-05 12:01 ` Sebastian Smolorz 2007-03-05 12:16 ` Roland Tollenaar 2007-03-05 12:48 ` Sebastian Smolorz 2007-03-05 13:13 ` Roland Tollenaar 2007-03-05 14:57 ` Stéphane ANCELOT 2007-03-05 14:42 ` Sebastian Smolorz 2007-03-05 17:02 ` Stéphane ANCELOT 2007-03-06 9:36 ` Sebastian Smolorz 2007-03-10 20:53 ` Wolfgang Grandegger 2007-03-14 11:38 ` [Xenomai-help] RT-Socket-CAN bus error handling (was CAN errors and real-time behaviour (IRQ raise forever and may lock system)) Wolfgang Grandegger 2007-03-14 12:51 ` Sebastian Smolorz 2007-03-14 13:18 ` Wolfgang Grandegger 2007-03-14 13:24 ` Sebastian Smolorz 2007-03-17 11:56 ` Wolfgang Grandegger 2007-03-18 10:22 ` Jan Kiszka 2007-03-18 11:33 ` Wolfgang Grandegger 2007-03-18 20:59 ` Jan Kiszka 2007-03-19 8:21 ` Sebastian Smolorz 2007-03-19 8:50 ` Sebastian Smolorz 2007-03-19 11:35 ` Wolfgang Grandegger 2007-03-19 11:46 ` Sebastian Smolorz 2007-03-19 13:05 ` Jan Kiszka 2007-03-19 20:44 ` Wolfgang Grandegger 2007-03-19 21:19 ` Wolfgang Grandegger 2007-03-19 22:25 ` Jan Kiszka 2007-03-20 6:53 ` Wolfgang Grandegger 2007-03-19 8:54 ` Wolfgang Grandegger 2007-03-19 16:48 ` Stéphane ANCELOT 2007-03-19 16:56 ` Sebastian Smolorz 2007-03-19 17:33 ` Jan Kiszka 2007-03-19 8:49 ` Stéphane ANCELOT 2007-03-19 8:30 ` Wolfgang Grandegger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.