Philippe Gerum schrieb:
> On Tue, 2007-02-13 at 19:17 +0100, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Tue, 2007-02-13 at 15:28 +0100, Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> On Tue, 2007-02-13 at 15:07 +0100, Jan Kiszka wrote:
>>>>>> Stephan Zimmermann wrote:
>>>>>>> I wasn't able to isolate the section of my code that causes the crash by
>>>>>>> now. The only thing I figured out by now is that the particular crash
>>>>>>> does not happen with 2.3.x rev 2077.
>>>>>>> So I guess some change from 2077 to the 2139 revision did break something.
>>>>>> Could you track the issue a bit more down? There are not to many
>>>>>> "interesting" changes to 2.3.x. A few milestones I found in the ChangeLog:
>>>>>>
>>>>>> - 2092: Allow sleeping scheduler locks
>>>>>> - 2108: Before RPI rework
>>>>>>
>>>>>> Anything after 2108 only makes sense to dissect when you switch on
>>>>> s,on,off,
>>>> Nope. If this switch is off, RPI is enabled while known to be buggy, right?
>>>>
>>> Btw, RPI was not buggy so so that it could cause crashes; it was failing
>>> to _always_ keep a thread's priority consistent across domain migration,
>>> which is quite different. IOW, do not start switching on RPIDISABLE
>>> blindly when your box goes south, it's most likely unrelated to what has
>>> been fixed recently.
>> Well, I recalled some temporary locking changes on RPI somewhere in this
>> period. My feeling was to better exclude their potential side-effects
>> from the testing rounds.
>>
> 
> This temporary issue that was introduced by #2109 while rewriting the
> RPI support has been fixed by #2139, and caused a debug assertion to
> trigger (btw, Stephan, make sure to enable CONFIG_XENO_OPT_DEBUG when
> testing). So if the tested revisions did include #2139, then we have the
> same crash happening, regardless of whether we use the old or (totally)
> new RPI implementation anyway.
> 
> My point is that we need to be extra careful about not leading people to
> disable RPI blindly (at least in its recent and more complete
> incarnation) while it's actually _that_ feature which happens to prevent
> priority inversions when secondary mode is involved, e.g:
> https://mail.gna.org/public/xenomai-core/2007-01/msg00081.html
> 

Hi again,
we did some testing with the different verions you told us on some 
different Systems. I will try to sum up what we experienced.

Testsystems:
--------------------------------------------
SMP1 - AMDx2 with Kernel 2.6.19
SMP2 - AMDx2 with Kernel 2.6.17
PM1 - Celeron M with Kernel 2.6.17 (ETX Industrial Board)
PM2 - Pentium M with Kernel 2.6.17 (Notebook)

Testet revisions where 2077, 2092, 2108, 2184

The test was running and terminating our application continuously, 
reloading xenomai modules between each start.

Results:
--------------------------------------------
SMP1:
The application runs as intended. While rebooting the system the kernel 
showed some backtraces according to bad paages. Revision 2184 leads to a 
complete crashed  System like I described in the original mail (2187 on 
Kernel 2.6.20 does that too). It is possible to work around this crash 
by disabling priority coupling (activating the option in kernel config).

SMP2:
same as SMP1

PM1:
The application runs fine, but only one time, then the system crashes in 
various ways, most often leaving a backtrace like I posted before on a 
totally frozen console.

PM2:
The Application runs fine, only when rebooting one can see some 
backtraces on the console, the beaviour is equal to the SMP systems, 
without that fatal crashing with 'priority coupling' enabled.

Conclusion:
--------------------------------------------
I guess that there are two bugs causing that effects, one related to 
priority coupling, and  one looks like a wild pointer in kernelspace for 
me. I was unable to produce some demo-code demonstarting the effects 
reliable, but I attached some code causing same Backtraces and crashes 
to appear once in a while (tested on PM1). One will have to run it in a 
loop to see the effect.
(reload modules - run program - reload ...)

Additional I attaced the .config of the PM1 machine, cause this one 
crashes most often.

Greetings, Stephan

P.S.: For running the attaced program, you need a 
magic-parallel-port-interrupt-loopback-device. (Connecting pins 9 and 10 
of the parallel port will do the trick.)