[Xenomai-help] hard lock-up

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-help] hard lock-up
@ 2007-08-09  9:11 andy motten
  2007-08-09  9:42 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-09  9:11 UTC (permalink / raw)
  To: xenomai

[-- Attachment #1.1: Type: text/plain, Size: 3527 bytes --]

hello,

We recently (this week) installed a snapshot of the xenomai 2.3.X branch,
using kernel version 2.6.20.9  and ipipe-1.8-05 patch.  When (or even
sometimes after finishing of) executing the included xeno-test program we
observe (what looks like) random hard lock-ups of the system. We didn't find
any indication of what the problem could be in the kernel messages.

Other symptoms are:
 - When loading the Xenomai modules, we get a warning about the failure of
"enabling SMI workaround".
 - 2 applications (Orocos and CANFestival), which make use of Xenomai, did
not pass their tests.

We investigated the failed test of Orocos further by debugging one of the
tests with gdb, the results are showed below:

        gdb ./event-test

                 Couldn't get registers: No such process.
                 (gdb) bt
                 Cannot fetch general-purpose registers for thread
-1238001984: generic error
                 Cannot fetch general-purpose registers for thread
-1238001984: generic error

The latest trunk (revision 2900) with kernel patch ipipe-1.9-03 and kernel
2.6.22.1 do not provide a solution for this problem. The symptoms stay to
same.

When we use the Xenomai 2.2.x branch (Last Changed Rev: 2358) with kernel
patch ipipe-1.5-02 and kernel 2.6.17.14, everything seems to be ok (except
for the SMI workaround problem).

After executing xeno-test, I printed the message buffer of the kernel (with
dmesg):

        For Kernel 2.6.22.1 - ipipe-1.9-03 - Xenomai 2.3.x
         ---------------------------------------------------
        Xenomai: SMI-enabled chipset found, enabling SMI workaround.
        Xenomai: SMI workaround failed!
        Xenomai: starting native API services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: starting POSIX services.
        Xenomai: starting RTDM services.
        Xenomai: POSIX: destroyed thread ec961320
        Xenomai: stopping RTDM services.
        Xenomai: stopping POSIX services.
        Xenomai: starting POSIX services.
        Xenomai: POSIX: destroyed thread ec961320
        Xenomai: stopping POSIX services.

        For Kernel 2.6.17.14 - ipipe-1.5-02 - Xenomai 2.2.x
         ----------------------------------------------------
        Xenomai: SMI-enabled chipset found, enabling SMI workaround.
        Xenomai: SMI workaround failed!
        Xenomai: starting native API services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: starting POSIX services.
        Xenomai: starting RTDM services.
        Xenomai: stopping RTDM services.
        Xenomai: stopping POSIX services.
        Xenomai: starting POSIX services.
        Xenomai: stopping POSIX services.

There seems to be a difference between the 2 versions (Xenomai: POSIX:
destroyed thread), Is this normal or part of the problem ?

Is there a possibility that this is a bug in Xenomai or in the ipipe patch?
Using the Xenomai 2.2.x branch is not an option since we need the RT can
drivers provided in the Xenomai 2.3.x branch.

Thanks in advance, andy

ps: in attachment are the .config files of the 2 kernels en the config files
of Xenomai

[-- Attachment #1.2: Type: text/html, Size: 5909 bytes --]

[-- Attachment #2: config_files.zip --]
[-- Type: application/zip, Size: 51715 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-09  9:11 [Xenomai-help] hard lock-up andy motten
@ 2007-08-09  9:42 ` Gilles Chanteperdrix
  2007-08-09 11:24   ` Jan Kiszka
  0 siblings, 1 reply; 25+ messages in thread
From: Gilles Chanteperdrix @ 2007-08-09  9:42 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai

On 8/9/07, andy motten <andy.motten@domain.hid> wrote:
> There seems to be a difference between the 2 versions (Xenomai: POSIX:
> destroyed thread), Is this normal or part of the problem ?

No, this is normal, this message is harmless, it only appears if you
select posix skin debugging.

> Is there a possibility that this is a bug in Xenomai or in the ipipe patch?
> Using the Xenomai 2.2.x branch is not an option since we need the RT can
> drivers provided in the Xenomai 2.3.x branch.

There is certainly a bug somewhere. Could you enable Linux debugs, its
NMI watchdog (by adding nmi_watchdog=1 on the kernel command line), as
well as Xenomai debugs ? In order to find where the lock-up occurs.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-09  9:42 ` Gilles Chanteperdrix
@ 2007-08-09 11:24   ` Jan Kiszka
  2007-08-09 16:09     ` andy motten
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kiszka @ 2007-08-09 11:24 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 987 bytes --]

Gilles Chanteperdrix wrote:
> On 8/9/07, andy motten <andy.motten@domain.hid> wrote:
>> There seems to be a difference between the 2 versions (Xenomai: POSIX:
>> destroyed thread), Is this normal or part of the problem ?
> 
> No, this is normal, this message is harmless, it only appears if you
> select posix skin debugging.
> 
>> Is there a possibility that this is a bug in Xenomai or in the ipipe patch?
>> Using the Xenomai 2.2.x branch is not an option since we need the RT can
>> drivers provided in the Xenomai 2.3.x branch.
> 
> There is certainly a bug somewhere. Could you enable Linux debugs, its
> NMI watchdog (by adding nmi_watchdog=1 on the kernel command line), as
> well as Xenomai debugs ? In order to find where the lock-up occurs.
> 

And have a look at linux/Documentation/serial-console.txt to capture the
"last words" (without having to take pictures of your monitor) in case
the lock-up remains hard even with debugging switches on.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-09 11:24   ` Jan Kiszka
@ 2007-08-09 16:09     ` andy motten
  2007-08-09 16:22       ` Philippe Gerum
  2007-08-09 16:26       ` Gilles Chanteperdrix
  0 siblings, 2 replies; 25+ messages in thread
From: andy motten @ 2007-08-09 16:09 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 3525 bytes --]

Hello, thanks for the fast response.

We have followed your advice:
 - enabled the linux debugs, nmi_watchdog debugs and Xenomai debugs.
 - make use of a serial console.

The linux kernel used now is 2.6.20.9 with ipipe patch 1.8-08.
We use branch 2.3.x of Xenomai (Last changed Revision 2898)

While running xeno-test, the system locks-up just after the latency test.
(below is a small sample of the kernel messages received through the serial
connection, the complete kernel message is included in attachment)

The kernel message is sometimes cut off after "NMI early shots: 0". So we
don't always get all the messages through.

thanks and greetings, andy


NMI watchdog detected timer latency above 100 us
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
NMI early shots: 0
 [<c027887f>] nmi_stack_correct+0x26/0x2b
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c01ce319>] delay_tsc+0x5/0x13
 [<c01154ab>] panic+0x104/0x119
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c01188f9>] do_exit+0x717/0x8d3
 [<c01cdfa6>] vscnprintf+0x14/0x22
 [<c0115ee7>] printk+0x9b/0xab
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c0104fc6>] die_nmi+0xe9/0xf0
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c021ded3>] rthal_latency_above_max+0x38/0x41
 [<c01cdfa6>] vscnprintf+0x14/0x22
 [<c0115ee7>] printk+0x9b/0xab
 [<c021e427>] rthal_nmi_watchdog_tick+0x51/0x194
 [<c01046e5>] do_nmi+0x75/0x296
 [<c010d42a>] __ipipe_handle_irq+0x76/0x1c8
 [<c027887f>] nmi_stack_correct+0x26/0x2b
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c01ce319>] delay_tsc+0x5/0x13
 [<c01154ab>] panic+0x104/0x119
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c01188f9>] do_exit+0x717/0x8d3
 [<c01cdfa6>] vscnprintf+0x14/0x22
 [<c0115ee7>] printk+0x9b/0xab
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c0104fc6>] die_nmi+0xe9/0xf0
 [<c0212746>] i8042_panic_blink+0x46/0x135
 [<c021ded3>] rthal_latency_above_max+0x38/0x41
 [<c01cdfa6>] vscnprintf+0x14/0x22
 [<c0115ee7>] printk+0x9b/0xab
 [<c021e427>] rthal_nmi_watchdog_tick+0x51/0x194
 [<c01046e5>] do_nmi+0x75/0x296
 [<c010d42a>] __ipipe_handle_irq+0x76/0x1c8

2007/8/9, Jan Kiszka <jan.kiszka@domain.hid>:
>
> Gilles Chanteperdrix wrote:
> > On 8/9/07, andy motten <andy.motten@domain.hid> wrote:
> >> There seems to be a difference between the 2 versions (Xenomai: POSIX:
> >> destroyed thread), Is this normal or part of the problem ?
> >
> > No, this is normal, this message is harmless, it only appears if you
> > select posix skin debugging.
> >
> >> Is there a possibility that this is a bug in Xenomai or in the ipipe
> patch?
> >> Using the Xenomai 2.2.x branch is not an option since we need the RT
> can
> >> drivers provided in the Xenomai 2.3.x branch.
> >
> > There is certainly a bug somewhere. Could you enable Linux debugs, its
> > NMI watchdog (by adding nmi_watchdog=1 on the kernel command line), as
> > well as Xenomai debugs ? In order to find where the lock-up occurs.
> >
>
> And have a look at linux/Documentation/serial-console.txt to capture the
> "last words" (without having to take pictures of your monitor) in case
> the lock-up remains hard even with debugging switches on.
>
> Jan
>
>
>

[-- Attachment #1.2: Type: text/html, Size: 4580 bytes --]

[-- Attachment #2: kernel-2.6.10.9-ipipe-1.8-08_gdb_latency_hardlockup.zip --]
[-- Type: application/zip, Size: 8534 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-09 16:09     ` andy motten
@ 2007-08-09 16:22       ` Philippe Gerum
  2007-08-10  7:32         ` Jan Kiszka
  2007-08-09 16:26       ` Gilles Chanteperdrix
  1 sibling, 1 reply; 25+ messages in thread
From: Philippe Gerum @ 2007-08-09 16:22 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai, Jan Kiszka

On Thu, 2007-08-09 at 18:09 +0200, andy motten wrote:
> Hello, thanks for the fast response.
> 
> We have followed your advice:
>  - enabled the linux debugs, nmi_watchdog debugs and Xenomai debugs.
>  - make use of a serial console.
> 
> The linux kernel used now is 2.6.20.9 with ipipe patch 1.8-08.
> We use branch 2.3.x of Xenomai (Last changed Revision 2898)
> 
> While running xeno-test, the system locks-up just after the latency
> test.

Confirmed here. Houston, we do have a problem with the switchtest, or
soimething this test triggers (and not necessarily the FPU mgmt)...

Could you try switching on the Xenomai watchdog knob from the nucleus
debug options and see if something changes? TIA,

> (below is a small sample of the kernel messages received through the
> serial connection, the complete kernel message is included in
> attachment) 
> 
> The kernel message is sometimes cut off after "NMI early shots: 0". So
> we don't always get all the messages through.
> 
> thanks and greetings, andy
> 
> 
> NMI watchdog detected timer latency above 100 us 
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0 
> NMI early shots: 0
> NMI early shots: 0
> NMI early shots: 0
>  [<c027887f>] nmi_stack_correct+0x26/0x2b
>  [<c0212746>] i8042_panic_blink+0x46/0x135
>  [<c01ce319>] delay_tsc+0x5/0x13
>  [<c01154ab>] panic+0x104/0x119 
>  [<c0212746>] i8042_panic_blink+0x46/0x135
>  [<c01188f9>] do_exit+0x717/0x8d3
>  [<c01cdfa6>] vscnprintf+0x14/0x22
>  [<c0115ee7>] printk+0x9b/0xab
>  [<c0212746>] i8042_panic_blink+0x46/0x135 
>  [<c0104fc6>] die_nmi+0xe9/0xf0
>  [<c0212746>] i8042_panic_blink+0x46/0x135
>  [<c021ded3>] rthal_latency_above_max+0x38/0x41
>  [<c01cdfa6>] vscnprintf+0x14/0x22
>  [<c0115ee7>] printk+0x9b/0xab 
>  [<c021e427>] rthal_nmi_watchdog_tick+0x51/0x194
>  [<c01046e5>] do_nmi+0x75/0x296
>  [<c010d42a>] __ipipe_handle_irq+0x76/0x1c8
>  [<c027887f>] nmi_stack_correct+0x26/0x2b
>  [<c0212746>] i8042_panic_blink+0x46/0x135 
>  [<c01ce319>] delay_tsc+0x5/0x13
>  [<c01154ab>] panic+0x104/0x119
>  [<c0212746>] i8042_panic_blink+0x46/0x135
>  [<c01188f9>] do_exit+0x717/0x8d3
>  [<c01cdfa6>] vscnprintf+0x14/0x22 
>  [<c0115ee7>] printk+0x9b/0xab
>  [<c0212746>] i8042_panic_blink+0x46/0x135
>  [<c0104fc6>] die_nmi+0xe9/0xf0
>  [<c0212746>] i8042_panic_blink+0x46/0x135
>  [<c021ded3>] rthal_latency_above_max+0x38/0x41 
>  [<c01cdfa6>] vscnprintf+0x14/0x22
>  [<c0115ee7>] printk+0x9b/0xab
>  [<c021e427>] rthal_nmi_watchdog_tick+0x51/0x194
>  [<c01046e5>] do_nmi+0x75/0x296
>  [<c010d42a>] __ipipe_handle_irq+0x76/0x1c8 
> 
> 2007/8/9, Jan Kiszka <jan.kiszka@domain.hid>:
>         Gilles Chanteperdrix wrote:
>         > On 8/9/07, andy motten <andy.motten@domain.hid> wrote:
>         >> There seems to be a difference between the 2 versions
>         (Xenomai: POSIX: 
>         >> destroyed thread), Is this normal or part of the problem ?
>         >
>         > No, this is normal, this message is harmless, it only
>         appears if you
>         > select posix skin debugging.
>         >
>         >> Is there a possibility that this is a bug in Xenomai or in
>         the ipipe patch? 
>         >> Using the Xenomai 2.2.x branch is not an option since we
>         need the RT can
>         >> drivers provided in the Xenomai 2.3.x branch.
>         >
>         > There is certainly a bug somewhere. Could you enable Linux
>         debugs, its 
>         > NMI watchdog (by adding nmi_watchdog=1 on the kernel command
>         line), as
>         > well as Xenomai debugs ? In order to find where the lock-up
>         occurs.
>         >
>         
>         And have a look at linux/Documentation/serial-console.txt to
>         capture the
>         "last words" (without having to take pictures of your monitor)
>         in case
>         the lock-up remains hard even with debugging switches on.
>         
>         Jan
>         
>         
> 
> _______________________________________________
> Xenomai-help mailing list
> Xenomai-help@domain.hid
> https://mail.gna.org/listinfo/xenomai-help
-- 
Philippe.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-09 16:09     ` andy motten
  2007-08-09 16:22       ` Philippe Gerum
@ 2007-08-09 16:26       ` Gilles Chanteperdrix
  1 sibling, 0 replies; 25+ messages in thread
From: Gilles Chanteperdrix @ 2007-08-09 16:26 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai, Jan Kiszka

On 8/9/07, andy motten <andy.motten@domain.hid> wrote:
> Hello, thanks for the fast response.
>
> We have followed your advice:
>  - enabled the linux debugs, nmi_watchdog debugs and Xenomai debugs.
>  - make use of a serial console.

Please disable Xenomai nmi watchdog, it does not seem to work
correctly on your machine.

-- 
                                               Gilles Chanteperdrix


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-09 16:22       ` Philippe Gerum
@ 2007-08-10  7:32         ` Jan Kiszka
  2007-08-10  7:54           ` Klaas Gadeyne
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kiszka @ 2007-08-10  7:32 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 891 bytes --]

Philippe Gerum wrote:
> On Thu, 2007-08-09 at 18:09 +0200, andy motten wrote:
>> Hello, thanks for the fast response.
>>
>> We have followed your advice:
>>  - enabled the linux debugs, nmi_watchdog debugs and Xenomai debugs.
>>  - make use of a serial console.
>>
>> The linux kernel used now is 2.6.20.9 with ipipe patch 1.8-08.
>> We use branch 2.3.x of Xenomai (Last changed Revision 2898)
>>
>> While running xeno-test, the system locks-up just after the latency
>> test.
> 
> Confirmed here. Houston, we do have a problem with the switchtest, or
> soimething this test triggers (and not necessarily the FPU mgmt)...

Any news on this? Does it already happen by just running switchtest
stand-alone (it doesn't do so here at least)? Is this a pure 2.3.x
issue, or is it also visible for 2.4? /me currently rebuilds some trees
to assess the local situation.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-10  7:32         ` Jan Kiszka
@ 2007-08-10  7:54           ` Klaas Gadeyne
  2007-08-10 15:05             ` andy motten
  0 siblings, 1 reply; 25+ messages in thread
From: Klaas Gadeyne @ 2007-08-10  7:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Fri, 10 Aug 2007, Jan Kiszka wrote:
> Philippe Gerum wrote:
>>> The linux kernel used now is 2.6.20.9 with ipipe patch 1.8-08.
>>> We use branch 2.3.x of Xenomai (Last changed Revision 2898)
>>>
>>> While running xeno-test, the system locks-up just after the latency
>>> test.
>>
>> Confirmed here. Houston, we do have a problem with the switchtest, or
>> soimething this test triggers (and not necessarily the FPU mgmt)...
>
> Any news on this? Does it already happen by just running switchtest
> stand-alone (it doesn't do so here at least)? Is this a pure 2.3.x
> issue, or is it also visible for 2.4? /me currently rebuilds some trees
> to assess the local situation.

Andy will probably continue testing today, so more info will follow, but (see
<https://mail.gna.org/public/xenomai-help/2007-08/msg00044.html>) at
first sight we encountered similar issues on trunk (2.2.x seems unaffected).

Klaas



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-10  7:54           ` Klaas Gadeyne
@ 2007-08-10 15:05             ` andy motten
  2007-08-10 15:12               ` Jan Kiszka
  0 siblings, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-10 15:05 UTC (permalink / raw)
  To: Klaas Gadeyne; +Cc: xenomai, Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

2007/8/10, Klaas Gadeyne <klaas.gadeyne@domain.hid>:
>
> On Fri, 10 Aug 2007, Jan Kiszka wrote:
> > Philippe Gerum wrote:
> >>> The linux kernel used now is 2.6.20.9 with ipipe patch 1.8-08.
> >>> We use branch 2.3.x of Xenomai (Last changed Revision 2898)
> >>>
> >>> While running xeno-test, the system locks-up just after the latency
> >>> test.
> >>
> >> Confirmed here. Houston, we do have a problem with the switchtest, or
> >> soimething this test triggers (and not necessarily the FPU mgmt)...
> >
> > Any news on this? Does it already happen by just running switchtest
> > stand-alone (it doesn't do so here at least)? Is this a pure 2.3.x
> > issue, or is it also visible for 2.4? /me currently rebuilds some trees
> > to assess the local situation.
>
> Andy will probably continue testing today, so more info will follow, but
> (see
> <https://mail.gna.org/public/xenomai-help/2007-08/msg00044.html>) at
> first sight we encountered similar issues on trunk (2.2.x seems
> unaffected).
>
> Klaas

We have disabled NMI watchdog like Gilles Chanteperdrix suggested in his
previous message.

For the moment We have no hard locks anymore. The linux kernel used is still
2.6.20.9 with ipipe patch 1.8-08.
We use branch 2.3.x of Xenomai (Last changed Revision 2898). The only thing
We have changed is turning off the "NMI watchdog".

There is no obvious reason why this happened (according to us), but it may
have something to do with the AGP bus (we turned it OFF in this kernel, it
was already turned OFF in the previous tests, but it was ON in the first
tests).

For seeing if this system will not hard lock anymore, the system will stay
on this weekend (continuously running xeno-tests with high loads). We will
inform you of the results after the weekend.

thanks and greetings, andy

[-- Attachment #2: Type: text/html, Size: 2343 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-10 15:05             ` andy motten
@ 2007-08-10 15:12               ` Jan Kiszka
  2007-08-13  7:06                 ` Klaas Gadeyne
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kiszka @ 2007-08-10 15:12 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

andy motten wrote:
> 2007/8/10, Klaas Gadeyne <klaas.gadeyne@domain.hid>:
>> On Fri, 10 Aug 2007, Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>>> The linux kernel used now is 2.6.20.9 with ipipe patch 1.8-08.
>>>>> We use branch 2.3.x of Xenomai (Last changed Revision 2898)
>>>>>
>>>>> While running xeno-test, the system locks-up just after the latency
>>>>> test.
>>>> Confirmed here. Houston, we do have a problem with the switchtest, or
>>>> soimething this test triggers (and not necessarily the FPU mgmt)...
>>> Any news on this? Does it already happen by just running switchtest
>>> stand-alone (it doesn't do so here at least)? Is this a pure 2.3.x
>>> issue, or is it also visible for 2.4? /me currently rebuilds some trees
>>> to assess the local situation.
>> Andy will probably continue testing today, so more info will follow, but
>> (see
>> <https://mail.gna.org/public/xenomai-help/2007-08/msg00044.html>) at
>> first sight we encountered similar issues on trunk (2.2.x seems
>> unaffected).
>>
>> Klaas
> 
> 
> We have disabled NMI watchdog like Gilles Chanteperdrix suggested in his
> previous message.
> 
> For the moment We have no hard locks anymore. The linux kernel used is still
> 2.6.20.9 with ipipe patch 1.8-08.
> We use branch 2.3.x of Xenomai (Last changed Revision 2898). The only thing
> We have changed is turning off the "NMI watchdog".
> 
> There is no obvious reason why this happened (according to us), but it may
> have something to do with the AGP bus (we turned it OFF in this kernel, it
> was already turned OFF in the previous tests, but it was ON in the first
> tests).

Yet another broken x86 system...?

> 
> For seeing if this system will not hard lock anymore, the system will stay
> on this weekend (continuously running xeno-tests with high loads). We will
> inform you of the results after the weekend.

BTW, given that the SMI workaround is infunctional on your system: Do
you get reasonable latencies ATM? It looked to me like the NMI watchdog
might have triggered due to SMI-related delays.

[/me still wonders what Philippe found...]

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-10 15:12               ` Jan Kiszka
@ 2007-08-13  7:06                 ` Klaas Gadeyne
  2007-08-13  7:19                   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 25+ messages in thread
From: Klaas Gadeyne @ 2007-08-13  7:06 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

On Fri, 10 Aug 2007, Jan Kiszka wrote:
[..]
> BTW, given that the SMI workaround is infunctional on your system: Do
> you get reasonable latencies ATM? It looked to me like the NMI watchdog
> might have triggered due to SMI-related delays.

We do get very reasonable latencies once the AGP functionality is
turned off (otherwise, latencies are pathetic).  Note that we haven't
sorted out whether the buggy behaviour and huge latencies are due to
the AGP bus in general or too the specific intel_something_els_here
driver.

Klaas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-13  7:06                 ` Klaas Gadeyne
@ 2007-08-13  7:19                   ` Gilles Chanteperdrix
  2007-08-13 15:10                     ` andy motten
  0 siblings, 1 reply; 25+ messages in thread
From: Gilles Chanteperdrix @ 2007-08-13  7:19 UTC (permalink / raw)
  To: Klaas Gadeyne; +Cc: xenomai, Jan Kiszka

Klaas Gadeyne wrote:
 > On Fri, 10 Aug 2007, Jan Kiszka wrote:
 > [..]
 > > BTW, given that the SMI workaround is infunctional on your system: Do
 > > you get reasonable latencies ATM? It looked to me like the NMI watchdog
 > > might have triggered due to SMI-related delays.
 > 
 > We do get very reasonable latencies once the AGP functionality is
 > turned off (otherwise, latencies are pathetic).  Note that we haven't
 > sorted out whether the buggy behaviour and huge latencies are due to
 > the AGP bus in general or too the specific intel_something_els_here
 > driver.

Did you try the "noaccel" option, as explained in the FAQ ?

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-13  7:19                   ` Gilles Chanteperdrix
@ 2007-08-13 15:10                     ` andy motten
  2007-08-13 17:01                       ` Jan Kiszka
  0 siblings, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-13 15:10 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

> Klaas Gadeyne wrote:
> > On Fri, 10 Aug 2007, Jan Kiszka wrote:
> > [..]
> > > BTW, given that the SMI workaround is infunctional on your system: Do
> > > you get reasonable latencies ATM? It looked to me like the NMI
> watchdog
> > > might have triggered due to SMI-related delays.
> >
> > We do get very reasonable latencies once the AGP functionality is
> > turned off (otherwise, latencies are pathetic).  Note that we haven't
> > sorted out whether the buggy behaviour and huge latencies are due to
> > the AGP bus in general or too the specific intel_something_els_here
> > driver.
>
> Did you try the "noaccel" option, as explained in the FAQ ?
>

Yes, but there was no change in performance.

We have tried several configurations of the system, and we get no hard
lockups anymore.
Even the configurations that used to lockup are working now without a
problem.

Maybe we forget something, but to be honest, we have no idea what triggered
this change in behavior.
Thanks for all the help.
As soon as we know something more about this 'random' behavior, We will post
it.

greetings, andy

[-- Attachment #2: Type: text/html, Size: 1430 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-13 15:10                     ` andy motten
@ 2007-08-13 17:01                       ` Jan Kiszka
  2007-08-14 15:26                         ` andy motten
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kiszka @ 2007-08-13 17:01 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1421 bytes --]

andy motten wrote:
>> Klaas Gadeyne wrote:
>>> On Fri, 10 Aug 2007, Jan Kiszka wrote:
>>> [..]
>>>> BTW, given that the SMI workaround is infunctional on your system: Do
>>>> you get reasonable latencies ATM? It looked to me like the NMI
>> watchdog
>>>> might have triggered due to SMI-related delays.
>>> We do get very reasonable latencies once the AGP functionality is
>>> turned off (otherwise, latencies are pathetic).  Note that we haven't
>>> sorted out whether the buggy behaviour and huge latencies are due to
>>> the AGP bus in general or too the specific intel_something_els_here
>>> driver.
>> Did you try the "noaccel" option, as explained in the FAQ ?
>>
> 
> Yes, but there was no change in performance.

You mean latency?

> 
> We have tried several configurations of the system, and we get no hard
> lockups anymore.
> Even the configurations that used to lockup are working now without a
> problem.
> 
> Maybe we forget something, but to be honest, we have no idea what triggered
> this change in behavior.
> Thanks for all the help.
> As soon as we know something more about this 'random' behavior, We will post
> it.

And the pathetic latencies might be worth a look using the latency
tracer. Just to be sure it's no software issue. Is there any known
non-broken real-time Linux installation for your board, or did you just
try it out for the first time?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-13 17:01                       ` Jan Kiszka
@ 2007-08-14 15:26                         ` andy motten
  2007-08-27 13:27                           ` andy motten
  0 siblings, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-14 15:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]

2007/8/13, Jan Kiszka <jan.kiszka@domain.hid>:
>
> andy motten wrote:
> >> Klaas Gadeyne wrote:
> >>> On Fri, 10 Aug 2007, Jan Kiszka wrote:
> >>> [..]
> >>>> BTW, given that the SMI workaround is infunctional on your system: Do
> >>>> you get reasonable latencies ATM? It looked to me like the NMI
> >> watchdog
> >>>> might have triggered due to SMI-related delays.
> >>> We do get very reasonable latencies once the AGP functionality is
> >>> turned off (otherwise, latencies are pathetic).  Note that we haven't
> >>> sorted out whether the buggy behaviour and huge latencies are due to
> >>> the AGP bus in general or too the specific intel_something_els_here
> >>> driver.
> >> Did you try the "noaccel" option, as explained in the FAQ ?
> >>
> >
> > Yes, but there was no change in performance.
>
> You mean latency?


Yes, I meant latency. There is no difference with or without the option
"noaccel".

>
> > We have tried several configurations of the system, and we get no hard
> > lockups anymore.
> > Even the configurations that used to lockup are working now without a
> > problem.
> >
> > Maybe we forget something, but to be honest, we have no idea what
> triggered
> > this change in behavior.
> > Thanks for all the help.
> > As soon as we know something more about this 'random' behavior, We will
> post
> > it.


The lock-ups are back (very randomly). Disabling the AGP bus didn't make a
difference.

And the pathetic latencies might be worth a look using the latency
> tracer.


I will run continuously several tests on the pc (including "latency -f") for
the rest of this week. Since I am not in the office during this period  (so
not in the neighborhood of this problematic pc).
And I hope (I hope not in vain) that the latency tracer will give us a hint
for the reason of the hard lock ups (if a hard lock up happens during this
period).

Just to be sure it's no software issue. Is there any known
> non-broken real-time Linux installation for your board, or did you just
> try it out for the first time?


This is the first time that Xenomai has been installed AND tested thoroughly
on this pc.

andy

[-- Attachment #2: Type: text/html, Size: 3131 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-14 15:26                         ` andy motten
@ 2007-08-27 13:27                           ` andy motten
  2007-08-27 16:55                             ` Jan Kiszka
  0 siblings, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-27 13:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 10230 bytes --]

>
>
> I will run continuously several tests on the pc (including "latency -f")
> for the rest of this week. Since I am not in the office during this period
> (so not in the neighborhood of this problematic pc).
> And I hope (I hope not in vain) that the latency tracer will give us a
> hint for the reason of the hard lock ups (if a hard lock up happens during
> this period).
>
> andy
>


Hello,

Since  we are having a hard time finding the hard lock-ups. We have taken a
closer look at the failed tests of orocos (maybe the source of the problem
is the same). These failures occur during the make check execution.

    The following tests FAILED:
        2 - task-test (OTHER-FAULT)
        3 - event-test (OTHER-FAULT)
        4 - taskcontext-test (OTHER-FAULT)

When we perform a single test, e.g task-test, we get the the following
messages: Killed
The OROCOS messages are then:

0.000 [ Info   ][Logger] Successfully extracted environment variable
ORO_LOGLEVEL
0.001 [ Info   ][Logger]  OROCOS version '1.2.1' compiled with GCC
4.1.2.Orocos Logging Activated at level : [ Debug  ] ( 6 )
0.001 [ Info   ][Logger] Reference System Time is : 880886725351 ticks (
315.369 seconds ).
0.002 [ Info   ][Logger] Logging is relative to this time.
0.002 [ Info   ][Logger] Xenomai Periodic Timer runs in preemptive
'one-shot' mode.
0.003 [ Debug  ][Logger] Xenomai Timer and Main Task Created
0.003 [ Debug  ][Logger] MainThread started.
0.003 [ Debug  ][Logger] Starting StartStopManager.
0.004 [ Info   ][Toolkit] Loading Tool RealTime.
0.005 [ Debug  ][Toolkit] Registered Type 'int' to the Orocos Type System.
0.005 [ Debug  ][Toolkit] Registered Type 'uint' to the Orocos Type System.
0.006 [ Debug  ][Toolkit] Registered Type 'double' to the Orocos Type
System.
0.006 [ Debug  ][Toolkit] Registered Type 'bool' to the Orocos Type System.
0.006 [ Debug  ][Toolkit] Registered Type 'PropertyBag' to the Orocos Type
System.
0.007 [ Debug  ][Toolkit] Registered Type 'float' to the Orocos Type System.
0.007 [ Debug  ][Toolkit] Registered Type 'char' to the Orocos Type System.
0.008 [ Debug  ][Toolkit] Registered Type 'array' to the Orocos Type System.
0.008 [ Debug  ][Toolkit] Registered Type 'string' to the Orocos Type
System.
0.010 [ Debug  ][./task-test::main()] ORO_main starting...
0.010 [ Info   ][./task-test::main()] LogLevel unaltered by test-runner.
0.011 [ Info   ][./task-test::main()] Creating PeriodicThread for scheduler:
0
0.012 [ Info   ][TimerThreadInstance] PeriodicThread created with scheduler
type '0', priority 15 and period 0.01.
0.013 [ Debug  ][Logger] Periodic Thread TimerThreadInstance started.
0.014 [ Info   ][PThread] PeriodicThread created with scheduler type '0',
priority 99 and period 0.1.
0.014 [ Debug  ][Logger] Periodic Thread PThread started.
0.115 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
0.115 [ Debug  ][Logger] Periodic Thread PThread started.
1.216 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
1.216 [ Debug  ][~PeriodicThread] Terminating PThread

On the serial console we get the following listing (complete listing in
appendix):

Xenomai: starting native API services.
I-pipe: Detected illicit call from domain 'Xenomai'
        into a service reserved for domain 'Linux' and below.
       f635be74 00000000 00000000 52544149 f635be98 c0104789 c02cfa4f
c02f5b80
       f6c4e2f0 f635beb0 c0137d69 c02c256c c02c1186 c02c01b8 f8c0b280
f635bebc
       c0132981 f60a1730 f635bed8 f8bd8570 c010ef8c 00000000 f60a0120
f8beefe0
Call Trace:
 [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
 [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
 [<c0104789>] show_stack+0x2f/0x36
 [<c0137d69>] ipipe_check_context+0x7a/0x81
 [<c0132981>] module_put+0x19/0x7d
 [<f8bd8570>] xnshadow_unmap+0xbc/0xff [xeno_nucleus]
 [<f8bfdc75>] __shadow_delete_hook+0x25/0x27 [xeno_native]
 [<f8bd1454>] xnpod_delete_thread+0x1b9/0x2aa [xeno_nucleus]
 [<f8bfc36b>] rt_task_delete+0x140/0x145 [xeno_native]
 [<f8bfe02a>] __rt_task_delete+0x58/0x69 [xeno_native]
 [<f8bd8165>] hisyscall_event+0x185/0x291 [xeno_nucleus]
 [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
 [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
 [<c0102e79>] system_call+0x29/0x41
 =======================
I-pipe tracer log (30 points):
 | # *func                    0 ipipe_trace_panic_freeze+0x9
(ipipe_check_context+0x3f)
 | # *func                    0 ipipe_check_context+0xc (module_put+0x19)
 | # *func                    0 module_put+0x9 (xnshadow_unmap+0xbc
[xeno_nucleus])
 | # *func                    0 xnshadow_unmap+0xe [xeno_nucleus]
(__shadow_delete_hook+0x25 [xeno_native])
 | # *func                   -1 __shadow_delete_hook+0x8 [xeno_native]
(xnpod_delete_thread+0x1b9 [xeno_nucleus])
 | # *func                   -1 xnsynch_release_all_ownerships+0xe
[xeno_nucleus] (xnpod_delete_thread+0x157 [xeno_nucleus])
 | # *func                   -1 xntimer_do_stop_aperiodic+0xe [xeno_nucleus]
(xnpod_delete_thread+0x2a1 [xeno_nucleus])
 | # *func                   -2 xnpod_delete_thread+0xe [xeno_nucleus]
(rt_task_delete+0x140 [xeno_native])
 | # *func                   -2 __ipipe_schedule_irq+0xe
(rthal_apc_schedule+0x83)
 | # *func                   -3 rthal_apc_schedule+0xa
(schedule_linux_call+0x98 [xeno_nucleus])
 | # *func                   -3 schedule_linux_call+0xe [xeno_nucleus]
(xnshadow_send_sig+0x20 [xeno_nucleus])
 | # *func                   -3 xnshadow_send_sig+0x9 [xeno_nucleus]
(rt_task_delete+0x138 [xeno_native])
 | # *func                   -3 __native_task_safewait+0xb [xeno_native]
(rt_task_delete+0xd1 [xeno_native])
 | + *begin   0x80000000     -3 rt_task_delete+0xc8 [xeno_native]
(__rt_task_delete+0x58 [xeno_native])
   + *func                   -4 rt_task_delete+0xe [xeno_native]
(__rt_task_delete+0x58 [xeno_native])
 | + *end     0x80000000     -4 __ipipe_restore_pipelineno_nucleus])
 | # *func                   -4 __ipipe_restore_pipeline_head+0xd
(xnregistry_fetch+0x5f [xeno_nucleus])
 | + *begin   0x80000000     -4 xnregistry_fetch+0x83 [xeno_nucleus]
(__rt_task_delete+0x4a [xeno_native])
   + *func                   -4 xnregistry_fetch+0x9 [xeno_nucleus]
(__rt_task_delete+0x4a [xeno_native])
   + *func                   -5 __copy_from_user_ll_nozero+0xa
(__rt_task_delete+0             -5 __rt_task_delete+0xc [xeno_native]
(hisyscall_event+0x185 [xeno_nucleus])
   + *func                   -5 hisyscall_event+0xe [xeno_nucleus]
(__ipipe_dispatch_event+0xc0)
 | + *end     0x80000001     -5 __ipipe_dispatch_event+0x1a2
(__ipipe_syscall_root+0x43)
 | + *begin   0x80000001     -5 __ipipe_dispatch_event+0x1b2
(__ipipe_syscall_root+0x43)
   + *func                   -6 __ipipe_dispatch_event+0xe
(__ipipe_syscall_root+0x43)
   + *func                   -6 __ipipe_syscall_root+0xa (system_call+0x29)
 | + *end     0x80000001    -18 __ipipe_dispatch_event+0x169
(__ipipe_syscall_root+0x43)
 | + *begin   0x80000001    -18 __ipipe_dispatch_event+0x193
(__ipipe_syscall_root+0x43)
 | + *end     0x80000000    -18 __ipipe_restore_pipeline_head+0x6d
(rt_sem_p+0x61 [xeno_native])
 | # *func                  -18 __ipipe_restore_pipeline_head+0xd
(rt_sem_p+0x61 [xeno_native])
BUG: unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
f8bfc99c
*pde = 00000000
Oops: 0002 [#1]
PREEMPT
Modules linked in: xeno_native xeno_nucleus nfs lockd nfs_acl sunrpc ipv6 lp
parport usbkbd usbmouse usbhid e100 mii evdev psmouse ehci_hcd uhci_hcd
usbcore pcspkr
CPU:    0
EIP:    0060:[<f8bfc99c>]    Not tainted VLI
EFLAGS: 00010046   (2.6.20.14-ipipe-1.8-05 #1)
EIP is at rt_task_create+0x192/0x24b [xeno_native]
eax: f60a1724   ebx: f60a1d20   ecx: 00000001   edx: 00000000
esi: 00000000   edi: f60a1d24   ebp: f1aa7ee4   esp: f1aa7eb4
ds: 007b   es: 007b   ss: 0068
Process TimerThreadInst (pid: 2732, ti=f1aa6000 task=f7ce61b0
task.ti=f1aa6000)
Stack: 00300000 00000000 00000600 f60a1d20 00000000 f60a1d30 f1aa7f08
00000000
       00000000 00300000 fffffff4 f60a1d20 f1aa7f50 f8bff1e3 0000000f
00300000
       f1aa7fb8 08098654 0000000f bf9effd0 c01e2a96 656d6954 72685472
49646165
Call Trace:
 [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
 [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
 [<c01042aa>] show_registers+0x1ca/0x392
 [<c0104588>] die+0x116/0x246
 [<c011143a>] do_page_fault+0x287/0x61d
 [<c010ee95>] __ipipe_handle_exception+0x63/0x136
 [<c028f83d>] error_code+0x79/0x88
 [<f8bff1e3>] __rt_task_create+0xd5/0x16c [xeno_native]
 [<f8bd7f22>] losyscall_event+0xaf/0x16d [xeno_nucleus]
 [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
 [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
 [<c0102e79>] system_call+0x29/0x41
 =======================
Code: 00 00 00 a1 04 f6 2f c0 2d 80 99 00 00 0f ba 28 00 19 d2 89 d6 83 e6
01 c7 03 01 01 55 55 a1 50 ae c0 f8 89 47 04 8b 10 89 53 04 <89> 7a 04 89 38
83 05 54 ae c0 f8 01 8b 0d 04 f6 2f c0 81 e9 80
EIP: [<f8bfc99c>] rt_task_create+0x192/0x24b [xeno_native] SS:ESP
0068:f1aa7eb4
 ?

We have pinpointed the release of Xenomai from where on this problem
started, this is R2433. From this release on, orocos make check fails.
Before this release, orocos make check succeeds.
We use linux kernel 2.6.20.9 with ipipe 1.8-00 and Xenomai 2.3.x. These are
the changes according to the log files:

------------------------------------------------------------------------
r2433 | rpm | 2007-05-11 10:45:43 +0200 (Fri, 11 May 2007) | 1 line

Defer thread memory release
------------------------------------------------------------------------


2007-05-11  Philippe Gerum  <rpm@xenomai.org>

    * include/nucleus/heap.h (xnfreesafe): Use xnpod_current_p() when
    checking for deferral.

    * include/nucleus/pod.h (xnpod_current_p): Give exec mode
    awareness to this predicate, checking for primary/secondary mode
    of shadows.

2007-05-11  Gilles Chanteperdrix  <gilles.chanteperdrix@xenomai.org>

    * ksrc/skins: Always defer thread memory release in deletion hook
    by calling xnheap_schedule_free() instead of xnfreesafe().


Orocos uses the native skin, so my guess is that these changes shouldn't
impact the working of orocos.
Did orocos use something that it shouldn't had or does orocos use something
else that has changed between r2429 and r2433?

thanks in advance, andy

[-- Attachment #1.2: Type: text/html, Size: 14835 bytes --]

[-- Attachment #2: linux-2.6.20.14-ipipe-1.8-05.zip --]
[-- Type: application/zip, Size: 8199 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-27 13:27                           ` andy motten
@ 2007-08-27 16:55                             ` Jan Kiszka
  2007-08-28 10:06                               ` andy motten
  2007-08-29  6:11                               ` Jan Kiszka
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Kiszka @ 2007-08-27 16:55 UTC (permalink / raw)
  To: andy motten; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 4778 bytes --]

andy motten wrote:
>>
>> I will run continuously several tests on the pc (including "latency -f")
>> for the rest of this week. Since I am not in the office during this period
>> (so not in the neighborhood of this problematic pc).
>> And I hope (I hope not in vain) that the latency tracer will give us a
>> hint for the reason of the hard lock ups (if a hard lock up happens during
>> this period).
>>
>> andy
>>
> 
> 
> Hello,
> 
> Since  we are having a hard time finding the hard lock-ups. We have taken a
> closer look at the failed tests of orocos (maybe the source of the problem
> is the same). These failures occur during the make check execution.
> 
>     The following tests FAILED:
>         2 - task-test (OTHER-FAULT)
>         3 - event-test (OTHER-FAULT)
>         4 - taskcontext-test (OTHER-FAULT)
> 
> When we perform a single test, e.g task-test, we get the the following
> messages: Killed
> The OROCOS messages are then:
> 
> 0.000 [ Info   ][Logger] Successfully extracted environment variable
> ORO_LOGLEVEL
> 0.001 [ Info   ][Logger]  OROCOS version '1.2.1' compiled with GCC
> 4.1.2.Orocos Logging Activated at level : [ Debug  ] ( 6 )
> 0.001 [ Info   ][Logger] Reference System Time is : 880886725351 ticks (
> 315.369 seconds ).
> 0.002 [ Info   ][Logger] Logging is relative to this time.
> 0.002 [ Info   ][Logger] Xenomai Periodic Timer runs in preemptive
> 'one-shot' mode.
> 0.003 [ Debug  ][Logger] Xenomai Timer and Main Task Created
> 0.003 [ Debug  ][Logger] MainThread started.
> 0.003 [ Debug  ][Logger] Starting StartStopManager.
> 0.004 [ Info   ][Toolkit] Loading Tool RealTime.
> 0.005 [ Debug  ][Toolkit] Registered Type 'int' to the Orocos Type System.
> 0.005 [ Debug  ][Toolkit] Registered Type 'uint' to the Orocos Type System.
> 0.006 [ Debug  ][Toolkit] Registered Type 'double' to the Orocos Type
> System.
> 0.006 [ Debug  ][Toolkit] Registered Type 'bool' to the Orocos Type System.
> 0.006 [ Debug  ][Toolkit] Registered Type 'PropertyBag' to the Orocos Type
> System.
> 0.007 [ Debug  ][Toolkit] Registered Type 'float' to the Orocos Type System.
> 0.007 [ Debug  ][Toolkit] Registered Type 'char' to the Orocos Type System.
> 0.008 [ Debug  ][Toolkit] Registered Type 'array' to the Orocos Type System.
> 0.008 [ Debug  ][Toolkit] Registered Type 'string' to the Orocos Type
> System.
> 0.010 [ Debug  ][./task-test::main()] ORO_main starting...
> 0.010 [ Info   ][./task-test::main()] LogLevel unaltered by test-runner.
> 0.011 [ Info   ][./task-test::main()] Creating PeriodicThread for scheduler:
> 0
> 0.012 [ Info   ][TimerThreadInstance] PeriodicThread created with scheduler
> type '0', priority 15 and period 0.01.
> 0.013 [ Debug  ][Logger] Periodic Thread TimerThreadInstance started.
> 0.014 [ Info   ][PThread] PeriodicThread created with scheduler type '0',
> priority 99 and period 0.1.
> 0.014 [ Debug  ][Logger] Periodic Thread PThread started.
> 0.115 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
> 0.115 [ Debug  ][Logger] Periodic Thread PThread started.
> 1.216 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
> 1.216 [ Debug  ][~PeriodicThread] Terminating PThread
> 
> On the serial console we get the following listing (complete listing in
> appendix):
> 
> Xenomai: starting native API services.
> I-pipe: Detected illicit call from domain 'Xenomai'
>         into a service reserved for domain 'Linux' and below.
>        f635be74 00000000 00000000 52544149 f635be98 c0104789 c02cfa4f
> c02f5b80
>        f6c4e2f0 f635beb0 c0137d69 c02c256c c02c1186 c02c01b8 f8c0b280
> f635bebc
>        c0132981 f60a1730 f635bed8 f8bd8570 c010ef8c 00000000 f60a0120
> f8beefe0
> Call Trace:
>  [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
>  [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
>  [<c0104789>] show_stack+0x2f/0x36
>  [<c0137d69>] ipipe_check_context+0x7a/0x81
>  [<c0132981>] module_put+0x19/0x7d
>  [<f8bd8570>] xnshadow_unmap+0xbc/0xff [xeno_nucleus]
>  [<f8bfdc75>] __shadow_delete_hook+0x25/0x27 [xeno_native]
>  [<f8bd1454>] xnpod_delete_thread+0x1b9/0x2aa [xeno_nucleus]
>  [<f8bfc36b>] rt_task_delete+0x140/0x145 [xeno_native]
>  [<f8bfe02a>] __rt_task_delete+0x58/0x69 [xeno_native]
>  [<f8bd8165>] hisyscall_event+0x185/0x291 [xeno_nucleus]
>  [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
>  [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
>  [<c0102e79>] system_call+0x29/0x41
>  =======================

That specific Xenomai bug should be fixed in 2.4, please check your
testcase against -rc1 e.g. Unfortunately we have no backport of the fix
in 2.3 yet. Can't tell right now if this is tricky, but this test
demonstrates that $SOMETHING should be done...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-27 16:55                             ` Jan Kiszka
@ 2007-08-28 10:06                               ` andy motten
  2007-08-28 11:32                                 ` Jan Kiszka
  2007-08-29  6:11                               ` Jan Kiszka
  1 sibling, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-28 10:06 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai


[-- Attachment #1.1: Type: text/plain, Size: 2647 bytes --]

>
> That specific Xenomai bug should be fixed in 2.4, please check your
> testcase against -rc1 e.g. Unfortunately we have no backport of the fix
> in 2.3 yet. Can't tell right now if this is tricky, but this test
> demonstrates that $SOMETHING should be done...
>
> Jan


We have tried xenomai 2.4 rc1 (last changed revision 2865) with linux kernel
2.6.20.9 (ipipe 1.8-06) and linux kernel 2.6.22.1 (ipipe 1.9-01).
The orocos tests are also not working with this release:

    The following tests FAILED:
        2 - task-test (OTHER-FAULT)
        3 - event-test (OTHER-FAULT)
        4 - taskcontext-test (OTHER-FAULT)

And we get the the following messages: Killed after a single test. The
OROCOS messages are the same as with previous versions.
This time there is no extra information on the serial console after running
the orocos tests (accept "cleaning up sem" and "cleaning up mutex").

There is however a bug indication on the serial console after loading
xeno-native (see listing below, complete listing in appendix). Can this have
anything to do with the failure of the orocos tests?

I-pipe: Domain Xenomai registered.
Xenomai: hal/x86 started.
I-pipe: Domain IShield registered.
Xenomai: real-time nucleus v2.4-rc1 (Bells Of Lal) loaded.
Xenomai: SMI-enabled chipset found, enabling SMI workaround.
Xenomai: SMI workaround failed!
Xenomai: starting native API services.
BUG: sleeping function called from invalid context at mm/slab.c:3024
in_atomic():0, irqs_disabled():1
 [<c0103ab8>] show_trace_log_lvl+0x1f/0x35
 [<c01047fd>] show_trace+0x17/0x19
 [<c0104911>] dump_stack+0x1a/0x1c
 [<c0112a2a>] __might_sleep+0xc0/0xd0
 [<c0158edc>] kmem_cache_alloc+0xbc/0xdb
 [<c016d206>] d_alloc+0x23/0x190
 [<c01638ab>] do_lookup+0x117/0x168
 [<c01651aa>] __link_path_walk+0x7cb/0xcd7
 [<c01656ff>] link_path_walk+0x49/0xc4
 [<c0165797>] path_walk+0x1d/0x1f
 [<c0165945>] do_path_lookup+0x7d/0x1b3
 [<c016628a>] __user_walk_fd+0x37/0x4f
 [<c015fbb8>] vfs_lstat_fd+0x1d/0x43
 [<c015fc58>] vfs_lstat+0x16/0x18
 [<c015fc73>] sys_lstat64+0x19/0x2d
 [<c0102979>] sysenter_past_esp+0x6e/0x72
 =======================
WARNING: at kernel/softirq.c:138 local_bh_enable()
 [<c0103ab8>] show_trace_log_lvl+0x1f/0x35
 [<c01047fd>] show_trace+0x17/0x19
 [<c0104911>] dump_stack+0x1a/0x1c
 [<c011b2a3>] local_bh_enable+0xa3/0xb2
 [<c023d6df>] lock_sock_nested+0xbe/0xc6
 [<c023ae21>] sock_fasync+0x46/0x14d
 [<c023c2ca>] sock_close+0x1e/0x42
 [<c015d5fd>] __fput+0x62/0x15c
 [<c015d759>] fput+0x1d/0x1f
 [<c015afb2>] filp_close+0x46/0x6c
 [<c015c0de>] sys_close+0x6f/0xb7
 [<c0102979>] sysenter_past_esp+0x6e/0x72
 =======================

greetings, andy

[-- Attachment #1.2: Type: text/html, Size: 3561 bytes --]

[-- Attachment #2: linux-2.6.22.1-ipipe-1.9-01.zip --]
[-- Type: application/zip, Size: 4484 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-28 10:06                               ` andy motten
@ 2007-08-28 11:32                                 ` Jan Kiszka
  2007-08-29 11:36                                   ` andy motten
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kiszka @ 2007-08-28 11:32 UTC (permalink / raw)
  To: andy motten; +Cc: Xenomai

[-- Attachment #1: Type: text/plain, Size: 3496 bytes --]

andy motten wrote:
>> That specific Xenomai bug should be fixed in 2.4, please check your
>> testcase against -rc1 e.g. Unfortunately we have no backport of the fix
>> in 2.3 yet. Can't tell right now if this is tricky, but this test
>> demonstrates that $SOMETHING should be done...
>>
>> Jan
> 
> 
> We have tried xenomai 2.4 rc1 (last changed revision 2865) with linux kernel
> 2.6.20.9 (ipipe 1.8-06) and linux kernel 2.6.22.1 (ipipe 1.9-01).
> The orocos tests are also not working with this release:
> 
>     The following tests FAILED:
>         2 - task-test (OTHER-FAULT)
>         3 - event-test (OTHER-FAULT)
>         4 - taskcontext-test (OTHER-FAULT)
> 
> And we get the the following messages: Killed after a single test. The
> OROCOS messages are the same as with previous versions.
> This time there is no extra information on the serial console after running
> the orocos tests (accept "cleaning up sem" and "cleaning up mutex").

So your test apps are stopping due to some segfault or so? Over both
kernels the same behaviour? Can you try to catch the problem with gdb
(to see what causes the termination)? That may only make sense with the
kernel bug below is not visible on a specific setup, though.

> 
> There is however a bug indication on the serial console after loading
> xeno-native (see listing below, complete listing in appendix). Can this have
> anything to do with the failure of the orocos tests?
> 
> I-pipe: Domain Xenomai registered.
> Xenomai: hal/x86 started.
> I-pipe: Domain IShield registered.
> Xenomai: real-time nucleus v2.4-rc1 (Bells Of Lal) loaded.
> Xenomai: SMI-enabled chipset found, enabling SMI workaround.
> Xenomai: SMI workaround failed!
> Xenomai: starting native API services.
> BUG: sleeping function called from invalid context at mm/slab.c:3024
> in_atomic():0, irqs_disabled():1
>  [<c0103ab8>] show_trace_log_lvl+0x1f/0x35
>  [<c01047fd>] show_trace+0x17/0x19
>  [<c0104911>] dump_stack+0x1a/0x1c
>  [<c0112a2a>] __might_sleep+0xc0/0xd0
>  [<c0158edc>] kmem_cache_alloc+0xbc/0xdb
>  [<c016d206>] d_alloc+0x23/0x190
>  [<c01638ab>] do_lookup+0x117/0x168
>  [<c01651aa>] __link_path_walk+0x7cb/0xcd7
>  [<c01656ff>] link_path_walk+0x49/0xc4
>  [<c0165797>] path_walk+0x1d/0x1f
>  [<c0165945>] do_path_lookup+0x7d/0x1b3
>  [<c016628a>] __user_walk_fd+0x37/0x4f
>  [<c015fbb8>] vfs_lstat_fd+0x1d/0x43
>  [<c015fc58>] vfs_lstat+0x16/0x18
>  [<c015fc73>] sys_lstat64+0x19/0x2d
>  [<c0102979>] sysenter_past_esp+0x6e/0x72
>  =======================
> WARNING: at kernel/softirq.c:138 local_bh_enable()
>  [<c0103ab8>] show_trace_log_lvl+0x1f/0x35
>  [<c01047fd>] show_trace+0x17/0x19
>  [<c0104911>] dump_stack+0x1a/0x1c
>  [<c011b2a3>] local_bh_enable+0xa3/0xb2
>  [<c023d6df>] lock_sock_nested+0xbe/0xc6
>  [<c023ae21>] sock_fasync+0x46/0x14d
>  [<c023c2ca>] sock_close+0x1e/0x42
>  [<c015d5fd>] __fput+0x62/0x15c
>  [<c015d759>] fput+0x1d/0x1f
>  [<c015afb2>] filp_close+0x46/0x6c
>  [<c015c0de>] sys_close+0x6f/0xb7
>  [<c0102979>] sysenter_past_esp+0x6e/0x72
>  =======================

There is a pending Linux-IRQ-state issue for at least 2.6.22. I came
across it 2 weeks ago, but no one had the time to dig deeper so far (see
also xenomai-core). However, I did not see this with 2.6.20. Is your
observation kernel-version-independent? Maybe it's a race that I just
miss with my 2.6.20 kernel (it was already tricky to reproduces with
2.6.22).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-27 16:55                             ` Jan Kiszka
  2007-08-28 10:06                               ` andy motten
@ 2007-08-29  6:11                               ` Jan Kiszka
  2007-08-29 13:40                                 ` andy motten
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kiszka @ 2007-08-29  6:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai


[-- Attachment #1.1: Type: text/plain, Size: 5109 bytes --]

Jan Kiszka wrote:
> andy motten wrote:
>>> I will run continuously several tests on the pc (including "latency -f")
>>> for the rest of this week. Since I am not in the office during this period
>>> (so not in the neighborhood of this problematic pc).
>>> And I hope (I hope not in vain) that the latency tracer will give us a
>>> hint for the reason of the hard lock ups (if a hard lock up happens during
>>> this period).
>>>
>>> andy
>>>
>>
>> Hello,
>>
>> Since  we are having a hard time finding the hard lock-ups. We have taken a
>> closer look at the failed tests of orocos (maybe the source of the problem
>> is the same). These failures occur during the make check execution.
>>
>>     The following tests FAILED:
>>         2 - task-test (OTHER-FAULT)
>>         3 - event-test (OTHER-FAULT)
>>         4 - taskcontext-test (OTHER-FAULT)
>>
>> When we perform a single test, e.g task-test, we get the the following
>> messages: Killed
>> The OROCOS messages are then:
>>
>> 0.000 [ Info   ][Logger] Successfully extracted environment variable
>> ORO_LOGLEVEL
>> 0.001 [ Info   ][Logger]  OROCOS version '1.2.1' compiled with GCC
>> 4.1.2.Orocos Logging Activated at level : [ Debug  ] ( 6 )
>> 0.001 [ Info   ][Logger] Reference System Time is : 880886725351 ticks (
>> 315.369 seconds ).
>> 0.002 [ Info   ][Logger] Logging is relative to this time.
>> 0.002 [ Info   ][Logger] Xenomai Periodic Timer runs in preemptive
>> 'one-shot' mode.
>> 0.003 [ Debug  ][Logger] Xenomai Timer and Main Task Created
>> 0.003 [ Debug  ][Logger] MainThread started.
>> 0.003 [ Debug  ][Logger] Starting StartStopManager.
>> 0.004 [ Info   ][Toolkit] Loading Tool RealTime.
>> 0.005 [ Debug  ][Toolkit] Registered Type 'int' to the Orocos Type System.
>> 0.005 [ Debug  ][Toolkit] Registered Type 'uint' to the Orocos Type System.
>> 0.006 [ Debug  ][Toolkit] Registered Type 'double' to the Orocos Type
>> System.
>> 0.006 [ Debug  ][Toolkit] Registered Type 'bool' to the Orocos Type System.
>> 0.006 [ Debug  ][Toolkit] Registered Type 'PropertyBag' to the Orocos Type
>> System.
>> 0.007 [ Debug  ][Toolkit] Registered Type 'float' to the Orocos Type System.
>> 0.007 [ Debug  ][Toolkit] Registered Type 'char' to the Orocos Type System.
>> 0.008 [ Debug  ][Toolkit] Registered Type 'array' to the Orocos Type System.
>> 0.008 [ Debug  ][Toolkit] Registered Type 'string' to the Orocos Type
>> System.
>> 0.010 [ Debug  ][./task-test::main()] ORO_main starting...
>> 0.010 [ Info   ][./task-test::main()] LogLevel unaltered by test-runner.
>> 0.011 [ Info   ][./task-test::main()] Creating PeriodicThread for scheduler:
>> 0
>> 0.012 [ Info   ][TimerThreadInstance] PeriodicThread created with scheduler
>> type '0', priority 15 and period 0.01.
>> 0.013 [ Debug  ][Logger] Periodic Thread TimerThreadInstance started.
>> 0.014 [ Info   ][PThread] PeriodicThread created with scheduler type '0',
>> priority 99 and period 0.1.
>> 0.014 [ Debug  ][Logger] Periodic Thread PThread started.
>> 0.115 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
>> 0.115 [ Debug  ][Logger] Periodic Thread PThread started.
>> 1.216 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
>> 1.216 [ Debug  ][~PeriodicThread] Terminating PThread
>>
>> On the serial console we get the following listing (complete listing in
>> appendix):
>>
>> Xenomai: starting native API services.
>> I-pipe: Detected illicit call from domain 'Xenomai'
>>         into a service reserved for domain 'Linux' and below.
>>        f635be74 00000000 00000000 52544149 f635be98 c0104789 c02cfa4f
>> c02f5b80
>>        f6c4e2f0 f635beb0 c0137d69 c02c256c c02c1186 c02c01b8 f8c0b280
>> f635bebc
>>        c0132981 f60a1730 f635bed8 f8bd8570 c010ef8c 00000000 f60a0120
>> f8beefe0
>> Call Trace:
>>  [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
>>  [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
>>  [<c0104789>] show_stack+0x2f/0x36
>>  [<c0137d69>] ipipe_check_context+0x7a/0x81
>>  [<c0132981>] module_put+0x19/0x7d
>>  [<f8bd8570>] xnshadow_unmap+0xbc/0xff [xeno_nucleus]
>>  [<f8bfdc75>] __shadow_delete_hook+0x25/0x27 [xeno_native]
>>  [<f8bd1454>] xnpod_delete_thread+0x1b9/0x2aa [xeno_nucleus]
>>  [<f8bfc36b>] rt_task_delete+0x140/0x145 [xeno_native]
>>  [<f8bfe02a>] __rt_task_delete+0x58/0x69 [xeno_native]
>>  [<f8bd8165>] hisyscall_event+0x185/0x291 [xeno_nucleus]
>>  [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
>>  [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
>>  [<c0102e79>] system_call+0x29/0x41
>>  =======================
> 
> That specific Xenomai bug should be fixed in 2.4, please check your
> testcase against -rc1 e.g. Unfortunately we have no backport of the fix
> in 2.3 yet. Can't tell right now if this is tricky, but this test
> demonstrates that $SOMETHING should be done...

OK, in order to start fixing things: Here comes a back-port of the 2.4
patch to 2.3.x-SVN, moving module_put out of RT context. Be warned, it's
an early-morning hack, not even compile-tested. Feedback welcome!

Thanks,
Jan

[-- Attachment #1.2: postpone-module_put.patch --]
[-- Type: text/plain, Size: 4400 bytes --]

Index: xenomai-2.3.x/ChangeLog
===================================================================
--- xenomai-2.3.x/ChangeLog	(Revision 2954)
+++ xenomai-2.3.x/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2007-08-29  Jan Kiszka  <jan.kiszka@domain.hid>
+
+	* ksrc/nucleus/shadow.c: Postpone module_put() to the lo-stage
+	APC handler (back-ported from 2.4).
+
 2007-08-24  Wolfgang Grandegger  <wg@domain.hid>
 
 	* ksrc/drivers/can/rtcan_socket.c: protect the list of RTCAN
Index: xenomai-2.3.x/ksrc/nucleus/shadow.c
===================================================================
--- xenomai-2.3.x/ksrc/nucleus/shadow.c	(Revision 2954)
+++ xenomai-2.3.x/ksrc/nucleus/shadow.c	(Arbeitskopie)
@@ -99,6 +99,7 @@ static struct __lostagerq {
 #define LO_RENICE_REQ 2
 #define LO_SIGGRP_REQ 3
 #define LO_SIGTHR_REQ 4
+#define LO_UNMAP_REQ  5
 		int type;
 		struct task_struct *task;
 		int arg;
@@ -753,6 +754,28 @@ void xnshadow_reset_shield(void)
 
 #endif /* CONFIG_XENO_OPT_ISHIELD */
 
+static void xnshadow_dereference_skin(unsigned magic)
+{
+	unsigned muxid;
+
+	for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
+		if (muxtable[muxid].magic == magic) {
+			if (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
+				xnarch_atomic_dec(&muxtable[0].refcnt);
+			if (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
+
+				/* We were the last thread, decrement the counter,
+				   since it was incremented by the xn_sys_bind
+				   operation. */
+				xnarch_atomic_dec(&muxtable[muxid].refcnt);
+			if (muxtable[muxid].module)
+				module_put(muxtable[muxid].module);
+
+			break;
+		}
+	}
+}
+
 static void lostage_handler(void *cookie)
 {
 	int cpuid = smp_processor_id(), reqnum, sig;
@@ -777,6 +800,12 @@ static void lostage_handler(void *cookie
 
 			goto do_wakeup;
 
+		case LO_UNMAP_REQ:
+
+			xnshadow_dereference_skin(
+				(unsigned)rq->req[reqnum].arg);
+
+		/* fall through */
 		case LO_WAKEUP_REQ:
 
 			/* We need to downgrade the root thread
@@ -1256,7 +1285,6 @@ int xnshadow_map(xnthread_t *thread, xnc
 void xnshadow_unmap(xnthread_t *thread)
 {
 	struct task_struct *p;
-	unsigned muxid, magic;
 
 	if (XENO_DEBUG(NUCLEUS) &&
 	    !testbits(xnpod_current_sched()->status, XNKCOUT))
@@ -1264,25 +1292,6 @@ void xnshadow_unmap(xnthread_t *thread)
 
 	p = xnthread_archtcb(thread)->user_task;
 
-	magic = xnthread_get_magic(thread);
-
-	for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
-		if (muxtable[muxid].magic == magic) {
-			if (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
-				xnarch_atomic_dec(&muxtable[0].refcnt);
-			if (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
-
-				/* We were the last thread, decrement the counter,
-				   since it was incremented by the xn_sys_bind
-				   operation. */
-				xnarch_atomic_dec(&muxtable[muxid].refcnt);
-			if (muxtable[muxid].module)
-				module_put(muxtable[muxid].module);
-
-			break;
-		}
-	}
-
 	xnthread_clear_state(thread, XNMAPPED);
 	rpi_pop(thread);
 
@@ -1298,13 +1307,7 @@ void xnshadow_unmap(xnthread_t *thread)
 
 	xnshadow_thrptd(p) = NULL;
 
-	if (p->state != TASK_RUNNING)
-		/* If the shadow is being unmapped in primary mode or blocked
-		   in secondary mode, the associated Linux task should also
-		   die. In the former case, the zombie Linux side returning to
-		   user-space will be trapped and exited inside the pod's
-		   rescheduling routines. */
-		schedule_linux_call(LO_WAKEUP_REQ, p, 0);
+	schedule_linux_call(LO_UNMAP_REQ, p, xnthread_get_magic(thread));
 }
 
 int xnshadow_wait_barrier(struct pt_regs *regs)
@@ -2010,6 +2013,7 @@ RTHAL_DECLARE_EVENT(losyscall_event);
 static inline void do_taskexit_event(struct task_struct *p)
 {
 	xnthread_t *thread = xnshadow_thread(p); /* p == current */
+	unsigned magic;
 	spl_t s;
 
 	if (!thread)
@@ -2018,6 +2022,8 @@ static inline void do_taskexit_event(str
 	if (xnpod_shadow_p())
 		xnshadow_relax(0);
 
+	magic = xnthread_get_magic(thread);
+
 	xnlock_get_irqsave(&nklock, s);
 	/* Prevent wakeup call from xnshadow_unmap(). */
 	xnshadow_thrptd(p) = NULL;
@@ -2028,6 +2034,7 @@ static inline void do_taskexit_event(str
 	xnlock_put_irqrestore(&nklock, s);
 	xnpod_schedule();
 
+	xnshadow_dereference_skin(magic);
 	xnltt_log_event(xeno_ev_shadowexit, thread->name);
 }
 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-28 11:32                                 ` Jan Kiszka
@ 2007-08-29 11:36                                   ` andy motten
  0 siblings, 0 replies; 25+ messages in thread
From: andy motten @ 2007-08-29 11:36 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

[-- Attachment #1: Type: text/plain, Size: 3976 bytes --]

>
> > We have tried xenomai 2.4 rc1 (last changed revision 2865) with linux
> kernel
> > 2.6.20.9 (ipipe 1.8-06) and linux kernel 2.6.22.1 (ipipe 1.9-01).
> > The orocos tests are also not working with this release:
> >
> >     The following tests FAILED:
> >         2 - task-test (OTHER-FAULT)
> >         3 - event-test (OTHER-FAULT)
> >         4 - taskcontext-test (OTHER-FAULT)
> >
> > And we get the the following messages: Killed after a single test. The
> > OROCOS messages are the same as with previous versions.
> > This time there is no extra information on the serial console after
> running
> > the orocos tests (accept "cleaning up sem" and "cleaning up mutex").
>
> So your test apps are stopping due to some segfault or so? Over both
> kernels the same behaviour? Can you try to catch the problem with gdb
> (to see what causes the termination)? That may only make sense with the
> kernel bug below is not visible on a specific setup, though.

From: Klaas Gadeyne <klaas

The only information provided by the backtrace of gdb is:

        (gdb)
        Cannot fetch general-purpose registers for thread -1237948736:
generic error
        Cannot fetch general-purpose registers for thread -1237948736:
generic error

We are unsure about the goal of some of the xenomai function calls used in
the orocos os-layer. (
http://svn.mech.kuleuven.be/websvn/orocos/trunk/rtt/src/os/xenomai/fosi_internal.cpp
)
We will wait until the author comes back from holiday to check the code in
more detail.

>
> > There is however a bug indication on the serial console after loading
> > xeno-native (see listing below, complete listing in appendix). Can this
> have
> > anything to do with the failure of the orocos tests?
> >
> > I-pipe: Domain Xenomai registered.
> > Xenomai: hal/x86 started.
> > I-pipe: Domain IShield registered.
> > Xenomai: real-time nucleus v2.4-rc1 (Bells Of Lal) loaded.
> > Xenomai: SMI-enabled chipset found, enabling SMI workaround.
> > Xenomai: SMI workaround failed!
> > Xenomai: starting native API services.
> > BUG: sleeping function called from invalid context at mm/slab.c:3024
> > in_atomic():0, irqs_disabled():1
> >  [<c0103ab8>] show_trace_log_lvl+0x1f/0x35
> >  [<c01047fd>] show_trace+0x17/0x19
> >  [<c0104911>] dump_stack+0x1a/0x1c
> >  [<c0112a2a>] __might_sleep+0xc0/0xd0
> >  [<c0158edc>] kmem_cache_alloc+0xbc/0xdb
> >  [<c016d206>] d_alloc+0x23/0x190
> >  [<c01638ab>] do_lookup+0x117/0x168
> >  [<c01651aa>] __link_path_walk+0x7cb/0xcd7
> >  [<c01656ff>] link_path_walk+0x49/0xc4
> >  [<c0165797>] path_walk+0x1d/0x1f
> >  [<c0165945>] do_path_lookup+0x7d/0x1b3
> >  [<c016628a>] __user_walk_fd+0x37/0x4f
> >  [<c015fbb8>] vfs_lstat_fd+0x1d/0x43
> >  [<c015fc58>] vfs_lstat+0x16/0x18
> >  [<c015fc73>] sys_lstat64+0x19/0x2d
> >  [<c0102979>] sysenter_past_esp+0x6e/0x72
> >  =======================
> > WARNING: at kernel/softirq.c:138 local_bh_enable()
> >  [<c0103ab8>] show_trace_log_lvl+0x1f/0x35
> >  [<c01047fd>] show_trace+0x17/0x19
> >  [<c0104911>] dump_stack+0x1a/0x1c
> >  [<c011b2a3>] local_bh_enable+0xa3/0xb2
> >  [<c023d6df>] lock_sock_nested+0xbe/0xc6
> >  [<c023ae21>] sock_fasync+0x46/0x14d
> >  [<c023c2ca>] sock_close+0x1e/0x42
> >  [<c015d5fd>] __fput+0x62/0x15c
> >  [<c015d759>] fput+0x1d/0x1f
> >  [<c015afb2>] filp_close+0x46/0x6c
> >  [<c015c0de>] sys_close+0x6f/0xb7
> >  [<c0102979>] sysenter_past_esp+0x6e/0x72
> >  =======================
>
> There is a pending Linux-IRQ-state issue for at least 2.6.22. I came
> across it 2 weeks ago, but no one had the time to dig deeper so far (see
> also xenomai-core). However, I did not see this with 2.6.20. Is your
> observation kernel-version-independent? Maybe it's a race that I just
> miss with my 2.6.20 kernel (it was already tricky to reproduces with
> 2.6.22).
>

From: Klaas Gadeyne <klaas

This observation is kernel independent, it happens with linux kernel
2.6.20.9 (ipipe 1.8-06) and linux kernel 2.6.22.1 (ipipe 1.9-01).

greetings, andy

[-- Attachment #2: Type: text/html, Size: 6547 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-29  6:11                               ` Jan Kiszka
@ 2007-08-29 13:40                                 ` andy motten
  2007-08-29 14:12                                   ` Jan Kiszka
  0 siblings, 1 reply; 25+ messages in thread
From: andy motten @ 2007-08-29 13:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

[-- Attachment #1: Type: text/plain, Size: 10787 bytes --]

> Jan Kiszka wrote:
> > andy motten wrote:
> >>> I will run continuously several tests on the pc (including "latency
> -f")
> >>> for the rest of this week. Since I am not in the office during this
> period
> >>> (so not in the neighborhood of this problematic pc).
> >>> And I hope (I hope not in vain) that the latency tracer will give us a
> >>> hint for the reason of the hard lock ups (if a hard lock up happens
> during
> >>> this period).
> >>>
> >>> andy
> >>>
> >>
> >> Hello,
> >>
> >> Since  we are having a hard time finding the hard lock-ups. We have
> taken a
> >> closer look at the failed tests of orocos (maybe the source of the
> problem
> >> is the same). These failures occur during the make check execution.
> >>
> >>     The following tests FAILED:
> >>         2 - task-test (OTHER-FAULT)
> >>         3 - event-test (OTHER-FAULT)
> >>         4 - taskcontext-test (OTHER-FAULT)
> >>
> >> When we perform a single test, e.g task-test, we get the the following
> >> messages: Killed
> >> The OROCOS messages are then:
> >>
> >> 0.000 [ Info   ][Logger] Successfully extracted environment variable
> >> ORO_LOGLEVEL
> >> 0.001 [ Info   ][Logger]  OROCOS version '1.2.1' compiled with GCC
> >> 4.1.2.Orocos Logging Activated at level : [ Debug  ] ( 6 )
> >> 0.001 [ Info   ][Logger] Reference System Time is : 880886725351 ticks
> (
> >> 315.369 seconds ).
> >> 0.002 [ Info   ][Logger] Logging is relative to this time.
> >> 0.002 [ Info   ][Logger] Xenomai Periodic Timer runs in preemptive
> >> 'one-shot' mode.
> >> 0.003 [ Debug  ][Logger] Xenomai Timer and Main Task Created
> >> 0.003 [ Debug  ][Logger] MainThread started.
> >> 0.003 [ Debug  ][Logger] Starting StartStopManager.
> >> 0.004 [ Info   ][Toolkit] Loading Tool RealTime.
> >> 0.005 [ Debug  ][Toolkit] Registered Type 'int' to the Orocos Type
> System.
> >> 0.005 [ Debug  ][Toolkit] Registered Type 'uint' to the Orocos Type
> System.
> >> 0.006 [ Debug  ][Toolkit] Registered Type 'double' to the Orocos Type
> >> System.
> >> 0.006 [ Debug  ][Toolkit] Registered Type 'bool' to the Orocos Type
> System.
> >> 0.006 [ Debug  ][Toolkit] Registered Type 'PropertyBag' to the Orocos
> Type
> >> System.
> >> 0.007 [ Debug  ][Toolkit] Registered Type 'float' to the Orocos Type
> System.
> >> 0.007 [ Debug  ][Toolkit] Registered Type 'char' to the Orocos Type
> System.
> >> 0.008 [ Debug  ][Toolkit] Registered Type 'array' to the Orocos Type
> System.
> >> 0.008 [ Debug  ][Toolkit] Registered Type 'string' to the Orocos Type
> >> System.
> >> 0.010 [ Debug  ][./task-test::main()] ORO_main starting...
> >> 0.010 [ Info   ][./task-test::main()] LogLevel unaltered by
> test-runner.
> >> 0.011 [ Info   ][./task-test::main()] Creating PeriodicThread for
> scheduler:
> >> 0
> >> 0.012 [ Info   ][TimerThreadInstance] PeriodicThread created with
> scheduler
> >> type '0', priority 15 and period 0.01.
> >> 0.013 [ Debug  ][Logger] Periodic Thread TimerThreadInstance started.
> >> 0.014 [ Info   ][PThread] PeriodicThread created with scheduler type
> '0',
> >> priority 99 and period 0.1.
> >> 0.014 [ Debug  ][Logger] Periodic Thread PThread started.
> >> 0.115 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
> >> 0.115 [ Debug  ][Logger] Periodic Thread PThread started.
> >> 1.216 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
> >> 1.216 [ Debug  ][~PeriodicThread] Terminating PThread
> >>
> >> On the serial console we get the following listing (complete listing in
> >> appendix):
> >>
> >> Xenomai: starting native API services.
> >> I-pipe: Detected illicit call from domain 'Xenomai'
> >>         into a service reserved for domain 'Linux' and below.
> >>        f635be74 00000000 00000000 52544149 f635be98 c0104789 c02cfa4f
> >> c02f5b80
> >>        f6c4e2f0 f635beb0 c0137d69 c02c256c c02c1186 c02c01b8 f8c0b280
> >> f635bebc
> >>        c0132981 f60a1730 f635bed8 f8bd8570 c010ef8c 00000000 f60a0120
> >> f8beefe0
> >> Call Trace:
> >>  [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
> >>  [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
> >>  [<c0104789>] show_stack+0x2f/0x36
> >>  [<c0137d69>] ipipe_check_context+0x7a/0x81
> >>  [<c0132981>] module_put+0x19/0x7d
> >>  [<f8bd8570>] xnshadow_unmap+0xbc/0xff [xeno_nucleus]
> >>  [<f8bfdc75>] __shadow_delete_hook+0x25/0x27 [xeno_native]
> >>  [<f8bd1454>] xnpod_delete_thread+0x1b9/0x2aa [xeno_nucleus]
> >>  [<f8bfc36b>] rt_task_delete+0x140/0x145 [xeno_native]
> >>  [<f8bfe02a>] __rt_task_delete+0x58/0x69 [xeno_native]
> >>  [<f8bd8165>] hisyscall_event+0x185/0x291 [xeno_nucleus]
> >>  [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
> >>  [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
> >>  [<c0102e79>] system_call+0x29/0x41
> >>  =======================
> >
> > That specific Xenomai bug should be fixed in 2.4, please check your
> > testcase against -rc1 e.g. Unfortunately we have no backport of the fix
> > in 2.3 yet. Can't tell right now if this is tricky, but this test
> > demonstrates that $SOMETHING should be done...
>
> OK, in order to start fixing things: Here comes a back-port of the 2.4
> patch to 2.3.x-SVN, moving module_put out of RT context. Be warned, it's
> an early-morning hack, not even compile-tested. Feedback welcome!
>
> Thanks,
> Jan
>
> Index: xenomai-2.3.x/ChangeLog
> ===================================================================
> --- xenomai-2.3.x/ChangeLog     (Revision 2954)
> +++ xenomai-2.3.x/ChangeLog     (Arbeitskopie)
> @@ -1,3 +1,8 @@
> +2007-08-29  Jan Kiszka  <jan.kiszka@domain.hid>
> +
> +       * ksrc/nucleus/shadow.c: Postpone module_put() to the lo-stage
> +       APC handler (back-ported from 2.4).
> +
> 2007-08-24  Wolfgang Grandegger  <wg@domain.hid>
>
>         * ksrc/drivers/can/rtcan_socket.c: protect the list of RTCAN
> Index: xenomai-2.3.x/ksrc/nucleus/shadow.c
> ===================================================================
> --- xenomai-2.3.x/ksrc/nucleus/shadow.c (Revision 2954)
> +++ xenomai-2.3.x/ksrc/nucleus/shadow.c (Arbeitskopie)
> @@ -99,6 +99,7 @@ static struct __lostagerq {
> #define LO_RENICE_REQ 2
> #define LO_SIGGRP_REQ 3
> #define LO_SIGTHR_REQ 4
> +#define LO_UNMAP_REQ  5
>                 int type;
>                 struct task_struct *task;
>                 int arg;
> @@ -753,6 +754,28 @@ void xnshadow_reset_shield(void)
>
> #endif /* CONFIG_XENO_OPT_ISHIELD */
>
> +static void xnshadow_dereference_skin(unsigned magic)
> +{
> +       unsigned muxid;
> +
> +       for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
> +               if (muxtable[muxid].magic == magic) {
> +                       if
> (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
> +                               xnarch_atomic_dec(&muxtable[0].refcnt);
> +                       if
> (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
> +
> +                               /* We were the last thread, decrement the
> counter,
> +                                  since it was incremented by the
> xn_sys_bind
> +                                  operation. */
> +
> xnarch_atomic_dec(&muxtable[muxid].refcnt);
> +                       if (muxtable[muxid].module)
> +                               module_put(muxtable[muxid].module);
> +
> +                       break;
> +               }
> +       }
> +}
> +
> static void lostage_handler(void *cookie)
> {
>         int cpuid = smp_processor_id(), reqnum, sig;
> @@ -777,6 +800,12 @@ static void lostage_handler(void *cookie
>
>                         goto do_wakeup;
>
> +               case LO_UNMAP_REQ:
> +
> +                       xnshadow_dereference_skin(
> +                               (unsigned)rq->req[reqnum].arg);
> +
> +               /* fall through */
>                 case LO_WAKEUP_REQ:
>
>                         /* We need to downgrade the root thread
> @@ -1256,7 +1285,6 @@ int xnshadow_map(xnthread_t *thread, xnc
> void xnshadow_unmap(xnthread_t *thread)
> {
>         struct task_struct *p;
> -       unsigned muxid, magic;
>
>         if (XENO_DEBUG(NUCLEUS) &&
>             !testbits(xnpod_current_sched()->status, XNKCOUT))
> @@ -1264,25 +1292,6 @@ void xnshadow_unmap(xnthread_t *thread)
>
>         p = xnthread_archtcb(thread)->user_task;
>
> -       magic = xnthread_get_magic(thread);
> -
> -       for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
> -               if (muxtable[muxid].magic == magic) {
> -                       if
> (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
> -                               xnarch_atomic_dec(&muxtable[0].refcnt);
> -                       if
> (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
> -
> -                               /* We were the last thread, decrement the
> counter,
> -                                  since it was incremented by the
> xn_sys_bind
> -                                  operation. */
> -
> xnarch_atomic_dec(&muxtable[muxid].refcnt);
> -                       if (muxtable[muxid].module)
> -                               module_put(muxtable[muxid].module);
> -
> -                       break;
> -               }
> -       }
> -
>         xnthread_clear_state(thread, XNMAPPED);
>         rpi_pop(thread);
>
> @@ -1298,13 +1307,7 @@ void xnshadow_unmap(xnthread_t *thread)
>
>         xnshadow_thrptd(p) = NULL;
>
> -       if (p->state != TASK_RUNNING)
> -               /* If the shadow is being unmapped in primary mode or
> blocked
> -                  in secondary mode, the associated Linux task should
> also
> -                  die. In the former case, the zombie Linux side
> returning to
> -                  user-space will be trapped and exited inside the pod's
> -                  rescheduling routines. */
> -               schedule_linux_call(LO_WAKEUP_REQ, p, 0);
> +       schedule_linux_call(LO_UNMAP_REQ, p, xnthread_get_magic(thread));
> }
>
> int xnshadow_wait_barrier(struct pt_regs *regs)
> @@ -2010,6 +2013,7 @@ RTHAL_DECLARE_EVENT(losyscall_event);
> static inline void do_taskexit_event(struct task_struct *p)
> {
>         xnthread_t *thread = xnshadow_thread(p); /* p == current */
> +       unsigned magic;
>         spl_t s;
>
>         if (!thread)
> @@ -2018,6 +2022,8 @@ static inline void do_taskexit_event(str
>         if (xnpod_shadow_p())
>                 xnshadow_relax(0);
>
> +       magic = xnthread_get_magic(thread);
> +
>         xnlock_get_irqsave(&nklock, s);
>         /* Prevent wakeup call from xnshadow_unmap(). */
>         xnshadow_thrptd(p) = NULL;
> @@ -2028,6 +2034,7 @@ static inline void do_taskexit_event(str
>         xnlock_put_irqrestore(&nklock, s);
>         xnpod_schedule();
>
> +       xnshadow_dereference_skin(magic);
>         xnltt_log_event(xeno_ev_shadowexit, thread->name);
> }


This  patch is working with linux kernel 2.6.20.9 and ipipe 1.8-08.

thanks andy

[-- Attachment #2: Type: text/html, Size: 18997 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-29 13:40                                 ` andy motten
@ 2007-08-29 14:12                                   ` Jan Kiszka
  2007-08-29 14:23                                     ` Philippe Gerum
  2007-08-29 14:23                                     ` andy motten
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Kiszka @ 2007-08-29 14:12 UTC (permalink / raw)
  To: andy motten; +Cc: Xenomai

[-- Attachment #1: Type: text/plain, Size: 5999 bytes --]

andy motten wrote:
>> Jan Kiszka wrote:
>> Index: xenomai-2.3.x/ChangeLog
>> ===================================================================
>> --- xenomai-2.3.x/ChangeLog     (Revision 2954)
>> +++ xenomai-2.3.x/ChangeLog     (Arbeitskopie)
>> @@ -1,3 +1,8 @@
>> +2007-08-29  Jan Kiszka  <jan.kiszka@domain.hid>
>> +
>> +       * ksrc/nucleus/shadow.c: Postpone module_put() to the lo-stage
>> +       APC handler (back-ported from 2.4).
>> +
>> 2007-08-24  Wolfgang Grandegger  <wg@domain.hid>
>>
>>         * ksrc/drivers/can/rtcan_socket.c: protect the list of RTCAN
>> Index: xenomai-2.3.x/ksrc/nucleus/shadow.c
>> ===================================================================
>> --- xenomai-2.3.x/ksrc/nucleus/shadow.c (Revision 2954)
>> +++ xenomai-2.3.x/ksrc/nucleus/shadow.c (Arbeitskopie)
>> @@ -99,6 +99,7 @@ static struct __lostagerq {
>> #define LO_RENICE_REQ 2
>> #define LO_SIGGRP_REQ 3
>> #define LO_SIGTHR_REQ 4
>> +#define LO_UNMAP_REQ  5
>>                 int type;
>>                 struct task_struct *task;
>>                 int arg;
>> @@ -753,6 +754,28 @@ void xnshadow_reset_shield(void)
>>
>> #endif /* CONFIG_XENO_OPT_ISHIELD */
>>
>> +static void xnshadow_dereference_skin(unsigned magic)
>> +{
>> +       unsigned muxid;
>> +
>> +       for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
>> +               if (muxtable[muxid].magic == magic) {
>> +                       if
>> (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
>> +                               xnarch_atomic_dec(&muxtable[0].refcnt);
>> +                       if
>> (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
>> +
>> +                               /* We were the last thread, decrement the
>> counter,
>> +                                  since it was incremented by the
>> xn_sys_bind
>> +                                  operation. */
>> +
>> xnarch_atomic_dec(&muxtable[muxid].refcnt);
>> +                       if (muxtable[muxid].module)
>> +                               module_put(muxtable[muxid].module);
>> +
>> +                       break;
>> +               }
>> +       }
>> +}
>> +
>> static void lostage_handler(void *cookie)
>> {
>>         int cpuid = smp_processor_id(), reqnum, sig;
>> @@ -777,6 +800,12 @@ static void lostage_handler(void *cookie
>>
>>                         goto do_wakeup;
>>
>> +               case LO_UNMAP_REQ:
>> +
>> +                       xnshadow_dereference_skin(
>> +                               (unsigned)rq->req[reqnum].arg);
>> +
>> +               /* fall through */
>>                 case LO_WAKEUP_REQ:
>>
>>                         /* We need to downgrade the root thread
>> @@ -1256,7 +1285,6 @@ int xnshadow_map(xnthread_t *thread, xnc
>> void xnshadow_unmap(xnthread_t *thread)
>> {
>>         struct task_struct *p;
>> -       unsigned muxid, magic;
>>
>>         if (XENO_DEBUG(NUCLEUS) &&
>>             !testbits(xnpod_current_sched()->status, XNKCOUT))
>> @@ -1264,25 +1292,6 @@ void xnshadow_unmap(xnthread_t *thread)
>>
>>         p = xnthread_archtcb(thread)->user_task;
>>
>> -       magic = xnthread_get_magic(thread);
>> -
>> -       for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
>> -               if (muxtable[muxid].magic == magic) {
>> -                       if
>> (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
>> -                               xnarch_atomic_dec(&muxtable[0].refcnt);
>> -                       if
>> (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
>> -
>> -                               /* We were the last thread, decrement the
>> counter,
>> -                                  since it was incremented by the
>> xn_sys_bind
>> -                                  operation. */
>> -
>> xnarch_atomic_dec(&muxtable[muxid].refcnt);
>> -                       if (muxtable[muxid].module)
>> -                               module_put(muxtable[muxid].module);
>> -
>> -                       break;
>> -               }
>> -       }
>> -
>>         xnthread_clear_state(thread, XNMAPPED);
>>         rpi_pop(thread);
>>
>> @@ -1298,13 +1307,7 @@ void xnshadow_unmap(xnthread_t *thread)
>>
>>         xnshadow_thrptd(p) = NULL;
>>
>> -       if (p->state != TASK_RUNNING)
>> -               /* If the shadow is being unmapped in primary mode or
>> blocked
>> -                  in secondary mode, the associated Linux task should
>> also
>> -                  die. In the former case, the zombie Linux side
>> returning to
>> -                  user-space will be trapped and exited inside the pod's
>> -                  rescheduling routines. */
>> -               schedule_linux_call(LO_WAKEUP_REQ, p, 0);
>> +       schedule_linux_call(LO_UNMAP_REQ, p, xnthread_get_magic(thread));
>> }
>>
>> int xnshadow_wait_barrier(struct pt_regs *regs)
>> @@ -2010,6 +2013,7 @@ RTHAL_DECLARE_EVENT(losyscall_event);
>> static inline void do_taskexit_event(struct task_struct *p)
>> {
>>         xnthread_t *thread = xnshadow_thread(p); /* p == current */
>> +       unsigned magic;
>>         spl_t s;
>>
>>         if (!thread)
>> @@ -2018,6 +2022,8 @@ static inline void do_taskexit_event(str
>>         if (xnpod_shadow_p())
>>                 xnshadow_relax(0);
>>
>> +       magic = xnthread_get_magic(thread);
>> +
>>         xnlock_get_irqsave(&nklock, s);
>>         /* Prevent wakeup call from xnshadow_unmap(). */
>>         xnshadow_thrptd(p) = NULL;
>> @@ -2028,6 +2034,7 @@ static inline void do_taskexit_event(str
>>         xnlock_put_irqrestore(&nklock, s);
>>         xnpod_schedule();
>>
>> +       xnshadow_dereference_skin(magic);
>>         xnltt_log_event(xeno_ev_shadowexit, thread->name);
>> }
> 
> 
> This  patch is working with linux kernel 2.6.20.9 and ipipe 1.8-08.

Means the OROCOS test now runs fine against Xenomai 2.3.x at least?

Philippe, can we merge the patch?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-29 14:12                                   ` Jan Kiszka
@ 2007-08-29 14:23                                     ` Philippe Gerum
  2007-08-29 14:23                                     ` andy motten
  1 sibling, 0 replies; 25+ messages in thread
From: Philippe Gerum @ 2007-08-29 14:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

On Wed, 2007-08-29 at 16:12 +0200, Jan Kiszka wrote:

> Means the OROCOS test now runs fine against Xenomai 2.3.x at least?
> 
> Philippe, can we merge the patch?

Yes, this one is pending in my queue for 2.3.4.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Xenomai-help] hard lock-up
  2007-08-29 14:12                                   ` Jan Kiszka
  2007-08-29 14:23                                     ` Philippe Gerum
@ 2007-08-29 14:23                                     ` andy motten
  1 sibling, 0 replies; 25+ messages in thread
From: andy motten @ 2007-08-29 14:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai

[-- Attachment #1: Type: text/plain, Size: 7387 bytes --]

2007/8/29, Jan Kiszka <jan.kiszka@domain.hid>:
>
> andy motten wrote:
> >> Jan Kiszka wrote:
> >> Index: xenomai-2.3.x/ChangeLog
> >> ===================================================================
> >> --- xenomai-2.3.x/ChangeLog     (Revision 2954)
> >> +++ xenomai-2.3.x/ChangeLog     (Arbeitskopie)
> >> @@ -1,3 +1,8 @@
> >> +2007-08-29  Jan Kiszka  <jan.kiszka@domain.hid>
> >> +
> >> +       * ksrc/nucleus/shadow.c: Postpone module_put() to the lo-stage
> >> +       APC handler (back-ported from 2.4).
> >> +
> >> 2007-08-24  Wolfgang Grandegger  <wg@domain.hid>
> >>
> >>         * ksrc/drivers/can/rtcan_socket.c: protect the list of RTCAN
> >> Index: xenomai-2.3.x/ksrc/nucleus/shadow.c
> >> ===================================================================
> >> --- xenomai-2.3.x/ksrc/nucleus/shadow.c (Revision 2954)
> >> +++ xenomai-2.3.x/ksrc/nucleus/shadow.c (Arbeitskopie)
> >> @@ -99,6 +99,7 @@ static struct __lostagerq {
> >> #define LO_RENICE_REQ 2
> >> #define LO_SIGGRP_REQ 3
> >> #define LO_SIGTHR_REQ 4
> >> +#define LO_UNMAP_REQ  5
> >>                 int type;
> >>                 struct task_struct *task;
> >>                 int arg;
> >> @@ -753,6 +754,28 @@ void xnshadow_reset_shield(void)
> >>
> >> #endif /* CONFIG_XENO_OPT_ISHIELD */
> >>
> >> +static void xnshadow_dereference_skin(unsigned magic)
> >> +{
> >> +       unsigned muxid;
> >> +
> >> +       for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
> >> +               if (muxtable[muxid].magic == magic) {
> >> +                       if
> >> (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
> >> +                               xnarch_atomic_dec(&muxtable[0].refcnt);
> >> +                       if
> >> (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
> >> +
> >> +                               /* We were the last thread, decrement
> the
> >> counter,
> >> +                                  since it was incremented by the
> >> xn_sys_bind
> >> +                                  operation. */
> >> +
> >> xnarch_atomic_dec(&muxtable[muxid].refcnt);
> >> +                       if (muxtable[muxid].module)
> >> +                               module_put(muxtable[muxid].module);
> >> +
> >> +                       break;
> >> +               }
> >> +       }
> >> +}
> >> +
> >> static void lostage_handler(void *cookie)
> >> {
> >>         int cpuid = smp_processor_id(), reqnum, sig;
> >> @@ -777,6 +800,12 @@ static void lostage_handler(void *cookie
> >>
> >>                         goto do_wakeup;
> >>
> >> +               case LO_UNMAP_REQ:
> >> +
> >> +                       xnshadow_dereference_skin(
> >> +                               (unsigned)rq->req[reqnum].arg);
> >> +
> >> +               /* fall through */
> >>                 case LO_WAKEUP_REQ:
> >>
> >>                         /* We need to downgrade the root thread
> >> @@ -1256,7 +1285,6 @@ int xnshadow_map(xnthread_t *thread, xnc
> >> void xnshadow_unmap(xnthread_t *thread)
> >> {
> >>         struct task_struct *p;
> >> -       unsigned muxid, magic;
> >>
> >>         if (XENO_DEBUG(NUCLEUS) &&
> >>             !testbits(xnpod_current_sched()->status, XNKCOUT))
> >> @@ -1264,25 +1292,6 @@ void xnshadow_unmap(xnthread_t *thread)
> >>
> >>         p = xnthread_archtcb(thread)->user_task;
> >>
> >> -       magic = xnthread_get_magic(thread);
> >> -
> >> -       for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
> >> -               if (muxtable[muxid].magic == magic) {
> >> -                       if
> >> (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
> >> -                               xnarch_atomic_dec(&muxtable[0].refcnt);
> >> -                       if
> >> (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
> >> -
> >> -                               /* We were the last thread, decrement
> the
> >> counter,
> >> -                                  since it was incremented by the
> >> xn_sys_bind
> >> -                                  operation. */
> >> -
> >> xnarch_atomic_dec(&muxtable[muxid].refcnt);
> >> -                       if (muxtable[muxid].module)
> >> -                               module_put(muxtable[muxid].module);
> >> -
> >> -                       break;
> >> -               }
> >> -       }
> >> -
> >>         xnthread_clear_state(thread, XNMAPPED);
> >>         rpi_pop(thread);
> >>
> >> @@ -1298,13 +1307,7 @@ void xnshadow_unmap(xnthread_t *thread)
> >>
> >>         xnshadow_thrptd(p) = NULL;
> >>
> >> -       if (p->state != TASK_RUNNING)
> >> -               /* If the shadow is being unmapped in primary mode or
> >> blocked
> >> -                  in secondary mode, the associated Linux task should
> >> also
> >> -                  die. In the former case, the zombie Linux side
> >> returning to
> >> -                  user-space will be trapped and exited inside the
> pod's
> >> -                  rescheduling routines. */
> >> -               schedule_linux_call(LO_WAKEUP_REQ, p, 0);
> >> +       schedule_linux_call(LO_UNMAP_REQ, p,
> xnthread_get_magic(thread));
> >> }
> >>
> >> int xnshadow_wait_barrier(struct pt_regs *regs)
> >> @@ -2010,6 +2013,7 @@ RTHAL_DECLARE_EVENT(losyscall_event);
> >> static inline void do_taskexit_event(struct task_struct *p)
> >> {
> >>         xnthread_t *thread = xnshadow_thread(p); /* p == current */
> >> +       unsigned magic;
> >>         spl_t s;
> >>
> >>         if (!thread)
> >> @@ -2018,6 +2022,8 @@ static inline void do_taskexit_event(str
> >>         if (xnpod_shadow_p())
> >>                 xnshadow_relax(0);
> >>
> >> +       magic = xnthread_get_magic(thread);
> >> +
> >>         xnlock_get_irqsave(&nklock, s);
> >>         /* Prevent wakeup call from xnshadow_unmap(). */
> >>         xnshadow_thrptd(p) = NULL;
> >> @@ -2028,6 +2034,7 @@ static inline void do_taskexit_event(str
> >>         xnlock_put_irqrestore(&nklock, s);
> >>         xnpod_schedule();
> >>
> >> +       xnshadow_dereference_skin(magic);
> >>         xnltt_log_event(xeno_ev_shadowexit, thread->name);
> >> }
> >
> >
> > This  patch is working with linux kernel 2.6.20.9 and ipipe 1.8-08.
>
> Means the OROCOS test now runs fine against Xenomai 2.3.x at least?


no, sorry for the confusion. I only meant that we don't get the I-pipe
message anymore (see listing below). The OROCOS tests are still NOT running
correctly.

I-pipe: Detected illicit call from domain 'Xenomai'
        into a service reserved for domain 'Linux' and below.
       f635be74 00000000 00000000 52544149 f635be98 c0104789 c02cfa4f
c02f5b80
       f6c4e2f0 f635beb0 c0137d69 c02c256c c02c1186 c02c01b8 f8c0b280
f635bebc
       c0132981 f60a1730 f635bed8 f8bd8570 c010ef8c 00000000 f60a0120
f8beefe0
Call Trace:
 [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
 [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
 [<c0104789>] show_stack+0x2f/0x36
 [<c0137d69>] ipipe_check_context+0x7a/0x81
 [<c0132981>] module_put+0x19/0x7d
 [<f8bd8570>] xnshadow_unmap+0xbc/0xff [xeno_nucleus]
 [<f8bfdc75>] __shadow_delete_hook+0x25/0x27 [xeno_native]
 [<f8bd1454>] xnpod_delete_thread+0x1b9/0x2aa [xeno_nucleus]
 [<f8bfc36b>] rt_task_delete+0x140/0x145 [xeno_native]
 [<f8bfe02a>] __rt_task_delete+0x58/0x69 [xeno_native]
 [<f8bd8165>] hisyscall_event+0x185/0x291 [xeno_nucleus]
 [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
 [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
 [<c0102e79>] system_call+0x29/0x41
 =======================

[-- Attachment #2: Type: text/html, Size: 14927 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-08-29 14:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-09  9:11 [Xenomai-help] hard lock-up andy motten
2007-08-09  9:42 ` Gilles Chanteperdrix
2007-08-09 11:24   ` Jan Kiszka
2007-08-09 16:09     ` andy motten
2007-08-09 16:22       ` Philippe Gerum
2007-08-10  7:32         ` Jan Kiszka
2007-08-10  7:54           ` Klaas Gadeyne
2007-08-10 15:05             ` andy motten
2007-08-10 15:12               ` Jan Kiszka
2007-08-13  7:06                 ` Klaas Gadeyne
2007-08-13  7:19                   ` Gilles Chanteperdrix
2007-08-13 15:10                     ` andy motten
2007-08-13 17:01                       ` Jan Kiszka
2007-08-14 15:26                         ` andy motten
2007-08-27 13:27                           ` andy motten
2007-08-27 16:55                             ` Jan Kiszka
2007-08-28 10:06                               ` andy motten
2007-08-28 11:32                                 ` Jan Kiszka
2007-08-29 11:36                                   ` andy motten
2007-08-29  6:11                               ` Jan Kiszka
2007-08-29 13:40                                 ` andy motten
2007-08-29 14:12                                   ` Jan Kiszka
2007-08-29 14:23                                     ` Philippe Gerum
2007-08-29 14:23                                     ` andy motten
2007-08-09 16:26       ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.