All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] xeno-2.3.1 shared interrups BUG?
@ 2007-06-19  7:23 apittaluga
  2007-06-19  7:44 ` Jan Kiszka
  2007-06-20 22:10 ` Jan Kiszka
  0 siblings, 2 replies; 9+ messages in thread
From: apittaluga @ 2007-06-19  7:23 UTC (permalink / raw)
  To: xenomai

[-- Attachment #1: Type: text/plain, Size: 4041 bytes --]

Hi,
running a simple test application which spawns a periodic task writing on a
serial interface
the system hangs performing the rt_dev_close.
The test program ran fine with xeno 2.2.6 with "Shared Interrupts" enabled,
so as with
xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno 2.3.1 with
"Shared Interrupts" enabled, so the problem seems to be in the shared
interrupts handling area.
kernel is 2.6.20 adeos patched

Any suggestion?

Many Thanks

here follows the kernel dumps:

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000008
printing eip:
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: xeno_16550A ipv6 nfs lockd sunrpc ide_scsi i2c_i801
i2c_core sg shpchp rng_core evdev ehci_hcd uhci_hcd intel
_agp agpgart e1000 serio_raw pcspkr
CPU: 0
EIP: 0060: Not tainted VLI
EFLAGS: 00010046 (2.6.20.1-xeno-2.3.1 #16)
EIP is at xnintr_edge_shirq_handler+0xda/0x2f0
eax: 00000000 ebx: 00000000 ecx: f2d58074 edx: c0529080
esi: c05290c0 edi: 69bfb728 ebp: 000000c9 esp: c04b9f08
ds: 007b es: 007b ss: 0068
I-pipe domain Xenomai
Process modprobe (pid: 1876, ti=f7a44000 task=f7d7d030 task.ti=f7a44000)
Stack: c047d100 c0487400 00000001 00000001 00000001 f2d58050 c0527a10
c05293f8
00000004 c04fbc38 c051d100 00000004 00000000 c0143fa8 00000000 c047d100
c051d100 c04fbc38 00000000 00000004 c0527400 c0112de2 c03bc88a 00000000
Call Trace:
__ipipe_dispatch_wired+0xdB/0x120
__ipipe_handle_irq+0x72/0x2b0
schedule+0x41a/0x880
common_interrupt+0x21/0x38
mwait_idle_with_hints+0x3f/0x50
mwait_idle+0x0/0x10
cpu_idle+0x6f/0x90
start_kernel+0x1d0/0x240
unknown_bootoption+0x0/0x190
=======================
Code: 03 00 00 89 f0 89 96 b4 03 00 00 8b 15 80 87 52 c0 29 d0 83 e8 40 c1
f8 04 69 c0 ab aa aa aa 8d 44 18 24 87 86 b8 03 00 00
89 d9 <ff> 53 08 09 44 24 10 0f b6 d0 83 fa 02 0f 84 04 01 00 00 4a 0f
EIP: xnintr_edge_shirq_handler+0xda/0x2f0 SS:ESP 0068:c04b9f08
Kernel panic - not syncing: Attempted to kill the idle task!

hardware:
Intel Core DUO Processor Single Board Computer

here follows the test application:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

#include <signal.h>
#include <sys/mman.h>

#include <native/task.h>
#include <native/timer.h>
#include <rtdm/rtserial.h>

#define FALSE 0
#define TRUE 1
#define BAUDRATE B115200

#define WRITE_FILE "rtser0"

RT_TASK rt_writer_tid;
int fd;

static const struct rtser_config write_config = {
0xFFDF, /* config_mask */
115200, /* baud_rate */
RTSER_DEF_PARITY, /* parity */
RTSER_DEF_BITS, /* data_bits */
RTSER_DEF_STOPB, /* stop_bits */
RTSER_DEF_HAND, /* handshake */
RTSER_DEF_FIFO_DEPTH, /* fifo_depth*/
RTSER_DEF_TIMEOUT, /* rx_timeout */
RTSER_DEF_TIMEOUT, /* tx_timeout */
RTSER_DEF_TIMEOUT, /* event_timeout */
RTSER_DEF_TIMESTAMP_HISTORY /* timestamp_history */
};

void rt_writer (void *cookie)
{
int error;
int res;
char* msg = "abrac";
error = rt_task_set_periodic(NULL,
TM_NOW,
rt_timer_ns2ticks(32000000));

for (;;) {
error = rt_task_wait_period(NULL);
res = rt_dev_write(fd, msg, strlen(msg));

}
}


void cleanup_upon_sig(int sig __attribute__((unused)))
{


rt_dev_close(fd);

exit(0);
}

int main(int argc, char** argv)
{
int error;


mlockall(MCL_CURRENT|MCL_FUTURE);

signal(SIGINT, cleanup_upon_sig);
signal(SIGTERM, cleanup_upon_sig);
signal(SIGHUP, cleanup_upon_sig);
signal(SIGALRM, cleanup_upon_sig);

fd = rt_dev_open(WRITE_FILE, 0);
if (fd < 0) {
perror(WRITE_FILE);
cleanup_upon_sig(0);
}
error = rt_dev_ioctl(fd, RTSER_RTIOC_SET_CONFIG, &write_config);
if (error) {
printf("error while RTSER_RTIOC_SET_CONFIG, code %d\n",error);
cleanup_upon_sig(0);
}

error = rt_task_spawn(&rt_writer_tid,"rt_writer",0,99,T_FPU,
rt_writer, NULL);
if (error) {
printf("rt_task_spawn: code %d\n",error);
return 2;
}
pause();
exit(0);

}


Alessandro Pittaluga

Alenia Aeronautica
Avionic System Qualification
Test Systems
Corso Marche, 41
10146 Torino (Italy)
Phone +39-011-756.2915
+39-011-996.0714
Fax +39-011-756.2517

[-- Attachment #2: Type: text/html, Size: 4916 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19  7:23 [Xenomai-help] xeno-2.3.1 shared interrups BUG? apittaluga
@ 2007-06-19  7:44 ` Jan Kiszka
  2007-06-19 11:52   ` Dmitry Adamushko
  2007-06-20 22:10 ` Jan Kiszka
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2007-06-19  7:44 UTC (permalink / raw)
  To: apittaluga; +Cc: xenomai-help

[-- Attachment #1: Type: text/plain, Size: 4991 bytes --]

apittaluga@domain.hid wrote:
> Hi,
> running a simple test application which spawns a periodic task writing on a
> serial interface
> the system hangs performing the rt_dev_close.
> The test program ran fine with xeno 2.2.6 with "Shared Interrupts" enabled,
> so as with
> xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno 2.3.1 with
> "Shared Interrupts" enabled, so the problem seems to be in the shared
> interrupts handling area.
> kernel is 2.6.20 adeos patched
> 
> Any suggestion?

Argh, not good. Sounds like our IRQ detachment code is still racy.

> 
> Many Thanks
> 
> here follows the kernel dumps:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address
> 00000008
> printing eip:
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> Modules linked in: xeno_16550A ipv6 nfs lockd sunrpc ide_scsi i2c_i801
> i2c_core sg shpchp rng_core evdev ehci_hcd uhci_hcd intel
> _agp agpgart e1000 serio_raw pcspkr
> CPU: 0
> EIP: 0060: Not tainted VLI
> EFLAGS: 00010046 (2.6.20.1-xeno-2.3.1 #16)
> EIP is at xnintr_edge_shirq_handler+0xda/0x2f0

Do you know how to resolve this address into source code?
CONFIG_DEBUG_INFO needs to be on, "gdb vmlinux" and then "disassemble
xnintr_edge_shirq_handler" would give you that context. Please post the
full disassembly of that function. That may help us by pointing at the
variable that is causing the oops here.

> eax: 00000000 ebx: 00000000 ecx: f2d58074 edx: c0529080
> esi: c05290c0 edi: 69bfb728 ebp: 000000c9 esp: c04b9f08
> ds: 007b es: 007b ss: 0068
> I-pipe domain Xenomai
> Process modprobe (pid: 1876, ti=f7a44000 task=f7d7d030 task.ti=f7a44000)
> Stack: c047d100 c0487400 00000001 00000001 00000001 f2d58050 c0527a10
> c05293f8
> 00000004 c04fbc38 c051d100 00000004 00000000 c0143fa8 00000000 c047d100
> c051d100 c04fbc38 00000000 00000004 c0527400 c0112de2 c03bc88a 00000000
> Call Trace:
> __ipipe_dispatch_wired+0xdB/0x120
> __ipipe_handle_irq+0x72/0x2b0
> schedule+0x41a/0x880
> common_interrupt+0x21/0x38
> mwait_idle_with_hints+0x3f/0x50
> mwait_idle+0x0/0x10
> cpu_idle+0x6f/0x90
> start_kernel+0x1d0/0x240
> unknown_bootoption+0x0/0x190
> =======================
> Code: 03 00 00 89 f0 89 96 b4 03 00 00 8b 15 80 87 52 c0 29 d0 83 e8 40 c1
> f8 04 69 c0 ab aa aa aa 8d 44 18 24 87 86 b8 03 00 00
> 89 d9 <ff> 53 08 09 44 24 10 0f b6 d0 83 fa 02 0f 84 04 01 00 00 4a 0f
> EIP: xnintr_edge_shirq_handler+0xda/0x2f0 SS:ESP 0068:c04b9f08
> Kernel panic - not syncing: Attempted to kill the idle task!
> 
> hardware:
> Intel Core DUO Processor Single Board Computer
> 
> here follows the test application:
> 
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <unistd.h>
> #include <stdlib.h>
> 
> #include <signal.h>
> #include <sys/mman.h>
> 
> #include <native/task.h>
> #include <native/timer.h>
> #include <rtdm/rtserial.h>
> 
> #define FALSE 0
> #define TRUE 1
> #define BAUDRATE B115200
> 
> #define WRITE_FILE "rtser0"
> 
> RT_TASK rt_writer_tid;
> int fd;
> 
> static const struct rtser_config write_config = {
> 0xFFDF, /* config_mask */
> 115200, /* baud_rate */
> RTSER_DEF_PARITY, /* parity */
> RTSER_DEF_BITS, /* data_bits */
> RTSER_DEF_STOPB, /* stop_bits */
> RTSER_DEF_HAND, /* handshake */
> RTSER_DEF_FIFO_DEPTH, /* fifo_depth*/
> RTSER_DEF_TIMEOUT, /* rx_timeout */
> RTSER_DEF_TIMEOUT, /* tx_timeout */
> RTSER_DEF_TIMEOUT, /* event_timeout */
> RTSER_DEF_TIMESTAMP_HISTORY /* timestamp_history */
> };
> 
> void rt_writer (void *cookie)
> {
> int error;
> int res;
> char* msg = "abrac";
> error = rt_task_set_periodic(NULL,
> TM_NOW,
> rt_timer_ns2ticks(32000000));
> 
> for (;;) {
> error = rt_task_wait_period(NULL);
> res = rt_dev_write(fd, msg, strlen(msg));
> 
> }
> }
> 
> 
> void cleanup_upon_sig(int sig __attribute__((unused)))
> {
> 
> 
> rt_dev_close(fd);
> 
> exit(0);
> }
> 
> int main(int argc, char** argv)
> {
> int error;
> 
> 
> mlockall(MCL_CURRENT|MCL_FUTURE);
> 
> signal(SIGINT, cleanup_upon_sig);
> signal(SIGTERM, cleanup_upon_sig);
> signal(SIGHUP, cleanup_upon_sig);
> signal(SIGALRM, cleanup_upon_sig);
> 
> fd = rt_dev_open(WRITE_FILE, 0);
> if (fd < 0) {
> perror(WRITE_FILE);
> cleanup_upon_sig(0);
> }
> error = rt_dev_ioctl(fd, RTSER_RTIOC_SET_CONFIG, &write_config);
> if (error) {
> printf("error while RTSER_RTIOC_SET_CONFIG, code %d\n",error);
> cleanup_upon_sig(0);
> }
> 
> error = rt_task_spawn(&rt_writer_tid,"rt_writer",0,99,T_FPU,
> rt_writer, NULL);
> if (error) {
> printf("rt_task_spawn: code %d\n",error);
> return 2;
> }
> pause();
> exit(0);
> 
> }
> 
> 
> Alessandro Pittaluga
> 
> Alenia Aeronautica
> Avionic System Qualification
> Test Systems
> Corso Marche, 41
> 10146 Torino (Italy)
> Phone +39-011-756.2915
> +39-011-996.0714
> Fax +39-011-756.2517
> 
> 

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19  7:44 ` Jan Kiszka
@ 2007-06-19 11:52   ` Dmitry Adamushko
  2007-06-19 12:14     ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Adamushko @ 2007-06-19 11:52 UTC (permalink / raw)
  To: apittaluga; +Cc: Xenomai help, Jan Kiszka

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> apittaluga@domain.hid wrote:
> > Hi,
> > running a simple test application which spawns a periodic task writing on a
> > serial interface
> > the system hangs performing the rt_dev_close.
> > The test program ran fine with xeno 2.2.6 with "Shared Interrupts" enabled,
> > so as with
> > xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno 2.3.1 with
> > "Shared Interrupts" enabled, so the problem seems to be in the shared
> > interrupts handling area.
> > kernel is 2.6.20 adeos patched
> >
> > Any suggestion?

Does the fix below eliminate the problem?

The problem (allegedly) is cause by the following reinitialization at
the end of the loop:

...
                if (!(intr = intr->next))
                        intr = shirq->handlers;
...

'end' may point to some of the elements ... and shirq->handlers may
become NULL (all elements have been deleted)..

(white-space damaged version.. enclosed a normal one)

--- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
+++ ksrc/nucleus/intr.c 2007-06-19 13:45:53.867440067 +0200
@@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
        xnintr_shirq_lock(shirq);
        intr = shirq->handlers;

-       while (intr != end) {
+       while (intr && intr != end) {
                int ret, code;

                xnstat_runtime_switch(sched,


-- 
Best regards,
Dmitry Adamushko

[-- Attachment #2: fix-edge_shared_intr.patch --]
[-- Type: text/x-patch, Size: 350 bytes --]

--- ksrc/nucleus/intr.c-orig	2007-06-19 13:44:55.090623404 +0200
+++ ksrc/nucleus/intr.c	2007-06-19 13:45:53.867440067 +0200
@@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
 	xnintr_shirq_lock(shirq);
 	intr = shirq->handlers;
 
-	while (intr != end) {
+	while (intr && intr != end) {
 		int ret, code;
 
 		xnstat_runtime_switch(sched,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19 11:52   ` Dmitry Adamushko
@ 2007-06-19 12:14     ` Jan Kiszka
  2007-06-19 12:20       ` Jan Kiszka
  2007-06-19 12:41       ` Dmitry Adamushko
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Kiszka @ 2007-06-19 12:14 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Xenomai help, apittaluga

[-- Attachment #1: Type: text/plain, Size: 1938 bytes --]

Dmitry Adamushko wrote:
> On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> apittaluga@domain.hid wrote:
>> > Hi,
>> > running a simple test application which spawns a periodic task
>> writing on a
>> > serial interface
>> > the system hangs performing the rt_dev_close.
>> > The test program ran fine with xeno 2.2.6 with "Shared Interrupts"
>> enabled,
>> > so as with
>> > xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno
>> 2.3.1 with
>> > "Shared Interrupts" enabled, so the problem seems to be in the shared
>> > interrupts handling area.
>> > kernel is 2.6.20 adeos patched
>> >
>> > Any suggestion?
> 
> Does the fix below eliminate the problem?
> 
> The problem (allegedly) is cause by the following reinitialization at
> the end of the loop:
> 
> ...
>                if (!(intr = intr->next))
>                        intr = shirq->handlers;
> ...
> 
> 'end' may point to some of the elements ... and shirq->handlers may
> become NULL (all elements have been deleted)..

Good catch.

> 
> (white-space damaged version.. enclosed a normal one)
> 
> --- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
> +++ ksrc/nucleus/intr.c 2007-06-19 13:45:53.867440067 +0200
> @@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
>        xnintr_shirq_lock(shirq);
>        intr = shirq->handlers;
> 
> -       while (intr != end) {
> +       while (intr && intr != end) {
>                int ret, code;
> 
>                xnstat_runtime_switch(sched,
> 
> 

But your patch looks incomplete: What if someone removes "end" but
leaves other handlers behind while we are looping? Neither intr would
then become NULL nor would we hit the end again. This seems to be more
tricky...

Quick idea: mark the xnintr object as being removed, check for this
state of "end" at the end of the while loop and null'ify "end" in this case.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19 12:14     ` Jan Kiszka
@ 2007-06-19 12:20       ` Jan Kiszka
  2007-06-19 12:41       ` Dmitry Adamushko
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Kiszka @ 2007-06-19 12:20 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Xenomai help, apittaluga

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

Jan Kiszka wrote:
> Dmitry Adamushko wrote:
>> On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>>> apittaluga@domain.hid wrote:
>>>> Hi,
>>>> running a simple test application which spawns a periodic task
>>> writing on a
>>>> serial interface
>>>> the system hangs performing the rt_dev_close.
>>>> The test program ran fine with xeno 2.2.6 with "Shared Interrupts"
>>> enabled,
>>>> so as with
>>>> xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno
>>> 2.3.1 with
>>>> "Shared Interrupts" enabled, so the problem seems to be in the shared
>>>> interrupts handling area.
>>>> kernel is 2.6.20 adeos patched
>>>>
>>>> Any suggestion?
>> Does the fix below eliminate the problem?
>>
>> The problem (allegedly) is cause by the following reinitialization at
>> the end of the loop:
>>
>> ...
>>                if (!(intr = intr->next))
>>                        intr = shirq->handlers;
>> ...
>>
>> 'end' may point to some of the elements ... and shirq->handlers may
>> become NULL (all elements have been deleted)..
> 
> Good catch.
> 
>> (white-space damaged version.. enclosed a normal one)
>>
>> --- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
>> +++ ksrc/nucleus/intr.c 2007-06-19 13:45:53.867440067 +0200
>> @@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
>>        xnintr_shirq_lock(shirq);
>>        intr = shirq->handlers;
>>
>> -       while (intr != end) {
>> +       while (intr && intr != end) {
>>                int ret, code;
>>
>>                xnstat_runtime_switch(sched,
>>
>>
> 
> But your patch looks incomplete: What if someone removes "end" but
> leaves other handlers behind while we are looping? Neither intr would
> then become NULL nor would we hit the end again. This seems to be more
> tricky...
> 
> Quick idea: mark the xnintr object as being removed, check for this
> state of "end" at the end of the while loop and null'ify "end" in this case.

May also race. So we need both your test and mine, I think. This looks
increasingly ugly, screaming for something different.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19 12:14     ` Jan Kiszka
  2007-06-19 12:20       ` Jan Kiszka
@ 2007-06-19 12:41       ` Dmitry Adamushko
  2007-06-19 13:00         ` Jan Kiszka
  1 sibling, 1 reply; 9+ messages in thread
From: Dmitry Adamushko @ 2007-06-19 12:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai help, apittaluga

On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> Dmitry Adamushko wrote:
> > On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> >> apittaluga@domain.hid wrote:
> >> > Hi,
> >> > running a simple test application which spawns a periodic task
> >> writing on a
> >> > serial interface
> >> > the system hangs performing the rt_dev_close.
> >> > The test program ran fine with xeno 2.2.6 with "Shared Interrupts"
> >> enabled,
> >> > so as with
> >> > xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno
> >> 2.3.1 with
> >> > "Shared Interrupts" enabled, so the problem seems to be in the shared
> >> > interrupts handling area.
> >> > kernel is 2.6.20 adeos patched
> >> >
> >> > Any suggestion?
> >
> > Does the fix below eliminate the problem?
> >
> > The problem (allegedly) is cause by the following reinitialization at
> > the end of the loop:
> >
> > ...
> >                if (!(intr = intr->next))
> >                        intr = shirq->handlers;
> > ...
> >
> > 'end' may point to some of the elements ... and shirq->handlers may
> > become NULL (all elements have been deleted)..
>
> Good catch.
>
> >
> > (white-space damaged version.. enclosed a normal one)
> >
> > --- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
> > +++ ksrc/nucleus/intr.c 2007-06-19 13:45:53.867440067 +0200
> > @@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
> >        xnintr_shirq_lock(shirq);
> >        intr = shirq->handlers;
> >
> > -       while (intr != end) {
> > +       while (intr && intr != end) {
> >                int ret, code;
> >
> >                xnstat_runtime_switch(sched,
> >
> >
>
> But your patch looks incomplete: What if someone removes "end" but
> leaves other handlers behind while we are looping? Neither intr would
> then become NULL nor would we hit the end again. This seems to be more
> tricky...

Yeah.. what's about smth like this? (quick approach: if not ok, will
have to elaborate it thoroughly :-)


--- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
+++ ksrc/nucleus/intr.c 2007-06-19 14:38:36.073535255 +0200
@@ -259,7 +259,7 @@ static void xnintr_edge_shirq_handler(un
        xnstat_runtime_t *prev;
        xnticks_t start;
        xnintr_shirq_t *shirq = &xnshirqs[irq];
-       xnintr_t *intr, *end = NULL;
+       xnintr_t *intr, *end = NULL, *old_end = NULL;
        int s = 0, counter = 0;

        xnarch_memory_barrier();
@@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
        xnintr_shirq_lock(shirq);
        intr = shirq->handlers;

-       while (intr != end) {
+       while (intr && intr != end) {
                int ret, code;

                xnstat_runtime_switch(sched,
@@ -297,8 +297,14 @@ static void xnintr_edge_shirq_handler(un
                if (counter++ > MAX_EDGEIRQ_COUNTER)
                        break;

-               if (!(intr = intr->next))
+               if (!(intr = intr->next)) {
                        intr = shirq->handlers;
+
+                       /* 'end' has been removed in the mean time. */
+                       if (end && old_end == end)
+                               intr = NULL;
+                       old_end = end;
+               }
        }

        xnintr_shirq_unlock(shirq);


>
> Jan
>

-- 
Best regards,
Dmitry Adamushko


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19 12:41       ` Dmitry Adamushko
@ 2007-06-19 13:00         ` Jan Kiszka
  2007-06-19 13:28           ` Dmitry Adamushko
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2007-06-19 13:00 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Xenomai help, apittaluga

[-- Attachment #1: Type: text/plain, Size: 4098 bytes --]

Dmitry Adamushko wrote:
> On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Dmitry Adamushko wrote:
>> > On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> >> apittaluga@domain.hid wrote:
>> >> > Hi,
>> >> > running a simple test application which spawns a periodic task
>> >> writing on a
>> >> > serial interface
>> >> > the system hangs performing the rt_dev_close.
>> >> > The test program ran fine with xeno 2.2.6 with "Shared Interrupts"
>> >> enabled,
>> >> > so as with
>> >> > xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno
>> >> 2.3.1 with
>> >> > "Shared Interrupts" enabled, so the problem seems to be in the
>> shared
>> >> > interrupts handling area.
>> >> > kernel is 2.6.20 adeos patched
>> >> >
>> >> > Any suggestion?
>> >
>> > Does the fix below eliminate the problem?
>> >
>> > The problem (allegedly) is cause by the following reinitialization at
>> > the end of the loop:
>> >
>> > ...
>> >                if (!(intr = intr->next))
>> >                        intr = shirq->handlers;
>> > ...
>> >
>> > 'end' may point to some of the elements ... and shirq->handlers may
>> > become NULL (all elements have been deleted)..
>>
>> Good catch.
>>
>> >
>> > (white-space damaged version.. enclosed a normal one)
>> >
>> > --- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
>> > +++ ksrc/nucleus/intr.c 2007-06-19 13:45:53.867440067 +0200
>> > @@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
>> >        xnintr_shirq_lock(shirq);
>> >        intr = shirq->handlers;
>> >
>> > -       while (intr != end) {
>> > +       while (intr && intr != end) {
>> >                int ret, code;
>> >
>> >                xnstat_runtime_switch(sched,
>> >
>> >
>>
>> But your patch looks incomplete: What if someone removes "end" but
>> leaves other handlers behind while we are looping? Neither intr would
>> then become NULL nor would we hit the end again. This seems to be more
>> tricky...
> 
> Yeah.. what's about smth like this? (quick approach: if not ok, will
> have to elaborate it thoroughly :-)
> 
> 
> --- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
> +++ ksrc/nucleus/intr.c 2007-06-19 14:38:36.073535255 +0200
> @@ -259,7 +259,7 @@ static void xnintr_edge_shirq_handler(un
>        xnstat_runtime_t *prev;
>        xnticks_t start;
>        xnintr_shirq_t *shirq = &xnshirqs[irq];
> -       xnintr_t *intr, *end = NULL;
> +       xnintr_t *intr, *end = NULL, *old_end = NULL;
>        int s = 0, counter = 0;
> 
>        xnarch_memory_barrier();
> @@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
>        xnintr_shirq_lock(shirq);
>        intr = shirq->handlers;
> 
> -       while (intr != end) {
> +       while (intr && intr != end) {
>                int ret, code;
> 
>                xnstat_runtime_switch(sched,
> @@ -297,8 +297,14 @@ static void xnintr_edge_shirq_handler(un
>                if (counter++ > MAX_EDGEIRQ_COUNTER)
>                        break;
> 
> -               if (!(intr = intr->next))
> +               if (!(intr = intr->next)) {
>                        intr = shirq->handlers;
> +
> +                       /* 'end' has been removed in the mean time. */
> +                       if (end && old_end == end)
> +                               intr = NULL;
> +                       old_end = end;
> +               }

"end" may still remain stuck on an xnintr object that was removed.
Unless some other element becomes "end" while continuing with the chain,
I don't see a way out of this loop.

I currently have a new approach in mind:

- work with two chains, one is active and remains so as long as we
  iterate over it in the hander, the other gets modified and then marked
  active for succeeding handler entries

- turn the shared-edge chains into rings (make the last point to the
  first) so that we can drop "if (!(intr = intr->next))"

The latter is actually an unrelated optimisation, I just wanted to save
the idea. :)

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19 13:00         ` Jan Kiszka
@ 2007-06-19 13:28           ` Dmitry Adamushko
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Adamushko @ 2007-06-19 13:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai help, apittaluga

On 19/06/07, Jan Kiszka <jan.kiszka@domain.hid> wrote:
> > [ ... ]
> >
> > --- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
> > +++ ksrc/nucleus/intr.c 2007-06-19 14:38:36.073535255 +0200
> > @@ -259,7 +259,7 @@ static void xnintr_edge_shirq_handler(un
> >        xnstat_runtime_t *prev;
> >        xnticks_t start;
> >        xnintr_shirq_t *shirq = &xnshirqs[irq];
> > -       xnintr_t *intr, *end = NULL;
> > +       xnintr_t *intr, *end = NULL, *old_end = NULL;
> >        int s = 0, counter = 0;
> >
> >        xnarch_memory_barrier();
> > @@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
> >        xnintr_shirq_lock(shirq);
> >        intr = shirq->handlers;
> >
> > -       while (intr != end) {
> > +       while (intr && intr != end) {
> >                int ret, code;
> >
> >                xnstat_runtime_switch(sched,
> > @@ -297,8 +297,14 @@ static void xnintr_edge_shirq_handler(un
> >                if (counter++ > MAX_EDGEIRQ_COUNTER)
> >                        break;
> >
> > -               if (!(intr = intr->next))
> > +               if (!(intr = intr->next)) {
> >                        intr = shirq->handlers;
> > +
> > +                       /* 'end' has been removed in the mean time. */
> > +                       if (end && old_end == end)
> > +                               intr = NULL;
> > +                       old_end = end;
> > +               }
>
> "end" may still remain stuck on an xnintr object that was removed.
> Unless some other element becomes "end" while continuing with the chain,
> I don't see a way out of this loop.

But that's why 'old_end == end'.. i.e. if we finished iteration over
all the handlers and on the current iteration N 'end' is the same as
it was on iteration N-1 .. --- 'end' has been (probably) removed..
although, 'old_end' should be cleared when 'end' changes.. hum?

IOW, it catches a moment when the full iteration took place and 'end'
is the same as it was on iteration N-1.. moreover, 'end' has _not_
changed during iteration N.

at the same time,

...
while (intr && intr != end) {
...

did't took place so 'end' is invalid.. we set 'intr = NULL' and the loop ends.

I'll take a closer look later but I think it should work. Although,
maybe I just can't sanely estimate whole ugliness of this fix at the
moment :-)



-


--- ksrc/nucleus/intr.c-orig    2007-06-19 13:44:55.090623404 +0200
+++ ksrc/nucleus/intr.c 2007-06-19 15:17:55.787849783 +0200
@@ -259,7 +259,7 @@ static void xnintr_edge_shirq_handler(un
        xnstat_runtime_t *prev;
        xnticks_t start;
        xnintr_shirq_t *shirq = &xnshirqs[irq];
-       xnintr_t *intr, *end = NULL;
+       xnintr_t *intr, *end = NULL, *old_end = NULL;
        int s = 0, counter = 0;

        xnarch_memory_barrier();
@@ -273,7 +273,7 @@ static void xnintr_edge_shirq_handler(un
        xnintr_shirq_lock(shirq);
        intr = shirq->handlers;

-       while (intr != end) {
+       while (intr && intr != end) {
                int ret, code;

                xnstat_runtime_switch(sched,
@@ -291,14 +291,22 @@ static void xnintr_edge_shirq_handler(un
                                &intr->stat[xnsched_cpu(sched)].account,
                                start);
                        start = xnstat_runtime_now();
-               } else if (code == XN_ISR_NONE && end == NULL)
+               } else if (code == XN_ISR_NONE && end == NULL) {
                        end = intr;
+                       old_end = NULL;
+               }

                if (counter++ > MAX_EDGEIRQ_COUNTER)
                        break;

-               if (!(intr = intr->next))
+               if (!(intr = intr->next)) {
                        intr = shirq->handlers;
+
+                       /* 'end' has been removed in the mean time. */
+                       if (end && old_end == end)
+                               intr = NULL;
+                       old_end = end;
+               }
        }

        xnintr_shirq_unlock(shirq);



-- 
Best regards,
Dmitry Adamushko


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Xenomai-help] xeno-2.3.1 shared interrups BUG?
  2007-06-19  7:23 [Xenomai-help] xeno-2.3.1 shared interrups BUG? apittaluga
  2007-06-19  7:44 ` Jan Kiszka
@ 2007-06-20 22:10 ` Jan Kiszka
  1 sibling, 0 replies; 9+ messages in thread
From: Jan Kiszka @ 2007-06-20 22:10 UTC (permalink / raw)
  To: apittaluga; +Cc: Xenomai

[-- Attachment #1: Type: text/plain, Size: 5618 bytes --]

apittaluga@domain.hid wrote:
> Hi,
> running a simple test application which spawns a periodic task writing on a
> serial interface
> the system hangs performing the rt_dev_close.
> The test program ran fine with xeno 2.2.6 with "Shared Interrupts" enabled,
> so as with
> xeno 2.3.1 with "Shared Interrupts" disabled. It fails with xeno 2.3.1 with
> "Shared Interrupts" enabled, so the problem seems to be in the shared
> interrupts handling area.
> kernel is 2.6.20 adeos patched
> 
> Any suggestion?

OK, the commission came to the conclusion that we may have a solution:

Could you please test this patch

    https://mail.gna.org/public/xenomai-core/2007-06/msg00091.html

and report the result to us? Oh... wait... you are on 2.3.1. Then use the
patch below. Applies against 2.3.x-SVN, but should work with the release
as well.

Thanks,
Jan


---
 ksrc/nucleus/intr.c |   68 ++++++++++++++++------------------------------------
 1 file changed, 21 insertions(+), 47 deletions(-)

Index: xenomai-2.3.x/ksrc/nucleus/intr.c
===================================================================
--- xenomai-2.3.x.orig/ksrc/nucleus/intr.c
+++ xenomai-2.3.x/ksrc/nucleus/intr.c
@@ -145,39 +145,13 @@ typedef struct xnintr_shirq {
 	xnintr_t *handlers;
 	int unhandled;
 #ifdef CONFIG_SMP
-	atomic_counter_t active;
+	xnlock_t lock;
 #endif
 
 } xnintr_shirq_t;
 
 static xnintr_shirq_t xnshirqs[RTHAL_NR_IRQS];
 
-static inline void xnintr_shirq_lock(xnintr_shirq_t *shirq)
-{
-#ifdef CONFIG_SMP
-	xnarch_atomic_inc(&shirq->active);
-	xnarch_memory_barrier();
-#endif
-}
-
-static inline void xnintr_shirq_unlock(xnintr_shirq_t *shirq)
-{
-#ifdef CONFIG_SMP
-	xnarch_memory_barrier();
-	xnarch_atomic_dec(&shirq->active);
-#endif
-}
-
-void xnintr_synchronize(xnintr_t *intr)
-{
-#ifdef CONFIG_SMP
-	xnintr_shirq_t *shirq = &xnshirqs[intr->irq];
-
-	while (xnarch_atomic_get(&shirq->active))
-		cpu_relax();
-#endif
-}
-
 #if defined(CONFIG_XENO_OPT_SHIRQ_LEVEL)
 /*
  * Low-level interrupt handler dispatching the user-defined ISRs for
@@ -201,7 +175,7 @@ static void xnintr_shirq_handler(unsigne
 
 	++sched->inesting;
 
-	xnintr_shirq_lock(shirq);
+	xnlock_get(&shirq->lock);
 	intr = shirq->handlers;
 
 	while (intr) {
@@ -222,7 +196,7 @@ static void xnintr_shirq_handler(unsigne
 		intr = intr->next;
 	}
 
-	xnintr_shirq_unlock(shirq);
+	xnlock_put(&shirq->lock);
 
 	if (unlikely(s == XN_ISR_NONE)) {
 		if (++shirq->unhandled == XNINTR_MAX_UNHANDLED) {
@@ -272,7 +246,7 @@ static void xnintr_edge_shirq_handler(un
 
 	++sched->inesting;
 
-	xnintr_shirq_lock(shirq);
+	xnlock_get(&shirq->lock);
 	intr = shirq->handlers;
 
 	while (intr != end) {
@@ -303,7 +277,7 @@ static void xnintr_edge_shirq_handler(un
 			intr = shirq->handlers;
 	}
 
-	xnintr_shirq_unlock(shirq);
+	xnlock_put(&shirq->lock);
 
 	if (counter > MAX_EDGEIRQ_COUNTER)
 		xnlogerr
@@ -386,9 +360,12 @@ static inline int xnintr_irq_attach(xnin
 
 	__setbits(intr->flags, XN_ISR_ATTACHED);
 
-	/* Add a given interrupt object. */
 	intr->next = NULL;
+
+	/* Add a given interrupt object. */
+	xnlock_get(&shirq->lock);
 	*p = intr;
+	xnlock_put(&shirq->lock);
 
 	return 0;
 }
@@ -409,8 +386,10 @@ static inline int xnintr_irq_detach(xnin
 
 	while ((e = *p) != NULL) {
 		if (e == intr) {
-			/* Remove a given interrupt object from the list. */
+			/* Remove the given interrupt object from the list. */
+			xnlock_get(&shirq->lock);
 			*p = e->next;
+			xnlock_put(&shirq->lock);
 
 			/* Release the IRQ line if this was the last user */
 			if (shirq->handlers == NULL)
@@ -429,12 +408,8 @@ static inline int xnintr_irq_detach(xnin
 int xnintr_mount(void)
 {
 	int i;
-	for (i = 0; i < RTHAL_NR_IRQS; ++i) {
-		xnshirqs[i].handlers = NULL;
-#ifdef CONFIG_SMP
-		xnarch_atomic_set(&xnshirqs[i].active, 0);
-#endif
-	}
+	for (i = 0; i < RTHAL_NR_IRQS; ++i)
+		xnlock_init(&xnshirqs[i].lock);
 	return 0;
 }
 
@@ -450,7 +425,6 @@ static inline int xnintr_irq_detach(xnin
 	return xnarch_release_irq(intr->irq);
 }
 
-void xnintr_synchronize(xnintr_t *intr) {}
 int xnintr_mount(void) { return 0; }
 
 #endif /* CONFIG_XENO_OPT_SHIRQ_LEVEL || CONFIG_XENO_OPT_SHIRQ_EDGE */
@@ -626,6 +600,9 @@ int xnintr_destroy(xnintr_t *intr)
  * a low-level error occurred while attaching the interrupt. -EBUSY is
  * specifically returned if the interrupt object was already attached.
  *
+ * @note The caller <b>must not</b> hold nklock when invoking this service,
+ * this would cause deadlocks.
+ *
  * Environments:
  *
  * This service can be called from:
@@ -654,7 +631,7 @@ int xnintr_attach(xnintr_t *intr, void *
 
 	if (!err)
 		xnintr_stat_counter_inc();
-	
+
 	xnlock_put_irqrestore(&intrlock, s);
 
 	return err;
@@ -678,6 +655,9 @@ int xnintr_attach(xnintr_t *intr, void *
  * a non-attached interrupt object leads to a null-effect and returns
  * 0.
  *
+ * @note The caller <b>must not</b> hold nklock when invoking this service,
+ * this would cause deadlocks.
+ *
  * Environments:
  *
  * This service can be called from:
@@ -703,12 +683,6 @@ int xnintr_detach(xnintr_t *intr)
 
 	xnlock_put_irqrestore(&intrlock, s);
 
-	/* The idea here is to keep a detached interrupt object valid as long
-	   as the corresponding irq handler is running. This is one of the
-	   requirements to iterate over the xnintr_shirq_t::handlers list in
-	   xnintr_irq_handler() in a lockless way. */
-	xnintr_synchronize(intr);
-
 	return err;
 }
 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-06-20 22:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-19  7:23 [Xenomai-help] xeno-2.3.1 shared interrups BUG? apittaluga
2007-06-19  7:44 ` Jan Kiszka
2007-06-19 11:52   ` Dmitry Adamushko
2007-06-19 12:14     ` Jan Kiszka
2007-06-19 12:20       ` Jan Kiszka
2007-06-19 12:41       ` Dmitry Adamushko
2007-06-19 13:00         ` Jan Kiszka
2007-06-19 13:28           ` Dmitry Adamushko
2007-06-20 22:10 ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.