* [Xenomai-core] [BUG?] registry usage + module removal causes kernel oops (xenomai native)
@ 2007-01-11 17:44 Thomas Wiedemann
2007-01-11 18:05 ` Jan Kiszka
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Wiedemann @ 2007-01-11 17:44 UTC (permalink / raw)
To: xenomai
Hi again,
I observed another wrong(?) behaviour of xenomai, caused by a wrong
behaviour in our code :) The resources (tested with mutexes) are not
deleted after the process which created them exits without cleaning up
(for example, it crashes).
For anonymous objects, i don't see a reason why this would be a
defined behaviour, since there is no way to reuse those objects.
Therefore, I consider this to be a bug, as it will finally eat up all
memory.
Or are there any technical reasons for this?
Another bug appeared for objects registered at the registry. When
using xeno-native and xeno-rtdm, the order of removal seems to be
important. I appended a small code sample to register a mutex at
the registry. After the program exits, the modules can not be unloaded
in the order
1) xeno-native
2) xeno-rtdm,
but the other way around works fine. Instead, rmmod ends up with a
segmentation fault, dmesg output appended.
I tested this on xenomai 2.3.0/linux 2.6.19 and 2.2.5-svn/linux2.6.17.14.
Thanks for fixing the previously posted bug ;)
Thomas Wiedemann
-------------
/* The test code: */
/* compile with */
/* gcc -g -o xenwatchtest xenwatchtest.c `/usr/xenomai/bin/xeno-config
--xeno-cflags` `/usr/xenomai/bin/xeno-config --xeno-ldflags` -lnative */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <native/task.h>
#include <native/mutex.h>
int main() {
RT_TASK rttask;
RT_MUTEX *m;
int i;
mlockall(MCL_CURRENT | MCL_FUTURE);
if(rt_task_shadow(&rttask,"test",50,0)<0) {
fprintf(stderr,"Error creating rt task!\n");
exit(0);
}
if((m=(RT_MUTEX *)malloc(sizeof(RT_MUTEX))) == NULL) {
perror("malloc");
exit(1);
}
i=rt_mutex_create(m, "ich_bin_ein_berliner");
if(i!=0) {
fprintf(stderr,"rt_mutex_create(): %s\n",strerror(-i));
}
/* do not clean up ... */
exit(0);
}
-- end of test code
-- dmesg output after 'rmmod xeno-native ; rmmod xeno-rtdm':
[ 175.831909] Xenomai: stopping native API services.
[ 178.671282] Xenomai: stopping RTDM services.
[ 178.867527] BUG: unable to handle kernel paging request at virtual
address debb7e54
[ 178.867533] printing eip:
[ 178.867535] deb8e663
[ 178.867537] *pde = 01435067
[ 178.867539] *pte = 00000000
[ 178.867542] Oops: 0000 [#1]
[ 178.867547] Modules linked in: xeno_rtdm xeno_nucleus loop i2c_i801
iTCO_wdt e100 ehci_hcd mii 8250_pnp 8250 serial_core softdog
[ 178.867561] CPU: 0
[ 178.867562] EIP: 0060:[<deb8e663>] Tainted: GF VLI
[ 178.867563] EFLAGS: 00010282 (2.6.19-x5 #1)
[ 178.867589] EIP is at xnregistry_cleanup+0x43/0x140 [xeno_nucleus]
[ 178.867594] eax: dc300b58 ebx: dc300928 ecx: c03c8640 edx:
debb7e54
[ 178.867598] esi: 00000000 edi: 0000002e ebp: d5972000 esp:
d5973f48
[ 178.867602] ds: 007b es: 007b ss: 0068
[ 178.867606] Process rmmod (pid: 2503, ti=d5972000 task=c1502a90
task.ti=d5972000)
[ 178.867610] Stack: 00000000 00000000 00000003 deb86992 de86b340
00000000 c01327bb 6f6e6578
[ 178.867618] 6474725f d8bd006d c0148651 b7ef2000 dd1a1040
c0149037 ffffffff b7ef3000
[ 178.867626] dd1a1044 d6c83a58 b7ef3000 d6c83a64 d6c83a58
001a1040 de86b340 00000880
[ 178.867634] Call Trace:
[ 178.867637] [<deb86992>] xnpod_shutdown+0x122/0x1a0 [xeno_nucleus]
[ 178.867655] [<c01327bb>] sys_delete_module+0x12b/0x1a0
[ 178.867663] [<c0148651>] remove_vma+0x41/0x50
[ 178.867670] [<c0149037>] do_munmap+0x197/0x1f0
[ 178.867677] [<c0102cac>] sysenter_past_esp+0x65/0x69
[ 178.867683] =======================
[ 178.867685] Code: 90 8d b4 26 00 00 00 00 8b 1c ba 85 db 74 50 89 f6 8d
bc 27 00 00 00 00 8b 03 8b 73 04 85 c0 74 21 8b 50 40 85 d2 74 1a 8b 40 0c
<8b> 12 e8 96 57 5f e1 8b 03 8b 40 40 8b 50 08 4a 85 d2 89 50 08
[ 178.867712] EIP: [<deb8e663>] xnregistry_cleanup+0x43/0x140
[xeno_nucleus] SS:ESP 0068:d5973f48
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-core] [BUG?] registry usage + module removal causes kernel oops (xenomai native)
2007-01-11 17:44 [Xenomai-core] [BUG?] registry usage + module removal causes kernel oops (xenomai native) Thomas Wiedemann
@ 2007-01-11 18:05 ` Jan Kiszka
2007-01-11 23:34 ` Jan Kiszka
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kiszka @ 2007-01-11 18:05 UTC (permalink / raw)
To: Thomas Wiedemann; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 1793 bytes --]
Thomas Wiedemann wrote:
> Hi again,
>
> I observed another wrong(?) behaviour of xenomai, caused by a wrong
> behaviour in our code :) The resources (tested with mutexes) are not
> deleted after the process which created them exits without cleaning up
> (for example, it crashes).
>
> For anonymous objects, i don't see a reason why this would be a
> defined behaviour, since there is no way to reuse those objects.
> Therefore, I consider this to be a bug, as it will finally eat up all
> memory.
> Or are there any technical reasons for this?
-ENOTIMPLEMENTEDYET
We have this desired auto-cleanup for the POSIX skin already, but we are
lacking it for the others. The native objects should be straightforward,
stalled RTDM handles still require a bit thoughts (and the solution of
another issue). Time will come.
For now we here help ourselves by keeping track of those resources in
userspace at framework level and release them - when required - inside a
SEGFAULT signal handler. Of course, covers not all faults.
>
> Another bug appeared for objects registered at the registry. When
> using xeno-native and xeno-rtdm, the order of removal seems to be
> important. I appended a small code sample to register a mutex at
> the registry. After the program exits, the modules can not be unloaded
> in the order
> 1) xeno-native
> 2) xeno-rtdm,
> but the other way around works fine. Instead, rmmod ends up with a
> segmentation fault, dmesg output appended.
>
> I tested this on xenomai 2.3.0/linux 2.6.19 and 2.2.5-svn/linux2.6.17.14.
>
Looks like it's related to the left-over mutex in the registry. Probably
the other order just papers the issue. $Someone should give your code a
try, maybe I can check later with a debugger.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-core] [BUG?] registry usage + module removal causes kernel oops (xenomai native)
2007-01-11 18:05 ` Jan Kiszka
@ 2007-01-11 23:34 ` Jan Kiszka
2007-01-12 9:11 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kiszka @ 2007-01-11 23:34 UTC (permalink / raw)
To: Thomas Wiedemann; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]
Jan Kiszka wrote:
> Thomas Wiedemann wrote:
>> Another bug appeared for objects registered at the registry. When
>> using xeno-native and xeno-rtdm, the order of removal seems to be
>> important. I appended a small code sample to register a mutex at
>> the registry. After the program exits, the modules can not be unloaded
>> in the order
>> 1) xeno-native
>> 2) xeno-rtdm,
>> but the other way around works fine. Instead, rmmod ends up with a
>> segmentation fault, dmesg output appended.
>>
>> I tested this on xenomai 2.3.0/linux 2.6.19 and 2.2.5-svn/linux2.6.17.14.
>>
>
> Looks like it's related to the left-over mutex in the registry. Probably
> the other order just papers the issue. $Someone should give your code a
> try, maybe I can check later with a debugger.
Yet another reason to address auto-cleanup soon: I thought to remember
the native skin keeps track of resources in a global list and kills them
on rmmod, but that's only true for a few.
Instead stalled named resources are kept in the registry. On
unregistration of the last skin xnregistry_cleanup() is called. It tries
to kill the stalled entries from proc but their roots may have already
been removed (here: the native skin's proc root).
Philippe, this makes the sweep loop in xnregistry_cleanup rather fragile
and of questionable use. I guess all skins have to handle this on their
own, and we should just bark loudly here if something remained.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-core] [BUG?] registry usage + module removal causes kernel oops (xenomai native)
2007-01-11 23:34 ` Jan Kiszka
@ 2007-01-12 9:11 ` Philippe Gerum
0 siblings, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2007-01-12 9:11 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Thomas Wiedemann, xenomai
On Fri, 2007-01-12 at 00:34 +0100, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > Thomas Wiedemann wrote:
> >> Another bug appeared for objects registered at the registry. When
> >> using xeno-native and xeno-rtdm, the order of removal seems to be
> >> important. I appended a small code sample to register a mutex at
> >> the registry. After the program exits, the modules can not be unloaded
> >> in the order
> >> 1) xeno-native
> >> 2) xeno-rtdm,
> >> but the other way around works fine. Instead, rmmod ends up with a
> >> segmentation fault, dmesg output appended.
> >>
> >> I tested this on xenomai 2.3.0/linux 2.6.19 and 2.2.5-svn/linux2.6.17.14.
> >>
> >
> > Looks like it's related to the left-over mutex in the registry. Probably
> > the other order just papers the issue. $Someone should give your code a
> > try, maybe I can check later with a debugger.
>
> Yet another reason to address auto-cleanup soon: I thought to remember
> the native skin keeps track of resources in a global list and kills them
> on rmmod, but that's only true for a few.
>
> Instead stalled named resources are kept in the registry. On
> unregistration of the last skin xnregistry_cleanup() is called. It tries
> to kill the stalled entries from proc but their roots may have already
> been removed (here: the native skin's proc root).
>
> Philippe, this makes the sweep loop in xnregistry_cleanup rather fragile
> and of questionable use.
It has never been meant as something skins should rely on or "use".
Interfaces that do not cleanup their place before leaving are just ugly
examples of unleashed procrastination, and it happens that I wrote one
of them. The cleanup loop in the registry unload code is solely a best
effort that may, or may not work depending on the setup (i.e. modular vs
static), this is _not_ an exported or exportable "feature".
> I guess all skins have to handle this on their
> own, and we should just bark loudly here if something remained.
Yes. The registry has to maintain the basic information and likely
provide a simple mechanism over that which would allow the skins to
cleanup properly, but the skins would have to trigger this cleanup on
their own. I'm working on this.
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-01-12 9:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-11 17:44 [Xenomai-core] [BUG?] registry usage + module removal causes kernel oops (xenomai native) Thomas Wiedemann
2007-01-11 18:05 ` Jan Kiszka
2007-01-11 23:34 ` Jan Kiszka
2007-01-12 9:11 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.