From: Jan Kiszka <jan.kiszka@siemens.com>
To: Philippe Gerum <rpm@xenomai.org>, Xenomai <xenomai@xenomai.org>
Subject: Re: [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb
Date: Mon, 16 Mar 2015 17:42:15 +0100 [thread overview]
Message-ID: <550707E7.1000107@siemens.com> (raw)
In-Reply-To: <55070020.70002@xenomai.org>
On 2015-03-16 17:09, Philippe Gerum wrote:
> On 03/16/2015 05:00 PM, Jan Kiszka wrote:
>> On 2015-03-16 16:31, Jan Kiszka wrote:
>>> On 2015-03-16 15:43, Philippe Gerum wrote:
>>>> On 03/11/2015 03:47 PM, Jan Kiszka wrote:
>>>>> Hi Philippe,
>>>>>
>>>>> just happened to trigger the oops below by running
>>>>>
>>>>> gdb --args smokey --run=8
>>>>>
>>>>> That run already has troubles and generates different output than
>>>>> running the test without gdb surveillance, probably due to unexpected
>>>>> mode switches.
>>>>
>>>> Clearly, yes. GDB causes the test program to leave primary mode, which
>>>> changes the scheduling order, and therefore the output which depends on it.
>>>>
>>>> But the real problem is that running the test again
>>>>> afterwards, with or without gdb, causes the oops. Registers contain
>>>>> suspicious "dead" patterns, thus we access invalid list elements. Do we
>>>>> miss a cleanup when terminating smokey in the gdb session?
>>>>>
>>>>
>>>> I could not reproduce this bug yet.
>>>>
>>>> There is no reason for ptracing the application to have any impact on
>>>> the housekeeping chores when it exits. The backtrace shows that
>>>> xnsched_tp_set_schedule() is walking through tp->threads, which seems to
>>>> link to a stale tcb. xnsched_tp_forget() would then be called twice,
>>>> leading to the fault.
>>>>
>>>> Normally, a thread that undergoes TP scheduling should be automatically
>>>> removed from tp->threads upon exit after this sequence took place:
>>>>
>>>> handle_taskexit_event -> __xnthread_cleanup -> cleanup_tcb ->
>>>> xnsched_forget -> xnsched_tp_forget
>>>>
>>>> For that bug to happen, either this assumption has to be wrong, or
>>>> xnsched_set_policy() is being silly at some point.
>>>>
>>>> Is this 100% reproducible on your end, and does this require the initial
>>>> gdb run to show up, or would that break even when running the sched_tp
>>>> twice without gdb?
>>>
>>> It is always reproducible, also with current next branch. And you need
>>> to run gdb beforehand, yes.
>>>
>>> I'll see if I can look into details.
>>
>> During cleanup of the first run under gdb, I get this one as expected
>> (and two more hits for thread and C):
>>
>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> 175 {
>> (gdb) p thread->name
>> $3 = "threadA", '\000' <repeats 24 times>
>> (gdb) bt
>> #0 xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> #1 0xffffffff8114b19f in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>> #2 cleanup_tcb (thread=<optimized out>) at ../kernel/xenomai/thread.c:467
>> #3 __xnthread_cleanup (curr=0xffff88003ad07040) at ../kernel/xenomai/thread.c:486
>> #4 0xffffffff811794fd in handle_taskexit_event (p=<optimized out>) at ../kernel/xenomai/posix/process.c:1028
>> #5 0xffffffff8117b49d in ipipe_kevent_hook (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/xenomai/posix/process.c:1228
>> #6 0xffffffff810fc6d1 in __ipipe_notify_kevent (kevent=<optimized out>, data=0xffff88003cfcb870) at ../kernel/ipipe/core.c:1092
>> #7 0xffffffff81050702 in do_exit (code=0) at ../kernel/exit.c:717
>> #8 0xffffffff810518a7 in SYSC_exit (error_code=<optimized out>) at ../kernel/exit.c:855
>> #9 SyS_exit (error_code=<optimized out>) at ../kernel/exit.c:853
>> #10 <signal handler called>
>> #11 0x00007ffff7354146 in ?? ()
>> #12 0xffff88003cfcde10 in ?? ()
>> #13 0xffffffff81a09260 in ?? ()
>> #14 0x0000000000000000 in ?? ()
>> (gdb) c
>> Continuing.
>>
>>
>> But then, when I start the test again (with or without gdb), I also get
>> this right at the beginning:
>>
>>
>> Breakpoint 1, xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> 175 {
>> (gdb) bt
>> #0 xnsched_tp_forget (thread=0xffff88003ad07040) at ../kernel/xenomai/sched-tp.c:175
>> #1 0xffffffff8113ebae in xnsched_forget (thread=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:603
>> #2 xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffffffff81a2bbe0 <xnsched_class_rt>, p=0xffff88003b813e00) at ../kernel/xenomai/sched.c:403
>> #3 0xffffffff8115184f in xnsched_tp_set_schedule (sched=0xffff88003ad07040, gps=0xffff88003ad08080) at ../kernel/xenomai/sched-tp.c:260
>> #4 0xffffffff8117c5df in set_tp_config (len=<optimized out>, config=<optimized out>, cpu=<optimized out>) at ../kernel/xenomai/posix/sched.c:284
>
> Yes, this one is the weird one. Normally, we should not find any TCB
> lingering in tp->threads, once threads A, B and C have exited and
> unlinked from it via xnsched_forget().
>
> That call on behalf of xnsched_tp_set_schedule() is aimed at moving all
> threads currently undergoing a TP schedule to the RT class, since we are
> about to change the scheduling data (i.e. time windows and partitions).
> Why tp->threads is not empty when running xnsched_tp_set_schedule() at
> the next program invocation is what needs to be explained.
Under gdb supervision, some threads are added to the tp pool multiple
times:
Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
160 {
(gdb) bt
#0 xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
#1 0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
#2 xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
#3 0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
#4 0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
#5 0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
#6 __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
#7 0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
#8 0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#9 ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
#10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
#11 <signal handler called>
#12 0x00007ffff77b7d25 in ?? ()
#13 0x5a5a5a5a00000001 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) c
Continuing.
Breakpoint 1, xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
160 {
(gdb) bt
#0 xnsched_tp_declare (thread=0xffff88003ad07040, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched-tp.c:160
#1 0xffffffff8113eb45 in xnsched_declare (p=<optimized out>, thread=<optimized out>, sched_class=<optimized out>) at ../include/xenomai/cobalt/kernel/sched.h:540
#2 xnsched_set_policy (thread=0xffff88003ad07040, sched_class=0xffff88003a7b7e20, p=0xffff88003a7b7e20) at ../kernel/xenomai/sched.c:390
#3 0xffffffff8114b7cf in __xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=<optimized out>, sched_param=<optimized out>) at ../kernel/xenomai/thread.c:1802
#4 0xffffffff8114b8cc in xnthread_set_schedparam (thread=0xffff88003ad07040, sched_class=0xffffffff81a2e400 <xnsched_class_tp>, sched_param=0xffff88003a7b7e20) at ../kernel/xenomai/thread.c:1772
#5 0xffffffff8118c2c3 in pthread_setschedparam_ex (param_ex=<optimized out>, policy=<optimized out>, thread=<optimized out>) at ../kernel/xenomai/posix/thread.c:271
#6 __cobalt_thread_setschedparam_ex (pth=<optimized out>, policy=<optimized out>, param_ex=<optimized out>, u_winoff=<optimized out>, u_promoted=<optimized out>) at ../kernel/xenomai/posix/thread.c:481
#7 0xffffffff8118c614 in cobalt_thread_setschedparam_ex (pth=140737352652544, policy=11, u_param=<optimized out>, u_winoff=0x7ffff7e94bec, u_promoted=0x7ffff7e94be8) at ../kernel/xenomai/posix/thread.c:509
#8 0xffffffff8118475a in handle_root_syscall (ipd=<optimized out>, regs=<optimized out>) at ../kernel/xenomai/posix/syscall.c:1058
#9 ipipe_syscall_hook (ipd=<optimized out>, regs=0xffff88003a7b7f58) at ../kernel/xenomai/posix/syscall.c:1107
#10 0xffffffff810fde9f in __ipipe_notify_syscall (regs=<optimized out>) at ../kernel/ipipe/core.c:1006
#11 <signal handler called>
#12 0x00007ffff77b7d25 in ?? ()
#13 0x5a5a5a5a00000001 in ?? ()
#14 0x0000000000000000 in ?? ()
Could it be that some syscall is restarted after being interrupted by
ptrace?
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2015-03-16 16:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-11 14:47 [Xenomai] Xenomai 3: smokey test sched_tp causes oops when run in gdb Jan Kiszka
2015-03-11 15:12 ` Philippe Gerum
2015-03-16 14:43 ` Philippe Gerum
2015-03-16 15:31 ` Jan Kiszka
2015-03-16 16:00 ` Jan Kiszka
2015-03-16 16:02 ` Jan Kiszka
2015-03-16 16:09 ` Philippe Gerum
2015-03-16 16:42 ` Jan Kiszka [this message]
2015-03-16 17:16 ` Jan Kiszka
2015-03-16 19:24 ` Philippe Gerum
2015-03-16 19:35 ` Jan Kiszka
2015-03-16 19:41 ` Philippe Gerum
2015-03-16 19:44 ` Jan Kiszka
2015-03-16 20:00 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=550707E7.1000107@siemens.com \
--to=jan.kiszka@siemens.com \
--cc=rpm@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.