From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <518C97FB.5010507@mitrol.it> Date: Fri, 10 May 2013 08:47:23 +0200 From: Paolo Minazzi MIME-Version: 1.0 References: <51826EEA.1090202@mitrol.it> <51830D98.7090309@xenomai.org> <5183CDD6.80400@mitrol.it> <518A06C9.50204@mitrol.it> <518A4C09.8030906@xenomai.org> <518A505C.2090207@mitrol.it> <518A52A7.5000801@xenomai.org> <518A5600.20508@mitrol.it> <518A6195.7030206@mitrol.it> <518A77F8.3070404@xenomai.org> <518A78DF.7020300@xenomai.org> <518BA678.9030406@xenomai.org> <518BAA18.2090209@mitrol.it> <518BAB98.4030108@xenomai.org> <518BACDB.8040203@xenomai.org> <518BADE8.90102@xenomai.org> In-Reply-To: <518BADE8.90102@xenomai.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum , Gilles Chanteperdrix Cc: xenomai@xenomai.org Il 09/05/2013 16.08, Philippe Gerum ha scritto: > On 05/09/2013 04:04 PM, Philippe Gerum wrote: >> On 05/09/2013 03:58 PM, Gilles Chanteperdrix wrote: >>> On 05/09/2013 03:52 PM, Paolo Minazzi wrote: >>> >>>> Il 09/05/2013 15.36, Philippe Gerum ha scritto: >>>>> On 05/08/2013 06:10 PM, Philippe Gerum wrote: >>>>>> On 05/08/2013 06:06 PM, Philippe Gerum wrote: >>>>>>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote: >>>>>>>> I think to be very near to the solution of this problem. >>>>>>>> Thanks to Gilles for his patience. >>>>>>>> >>>>>>>> Now I will retry to make a summary of the problem. >>>>>>>> >>>>>>> >>>>>>> >>>>>>>> The thread 1 finds thread 70 in debug mode ! >>>>>>>> >>>>>>> Which is expected. thread 70 has to be scheduled in with no pending >>>>>>> ptrace signals for leaving this mode, and this may happen long after >>>>>>> the truckload of other threads releases the CPU. >>>>>>> >>>>>>>> My patch adjust this problem. >>>>>>>> >>>>>>>> I realize that it is a very special case, but it is my case. >>>>>>>> >>>>>>>> I'd like to know if the patch is valid or can be written in a >>>>>>>> different >>>>>>>> way. >>>>>>>> For example, I could insert my patch directly in >>>>>>>> xnpod_delete_thread(). >>>>>>>> >>>>>>>> The function unlock_timers() cannot be called from >>>>>>>> xenomai-2.5.6/ksrc/skins/native/task.c >>>>>>>> because it is defined static. This is a detail. There are simple >>>>>>>> ways to >>>>>>>> solve this. >>>>>>>> >>>>>>> No, really the patch is wrong, but what you expose does reveal a bug >>>>>>> in the Xenomai core for sure. As Gilles told you, you would be only >>>>>>> papering over that real bug, which would likely show up in a >>>>>>> different >>>>>>> situation. >>>>>>> >>>>>>> First we need to check for a lock imbalance, I don't think that code >>>>>>> is particularly safe. >>>>>> I mean a lock imbalance introduced by an unexpected race between the >>>>>> locking/unlocking calls. The assertions introduced by this patch might >>>>>> help detecting this, with some luck. >>>>>> >>>>> Could you apply that patch below, and report whether some task triggers >>>>> the message it introduces, when things go wrong with gdb? TIA, >>>>> >>>>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c >>>>> index 868f98f..2da3265 100644 >>>>> --- a/ksrc/nucleus/pod.c >>>>> +++ b/ksrc/nucleus/pod.c >>>>> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread) >>>>> #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */ >>>>> } else { >>>>> #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */ >>>>> + if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) == >>>>> XNSHADOW) >>>>> + printk(KERN_WARNING "%s: deleting unmapped shadow %s\n", >>>>> + __func__, thread->name); >>>>> + >>>>> xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE"); >>>>> >>>>> xnsched_forget(thread); >>>>> >>>> I have tried but no messages on the console. >>>> This time I'm sure. To be sure I have added other messages and I see >>>> them. >>>> Paolo >>> >>> On my side, I have run your example with CONFIG_XENO_OPT_DEBUG_NUCLEUS >>> turned on, and I get the following message in the same conditions as >>> you: >>> >>> Xenomai: xnshadow_unmap invoked for a non-current task (t=demo0/p=demo0) >>> Master time base: clock=8300431919 >>> (...) >>> [] (unwind_backtrace+0x0/0xf4) from [] >>> (show_stack+0x20/0x24) >>> [] (show_stack+0x20/0x24) from [] >>> (xnshadow_unmap+0x1e0/0x270) >>> [] (xnshadow_unmap+0x1e0/0x270) from [] >>> (__shadow_delete_hook+0x4c/0x54) >>> [] (__shadow_delete_hook+0x4c/0x54) from [] >>> (xnpod_fire_callouts+0x44/0x80) >>> [] (xnpod_fire_callouts+0x44/0x80) from [] >>> (xnpod_delete_thread+0x91c/0x15b4) >>> [] (xnpod_delete_thread+0x91c/0x15b4) from [] >>> (rt_task_delete+0x100/0x344) >>> [] (rt_task_delete+0x100/0x344) from [] >>> (__rt_task_delete+0x8c/0x90) >>> [] (__rt_task_delete+0x8c/0x90) from [] >>> (losyscall_event+0xd8/0x268) >>> [] (losyscall_event+0xd8/0x268) from [] >>> (__ipipe_dispatch_event+0x100/0x264) >>> [] (__ipipe_dispatch_event+0x100/0x264) from [] >>> (__ipipe_syscall_root+0x64/0x13c) >>> [] (__ipipe_syscall_root+0x64/0x13c) from [] >>> (vector_swi+0x6c/0xac) >>> >>> So, maybe the XNSHADOW bit is not set yet? Because the deletion hooks >>> are definitely run when the bug happens. >>> >> XNSHADOW is set early when the tcb is created over the userland >> trampoline routine. The problem is that such thread may be still bearing >> XNDORMANT, so it would not be caught in the userland kill redirect, >> earlier in xnpod_delete(). This would explain in turn why the latest >> assertion did not trigger. Ok, that would be a bug. Need to think about >> this now. >> > Ok, does this one trigger? > > diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c > index 868f98f..9f14bf1 100644 > --- a/ksrc/nucleus/pod.c > +++ b/ksrc/nucleus/pod.c > @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread) > #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */ > } else { > #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */ > + if (xnthread_test_state(thread, XNSHADOW|XNDORMANT) == (XNSHADOW|XNDORMANT)) > + printk(KERN_WARNING "%s: deleting dormant shadow %s\n", > + __func__, thread->name); > + > xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE"); > > xnsched_forget(thread); > Ok, it seems trigger. This is my output on "c" command og gdb. /D # gdbserver :8888 prova Process prova created; pid = 295 Listening on port 8888 Remote debugging from host 198.18.0.1 xnpod_delete_thread: deleting dormant shadow demo0 xnpod_delete_thread: deleting dormant shadow demo1 xnpod_delete_thread: deleting dormant shadow demo2 xnpod_delete_thread: deleting dormant shadow demo3 xnpod_delete_thread: deleting dormant shadow demo4 xnpod_delete_thread: deleting dormant shadow demo5 xnpod_delete_thread: deleting dormant shadow demo6 xnpod_delete_thread: deleting dormant shadow demo7 xnpod_delete_thread: deleting dormant shadow demo8 xnpod_delete_thread: deleting dormant shadow demo9 xnpod_delete_thread: deleting dormant shadow demo10 xnpod_delete_thread: deleting dormant shadow demo11 xnpod_delete_thread: deleting dormant shadow demo12 .... Paolo