All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@domain.hid>
To: Jan Kiszka <jan.kiszka@domain.hid>
Cc: Xenomai <xenomai@xenomai.org>
Subject: Re: [Xenomai-help] hard lock-up
Date: Wed, 29 Aug 2007 08:11:08 +0200	[thread overview]
Message-ID: <46D50DFC.5080304@domain.hid> (raw)
In-Reply-To: <46D301EE.5020805@domain.hid>


[-- Attachment #1.1: Type: text/plain, Size: 5109 bytes --]

Jan Kiszka wrote:
> andy motten wrote:
>>> I will run continuously several tests on the pc (including "latency -f")
>>> for the rest of this week. Since I am not in the office during this period
>>> (so not in the neighborhood of this problematic pc).
>>> And I hope (I hope not in vain) that the latency tracer will give us a
>>> hint for the reason of the hard lock ups (if a hard lock up happens during
>>> this period).
>>>
>>> andy
>>>
>>
>> Hello,
>>
>> Since  we are having a hard time finding the hard lock-ups. We have taken a
>> closer look at the failed tests of orocos (maybe the source of the problem
>> is the same). These failures occur during the make check execution.
>>
>>     The following tests FAILED:
>>         2 - task-test (OTHER-FAULT)
>>         3 - event-test (OTHER-FAULT)
>>         4 - taskcontext-test (OTHER-FAULT)
>>
>> When we perform a single test, e.g task-test, we get the the following
>> messages: Killed
>> The OROCOS messages are then:
>>
>> 0.000 [ Info   ][Logger] Successfully extracted environment variable
>> ORO_LOGLEVEL
>> 0.001 [ Info   ][Logger]  OROCOS version '1.2.1' compiled with GCC
>> 4.1.2.Orocos Logging Activated at level : [ Debug  ] ( 6 )
>> 0.001 [ Info   ][Logger] Reference System Time is : 880886725351 ticks (
>> 315.369 seconds ).
>> 0.002 [ Info   ][Logger] Logging is relative to this time.
>> 0.002 [ Info   ][Logger] Xenomai Periodic Timer runs in preemptive
>> 'one-shot' mode.
>> 0.003 [ Debug  ][Logger] Xenomai Timer and Main Task Created
>> 0.003 [ Debug  ][Logger] MainThread started.
>> 0.003 [ Debug  ][Logger] Starting StartStopManager.
>> 0.004 [ Info   ][Toolkit] Loading Tool RealTime.
>> 0.005 [ Debug  ][Toolkit] Registered Type 'int' to the Orocos Type System.
>> 0.005 [ Debug  ][Toolkit] Registered Type 'uint' to the Orocos Type System.
>> 0.006 [ Debug  ][Toolkit] Registered Type 'double' to the Orocos Type
>> System.
>> 0.006 [ Debug  ][Toolkit] Registered Type 'bool' to the Orocos Type System.
>> 0.006 [ Debug  ][Toolkit] Registered Type 'PropertyBag' to the Orocos Type
>> System.
>> 0.007 [ Debug  ][Toolkit] Registered Type 'float' to the Orocos Type System.
>> 0.007 [ Debug  ][Toolkit] Registered Type 'char' to the Orocos Type System.
>> 0.008 [ Debug  ][Toolkit] Registered Type 'array' to the Orocos Type System.
>> 0.008 [ Debug  ][Toolkit] Registered Type 'string' to the Orocos Type
>> System.
>> 0.010 [ Debug  ][./task-test::main()] ORO_main starting...
>> 0.010 [ Info   ][./task-test::main()] LogLevel unaltered by test-runner.
>> 0.011 [ Info   ][./task-test::main()] Creating PeriodicThread for scheduler:
>> 0
>> 0.012 [ Info   ][TimerThreadInstance] PeriodicThread created with scheduler
>> type '0', priority 15 and period 0.01.
>> 0.013 [ Debug  ][Logger] Periodic Thread TimerThreadInstance started.
>> 0.014 [ Info   ][PThread] PeriodicThread created with scheduler type '0',
>> priority 99 and period 0.1.
>> 0.014 [ Debug  ][Logger] Periodic Thread PThread started.
>> 0.115 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
>> 0.115 [ Debug  ][Logger] Periodic Thread PThread started.
>> 1.216 [ Debug  ][Logger] Periodic Thread PThread stopping... done.
>> 1.216 [ Debug  ][~PeriodicThread] Terminating PThread
>>
>> On the serial console we get the following listing (complete listing in
>> appendix):
>>
>> Xenomai: starting native API services.
>> I-pipe: Detected illicit call from domain 'Xenomai'
>>         into a service reserved for domain 'Linux' and below.
>>        f635be74 00000000 00000000 52544149 f635be98 c0104789 c02cfa4f
>> c02f5b80
>>        f6c4e2f0 f635beb0 c0137d69 c02c256c c02c1186 c02c01b8 f8c0b280
>> f635bebc
>>        c0132981 f60a1730 f635bed8 f8bd8570 c010ef8c 00000000 f60a0120
>> f8beefe0
>> Call Trace:
>>  [<c0103ffb>] show_trace_log_lvl+0x1f/0x35
>>  [<c01040bb>] show_stack_log_lvl+0xaa/0xcf
>>  [<c0104789>] show_stack+0x2f/0x36
>>  [<c0137d69>] ipipe_check_context+0x7a/0x81
>>  [<c0132981>] module_put+0x19/0x7d
>>  [<f8bd8570>] xnshadow_unmap+0xbc/0xff [xeno_nucleus]
>>  [<f8bfdc75>] __shadow_delete_hook+0x25/0x27 [xeno_native]
>>  [<f8bd1454>] xnpod_delete_thread+0x1b9/0x2aa [xeno_nucleus]
>>  [<f8bfc36b>] rt_task_delete+0x140/0x145 [xeno_native]
>>  [<f8bfe02a>] __rt_task_delete+0x58/0x69 [xeno_native]
>>  [<f8bd8165>] hisyscall_event+0x185/0x291 [xeno_nucleus]
>>  [<c0138940>] __ipipe_dispatch_event+0xc0/0x1da
>>  [<c010ed6b>] __ipipe_syscall_root+0x43/0x10a
>>  [<c0102e79>] system_call+0x29/0x41
>>  =======================
> 
> That specific Xenomai bug should be fixed in 2.4, please check your
> testcase against -rc1 e.g. Unfortunately we have no backport of the fix
> in 2.3 yet. Can't tell right now if this is tricky, but this test
> demonstrates that $SOMETHING should be done...

OK, in order to start fixing things: Here comes a back-port of the 2.4
patch to 2.3.x-SVN, moving module_put out of RT context. Be warned, it's
an early-morning hack, not even compile-tested. Feedback welcome!

Thanks,
Jan

[-- Attachment #1.2: postpone-module_put.patch --]
[-- Type: text/plain, Size: 4400 bytes --]

Index: xenomai-2.3.x/ChangeLog
===================================================================
--- xenomai-2.3.x/ChangeLog	(Revision 2954)
+++ xenomai-2.3.x/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2007-08-29  Jan Kiszka  <jan.kiszka@domain.hid>
+
+	* ksrc/nucleus/shadow.c: Postpone module_put() to the lo-stage
+	APC handler (back-ported from 2.4).
+
 2007-08-24  Wolfgang Grandegger  <wg@domain.hid>
 
 	* ksrc/drivers/can/rtcan_socket.c: protect the list of RTCAN
Index: xenomai-2.3.x/ksrc/nucleus/shadow.c
===================================================================
--- xenomai-2.3.x/ksrc/nucleus/shadow.c	(Revision 2954)
+++ xenomai-2.3.x/ksrc/nucleus/shadow.c	(Arbeitskopie)
@@ -99,6 +99,7 @@ static struct __lostagerq {
 #define LO_RENICE_REQ 2
 #define LO_SIGGRP_REQ 3
 #define LO_SIGTHR_REQ 4
+#define LO_UNMAP_REQ  5
 		int type;
 		struct task_struct *task;
 		int arg;
@@ -753,6 +754,28 @@ void xnshadow_reset_shield(void)
 
 #endif /* CONFIG_XENO_OPT_ISHIELD */
 
+static void xnshadow_dereference_skin(unsigned magic)
+{
+	unsigned muxid;
+
+	for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
+		if (muxtable[muxid].magic == magic) {
+			if (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
+				xnarch_atomic_dec(&muxtable[0].refcnt);
+			if (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
+
+				/* We were the last thread, decrement the counter,
+				   since it was incremented by the xn_sys_bind
+				   operation. */
+				xnarch_atomic_dec(&muxtable[muxid].refcnt);
+			if (muxtable[muxid].module)
+				module_put(muxtable[muxid].module);
+
+			break;
+		}
+	}
+}
+
 static void lostage_handler(void *cookie)
 {
 	int cpuid = smp_processor_id(), reqnum, sig;
@@ -777,6 +800,12 @@ static void lostage_handler(void *cookie
 
 			goto do_wakeup;
 
+		case LO_UNMAP_REQ:
+
+			xnshadow_dereference_skin(
+				(unsigned)rq->req[reqnum].arg);
+
+		/* fall through */
 		case LO_WAKEUP_REQ:
 
 			/* We need to downgrade the root thread
@@ -1256,7 +1285,6 @@ int xnshadow_map(xnthread_t *thread, xnc
 void xnshadow_unmap(xnthread_t *thread)
 {
 	struct task_struct *p;
-	unsigned muxid, magic;
 
 	if (XENO_DEBUG(NUCLEUS) &&
 	    !testbits(xnpod_current_sched()->status, XNKCOUT))
@@ -1264,25 +1292,6 @@ void xnshadow_unmap(xnthread_t *thread)
 
 	p = xnthread_archtcb(thread)->user_task;
 
-	magic = xnthread_get_magic(thread);
-
-	for (muxid = 0; muxid < XENOMAI_MUX_NR; muxid++) {
-		if (muxtable[muxid].magic == magic) {
-			if (xnarch_atomic_dec_and_test(&muxtable[0].refcnt))
-				xnarch_atomic_dec(&muxtable[0].refcnt);
-			if (xnarch_atomic_dec_and_test(&muxtable[muxid].refcnt))
-
-				/* We were the last thread, decrement the counter,
-				   since it was incremented by the xn_sys_bind
-				   operation. */
-				xnarch_atomic_dec(&muxtable[muxid].refcnt);
-			if (muxtable[muxid].module)
-				module_put(muxtable[muxid].module);
-
-			break;
-		}
-	}
-
 	xnthread_clear_state(thread, XNMAPPED);
 	rpi_pop(thread);
 
@@ -1298,13 +1307,7 @@ void xnshadow_unmap(xnthread_t *thread)
 
 	xnshadow_thrptd(p) = NULL;
 
-	if (p->state != TASK_RUNNING)
-		/* If the shadow is being unmapped in primary mode or blocked
-		   in secondary mode, the associated Linux task should also
-		   die. In the former case, the zombie Linux side returning to
-		   user-space will be trapped and exited inside the pod's
-		   rescheduling routines. */
-		schedule_linux_call(LO_WAKEUP_REQ, p, 0);
+	schedule_linux_call(LO_UNMAP_REQ, p, xnthread_get_magic(thread));
 }
 
 int xnshadow_wait_barrier(struct pt_regs *regs)
@@ -2010,6 +2013,7 @@ RTHAL_DECLARE_EVENT(losyscall_event);
 static inline void do_taskexit_event(struct task_struct *p)
 {
 	xnthread_t *thread = xnshadow_thread(p); /* p == current */
+	unsigned magic;
 	spl_t s;
 
 	if (!thread)
@@ -2018,6 +2022,8 @@ static inline void do_taskexit_event(str
 	if (xnpod_shadow_p())
 		xnshadow_relax(0);
 
+	magic = xnthread_get_magic(thread);
+
 	xnlock_get_irqsave(&nklock, s);
 	/* Prevent wakeup call from xnshadow_unmap(). */
 	xnshadow_thrptd(p) = NULL;
@@ -2028,6 +2034,7 @@ static inline void do_taskexit_event(str
 	xnlock_put_irqrestore(&nklock, s);
 	xnpod_schedule();
 
+	xnshadow_dereference_skin(magic);
 	xnltt_log_event(xeno_ev_shadowexit, thread->name);
 }
 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

  parent reply	other threads:[~2007-08-29  6:11 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-09  9:11 [Xenomai-help] hard lock-up andy motten
2007-08-09  9:42 ` Gilles Chanteperdrix
2007-08-09 11:24   ` Jan Kiszka
2007-08-09 16:09     ` andy motten
2007-08-09 16:22       ` Philippe Gerum
2007-08-10  7:32         ` Jan Kiszka
2007-08-10  7:54           ` Klaas Gadeyne
2007-08-10 15:05             ` andy motten
2007-08-10 15:12               ` Jan Kiszka
2007-08-13  7:06                 ` Klaas Gadeyne
2007-08-13  7:19                   ` Gilles Chanteperdrix
2007-08-13 15:10                     ` andy motten
2007-08-13 17:01                       ` Jan Kiszka
2007-08-14 15:26                         ` andy motten
2007-08-27 13:27                           ` andy motten
2007-08-27 16:55                             ` Jan Kiszka
2007-08-28 10:06                               ` andy motten
2007-08-28 11:32                                 ` Jan Kiszka
2007-08-29 11:36                                   ` andy motten
2007-08-29  6:11                               ` Jan Kiszka [this message]
2007-08-29 13:40                                 ` andy motten
2007-08-29 14:12                                   ` Jan Kiszka
2007-08-29 14:23                                     ` Philippe Gerum
2007-08-29 14:23                                     ` andy motten
2007-08-09 16:26       ` Gilles Chanteperdrix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46D50DFC.5080304@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.