From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <481F2FF8.6040903@domain.hid> Date: Mon, 05 May 2008 18:04:08 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <481F2B6A.802@domain.hid> In-Reply-To: <481F2B6A.802@domain.hid> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] Houston, we have a circular problem List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Xenomai-core@domain.hid Jan Kiszka wrote: > Hi, > > after hacking away the barriers I-pipe erected in front of lockdep > (patches will follow on adeos-main), I was finally able to "visualize" a > bit more what our colleagues see in reality on SMP: some ugly, not yet > understood circular dependency when running some Xenomai app under gdb. > What lockdep tries to tell us remains unclear, unfortunately: > > [ 874.356703] > [ 874.356957] ======================================================= Got it! [ 0.000000] [ 0.000000] ======================================================= [ 0.000000] [ INFO: possible circular locking dependency detected ] [ 0.000000] 2.6.24.6-xeno_64 #313 [ 0.000000] ------------------------------------------------------- [ 0.000000] gdb/4385 is trying to acquire lock: [ 0.000000] ((spinlock_t *)&sighand->siglock){....}, at: [] schedule_event+0x7f/0x578 [ 0.000000] [ 0.000000] but task is already holding lock: [ 0.000000] (&rq->rq_lock_key){++..}, at: [] schedule+0x176/0x7ff [ 0.000000] [ 0.000000] which lock already depends on the new lock. [ 0.000000] [ 0.000000] [ 0.000000] the existing dependency chain (in reverse order) is: [ 0.000000] [ 0.000000] -> #2 (&rq->rq_lock_key){++..}: [ 0.000000] [] __lock_acquire+0xb91/0xd80 [ 0.000000] [] lock_acquire+0x9d/0xbc [ 0.000000] [] task_rq_lock+0x7f/0xb8 [ 0.000000] [] _spin_lock+0x2a/0x36 [ 0.000000] [] task_rq_lock+0x7f/0xb8 [ 0.000000] [] try_to_wake_up+0x29/0x306 [ 0.000000] [] default_wake_function+0x12/0x14 [ 0.000000] [] __wake_up_common+0x4b/0x7a [ 0.000000] [] complete+0x3d/0x51 [ 0.000000] [] migration_thread+0x0/0x22b [ 0.000000] [] kthread+0x2c/0x7c [ 0.000000] [] child_rip+0xa/0x12 [ 0.000000] [] restore_args+0x0/0x30 [ 0.000000] [] kthread+0x0/0x7c [ 0.000000] [] child_rip+0x0/0x12 [ 0.000000] [] 0xffffffffffffffff [ 0.000000] [ 0.000000] -> #1 ((spinlock_t *)&q->lock){++..}: [ 0.000000] [] __lock_acquire+0xb91/0xd80 [ 0.000000] [] lock_acquire+0x9d/0xbc [ 0.000000] [] __wake_up_sync+0x23/0x53 [ 0.000000] [] _spin_lock_irqsave+0x69/0x79 [ 0.000000] [] __wake_up_sync+0x23/0x53 [ 0.000000] [] do_notify_parent+0x1ea/0x207 [ 0.000000] [] kmem_cache_free+0xc6/0xcf [ 0.000000] [] _raw_write_lock+0xe/0x90 [ 0.000000] [] do_exit+0x5fd/0x7e0 [ 0.000000] [] do_exit+0x707/0x7e0 [ 0.000000] [] __call_usermodehelper+0x0/0x61 [ 0.000000] [] request_module+0x0/0x166 [ 0.000000] [] child_rip+0xa/0x12 [ 0.000000] [] restore_args+0x0/0x30 [ 0.000000] [] ____call_usermodehelper+0x0/0x183 [ 0.000000] [] child_rip+0x0/0x12 [ 0.000000] [] 0xffffffffffffffff [ 0.000000] [ 0.000000] -> #0 ((spinlock_t *)&sighand->siglock){....}: [ 0.000000] [] print_circular_bug_entry+0x4d/0x54 [ 0.000000] [] __lock_acquire+0xa93/0xd80 [ 0.000000] [] lock_acquire+0x9d/0xbc [ 0.000000] [] schedule_event+0x7f/0x578 [ 0.000000] [] _spin_lock+0x2a/0x36 [ 0.000000] [] schedule_event+0x7f/0x578 [ 0.000000] [] __ipipe_dispatch_event+0xe4/0x1db [ 0.000000] [] schedule+0x5a8/0x7ff [ 0.000000] [] do_wait+0xb5e/0xc4c [ 0.000000] [] _raw_read_unlock+0xe/0x2d [ 0.000000] [] do_wait+0xb8d/0xc4c [ 0.000000] [] default_wake_function+0x0/0x14 [ 0.000000] [] mcount+0x4c/0x72 [ 0.000000] [] sys_wait4+0x2d/0x2f [ 0.000000] [] system_call+0x92/0x97 [ 0.000000] [] 0xffffffffffffffff [ 0.000000] [ 0.000000] other info that might help us debug this: [ 0.000000] [ 0.000000] 1 lock held by gdb/4385: [ 0.000000] #0: (&rq->rq_lock_key){++..}, at: [] schedule+0x176/0x7ff [ 0.000000] [ 0.000000] stack backtrace: [ 0.000000] Pid: 4385, comm: gdb Not tainted 2.6.24.6-xeno_64 #313 [ 0.000000] [ 0.000000] Call Trace: [ 0.000000] [] print_circular_bug_tail+0x75/0x80 [ 0.000000] [] print_circular_bug_entry+0x4d/0x54 [ 0.000000] [] __lock_acquire+0xa93/0xd80 [ 0.000000] [] lock_acquire+0x9d/0xbc [ 0.000000] [] schedule_event+0x7f/0x578 [ 0.000000] [] _spin_lock+0x2a/0x36 [ 0.000000] [] schedule_event+0x7f/0x578 [ 0.000000] [] __ipipe_dispatch_event+0xe4/0x1db [ 0.000000] [] schedule+0x5a8/0x7ff [ 0.000000] [] do_wait+0xb5e/0xc4c [ 0.000000] [] _raw_read_unlock+0xe/0x2d [ 0.000000] [] do_wait+0xb8d/0xc4c [ 0.000000] [] default_wake_function+0x0/0x14 [ 0.000000] [] mcount+0x4c/0x72 [ 0.000000] [] sys_wait4+0x2d/0x2f [ 0.000000] [] system_call+0x92/0x97 [ 0.000000] My quick translation is that we must not send signals from the schedule_event callback, at least as that hook is currently placed. Any ideas? Or better interpretations? Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux