* do_syslog/__down_trylock lockup in current BK @ 2002-09-09 18:53 Ingo Molnar 2002-09-09 19:01 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 18:53 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds i can reproduce the following lockup in BK-current, SMP: >>EIP; c0106c57 <__down_trylock+a7/b4> <===== Trace; c0120316 <do_syslog+16/6a0> Trace; c01088c8 <show_registers+198/1c0> Trace; c011ef16 <do_fork+436/b20> Trace; c011f41a <do_fork+93a/b20> Trace; c0125755 <release_resource+15/50> Trace; c0126686 <proc_doutsstring+76/c0> Trace; c0127c54 <__constant_copy_from_user+24/98> Trace; c0107cf3 <handle_signal+113/1a0> Trace; c012d572 <sys_setreuid+22/170> Trace; c0107fce <do_signal+24e/3c0> it could be related to the signal changes - but at first sight it looks like some sort of printk related lockup. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 18:53 do_syslog/__down_trylock lockup in current BK Ingo Molnar @ 2002-09-09 19:01 ` Linus Torvalds 2002-09-09 19:08 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2002-09-09 19:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Mon, 9 Sep 2002, Ingo Molnar wrote: > > i can reproduce the following lockup in BK-current, SMP: Hmm. The only potential lockup source I could see is that semaphore waitqueue spinlock. > >>EIP; c0106c57 <__down_trylock+a7/b4> <===== Are you sure your system.map is correct? __down_trylock() should _not_ be that big - it's just 67 bytes for me (and apparently almost three times the size for you). Spinlock debugging or something? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 19:01 ` Linus Torvalds @ 2002-09-09 19:08 ` Ingo Molnar 2002-09-09 19:18 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 19:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Mon, 9 Sep 2002, Linus Torvalds wrote: > > >>EIP; c0106c57 <__down_trylock+a7/b4> <===== > > Are you sure your system.map is correct? __down_trylock() should _not_ be > that big - it's just 67 bytes for me (and apparently almost three times > the size for you). Spinlock debugging or something? i have spinlock debugging enabled - but indeed the trace could be incorrect, let me re-check it. doh, wrong script called ... the right backtrace is: >>EIP; c0106c57 <__write_lock_failed+7/20> <===== Trace; c0120316 <.text.lock.exit+119/133> Trace; c01088c8 <common_interrupt+18/20> Trace; c011ef16 <exit_notify+16/230> Trace; c011f41a <do_exit+2ea/370> Trace; c0125755 <schedule_timeout+b5/c0> Trace; c0126686 <sig_exit+36/40> Trace; c0127c54 <get_signal_to_deliver+2d4/360> Trace; c0107cf3 <do_signal+c3/100> Trace; c012d572 <sys_futex+182/1b0> Trace; c0107fce <work_notifysig+13/15> (we definitely need kksymoops in the 2.5 kernel - it's just *so* much easier to debug various crashes with kksymoops enabled - especially when debugging over a serial line.) and this lockup definitely looks related to the signal changes. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 19:08 ` Ingo Molnar @ 2002-09-09 19:18 ` Ingo Molnar 2002-09-09 20:02 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 19:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel it's locking up here: 416 struct task_struct *p, *reaper = father; 417 struct list_head *_p; 418 419 write_lock_irq(&tasklist_lock); 420 421 if (father->exit_signal != -1) 422 reaper = prev_thread(reaper); (unfortunately i dont know what happened on the other CPU.) Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 19:18 ` Ingo Molnar @ 2002-09-09 20:02 ` Ingo Molnar 2002-09-09 20:15 ` Daniel Jacobowitz 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 20:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel the lockup, on both CPUs: >>EIP; c01200b1 <zap_thread+21/16d> <===== Trace; c0120092 <zap_thread+2/16d> Trace; c011f00c <exit_notify+10c/230> Trace; c011f41a <do_exit+2ea/370> Trace; c0125755 <schedule_timeout+b5/c0> Trace; c0126686 <sig_exit+36/40> Trace; c0127c54 <get_signal_to_deliver+2d4/360> Trace; c0107cf3 <do_signal+c3/100> Trace; c012d572 <sys_futex+182/1b0> Trace; c0107fce <work_notifysig+13/15> >>EIP; c0106c5f <__write_lock_failed+f/20> <===== Trace; c0120316 <.text.lock.exit+119/133> Trace; c011ef16 <exit_notify+16/230> Trace; c011f41a <do_exit+2ea/370> Trace; c0125755 <schedule_timeout+b5/c0> Trace; c0126686 <sig_exit+36/40> Trace; c0127c54 <get_signal_to_deliver+2d4/360> Trace; c0107cf3 <do_signal+c3/100> Trace; c012d572 <sys_futex+182/1b0> Trace; c0107fce <work_notifysig+13/15> ie. either the 'zap_again' produces an infinite loop, or one of the lists became corrupted. I suspect the former. Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:02 ` Ingo Molnar @ 2002-09-09 20:15 ` Daniel Jacobowitz 2002-09-09 20:25 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Daniel Jacobowitz @ 2002-09-09 20:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel On Mon, Sep 09, 2002 at 10:02:11PM +0200, Ingo Molnar wrote: > > the lockup, on both CPUs: > > >>EIP; c01200b1 <zap_thread+21/16d> <===== > Trace; c0120092 <zap_thread+2/16d> > Trace; c011f00c <exit_notify+10c/230> > Trace; c011f41a <do_exit+2ea/370> > Trace; c0125755 <schedule_timeout+b5/c0> > Trace; c0126686 <sig_exit+36/40> > Trace; c0127c54 <get_signal_to_deliver+2d4/360> > Trace; c0107cf3 <do_signal+c3/100> > Trace; c012d572 <sys_futex+182/1b0> > Trace; c0107fce <work_notifysig+13/15> > > >>EIP; c0106c5f <__write_lock_failed+f/20> <===== > Trace; c0120316 <.text.lock.exit+119/133> > Trace; c011ef16 <exit_notify+16/230> > Trace; c011f41a <do_exit+2ea/370> > Trace; c0125755 <schedule_timeout+b5/c0> > Trace; c0126686 <sig_exit+36/40> > Trace; c0127c54 <get_signal_to_deliver+2d4/360> > Trace; c0107cf3 <do_signal+c3/100> > Trace; c012d572 <sys_futex+182/1b0> > Trace; c0107fce <work_notifysig+13/15> > > ie. either the 'zap_again' produces an infinite loop, or one of the lists > became corrupted. I suspect the former. How do you reproduce this? It's probably my fault, if it's stuck in zap_thread... but that's a pretty suspicious looking trace to me, if it goes from schedule_timeout to do_exit. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:15 ` Daniel Jacobowitz @ 2002-09-09 20:25 ` Ingo Molnar 2002-09-09 20:33 ` Ingo Molnar 2002-09-09 20:43 ` Ingo Molnar 2 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 20:25 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Linus Torvalds, linux-kernel On Mon, 9 Sep 2002, Daniel Jacobowitz wrote: > How do you reproduce this? It's probably my fault, if it's stuck in > zap_thread... but that's a pretty suspicious looking trace to me, if it > goes from schedule_timeout to do_exit. (i think i found it - patch in a few minutes.) Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:15 ` Daniel Jacobowitz 2002-09-09 20:25 ` Ingo Molnar @ 2002-09-09 20:33 ` Ingo Molnar 2002-09-09 20:40 ` Daniel Jacobowitz 2002-09-09 20:40 ` Linus Torvalds 2002-09-09 20:43 ` Ingo Molnar 2 siblings, 2 replies; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 20:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Jacobowitz, linux-kernel the attached patch fixes one bug in the way we did zap_thread() - but this alone does not fix the lockup. the bug was that list_for_each_safe() is not 'safe enough' - zap_thread() drops the tasklist lock at which point anything might happen to the child list. the lockup is likely in the while loop - ie. zap_thread() not actually reparenting a thread and thus causing an infinite loop - is that possible? Ingo --- linux/kernel/exit.c.orig Mon Sep 9 21:59:24 2002 +++ linux/kernel/exit.c Mon Sep 9 22:24:41 2002 @@ -493,7 +493,6 @@ static void exit_notify(void) { struct task_struct *t; - struct list_head *_p, *_n; forget_original_parent(current); /* @@ -554,17 +553,16 @@ do_notify_parent(current, current->exit_signal); zap_again: - list_for_each_safe(_p, _n, ¤t->children) - zap_thread(list_entry(_p,struct task_struct,sibling), current, 0); - list_for_each_safe(_p, _n, ¤t->ptrace_children) - zap_thread(list_entry(_p,struct task_struct,ptrace_list), current, 1); + while (!list_empty(¤t->children)) + zap_thread(list_entry(current->children.next,struct task_struct,sibling), current, 0); + while (!list_empty(¤t->ptrace_children)) + zap_thread(list_entry(current->ptrace_children.next,struct task_struct,sibling), current, 0); /* * zap_thread might drop the tasklist lock, thus we could * have new children queued back from the ptrace list into the * child list: */ - if (unlikely(!list_empty(¤t->children) || - !list_empty(¤t->ptrace_children))) + if (unlikely(!list_empty(¤t->children))) goto zap_again; /* * No need to unlock IRQs, we'll schedule() immediately ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:33 ` Ingo Molnar @ 2002-09-09 20:40 ` Daniel Jacobowitz 2002-09-09 20:58 ` Ingo Molnar 2002-09-09 20:40 ` Linus Torvalds 1 sibling, 1 reply; 15+ messages in thread From: Daniel Jacobowitz @ 2002-09-09 20:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel On Mon, Sep 09, 2002 at 10:33:17PM +0200, Ingo Molnar wrote: > > the attached patch fixes one bug in the way we did zap_thread() - but this > alone does not fix the lockup. > > the bug was that list_for_each_safe() is not 'safe enough' - zap_thread() > drops the tasklist lock at which point anything might happen to the child > list. > > the lockup is likely in the while loop - ie. zap_thread() not actually > reparenting a thread and thus causing an infinite loop - is that possible? Well, it shouldn't be. forget_original_parent should update real_parent for every child on either list, and then zap_thread unlinks each child from the current parent and links it to the new real_parent. A couple of printks in there should be able to work out if I'm wrong, though... > @@ -554,17 +553,16 @@ > do_notify_parent(current, current->exit_signal); > > zap_again: > - list_for_each_safe(_p, _n, ¤t->children) > - zap_thread(list_entry(_p,struct task_struct,sibling), current, 0); > - list_for_each_safe(_p, _n, ¤t->ptrace_children) > - zap_thread(list_entry(_p,struct task_struct,ptrace_list), current, 1); > + while (!list_empty(¤t->children)) > + zap_thread(list_entry(current->children.next,struct task_struct,sibling), current, 0); > + while (!list_empty(¤t->ptrace_children)) > + zap_thread(list_entry(current->ptrace_children.next,struct task_struct,sibling), current, 0); As Linus points out, typo right there on the last argument. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:40 ` Daniel Jacobowitz @ 2002-09-09 20:58 ` Ingo Molnar 0 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 20:58 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Linus Torvalds, linux-kernel some more info about the state it was in: p (ld-linux.so.2, 4364/4357), father: (ld-linux.so.2, 4363/4357) kernel BUG at exit.c:470! Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:33 ` Ingo Molnar 2002-09-09 20:40 ` Daniel Jacobowitz @ 2002-09-09 20:40 ` Linus Torvalds 2002-09-09 20:45 ` Ingo Molnar 1 sibling, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2002-09-09 20:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: Daniel Jacobowitz, linux-kernel On Mon, 9 Sep 2002, Ingo Molnar wrote: > > the lockup is likely in the while loop - ie. zap_thread() not actually > reparenting a thread and thus causing an infinite loop - is that possible? Hmm.. This patch changes the last argument of zap_thread() from a "1" to a "0" for the ptrace_children list. Was that intentional or a cut-and-paste error? If it was intentional, please remove the argument altogether, since it's now always 0. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:40 ` Linus Torvalds @ 2002-09-09 20:45 ` Ingo Molnar 0 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 20:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Daniel Jacobowitz, linux-kernel On Mon, 9 Sep 2002, Linus Torvalds wrote: > > the lockup is likely in the while loop - ie. zap_thread() not actually > > reparenting a thread and thus causing an infinite loop - is that possible? > > Hmm.. This patch changes the last argument of zap_thread() from a "1" to > a "0" for the ptrace_children list. Was that intentional or a > cut-and-paste error? [...] cut-and-paste error. New patch attached. (with the assert added as well) Ingo --- linux/kernel/exit.c.orig Mon Sep 9 21:59:24 2002 +++ linux/kernel/exit.c Mon Sep 9 22:38:44 2002 @@ -461,6 +461,8 @@ ptrace_unlink (p); list_del_init(&p->sibling); + if (p->parent == father && p->parent == p->real_parent) + BUG(); p->parent = p->real_parent; list_add_tail(&p->sibling, &p->parent->children); } @@ -493,7 +495,6 @@ static void exit_notify(void) { struct task_struct *t; - struct list_head *_p, *_n; forget_original_parent(current); /* @@ -554,17 +555,16 @@ do_notify_parent(current, current->exit_signal); zap_again: - list_for_each_safe(_p, _n, ¤t->children) - zap_thread(list_entry(_p,struct task_struct,sibling), current, 0); - list_for_each_safe(_p, _n, ¤t->ptrace_children) - zap_thread(list_entry(_p,struct task_struct,ptrace_list), current, 1); + while (!list_empty(¤t->children)) + zap_thread(list_entry(current->children.next,struct task_struct,sibling), current, 0); + while (!list_empty(¤t->ptrace_children)) + zap_thread(list_entry(current->ptrace_children.next,struct task_struct,sibling), current, 1); /* * zap_thread might drop the tasklist lock, thus we could * have new children queued back from the ptrace list into the * child list: */ - if (unlikely(!list_empty(¤t->children) || - !list_empty(¤t->ptrace_children))) + if (unlikely(!list_empty(¤t->children))) goto zap_again; /* * No need to unlock IRQs, we'll schedule() immediately ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:15 ` Daniel Jacobowitz 2002-09-09 20:25 ` Ingo Molnar 2002-09-09 20:33 ` Ingo Molnar @ 2002-09-09 20:43 ` Ingo Molnar 2002-09-09 20:50 ` Daniel Jacobowitz 2 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 20:43 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Linus Torvalds, linux-kernel the following assert triggers and catches the lockup: --- linux/kernel/exit.c.orig Mon Sep 9 21:59:24 2002 +++ linux/kernel/exit.c Mon Sep 9 22:38:44 2002 @@ -461,6 +461,8 @@ ptrace_unlink (p); list_del_init(&p->sibling); + if (p->parent == father && p->parent == p->real_parent) + BUG(); p->parent = p->real_parent; list_add_tail(&p->sibling, &p->parent->children); } so somehow we can end up having parent == real_parent? Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:43 ` Ingo Molnar @ 2002-09-09 20:50 ` Daniel Jacobowitz 2002-09-09 21:00 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Daniel Jacobowitz @ 2002-09-09 20:50 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel On Mon, Sep 09, 2002 at 10:43:51PM +0200, Ingo Molnar wrote: > > the following assert triggers and catches the lockup: > > --- linux/kernel/exit.c.orig Mon Sep 9 21:59:24 2002 > +++ linux/kernel/exit.c Mon Sep 9 22:38:44 2002 > @@ -461,6 +461,8 @@ > ptrace_unlink (p); > > list_del_init(&p->sibling); > + if (p->parent == father && p->parent == p->real_parent) > + BUG(); > p->parent = p->real_parent; > list_add_tail(&p->sibling, &p->parent->children); > } > > so somehow we can end up having parent == real_parent? When is this happening? It's not necessarily a bug. If the process was traced, then __ptrace_unlink will set p->parent = p->real_parent when it unlinks. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: do_syslog/__down_trylock lockup in current BK 2002-09-09 20:50 ` Daniel Jacobowitz @ 2002-09-09 21:00 ` Ingo Molnar 0 siblings, 0 replies; 15+ messages in thread From: Ingo Molnar @ 2002-09-09 21:00 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Linus Torvalds, linux-kernel On Mon, 9 Sep 2002, Daniel Jacobowitz wrote: > When is this happening? It's not necessarily a bug. If the process was > traced, then __ptrace_unlink will set p->parent = p->real_parent when it > unlinks. it's not traced. And if you look at the patch i've put the assert into the !traced branch ... Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2002-09-09 20:53 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-09-09 18:53 do_syslog/__down_trylock lockup in current BK Ingo Molnar 2002-09-09 19:01 ` Linus Torvalds 2002-09-09 19:08 ` Ingo Molnar 2002-09-09 19:18 ` Ingo Molnar 2002-09-09 20:02 ` Ingo Molnar 2002-09-09 20:15 ` Daniel Jacobowitz 2002-09-09 20:25 ` Ingo Molnar 2002-09-09 20:33 ` Ingo Molnar 2002-09-09 20:40 ` Daniel Jacobowitz 2002-09-09 20:58 ` Ingo Molnar 2002-09-09 20:40 ` Linus Torvalds 2002-09-09 20:45 ` Ingo Molnar 2002-09-09 20:43 ` Ingo Molnar 2002-09-09 20:50 ` Daniel Jacobowitz 2002-09-09 21:00 ` Ingo Molnar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.