public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
@ 2003-03-30 16:10 Shawn Starr
  2003-03-30 17:12 ` Petr Vandrovec
  2003-03-30 21:02 ` Roland Dreier
  0 siblings, 2 replies; 8+ messages in thread
From: Shawn Starr @ 2003-03-30 16:10 UTC (permalink / raw)
  To: Shawn Starr, Andrew Morton; +Cc: rml, linux-kernel

Function found was: delayed_work_timer_fn (kernel/workqueue.c)

free of pending timer at c7411150
function=c0138ba0
Call Trace:
 [<c014d4b4>] timer_hunt+0x84/0x90
 [<c0138ba0>] delayed_work_timer_fn+0x0/0x170
 [<c014fb08>] kfree+0x1c8/0x320
 [<c024f8a6>] release_dev+0x696/0x840
 [<c024f8a6>] release_dev+0x696/0x840
 [<c01566eb>] zap_pmd_range+0x4b/0x70
 [<c015675b>] unmap_page_range+0x4b/0x80
 [<c0250054>] tty_release+0x94/0x1b0
 [<c016db2c>] __fput+0xac/0x100
 [<c024ffc0>] tty_release+0x0/0x1b0
 [<c016db7b>] __fput+0xfb/0x100
 [<c016ba6c>] filp_close+0x15c/0x230
 [<c0125ebc>] put_files_struct+0x6c/0xe0
 [<c0127300>] do_exit+0x400/0xaa0
 [<c0255487>] read_chan+0x327/0x1010
 [<c0127bdb>] do_group_exit+0x1cb/0x210
 [<c01303e5>] dequeue_signal+0x35/0xa0
 [<c013319e>] get_signal_to_deliver+0x40e/0x920
 [<c010a08d>] do_signal+0xdd/0x110
 [<c024e22d>] tty_read+0x24d/0x2d0
 [<c0256170>] write_chan+0x0/0x240
 [<c016c641>] vfs_read+0xe1/0x1c0
 [<c010cd75>] do_IRQ+0x235/0x370
 [<c010a119>] do_notify_resume+0x59/0x5c
 [<c010a2f6>] work_notifysig+0x13/0x15

---- oops --
Unable to handle kernel paging request at virtual address 6b6b6b6f
 printing eip:
c012e9c7
*pde = 00000000
Oops: 0002 [#1]
CPU:    0
EIP:    0060:[<c012e9c7>]    Not tainted
EFLAGS: 00010002
EIP is at run_timer_softirq+0xe7/0x410
eax: c7411150   ebx: c7411154   ecx: 6b6b6b6b   edx: 6b6b6b6b
esi: 6b6b6b6b   edi: 6b6b6b6b   ebp: c7630000   esp: c7631c68
ds: 007b   es: 007b   ss: 0068
Process agetty (pid: 488, threadinfo=c7630000 task=c7626cc0)
Stack: 00000046 c010ac18 00000002 00000002 00000000 c04c9c40 fffffffd
c7630000
       00000040 0000007b 00000001 c04c9c48 fffffffd 00000046 c012964a
c04c9c48
       c7630000 c7630000 00000000 c04185e0 c010cd75 00000000 c7631cf0
c04185e0
Call Trace:
 [<c010ac18>] common_interrupt+0x18/0x20
 [<c012964a>] do_softirq+0x9a/0xa0
 [<c010cd75>] do_IRQ+0x235/0x370
 [<c010ac18>] common_interrupt+0x18/0x20
 [<c014007b>] sys_timer_delete+0xdb/0x210
 [<c014fb2c>] kfree+0x1ec/0x320
 [<c024f8a6>] release_dev+0x696/0x840
 [<c024f8a6>] release_dev+0x696/0x840
 [<c01566eb>] zap_pmd_range+0x4b/0x70
 [<c015675b>] unmap_page_range+0x4b/0x80
 [<c0250054>] tty_release+0x94/0x1b0
 [<c016db2c>] __fput+0xac/0x100
 [<c024ffc0>] tty_release+0x0/0x1b0
 [<c016db7b>] __fput+0xfb/0x100
 [<c016ba6c>] filp_close+0x15c/0x230
 [<c0125ebc>] put_files_struct+0x6c/0xe0
 [<c0127300>] do_exit+0x400/0xaa0
 [<c0255487>] read_chan+0x327/0x1010
 [<c0127bdb>] do_group_exit+0x1cb/0x210
 [<c01303e5>] dequeue_signal+0x35/0xa0
 [<c013319e>] get_signal_to_deliver+0x40e/0x920
 [<c010a08d>] do_signal+0xdd/0x110
 [<c024e22d>] tty_read+0x24d/0x2d0
 [<c0256170>] write_chan+0x0/0x240
 [<c016c641>] vfs_read+0xe1/0x1c0
 [<c010cd75>] do_IRQ+0x235/0x370
 [<c010a119>] do_notify_resume+0x59/0x5c
 [<c010a2f6>] work_notifysig+0x13/0x15

Code: 89 4a 04 89 11 c7 40 30 00 00 00 00 81 3d a0 9a 41 c0 3c 4b
 <0>Kernel panic: Aiee, killing interrupt handler!
kernel/timer.c:258: spin_lock(kernel/timer.c:c0419aa0) already locked by
kernel/timer.c/398
In interrupt handler - not syncing

ksymoops dump:

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   89 4a 04                  mov    %ecx,0x4(%edx)
Code;  00000003 Before first symbol
   3:   89 11                     mov    %edx,(%ecx)
Code;  00000005 Before first symbol
   5:   c7 40 30 00 00 00 00      movl   $0x0,0x30(%eax)
Code;  0000000c Before first symbol
   c:   81 3d a0 9a 41 c0 3c      cmpl   $0x4b3c,0xc0419aa0
Code;  00000013 Before first symbol
  13:   4b 00 00



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 16:10 [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer Shawn Starr
@ 2003-03-30 17:12 ` Petr Vandrovec
  2003-03-30 21:02 ` Roland Dreier
  1 sibling, 0 replies; 8+ messages in thread
From: Petr Vandrovec @ 2003-03-30 17:12 UTC (permalink / raw)
  To: Shawn Starr; +Cc: Andrew Morton, rml, linux-kernel

On Sun, Mar 30, 2003 at 11:10:48AM -0500, Shawn Starr wrote:
> Function found was: delayed_work_timer_fn (kernel/workqueue.c)
> 
> free of pending timer at c7411150
> function=c0138ba0
> Call Trace:
>  [<c014d4b4>] timer_hunt+0x84/0x90
>  [<c0138ba0>] delayed_work_timer_fn+0x0/0x170
>  [<c014fb08>] kfree+0x1c8/0x320
>  [<c024f8a6>] release_dev+0x696/0x840
>  [<c024f8a6>] release_dev+0x696/0x840

drivers/char/tty_io.c schedules flush_to_ldisc work. This
work can potentially reschedule itself, making call to
flush_scheduled_work() at the end of release_dev()
useless :-( 

Unfortunately I have no idea what is that code trying to do -
we already did tty->ldisc.close() and replaced
tty->ldisc with something else when we call flush_scheduled_work(),
so I doubt that work we are waiting for can do anything
useful even if TTY_DONT_FLIP is not set...
					Petr Vandrovec
					vandrove@vc.cvut.cz


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 16:10 [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer Shawn Starr
  2003-03-30 17:12 ` Petr Vandrovec
@ 2003-03-30 21:02 ` Roland Dreier
  2003-03-30 21:28   ` Shawn Starr
  1 sibling, 1 reply; 8+ messages in thread
From: Roland Dreier @ 2003-03-30 21:02 UTC (permalink / raw)
  To: Shawn Starr; +Cc: Andrew Morton, rml, linux-kernel

    Shawn> Function found was: delayed_work_timer_fn
    Shawn> (kernel/workqueue.c)

It looks to me like something is calling schedule_delayed_work()
(which calls queue_delayed_work(), which starts a timer) and then
freeing the work_struct before it's executed.

Here's a list of places that use schedule_delayed_work() where the
work_struct might be kmalloc()ed.  Are you using any of these drivers?
(Obviously you're using tty_io, so that bears some looking at)

    drivers/char/cyclades.c
    drivers/char/mxser.c
    drivers/char/tty_io.c
    drivers/isdn/i4l/isdn_tty.c
    drivers/message/fusion/mptlan.c
    drivers/net/hamradio/baycom_epp.c
    drivers/net/plip.c
    drivers/scsi/imm.c
    drivers/scsi/ppa.c

If tty_io.c is the problem, then maybe something like the patch below
will find the culprit.

  - Roland

===== drivers/char/tty_io.c 1.68 vs edited =====
--- 1.68/drivers/char/tty_io.c	Thu Mar 27 21:15:44 2003
+++ edited/drivers/char/tty_io.c	Sun Mar 30 12:51:00 2003
@@ -169,6 +169,10 @@
 
 static inline void free_tty_struct(struct tty_struct *tty)
 {
+	if (timer_pending(&tty->flip.work.timer)) {
+		printk(KERN_WARNING "freeing tty with pending flip work timer from [<%p>]\n",
+		       __builtin_return_address(0));
+	}
 	kfree(tty);
 }
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 21:02 ` Roland Dreier
@ 2003-03-30 21:28   ` Shawn Starr
  2003-03-30 23:17     ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Shawn Starr @ 2003-03-30 21:28 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Andrew Morton, rml, linux-kernel

drivers/char/tty_io.c - Only

I bet it's this function, there's only a kfree, not destruction of any
timers.

Added this rebuilt kernel waiting :-)

Shawn.

----- Original Message -----
From: "Roland Dreier" <roland@topspin.com>
To: "Shawn Starr" <spstarr@sh0n.net>
Cc: "Andrew Morton" <akpm@digeo.com>; <rml@tech9.net>;
<linux-kernel@vger.kernel.org>
Sent: Sunday, March 30, 2003 4:02 PM
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New
OOPS w/ timer


>     Shawn> Function found was: delayed_work_timer_fn
>     Shawn> (kernel/workqueue.c)
>
> It looks to me like something is calling schedule_delayed_work()
> (which calls queue_delayed_work(), which starts a timer) and then
> freeing the work_struct before it's executed.
>
> Here's a list of places that use schedule_delayed_work() where the
> work_struct might be kmalloc()ed.  Are you using any of these drivers?
> (Obviously you're using tty_io, so that bears some looking at)
>
>     drivers/char/cyclades.c
>     drivers/char/mxser.c
>     drivers/char/tty_io.c
>     drivers/isdn/i4l/isdn_tty.c
>     drivers/message/fusion/mptlan.c
>     drivers/net/hamradio/baycom_epp.c
>     drivers/net/plip.c
>     drivers/scsi/imm.c
>     drivers/scsi/ppa.c
>
> If tty_io.c is the problem, then maybe something like the patch below
> will find the culprit.
>
>   - Roland
>
> ===== drivers/char/tty_io.c 1.68 vs edited =====
> --- 1.68/drivers/char/tty_io.c Thu Mar 27 21:15:44 2003
> +++ edited/drivers/char/tty_io.c Sun Mar 30 12:51:00 2003
> @@ -169,6 +169,10 @@
>
>  static inline void free_tty_struct(struct tty_struct *tty)
>  {
> + if (timer_pending(&tty->flip.work.timer)) {
> + printk(KERN_WARNING "freeing tty with pending flip work timer from
[<%p>]\n",
> +        __builtin_return_address(0));
> + }
>   kfree(tty);
>  }
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 21:28   ` Shawn Starr
@ 2003-03-30 23:17     ` Andrew Morton
  2003-03-31  0:00       ` Roland Dreier
                         ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Andrew Morton @ 2003-03-30 23:17 UTC (permalink / raw)
  To: Shawn Starr; +Cc: roland, rml, linux-kernel

"Shawn Starr" <spstarr@sh0n.net> wrote:
>
> drivers/char/tty_io.c - Only
> 
> I bet it's this function, there's only a kfree, not destruction of any
> timers.
> 

This is fairly foul.

--- 25/drivers/char/tty_io.c~a	2003-03-30 15:12:37.000000000 -0800
+++ 25-akpm/drivers/char/tty_io.c	2003-03-30 15:16:59.000000000 -0800
@@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
 	/*
 	 * Make sure that the tty's task queue isn't activated. 
 	 */
+	clear_bit(TTY_DONT_FLIP, &tty->flags);
+	del_timer_sync(&tty->flip.work.timer);
 	flush_scheduled_work();
 
 	/* 

_


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 23:17     ` Andrew Morton
@ 2003-03-31  0:00       ` Roland Dreier
  2003-03-31 14:08       ` Shawn Starr
  2003-04-06  0:35       ` [OOPS][2.5.66bk9+] " Shawn Starr
  2 siblings, 0 replies; 8+ messages in thread
From: Roland Dreier @ 2003-03-31  0:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shawn Starr, rml, linux-kernel

  > --- 25/drivers/char/tty_io.c~a	2003-03-30 15:12:37.000000000 -0800
  > +++ 25-akpm/drivers/char/tty_io.c	2003-03-30 15:16:59.000000000 -0800
  > @@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
  >  	/*
  >  	 * Make sure that the tty's task queue isn't activated. 
  >  	 */
  > +	clear_bit(TTY_DONT_FLIP, &tty->flags);
  > +	del_timer_sync(&tty->flip.work.timer);
  >  	flush_scheduled_work();

I'm confused by this for two reasons:

First, from looking at workqueue.c (especially the comment in
queue_delayed_work() that says "Increase nr_queued so that the flush
function knows that there's something pending."), it seems like
flush_scheduled_work() should wait until even delayed work is done.
Given that, I don't think the del_timer_sync() should be there --
wouldn't flush_scheduled_work() block forever, since nr_queued can
never reach 0 now?

(I guess I'm assuming the real race is that tty_io.c calls
schedule_delayed_work() between flush_scheduled_work() and
release_mem() in release_dev())

Second, I don't see how it's _ever_ safe to call
flush_scheduled_work().  The comment in workqueue.c before
flush_workqueue() says "NOTE: if work is being added to the queue
constantly by some other context then this function might block
indefinitely."  But flush_scheduled_work() is flushing the keventd_wq,
which other code will definitely add work to.  If we're unlucky,
flush_scheduled_work() could block forever.  Am I just being paranoid?

 - Roland


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 23:17     ` Andrew Morton
  2003-03-31  0:00       ` Roland Dreier
@ 2003-03-31 14:08       ` Shawn Starr
  2003-04-06  0:35       ` [OOPS][2.5.66bk9+] " Shawn Starr
  2 siblings, 0 replies; 8+ messages in thread
From: Shawn Starr @ 2003-03-31 14:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: roland, rml, linux-kernel

I have applied this to my current tree, testing it out.

On Sun, 30 Mar 2003, Andrew Morton wrote:

> "Shawn Starr" <spstarr@sh0n.net> wrote:
> >
> > drivers/char/tty_io.c - Only
> >
> > I bet it's this function, there's only a kfree, not destruction of any
> > timers.
> >
>
> This is fairly foul.
>
> --- 25/drivers/char/tty_io.c~a	2003-03-30 15:12:37.000000000 -0800
> +++ 25-akpm/drivers/char/tty_io.c	2003-03-30 15:16:59.000000000 -0800
> @@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
>  	/*
>  	 * Make sure that the tty's task queue isn't activated.
>  	 */
> +	clear_bit(TTY_DONT_FLIP, &tty->flags);
> +	del_timer_sync(&tty->flip.work.timer);
>  	flush_scheduled_work();
>
>  	/*
>
> _
>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [OOPS][2.5.66bk9+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer
  2003-03-30 23:17     ` Andrew Morton
  2003-03-31  0:00       ` Roland Dreier
  2003-03-31 14:08       ` Shawn Starr
@ 2003-04-06  0:35       ` Shawn Starr
  2 siblings, 0 replies; 8+ messages in thread
From: Shawn Starr @ 2003-04-06  0:35 UTC (permalink / raw)
  To: 'Andrew Morton'; +Cc: roland, rml, linux-kernel

Andrew, since the new timer changes made, Things are looking solid so far.
We might want to begin adding things into BK-current no?

Sshd hasn't hung, ttys haven't hung either.

Shawn.


-----Original Message-----
From: Andrew Morton [mailto:akpm@digeo.com] 
Sent: Sunday, March 30, 2003 6:18 PM
To: Shawn Starr
Cc: roland@topspin.com; rml@tech9.net; linux-kernel@vger.kernel.org
Subject: Re: [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New
OOPS w/ timer

"Shawn Starr" <spstarr@sh0n.net> wrote:
>
> drivers/char/tty_io.c - Only
> 
> I bet it's this function, there's only a kfree, not destruction of any
> timers.
> 

This is fairly foul.

--- 25/drivers/char/tty_io.c~a	2003-03-30 15:12:37.000000000 -0800
+++ 25-akpm/drivers/char/tty_io.c	2003-03-30 15:16:59.000000000 -0800
@@ -1288,6 +1288,8 @@ static void release_dev(struct file * fi
 	/*
 	 * Make sure that the tty's task queue isn't activated. 
 	 */
+	clear_bit(TTY_DONT_FLIP, &tty->flags);
+	del_timer_sync(&tty->flip.work.timer);
 	flush_scheduled_work();
 
 	/* 

_



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-04-06  0:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-30 16:10 [OOPS][2.5.66bk3+] run_timer_softirq - IRQ Mishandlings - New OOPS w/ timer Shawn Starr
2003-03-30 17:12 ` Petr Vandrovec
2003-03-30 21:02 ` Roland Dreier
2003-03-30 21:28   ` Shawn Starr
2003-03-30 23:17     ` Andrew Morton
2003-03-31  0:00       ` Roland Dreier
2003-03-31 14:08       ` Shawn Starr
2003-04-06  0:35       ` [OOPS][2.5.66bk9+] " Shawn Starr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox