public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled
@ 2007-04-06 17:36 Suleiman Souhlal
  2007-04-06 19:22 ` Sergei Shtylyov
  2007-04-06 20:51 ` Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 7+ messages in thread
From: Suleiman Souhlal @ 2007-04-06 17:36 UTC (permalink / raw)
  To: bzolnier; +Cc: linux-ide, linux-kernel

It is possible for the timer expiry function to run even though the
request has already been handled: ide_timer_expiry() only checks that
the handler is not NULL, but it is possible that we have handled a
request (thus clearing the handler) and then started a new request
(thus starting the timer again, and setting a handler). 

A simple way to exhibit this is to set the DMA timeout to 1 jiffy and
run dd: The kernel will panic after a few minutes because
ide_timer_expiry() tries to add a timer when it's already active.

To fix this, we simply add a request generation count that gets
incremented at every interrupt, and check in ide_timer_expiry() that
we have not already handled a new interrupt before running the expiry
function.

Signed-off-by:	Suleiman Souhlal <suleiman@google.com>

---
 drivers/ide/ide-io.c   |    6 +++++-
 drivers/ide/ide-iops.c |    2 ++
 include/linux/ide.h    |    2 ++
 3 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
index 0e02800..8670112 100644
--- a/drivers/ide/ide-io.c
+++ b/drivers/ide/ide-io.c
@@ -1226,6 +1226,7 @@ #if 1
 #endif
 				/* so that ide_timer_expiry knows what to do */
 				hwgroup->sleeping = 1;
+				hwgroup->req_gen_timer = hwgroup->req_gen;
 				mod_timer(&hwgroup->timer, sleep);
 				/* we purposely leave hwgroup->busy==1
 				 * while sleeping */
@@ -1411,7 +1412,8 @@ void ide_timer_expiry (unsigned long dat
 
 	spin_lock_irqsave(&ide_lock, flags);
 
-	if ((handler = hwgroup->handler) == NULL) {
+	if (((handler = hwgroup->handler) == NULL) ||
+	    (hwgroup->req_gen != hwgroup->req_gen_timer)) {
 		/*
 		 * Either a marginal timeout occurred
 		 * (got the interrupt just as timer expired),
@@ -1439,6 +1441,7 @@ void ide_timer_expiry (unsigned long dat
 				if ((wait = expiry(drive)) > 0) {
 					/* reset timer */
 					hwgroup->timer.expires  = jiffies + wait;
+					hwgroup->req_gen_timer = hwgroup->req_gen;
 					add_timer(&hwgroup->timer);
 					spin_unlock_irqrestore(&ide_lock, flags);
 					return;
@@ -1653,6 +1656,7 @@ #endif /* CONFIG_BLK_DEV_IDEPCI */
 		printk(KERN_ERR "%s: ide_intr: hwgroup->busy was 0 ??\n", drive->name);
 	}
 	hwgroup->handler = NULL;
+	hwgroup->req_gen++;
 	del_timer(&hwgroup->timer);
 	spin_unlock(&ide_lock);
 
diff --git a/drivers/ide/ide-iops.c b/drivers/ide/ide-iops.c
index 1ee53a5..3caa176 100644
--- a/drivers/ide/ide-iops.c
+++ b/drivers/ide/ide-iops.c
@@ -889,6 +889,7 @@ static void __ide_set_handler (ide_drive
 	hwgroup->handler	= handler;
 	hwgroup->expiry		= expiry;
 	hwgroup->timer.expires	= jiffies + timeout;
+	hwgroup->req_gen_timer = hwgroup->req_gen;
 	add_timer(&hwgroup->timer);
 }
 
@@ -929,6 +930,7 @@ void ide_execute_command(ide_drive_t *dr
 	hwgroup->handler	= handler;
 	hwgroup->expiry		= expiry;
 	hwgroup->timer.expires	= jiffies + timeout;
+	hwgroup->req_gen_timer = hwgroup->req_gen;
 	add_timer(&hwgroup->timer);
 	hwif->OUTBSYNC(drive, cmd, IDE_COMMAND_REG);
 	/* Drive takes 400nS to respond, we must avoid the IRQ being
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 58564a1..d3bbc71 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -861,6 +861,8 @@ typedef struct hwgroup_s {
 	int (*expiry)(ide_drive_t *);
 		/* ide_system_bus_speed */
 	int pio_clock;
+	int req_gen;
+	int req_gen_timer;
 
 	unsigned char cmd_buf[4];
 } ide_hwgroup_t;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled
  2007-04-06 17:36 [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled Suleiman Souhlal
@ 2007-04-06 19:22 ` Sergei Shtylyov
  2007-04-06 20:55   ` Suleiman Souhlal
  2007-04-06 20:51 ` Bartlomiej Zolnierkiewicz
  1 sibling, 1 reply; 7+ messages in thread
From: Sergei Shtylyov @ 2007-04-06 19:22 UTC (permalink / raw)
  To: Suleiman Souhlal; +Cc: bzolnier, linux-ide, linux-kernel

Hello.

Suleiman Souhlal wrote:

> It is possible for the timer expiry function to run even though the
> request has already been handled: ide_timer_expiry() only checks that
> the handler is not NULL, but it is possible that we have handled a
> request (thus clearing the handler) and then started a new request
> (thus starting the timer again, and setting a handler). 

> A simple way to exhibit this is to set the DMA timeout to 1 jiffy and
> run dd: The kernel will panic after a few minutes because
> ide_timer_expiry() tries to add a timer when it's already active.

> To fix this, we simply add a request generation count that gets
> incremented at every interrupt, and check in ide_timer_expiry() that
> we have not already handled a new interrupt before running the expiry
> function.

    Couldn't this be addressed by simply changing add_timer() to mod_timer()?

MBR, Sergei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled
  2007-04-06 17:36 [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled Suleiman Souhlal
  2007-04-06 19:22 ` Sergei Shtylyov
@ 2007-04-06 20:51 ` Bartlomiej Zolnierkiewicz
  1 sibling, 0 replies; 7+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-04-06 20:51 UTC (permalink / raw)
  To: Suleiman Souhlal; +Cc: linux-ide, linux-kernel


On Friday 06 April 2007, Suleiman Souhlal wrote:
> It is possible for the timer expiry function to run even though the
> request has already been handled: ide_timer_expiry() only checks that
> the handler is not NULL, but it is possible that we have handled a
> request (thus clearing the handler) and then started a new request
> (thus starting the timer again, and setting a handler). 
> 
> A simple way to exhibit this is to set the DMA timeout to 1 jiffy and
> run dd: The kernel will panic after a few minutes because
> ide_timer_expiry() tries to add a timer when it's already active.
> 
> To fix this, we simply add a request generation count that gets
> incremented at every interrupt, and check in ide_timer_expiry() that
> we have not already handled a new interrupt before running the expiry
> function.
> 
> Signed-off-by:	Suleiman Souhlal <suleiman@google.com>

applied, thanks for fixing this

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Correctly prevent IDE timer expiry function to run if  request was already handled
  2007-04-06 19:22 ` Sergei Shtylyov
@ 2007-04-06 20:55   ` Suleiman Souhlal
  2007-04-06 21:09     ` Sergei Shtylyov
  0 siblings, 1 reply; 7+ messages in thread
From: Suleiman Souhlal @ 2007-04-06 20:55 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Suleiman Souhlal, bzolnier, linux-ide, linux-kernel

On Fri, April 6, 2007 12:22 pm, Sergei Shtylyov wrote:
> Hello.
>
>
> Suleiman Souhlal wrote:
>
>
>> It is possible for the timer expiry function to run even though the
>> request has already been handled: ide_timer_expiry() only checks that the
>> handler is not NULL, but it is possible that we have handled a request
>> (thus clearing the handler) and then started a new request
>> (thus starting the timer again, and setting a handler).
>>
>
>> A simple way to exhibit this is to set the DMA timeout to 1 jiffy and
>> run dd: The kernel will panic after a few minutes because
>> ide_timer_expiry() tries to add a timer when it's already active.
>
>> To fix this, we simply add a request generation count that gets
>> incremented at every interrupt, and check in ide_timer_expiry() that we
>> have not already handled a new interrupt before running the expiry
>> function.
>
> Couldn't this be addressed by simply changing add_timer() to mod_timer()?

No, we don't want to run the expiry function at all, in this case, since
the   request might have correctly been handled already by the time we
would try to run the expiry function/restart the timer.

Also, if we just change the add_timer() to mod_timer(), we will just be
hiding the problem because you might end up changing the timeout of a
timer whose purpose is different (for a new request, for example).
The timer should not be active when ide_timer_expiry() tries to restart
it, since that function is called when the timer has expired (meaning it
is not active anymore).
The reason the timer could have been active at that point, before applying
this patch, is that we try to dispatch a new request after handling one.
The new request will then have its own expiry timer, along with a handler.
Since  before this patch ide_timer_expiry() only looked at whether or not
a handler was present, it would incorrectly think the request had not been
handled already, and incorrectly tried to restart the timer.

-- Suleiman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled
  2007-04-06 20:55   ` Suleiman Souhlal
@ 2007-04-06 21:09     ` Sergei Shtylyov
  2007-04-06 21:19       ` Interrupt posted but not delivered Kirill Yushkov
  2007-04-06 21:35       ` [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 7+ messages in thread
From: Sergei Shtylyov @ 2007-04-06 21:09 UTC (permalink / raw)
  To: Suleiman Souhlal; +Cc: bzolnier, linux-ide, linux-kernel

Hello.

Suleiman Souhlal wrote:

>>>It is possible for the timer expiry function to run even though the
>>>request has already been handled: ide_timer_expiry() only checks that the
>>>handler is not NULL, but it is possible that we have handled a request
>>>(thus clearing the handler) and then started a new request
>>>(thus starting the timer again, and setting a handler).

>>>A simple way to exhibit this is to set the DMA timeout to 1 jiffy and
>>>run dd: The kernel will panic after a few minutes because
>>>ide_timer_expiry() tries to add a timer when it's already active.

>>>To fix this, we simply add a request generation count that gets
>>>incremented at every interrupt, and check in ide_timer_expiry() that we
>>>have not already handled a new interrupt before running the expiry
>>>function.

>>Couldn't this be addressed by simply changing add_timer() to mod_timer()?

> No, we don't want to run the expiry function at all, in this case, since
> the   request might have correctly been handled already by the time we
> would try to run the expiry function/restart the timer.

> Also, if we just change the add_timer() to mod_timer(), we will just be
> hiding the problem because you might end up changing the timeout of a
> timer whose purpose is different (for a new request, for example).
> The timer should not be active when ide_timer_expiry() tries to restart
> it, since that function is called when the timer has expired (meaning it
> is not active anymore).

    Yeah, that was stupid idea.  Been looking at network schedulers too much 
recently. :-)

> The reason the timer could have been active at that point, before applying
> this patch, is that we try to dispatch a new request after handling one.
> The new request will then have its own expiry timer, along with a handler.
> Since  before this patch ide_timer_expiry() only looked at whether or not
> a handler was present, it would incorrectly think the request had not been
> handled already, and incorrectly tried to restart the timer.

    Hm, I'm still not sure why this happens at all, probably need to try 
reproducing it (unless you post a stack trace :-).

> -- Suleiman

MBR, Sergei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Interrupt posted but not delivered
  2007-04-06 21:09     ` Sergei Shtylyov
@ 2007-04-06 21:19       ` Kirill Yushkov
  2007-04-06 21:35       ` [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled Bartlomiej Zolnierkiewicz
  1 sibling, 0 replies; 7+ messages in thread
From: Kirill Yushkov @ 2007-04-06 21:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: cebbert

Hello, Chuck!
You wrote to LKML on 2007-03-26 0:28:05:

CE> I've now seen several reports of this message from the 3Com
CE> 3c59x driver.  This seems to be a very old problem that has
CE> somehow come back starting with kernel 2.6.19 AFAICT. What
CE> could cause this?
I too have the same problem on version 2.6.20

Apr  6 18:18:14 gw kernel: NETDEV WATCHDOG: eth0: transmit timed out
Apr  6 18:18:14 gw kernel: eth0: transmit timed out, tx_status 00 status 
e681.
Apr  6 18:18:14 gw kernel:   diagnostics: net 0cf6 media 8880 dma 0000003a 
fifo 8000
Apr  6 18:18:14 gw kernel: eth0: Interrupt posted but not delivered -- IRQ 
blocked by another device?
Apr  6 18:18:14 gw kernel:   Flags; bus-master 1, dirty 63886224(0) current 
63886224(0)
Apr  6 18:18:14 gw kernel:   Transmit list 00000000 vs. ffff8100b8967200.
Apr  6 18:18:14 gw kernel:   0: @ffff8100b8967200  length 80000032 status 
00010032
Apr  6 18:18:14 gw kernel:   1: @ffff8100b89672a0  length 80000032 status 
00010032
Apr  6 18:18:14 gw kernel:   2: @ffff8100b8967340  length 8000002e status 
0001002e
Apr  6 18:18:14 gw kernel:   3: @ffff8100b89673e0  length 80000032 status 
00010032
Apr  6 18:18:14 gw kernel:   4: @ffff8100b8967480  length 80000058 status 
00010058
Apr  6 18:18:14 gw kernel:   5: @ffff8100b8967520  length 800002bc status 
000102bc
Apr  6 18:18:14 gw kernel:   6: @ffff8100b89675c0  length 80000061 status 
00010061
Apr  6 18:18:14 gw kernel:   7: @ffff8100b8967660  length 800000b3 status 
000100b3
Apr  6 18:18:14 gw kernel:   8: @ffff8100b8967700  length 800000aa status 
000100aa
Apr  6 18:18:14 gw kernel:   9: @ffff8100b89677a0  length 8000005c status 
0001005c
Apr  6 18:18:14 gw kernel:   10: @ffff8100b8967840  length 800005ac status 
000105ac
Apr  6 18:18:14 gw kernel:   11: @ffff8100b89678e0  length 800000e8 status 
000100e8
Apr  6 18:18:14 gw kernel:   12: @ffff8100b8967980  length 800005ac status 
000105ac
Apr  6 18:18:14 gw kernel:   13: @ffff8100b8967a20  length 800005ac status 
000105ac
Apr  6 18:18:14 gw kernel:   14: @ffff8100b8967ac0  length 80000062 status 
80010062
Apr  6 18:18:14 gw kernel:   15: @ffff8100b8967b60  length 8000046a status 
8001046a
Apr  6 18:18:15 gw kernel: BUG: at include/net/dst.h:154 dst_release()
Apr  6 18:18:15 gw kernel:
Apr  6 18:18:15 gw kernel: Call Trace:
Apr  6 18:18:15 gw kernel:  <IRQ>  [<ffffffff8022285f>] 
__kfree_skb+0x3f/0x120
Apr  6 18:18:15 gw kernel:  [<ffffffff80427282>] 
__neigh_event_send+0x142/0x190
Apr  6 18:18:15 gw kernel:  [<ffffffff804471c0>] ip_finish_output+0x0/0x200
Apr  6 18:18:15 gw kernel:  [<ffffffff8024c89a>] 
neigh_resolve_output+0x7a/0x1a0
Apr  6 18:18:15 gw kernel:  [<ffffffff80229c44>] ip_output+0x244/0x280
Apr  6 18:18:15 gw kernel:  [<ffffffff881198bb>] :pptp:pptp_xmit+0x6bb/0x760
Apr  6 18:18:15 gw kernel:  [<ffffffff8810254f>] 
:ppp_generic:ppp_push+0x5f/0xe0
Apr  6 18:18:15 gw kernel:  [<ffffffff881024c4>] 
:ppp_generic:ppp_send_frame+0x4e4/0x510
Apr  6 18:18:15 gw kernel:  [<ffffffff88101f77>] 
:ppp_generic:ppp_xmit_process+0x37/0xa0
Apr  6 18:18:15 gw kernel:  [<ffffffff88101d91>] 
:ppp_generic:ppp_start_xmit+0x1e1/0x210
Apr  6 18:18:15 gw kernel:  [<ffffffff80421c4e>] 
dev_hard_start_xmit+0x8e/0x140
Apr  6 18:18:15 gw kernel:  [<ffffffff804324bf>] __qdisc_run+0xff/0x1c0
Apr  6 18:18:15 gw kernel:  [<ffffffff80227845>] dev_queue_xmit+0x145/0x280
Apr  6 18:18:15 gw kernel:  [<ffffffff80447352>] 
ip_finish_output+0x192/0x200
Apr  6 18:18:15 gw kernel:  [<ffffffff8043d810>] nf_reinject+0x190/0x1e0
Apr  6 18:18:15 gw kernel:  [<ffffffff804471c0>] ip_finish_output+0x0/0x200
Apr  6 18:18:15 gw kernel:  [<ffffffff8811009a>] :imq:imq_dev_xmit+0x4a/0x60
Apr  6 18:18:15 gw kernel:  [<ffffffff80421c4e>] 
dev_hard_start_xmit+0x8e/0x140
Apr  6 18:18:15 gw kernel:  [<ffffffff804322dc>] qdisc_restart1+0xdc/0x1c0

With best regards, Kirill Yushkov.  E-mail: mail@kirya.ru 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled
  2007-04-06 21:09     ` Sergei Shtylyov
  2007-04-06 21:19       ` Interrupt posted but not delivered Kirill Yushkov
@ 2007-04-06 21:35       ` Bartlomiej Zolnierkiewicz
  1 sibling, 0 replies; 7+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-04-06 21:35 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Suleiman Souhlal, linux-ide, linux-kernel


On Friday 06 April 2007, Sergei Shtylyov wrote:
> Hello.
> 
> Suleiman Souhlal wrote:

> > The reason the timer could have been active at that point, before applying
> > this patch, is that we try to dispatch a new request after handling one.
> > The new request will then have its own expiry timer, along with a handler.
> > Since  before this patch ide_timer_expiry() only looked at whether or not
> > a handler was present, it would incorrectly think the request had not been
> > handled already, and incorrectly tried to restart the timer.
> 
>     Hm, I'm still not sure why this happens at all, probably need to try 
> reproducing it (unless you post a stack trace :-).

not a stack trace but should explain the issue a bit:

...
ide_timer_expiry
	-> just before taking ide_lock IRQ happens
...
ide_intr
	-> completes the command and queues the next one
...
ide_timer_expiry (resumed)
	-> takes ide_lock
	-> the bug happens ;)

should be even easier to trigger it on SMP since ide_timer_expiry
may be running at the same time as ide_intr

Bart

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-04-06 21:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-06 17:36 [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled Suleiman Souhlal
2007-04-06 19:22 ` Sergei Shtylyov
2007-04-06 20:55   ` Suleiman Souhlal
2007-04-06 21:09     ` Sergei Shtylyov
2007-04-06 21:19       ` Interrupt posted but not delivered Kirill Yushkov
2007-04-06 21:35       ` [PATCH] Correctly prevent IDE timer expiry function to run if request was already handled Bartlomiej Zolnierkiewicz
2007-04-06 20:51 ` Bartlomiej Zolnierkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox