public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* mdelay causes BUG, please use udelay
@ 2002-08-20 23:50 Troy Wilson
  2002-08-21  1:00 ` Jeff Garzik
  0 siblings, 1 reply; 12+ messages in thread
From: Troy Wilson @ 2002-08-20 23:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Martin.Bligh, tcw

Hi Jeff,

  We probably shouldn't be doing this, but we can at least avoid the BUG
caused by doing an mdelay in interrupt context if we change to udelay.

Thanks,

- Troy


diff -urN linux-2.5.31/drivers/net/e1000/e1000_hw.c linux-2.5.31.udelay/drivers/net/e1000/e1000_hw.c
--- linux-2.5.31/drivers/net/e1000/e1000_hw.c	Sat Aug 10 20:41:28 2002
+++ linux-2.5.31.udelay/drivers/net/e1000/e1000_hw.c	Tue Aug 20 18:12:20 2002
@@ -134,7 +134,7 @@
     /* Delay to allow any outstanding PCI transactions to complete before
      * resetting the device
      */ 
-    msec_delay(10);
+    usec_delay(10000);
 
     /* Issue a global reset to the MAC.  This will reset the chip's
      * transmit, receive, DMA, and link units.  It will not effect

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mdelay causes BUG, please use udelay
@ 2002-08-21  0:20 Feldman, Scott
  2002-08-21  0:27 ` Benjamin LaHaise
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Feldman, Scott @ 2002-08-21  0:20 UTC (permalink / raw)
  To: 'Troy Wilson', Jeff Garzik; +Cc: linux-kernel, Martin.Bligh, tcw

>   We probably shouldn't be doing this, but we can at least 
> avoid the BUG caused by doing an mdelay in interrupt context 
> if we change to udelay.

That's BUG is my fault.  I put that in there so no one would tempted to call
msec_delay in the interrupt context.  As you've discovered, I missed one
case where msec_delay is called in interrupt context: tx_timeout.  Dangit!

> -    msec_delay(10);
> +    usec_delay(10000);

Jeff, 10000 seems on the border of what's OK.  If it's acceptable, then
let's go for that.  Otherwise, we're going to have to chain several
mod_timer callbacks together to do a controller reset.

-scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-21  0:20 Feldman, Scott
@ 2002-08-21  0:27 ` Benjamin LaHaise
  2002-08-21  0:28 ` H. Peter Anvin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Benjamin LaHaise @ 2002-08-21  0:27 UTC (permalink / raw)
  To: Feldman, Scott
  Cc: 'Troy Wilson', Jeff Garzik, linux-kernel, Martin.Bligh,
	tcw

On Tue, Aug 20, 2002 at 05:20:02PM -0700, Feldman, Scott wrote:
> > -    msec_delay(10);
> > +    usec_delay(10000);
> 
> Jeff, 10000 seems on the border of what's OK.  If it's acceptable, then
> let's go for that.  Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.

10000 is beyond okay.  You've just lost multiple timer ticks.

		-ben
-- 
"You will be reincarnated as a toad; and you will be much happier."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-21  0:20 Feldman, Scott
  2002-08-21  0:27 ` Benjamin LaHaise
@ 2002-08-21  0:28 ` H. Peter Anvin
  2002-08-21  0:28 ` Alan Cox
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2002-08-21  0:28 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <288F9BF66CD9D5118DF400508B68C4460283E4AF@orsmsx113.jf.intel.com>
By author:    "Feldman, Scott" <scott.feldman@intel.com>
In newsgroup: linux.dev.kernel
> 
> > -    msec_delay(10);
> > +    usec_delay(10000);
> 
> Jeff, 10000 seems on the border of what's OK.  If it's acceptable, then
> let's go for that.  Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.
> 

10 ms in an interrupt context?  Surely you're joking...

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mdelay causes BUG, please use udelay
  2002-08-21  0:20 Feldman, Scott
  2002-08-21  0:27 ` Benjamin LaHaise
  2002-08-21  0:28 ` H. Peter Anvin
@ 2002-08-21  0:28 ` Alan Cox
  2002-08-21  0:59 ` Jeff Garzik
  2002-08-21 16:04 ` Martin J. Bligh
  4 siblings, 0 replies; 12+ messages in thread
From: Alan Cox @ 2002-08-21  0:28 UTC (permalink / raw)
  To: Feldman, Scott
  Cc: 'Troy Wilson', Jeff Garzik, linux-kernel, Martin.Bligh,
	tcw

On Wed, 2002-08-21 at 01:20, Feldman, Scott wrote:
> > +    usec_delay(10000);
> 
> Jeff, 10000 seems on the border of what's OK.  If it's acceptable, then
> let's go for that.  Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.

For some telco and embedded apps 10000 in an IRQ is borderline. One day
the timer stuff will be needed - how hard is it to fix right first time
?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-21  0:20 Feldman, Scott
                   ` (2 preceding siblings ...)
  2002-08-21  0:28 ` Alan Cox
@ 2002-08-21  0:59 ` Jeff Garzik
  2002-08-21 16:04 ` Martin J. Bligh
  4 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2002-08-21  0:59 UTC (permalink / raw)
  To: Feldman, Scott
  Cc: 'Troy Wilson', linux-kernel, Martin.Bligh, tcw, Alan Cox

Feldman, Scott wrote:
>>-    msec_delay(10);
>>+    usec_delay(10000);
> 
> 
> Jeff, 10000 seems on the border of what's OK.  If it's acceptable, then
> let's go for that.  Otherwise, we're going to have to chain several
> mod_timer callbacks together to do a controller reset.


That definitely wants fixing.  Since I like doing resets and similar 
slow-paths in process context -- sleep for as long as you want -- I 
would say kick over to a function called via schedule_task()

Just make sure other parts of the driver that may be called 
asynchronously, such as ethtool ioctls, are disabled.  Remember that 
tx_timeout holds the dev->xmit_lock as well, so spending a long time in 
there is a bad idea in general.

I would probably call netif_carrier_off() first thing in tx_timeout, too.

	Jeff




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mdelay causes BUG, please use udelay
@ 2002-08-21  0:59 Feldman, Scott
  0 siblings, 0 replies; 12+ messages in thread
From: Feldman, Scott @ 2002-08-21  0:59 UTC (permalink / raw)
  To: 'Alan Cox'
  Cc: 'Troy Wilson', Jeff Garzik, linux-kernel, Martin.Bligh,
	tcw


> > Jeff, 10000 seems on the border of what's OK.  If it's acceptable, 
> > then let's go for that.  Otherwise, we're going to have to chain 
> > several mod_timer callbacks together to do a controller reset.
> 
> For some telco and embedded apps 10000 in an IRQ is 
> borderline. One day the timer stuff will be needed - how hard 
> is it to fix right first time ?

Ok, ok, 10000 is bad, even when reseting the part, no problem.  It's not to
hard to fix the right way; I'll work on a patch to give to Jeff.

-scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-20 23:50 mdelay causes BUG, please use udelay Troy Wilson
@ 2002-08-21  1:00 ` Jeff Garzik
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2002-08-21  1:00 UTC (permalink / raw)
  To: Troy Wilson; +Cc: linux-kernel, Martin.Bligh, tcw

Troy Wilson wrote:
> -    msec_delay(10);
> +    usec_delay(10000);


not the right fix :)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mdelay causes BUG, please use udelay
  2002-08-21  0:20 Feldman, Scott
                   ` (3 preceding siblings ...)
  2002-08-21  0:59 ` Jeff Garzik
@ 2002-08-21 16:04 ` Martin J. Bligh
  2002-08-21 16:45   ` Chris Friesen
  2002-08-21 17:21   ` Dave Hansen
  4 siblings, 2 replies; 12+ messages in thread
From: Martin J. Bligh @ 2002-08-21 16:04 UTC (permalink / raw)
  To: Feldman, Scott, 'Troy Wilson', Jeff Garzik; +Cc: linux-kernel, tcw

>> -    msec_delay(10);
>> +    usec_delay(10000);
> 
> Jeff, 10000 seems on the border of what's OK.  If it's acceptable, 
> then let's go for that.  Otherwise, we're going to have to chain 
> several mod_timer callbacks together to do a controller reset.

Whilst this sort of delay in interrupt context is undoubtedly bad
any way we do it, I'd question the context a little more before we
make a decision. This is called from e1000_reset_hw - are we likely
to ever actually call this except under initialisation? If we just
do it once on system boot, I'd say evil hacks (like this) are 
acceptable. If we're going to do this under load, it definitely
needs fixing.

FWIW, this is heavily tested under Apache webserver load on a maxed 
out 8 CPU system with at least 4 (8?) gigabit ethernet cards. Whilst
undoubtedly ugly, it's better than what we have now, so might I 
suggest that we do this for now until a real fix is forthcoming if
we decide it's needed?

Martin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-21 16:04 ` Martin J. Bligh
@ 2002-08-21 16:45   ` Chris Friesen
  2002-08-21 16:56     ` Martin J. Bligh
  2002-08-21 17:21   ` Dave Hansen
  1 sibling, 1 reply; 12+ messages in thread
From: Chris Friesen @ 2002-08-21 16:45 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Feldman, Scott, 'Troy Wilson', Jeff Garzik, linux-kernel,
	tcw

"Martin J. Bligh" wrote:

> Whilst this sort of delay in interrupt context is undoubtedly bad
> any way we do it, I'd question the context a little more before we
> make a decision. This is called from e1000_reset_hw - are we likely
> to ever actually call this except under initialisation?

I currently work on an embedded device and if we detect given network connection isn't working at
all our first response is to switch to a working connection, then we reload the device driver for
the non-working one.  Since we may be doing other things at the same time, having this stall the
machine for extended periods of time is definately not a good thing.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-21 16:45   ` Chris Friesen
@ 2002-08-21 16:56     ` Martin J. Bligh
  0 siblings, 0 replies; 12+ messages in thread
From: Martin J. Bligh @ 2002-08-21 16:56 UTC (permalink / raw)
  To: Chris Friesen
  Cc: Feldman, Scott, 'Troy Wilson', Jeff Garzik, linux-kernel,
	tcw

> I currently work on an embedded device and if we detect given 
> network connection isn't working at all our first response is 
> to switch to a working connection, then we reload the device 
> driver for the non-working one.  Since we may be doing other 
> things at the same time, having this stall the machine for 
> extended periods of time is definately not a good thing.

That's great, IF you have a spare drop out to that net. 

M.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: mdelay causes BUG, please use udelay
  2002-08-21 16:04 ` Martin J. Bligh
  2002-08-21 16:45   ` Chris Friesen
@ 2002-08-21 17:21   ` Dave Hansen
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2002-08-21 17:21 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Feldman, Scott, 'Troy Wilson', Jeff Garzik, linux-kernel,
	tcw

Martin J. Bligh wrote:
>>>-    msec_delay(10);
>>>+    usec_delay(10000);
>>
>>Jeff, 10000 seems on the border of what's OK.  If it's acceptable, 
>>then let's go for that.  Otherwise, we're going to have to chain 
>>several mod_timer callbacks together to do a controller reset.
> 
> Whilst this sort of delay in interrupt context is undoubtedly bad
> any way we do it, I'd question the context a little more before we
> make a decision. This is called from e1000_reset_hw - are we likely
> to ever actually call this except under initialisation?

It doesn't happen often, or under good circumstances.  In certain 
cases, the driver detects that something timed out and it assumes 
something on the card to be dead.  Instead of delaying the 
reinitialization of the dead card with a timer, they just do it during 
the interrupt where the problem was detected.


-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-08-21 17:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-20 23:50 mdelay causes BUG, please use udelay Troy Wilson
2002-08-21  1:00 ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2002-08-21  0:20 Feldman, Scott
2002-08-21  0:27 ` Benjamin LaHaise
2002-08-21  0:28 ` H. Peter Anvin
2002-08-21  0:28 ` Alan Cox
2002-08-21  0:59 ` Jeff Garzik
2002-08-21 16:04 ` Martin J. Bligh
2002-08-21 16:45   ` Chris Friesen
2002-08-21 16:56     ` Martin J. Bligh
2002-08-21 17:21   ` Dave Hansen
2002-08-21  0:59 Feldman, Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox