* REGRESSION: panic on e1000 driver @ 2007-05-30 21:22 Doug Chapman 2007-05-30 22:57 ` Kok, Auke 2007-05-31 1:08 ` Herbert Xu 0 siblings, 2 replies; 17+ messages in thread From: Doug Chapman @ 2007-05-30 21:22 UTC (permalink / raw) To: doug.chapman, herbert, auke-jan.h.kok, netdev, e1000-devel All, I reported this a few weeks ago and it was fixed but it appears the offending code was again re-submitted. This causes a panic on HP Integrity servers and from what I hear many other platforms using e1000 as well. My original report was via kernel.org BZ: http://bugzilla.kernel.org/show_bug.cgi?id=8455 I am seeing the same panic with a kernel built from today's git pull. It was fixed by this commit: 3e1657c8ef53e1cd541cc1e420f3230dc075949b but once again broken just yesterday by the following commit. I have backed just this commit out and verified I no longer panic. commit 47313054352b879a2bc65379d55b05f48a0af7ec Author: Herbert Xu <herbert@gondor.apana.org.au> Date: Tue May 29 15:07:31 2007 -0700 e1000: restore netif_poll_enable call but make sure IRQs are off This restores the previously removed netif_poll_enable call in e1000_open. It's needed on all but the first call to e1000_open for a NIC as e1000_close always calls netif_poll_disable. netif_poll_enable can only be called safely if no polls have been scheduled. This should be the case as long as we don't enter our IRQ handler. In order to guarantee this we explicitly disable IRQs as early as possible when we're probing the NIC. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Kok, Auke" <auke-jan.h.kok@intel.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-30 21:22 REGRESSION: panic on e1000 driver Doug Chapman @ 2007-05-30 22:57 ` Kok, Auke 2007-05-30 23:31 ` Doug Chapman 2007-05-31 0:51 ` Herbert Xu 2007-05-31 1:08 ` Herbert Xu 1 sibling, 2 replies; 17+ messages in thread From: Kok, Auke @ 2007-05-30 22:57 UTC (permalink / raw) To: Doug Chapman, herbert; +Cc: e1000-devel, netdev, auke-jan.h.kok Doug Chapman wrote: > All, > > I reported this a few weeks ago and it was fixed but it appears the > offending code was again re-submitted. This causes a panic on HP > Integrity servers and from what I hear many other platforms using e1000 > as well. > > My original report was via kernel.org BZ: > http://bugzilla.kernel.org/show_bug.cgi?id=8455 > > I am seeing the same panic with a kernel built from today's git pull. > > > It was fixed by this commit: > 3e1657c8ef53e1cd541cc1e420f3230dc075949b > > > but once again broken just yesterday by the following commit. I have > backed just this commit out and verified I no longer panic. > > > commit 47313054352b879a2bc65379d55b05f48a0af7ec > Author: Herbert Xu <herbert@gondor.apana.org.au> > Date: Tue May 29 15:07:31 2007 -0700 > > e1000: restore netif_poll_enable call but make sure IRQs are off Hmm, we're making a mess of it. Herbert, wouldn't it just have been a lot easier to do just add a netif_poll_disable in e1000_probe, so that any and all other poll enable/disables are symmetric ? Something like this? Auke PS: apologies for the whitespace damage... it's a cut'n'paste --- Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 49be393..a8a3c5f 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -1149,6 +1149,7 @@ e1000_probe(struct pci_dev *pdev, /* tell the stack to leave us alone until e1000_open() is called */ netif_carrier_off(netdev); netif_stop_queue(netdev); + netif_poll_disable(netdev); DPRINTK(PROBE, INFO, "Intel(R) PRO/1000 Network Connection\n"); ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-30 22:57 ` Kok, Auke @ 2007-05-30 23:31 ` Doug Chapman 2007-05-31 0:51 ` Herbert Xu 1 sibling, 0 replies; 17+ messages in thread From: Doug Chapman @ 2007-05-30 23:31 UTC (permalink / raw) To: Kok, Auke; +Cc: e1000-devel, netdev, herbert, doug.chapman On Wed, 2007-05-30 at 15:57 -0700, Kok, Auke wrote: > Doug Chapman wrote: > > All, > > > > I reported this a few weeks ago and it was fixed but it appears the > > offending code was again re-submitted. This causes a panic on HP > > Integrity servers and from what I hear many other platforms using e1000 > > as well. > > > > My original report was via kernel.org BZ: > > http://bugzilla.kernel.org/show_bug.cgi?id=8455 > > > > I am seeing the same panic with a kernel built from today's git pull. > > > > > > It was fixed by this commit: > > 3e1657c8ef53e1cd541cc1e420f3230dc075949b > > > > > > but once again broken just yesterday by the following commit. I have > > backed just this commit out and verified I no longer panic. > > > > > > commit 47313054352b879a2bc65379d55b05f48a0af7ec > > Author: Herbert Xu <herbert@gondor.apana.org.au> > > Date: Tue May 29 15:07:31 2007 -0700 > > > > e1000: restore netif_poll_enable call but make sure IRQs are off > > Hmm, we're making a mess of it. > > Herbert, wouldn't it just have been a lot easier to do just add a > netif_poll_disable in e1000_probe, so that any and all other poll > enable/disables are symmetric ? Something like this? > > Auke > > > PS: apologies for the whitespace damage... it's a cut'n'paste > --- > > Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> > > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c > index 49be393..a8a3c5f 100644 > --- a/drivers/net/e1000/e1000_main.c > +++ b/drivers/net/e1000/e1000_main.c > @@ -1149,6 +1149,7 @@ e1000_probe(struct pci_dev *pdev, > /* tell the stack to leave us alone until e1000_open() is called */ > netif_carrier_off(netdev); > netif_stop_queue(netdev); > + netif_poll_disable(netdev); > > DPRINTK(PROBE, INFO, "Intel(R) PRO/1000 Network Connection\n"); Auke, I tested this and it does indeed fix the panic on my systems. I will leave the decision as to if it is the "right" thing to do to those who are more knowledgeable with the code. - Doug ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-30 22:57 ` Kok, Auke 2007-05-30 23:31 ` Doug Chapman @ 2007-05-31 0:51 ` Herbert Xu 2007-05-31 4:51 ` Kok, Auke 1 sibling, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-05-31 0:51 UTC (permalink / raw) To: Kok, Auke; +Cc: Doug Chapman, netdev, e1000-devel On Wed, May 30, 2007 at 03:57:13PM -0700, Kok, Auke wrote: > > Hmm, we're making a mess of it. Indeed :) > Herbert, wouldn't it just have been a lot easier to do just add a > netif_poll_disable in e1000_probe, so that any and all other poll > enable/disables are symmetric ? Something like this? I wish if it were as simple as that. As soon as register_netdev returns somebody else can invoke e1000_open so disabling poll here can be undesirable. In fact the existing netif_stop_queue and netif_carrier_off calls are also bad for the same reason. I'll keep looking to see if there's some other problem that we haven't uncovered yet. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 0:51 ` Herbert Xu @ 2007-05-31 4:51 ` Kok, Auke 2007-05-31 5:06 ` Herbert Xu 0 siblings, 1 reply; 17+ messages in thread From: Kok, Auke @ 2007-05-31 4:51 UTC (permalink / raw) To: Herbert Xu; +Cc: e1000-devel, netdev, Doug Chapman Herbert Xu wrote: > On Wed, May 30, 2007 at 03:57:13PM -0700, Kok, Auke wrote: >> Hmm, we're making a mess of it. > > Indeed :) > >> Herbert, wouldn't it just have been a lot easier to do just add a >> netif_poll_disable in e1000_probe, so that any and all other poll >> enable/disables are symmetric ? Something like this? > > I wish if it were as simple as that. As soon as register_netdev > returns somebody else can invoke e1000_open so disabling poll > here can be undesirable. In fact the existing netif_stop_queue > and netif_carrier_off calls are also bad for the same reason. this has been an age-old confusion that I never grasped either, so I perfectly understand why you added the explicit e1000_disable_irq call in the other patch (and think thats a great idea). But really, there should be a way for a driver to tell the stack that it should really keep it's hands off :) BTW e1000 currently triggers a single irq manually in the watchdog as link goes up, so that might be the one that is giving problems now. In any case I can't reproduce any of it - perhaps my hardware is too fast. Time to whip out the pIII :o Auke ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 4:51 ` Kok, Auke @ 2007-05-31 5:06 ` Herbert Xu 2007-05-31 14:54 ` Kok, Auke 0 siblings, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-05-31 5:06 UTC (permalink / raw) To: Kok, Auke; +Cc: Doug Chapman, netdev, e1000-devel On Wed, May 30, 2007 at 09:51:14PM -0700, Kok, Auke wrote: > > this has been an age-old confusion that I never grasped either, so I > perfectly understand why you added the explicit e1000_disable_irq call in > the other patch (and think thats a great idea). But really, there should be > a way for a driver to tell the stack that it should really keep it's hands > off :) Well yes, you can get the stack to keep away by not registering your device :) > BTW e1000 currently triggers a single irq manually in the watchdog as link > goes up, so that might be the one that is giving problems now. In any case > I can't reproduce any of it - perhaps my hardware is too fast. Time to whip > out the pIII :o Hmm, if it's triggered by the watchdog then that means the watchdog has been scheduled. However, it seems that the only way to schedule it is through an interrupt? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 5:06 ` Herbert Xu @ 2007-05-31 14:54 ` Kok, Auke 2007-05-31 21:49 ` Herbert Xu 0 siblings, 1 reply; 17+ messages in thread From: Kok, Auke @ 2007-05-31 14:54 UTC (permalink / raw) To: Herbert Xu; +Cc: Doug Chapman, netdev, e1000-devel Herbert Xu wrote: > On Wed, May 30, 2007 at 09:51:14PM -0700, Kok, Auke wrote: >> this has been an age-old confusion that I never grasped either, so I >> perfectly understand why you added the explicit e1000_disable_irq call in >> the other patch (and think thats a great idea). But really, there should be >> a way for a driver to tell the stack that it should really keep it's hands >> off :) > > Well yes, you can get the stack to keep away by not registering your > device :) *blunt* so how about calling netif_poll_disable() before we register the net_device? >> BTW e1000 currently triggers a single irq manually in the watchdog as link >> goes up, so that might be the one that is giving problems now. In any case >> I can't reproduce any of it - perhaps my hardware is too fast. Time to whip >> out the pIII :o > > Hmm, if it's triggered by the watchdog then that means the watchdog has > been scheduled. However, it seems that the only way to schedule it is > through an interrupt? well no, if we make the watchdog (this is something I've already implemented locally and -mm has it for instance) run as delayed work we can just schedule a watchdog run instead of firing an interrupt. I'm just not sure that would relieve the situation Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 14:54 ` Kok, Auke @ 2007-05-31 21:49 ` Herbert Xu 2007-05-31 22:11 ` Kok, Auke 0 siblings, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-05-31 21:49 UTC (permalink / raw) To: Kok, Auke; +Cc: Doug Chapman, netdev, e1000-devel On Thu, May 31, 2007 at 07:54:31AM -0700, Kok, Auke wrote: > > so how about calling netif_poll_disable() before we register the net_device? Yes that should work. Let's move the other two netif_ calls while we're at it. > well no, if we make the watchdog (this is something I've already > implemented locally and -mm has it for instance) run as delayed work we can > just schedule a watchdog run instead of firing an interrupt. OK. The only thing I'm worried about is if we're uncovering some underlying problem with the platform with respect to memory barriers then we should get to the bottom of it first before making this problem disappear. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 21:49 ` Herbert Xu @ 2007-05-31 22:11 ` Kok, Auke 0 siblings, 0 replies; 17+ messages in thread From: Kok, Auke @ 2007-05-31 22:11 UTC (permalink / raw) To: Herbert Xu; +Cc: Doug Chapman, netdev, e1000-devel Herbert Xu wrote: > On Thu, May 31, 2007 at 07:54:31AM -0700, Kok, Auke wrote: >> so how about calling netif_poll_disable() before we register the net_device? > > Yes that should work. Let's move the other two netif_ calls while we're > at it. > >> well no, if we make the watchdog (this is something I've already >> implemented locally and -mm has it for instance) run as delayed work we can >> just schedule a watchdog run instead of firing an interrupt. > > OK. The only thing I'm worried about is if we're uncovering some > underlying problem with the platform with respect to memory barriers > then we should get to the bottom of it first before making this problem > disappear. agreed. I will try to send a patch to do this later today. Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-30 21:22 REGRESSION: panic on e1000 driver Doug Chapman 2007-05-30 22:57 ` Kok, Auke @ 2007-05-31 1:08 ` Herbert Xu 2007-05-31 15:16 ` Doug Chapman 1 sibling, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-05-31 1:08 UTC (permalink / raw) To: Doug Chapman; +Cc: auke-jan.h.kok, netdev, e1000-devel On Wed, May 30, 2007 at 05:22:30PM -0400, Doug Chapman wrote: > > but once again broken just yesterday by the following commit. I have > backed just this commit out and verified I no longer panic. Hmm, the only way I can see this happening is if the hardware signals an interrupt even though we've explicitly shut it off. Does this patch help? If it does help, does it produce the warning in dmesg? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 9ec35b7..c6cf82f 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -625,15 +625,15 @@ e1000_down(struct e1000_adapter *adapter) { struct net_device *netdev = adapter->netdev; - /* signal that we're down so the interrupt handler does not - * reschedule our watchdog timer */ - set_bit(__E1000_DOWN, &adapter->flags); - #ifdef CONFIG_E1000_NAPI netif_poll_disable(netdev); #endif e1000_irq_disable(adapter); + /* signal that we're down so the interrupt handler does not + * reschedule our watchdog timer */ + set_bit(__E1000_DOWN, &adapter->flags); + del_timer_sync(&adapter->tx_fifo_stall_timer); del_timer_sync(&adapter->watchdog_timer); del_timer_sync(&adapter->phy_info_timer); @@ -1427,6 +1427,10 @@ e1000_open(struct net_device *netdev) * clean_rx handler before we do so. */ e1000_configure(adapter); +#ifdef CONFIG_E1000_NAPI + netif_poll_enable(netdev); +#endif + err = e1000_request_irq(adapter); if (err) goto err_req_irq; @@ -1434,10 +1438,6 @@ e1000_open(struct net_device *netdev) /* From here on the code is the same as e1000_up() */ clear_bit(__E1000_DOWN, &adapter->flags); -#ifdef CONFIG_E1000_NAPI - netif_poll_enable(netdev); -#endif - e1000_irq_enable(adapter); /* fire a link status change interrupt to start the watchdog */ @@ -3756,6 +3756,11 @@ e1000_intr_msi(int irq, void *data) #endif uint32_t icr = E1000_READ_REG(hw, ICR); + if (unlikely(test_bit(__E1000_DOWN, &adapter->flags) && + atomic_read(&adapter->irq_sem))) + printk(KERN_WARNING "e1000: Unexpected MSI icr=0x%x\n", + icr); + #ifdef CONFIG_E1000_NAPI /* read ICR disables interrupts using IAM, so keep up with our * enable/disable accounting */ @@ -3823,6 +3828,11 @@ e1000_intr(int irq, void *data) if (unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ + if (unlikely(test_bit(__E1000_DOWN, &adapter->flags) && + atomic_read(&adapter->irq_sem))) + printk(KERN_WARNING "e1000: Unexpected interrupt icr=0x%x\n", + icr); + #ifdef CONFIG_E1000_NAPI /* IMS will not auto-mask if INT_ASSERTED is not set, and if it is * not set, then the adapter didn't send an interrupt */ ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 1:08 ` Herbert Xu @ 2007-05-31 15:16 ` Doug Chapman 2007-05-31 15:23 ` [E1000-devel] " Kok, Auke 2007-05-31 22:10 ` Herbert Xu 0 siblings, 2 replies; 17+ messages in thread From: Doug Chapman @ 2007-05-31 15:16 UTC (permalink / raw) To: Herbert Xu; +Cc: e1000-devel, netdev, auke-jan.h.kok, doug.chapman On Thu, 2007-05-31 at 11:08 +1000, Herbert Xu wrote: > On Wed, May 30, 2007 at 05:22:30PM -0400, Doug Chapman wrote: > > > > but once again broken just yesterday by the following commit. I have > > backed just this commit out and verified I no longer panic. > > Hmm, the only way I can see this happening is if the hardware signals > an interrupt even though we've explicitly shut it off. Does this patch > help? If it does help, does it produce the warning in dmesg? > > Thanks, The patch does fix the panic. Here are all the e1000 messages from dmesg after a bootup. Note that only eth0 is actually configured and connected: # dmesg | grep e1000 e1000: 0000:01:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f8 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: 0000:01:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f9 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: 0000:15:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c4 e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: 0000:15:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c5 e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: Unexpected interrupt icr=0x4 e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX - Doug ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [E1000-devel] REGRESSION: panic on e1000 driver 2007-05-31 15:16 ` Doug Chapman @ 2007-05-31 15:23 ` Kok, Auke 2007-05-31 21:48 ` Herbert Xu 2007-05-31 22:10 ` Herbert Xu 1 sibling, 1 reply; 17+ messages in thread From: Kok, Auke @ 2007-05-31 15:23 UTC (permalink / raw) To: Doug Chapman, Herbert Xu; +Cc: e1000-devel, netdev Doug Chapman wrote: > On Thu, 2007-05-31 at 11:08 +1000, Herbert Xu wrote: >> On Wed, May 30, 2007 at 05:22:30PM -0400, Doug Chapman wrote: >>> but once again broken just yesterday by the following commit. I have >>> backed just this commit out and verified I no longer panic. >> Hmm, the only way I can see this happening is if the hardware signals >> an interrupt even though we've explicitly shut it off. Does this patch >> help? If it does help, does it produce the warning in dmesg? >> >> Thanks, > > > The patch does fix the panic. Here are all the e1000 messages from > dmesg after a bootup. Note that only eth0 is actually configured and > connected: > > > # dmesg | grep e1000 > e1000: 0000:01:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f8 > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: 0000:01:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f9 > e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: 0000:15:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c4 > e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: 0000:15:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c5 > e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: Unexpected interrupt icr=0x4 that's indeed the link status change we fire manually to trigger the watchdog run: #define E1000_ICR_LSC 0x00000004 /* Link Status Change */ Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [E1000-devel] REGRESSION: panic on e1000 driver 2007-05-31 15:23 ` [E1000-devel] " Kok, Auke @ 2007-05-31 21:48 ` Herbert Xu 2007-05-31 22:10 ` Kok, Auke 0 siblings, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-05-31 21:48 UTC (permalink / raw) To: Kok, Auke; +Cc: Doug Chapman, e1000-devel, netdev On Thu, May 31, 2007 at 08:23:24AM -0700, Kok, Auke wrote: > > >e1000: 0000:01:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f8 > >e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > >e1000: 0000:01:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f9 > >e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection > >e1000: 0000:15:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c4 > >e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection > >e1000: 0000:15:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c5 > >e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection > >e1000: Unexpected interrupt icr=0x4 > > that's indeed the link status change we fire manually to trigger the > watchdog run: > > #define E1000_ICR_LSC 0x00000004 /* Link Status Change */ This still makes no sense. The only triggers I can find for this occur after e1000_irq_enable. So unless we've got a problem with memory barriers we shouldn't get the above printk. Is there another trigger that happens earlier? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [E1000-devel] REGRESSION: panic on e1000 driver 2007-05-31 21:48 ` Herbert Xu @ 2007-05-31 22:10 ` Kok, Auke 0 siblings, 0 replies; 17+ messages in thread From: Kok, Auke @ 2007-05-31 22:10 UTC (permalink / raw) To: Herbert Xu; +Cc: Doug Chapman, e1000-devel, netdev Herbert Xu wrote: > On Thu, May 31, 2007 at 08:23:24AM -0700, Kok, Auke wrote: >>> e1000: 0000:01:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f8 >>> e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection >>> e1000: 0000:01:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f9 >>> e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection >>> e1000: 0000:15:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c4 >>> e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection >>> e1000: 0000:15:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c5 >>> e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection >>> e1000: Unexpected interrupt icr=0x4 >> that's indeed the link status change we fire manually to trigger the >> watchdog run: >> >> #define E1000_ICR_LSC 0x00000004 /* Link Status Change */ > > This still makes no sense. The only triggers I can find for this occur > after e1000_irq_enable. So unless we've got a problem with memory > barriers we shouldn't get the above printk. > > Is there another trigger that happens earlier? none that I know.... Auke ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 15:16 ` Doug Chapman 2007-05-31 15:23 ` [E1000-devel] " Kok, Auke @ 2007-05-31 22:10 ` Herbert Xu 2007-05-31 22:38 ` Doug Chapman 1 sibling, 1 reply; 17+ messages in thread From: Herbert Xu @ 2007-05-31 22:10 UTC (permalink / raw) To: Doug Chapman; +Cc: auke-jan.h.kok, netdev, e1000-devel On Thu, May 31, 2007 at 11:16:09AM -0400, Doug Chapman wrote: > > # dmesg | grep e1000 > e1000: 0000:01:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f8 > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: 0000:01:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f9 > e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: 0000:15:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c4 > e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: 0000:15:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c5 > e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: Unexpected interrupt icr=0x4 > e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: RX OK that's interesting. Does this patch (please unapply the last one first) make the printk go away or does it trigger a backtrace? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 9ec35b7..167de10 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -625,15 +625,15 @@ e1000_down(struct e1000_adapter *adapter) { struct net_device *netdev = adapter->netdev; - /* signal that we're down so the interrupt handler does not - * reschedule our watchdog timer */ - set_bit(__E1000_DOWN, &adapter->flags); - #ifdef CONFIG_E1000_NAPI netif_poll_disable(netdev); #endif e1000_irq_disable(adapter); + /* signal that we're down so the interrupt handler does not + * reschedule our watchdog timer */ + set_bit(__E1000_DOWN, &adapter->flags); + del_timer_sync(&adapter->tx_fifo_stall_timer); del_timer_sync(&adapter->watchdog_timer); del_timer_sync(&adapter->phy_info_timer); @@ -1328,6 +1328,7 @@ e1000_sw_init(struct e1000_adapter *adapter) /* Explicitly disable IRQ since the NIC can be in any state. */ atomic_set(&adapter->irq_sem, 0); e1000_irq_disable(adapter); + WARN_ON(E1000_READ_REG(&adapter->hw, ICR)); spin_lock_init(&adapter->stats_lock); @@ -1427,6 +1428,10 @@ e1000_open(struct net_device *netdev) * clean_rx handler before we do so. */ e1000_configure(adapter); +#ifdef CONFIG_E1000_NAPI + netif_poll_enable(netdev); +#endif + err = e1000_request_irq(adapter); if (err) goto err_req_irq; @@ -1434,9 +1439,7 @@ e1000_open(struct net_device *netdev) /* From here on the code is the same as e1000_up() */ clear_bit(__E1000_DOWN, &adapter->flags); -#ifdef CONFIG_E1000_NAPI - netif_poll_enable(netdev); -#endif + WARN_ON(E1000_READ_REG(&adapter->hw, ICR)); e1000_irq_enable(adapter); @@ -3756,6 +3759,13 @@ e1000_intr_msi(int irq, void *data) #endif uint32_t icr = E1000_READ_REG(hw, ICR); + if (unlikely(test_bit(__E1000_DOWN, &adapter->flags))) { + mb(); + if (unlikely(atomic_read(&adapter->irq_sem))) + printk(KERN_WARNING "e1000: Unexpected MSI icr=0x%x\n", + icr); + } + #ifdef CONFIG_E1000_NAPI /* read ICR disables interrupts using IAM, so keep up with our * enable/disable accounting */ @@ -3823,6 +3833,14 @@ e1000_intr(int irq, void *data) if (unlikely(!icr)) return IRQ_NONE; /* Not our interrupt */ + + if (unlikely(test_bit(__E1000_DOWN, &adapter->flags))) { + mb(); + if (unlikely(atomic_read(&adapter->irq_sem))) + printk(KERN_WARNING + "e1000: Unexpected interrupt icr=0x%x\n", icr); + } + #ifdef CONFIG_E1000_NAPI /* IMS will not auto-mask if INT_ASSERTED is not set, and if it is * not set, then the adapter didn't send an interrupt */ ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 22:10 ` Herbert Xu @ 2007-05-31 22:38 ` Doug Chapman 2007-06-01 0:34 ` Herbert Xu 0 siblings, 1 reply; 17+ messages in thread From: Doug Chapman @ 2007-05-31 22:38 UTC (permalink / raw) To: Herbert Xu; +Cc: auke-jan.h.kok, netdev, e1000-devel On Fri, 2007-06-01 at 08:10 +1000, Herbert Xu wrote: > On Thu, May 31, 2007 at 11:16:09AM -0400, Doug Chapman wrote: > > > > # dmesg | grep e1000 > > e1000: 0000:01:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f8 > > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > > e1000: 0000:01:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:15:60:04:d7:f9 > > e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection > > e1000: 0000:15:02.0: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c4 > > e1000: eth2: e1000_probe: Intel(R) PRO/1000 Network Connection > > e1000: 0000:15:02.1: e1000_probe: (PCI-X:66MHz:64-bit) 00:12:79:9e:b7:c5 > > e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection > > e1000: Unexpected interrupt icr=0x4 > > e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow > > Control: RX > > OK that's interesting. > > Does this patch (please unapply the last one first) make the printk > go away or does it trigger a backtrace? > > Thanks, I get a backtrace as it probes each e1000 device and I also still get the unexpected interrupt message. WARNING: at drivers/net/e1000/e1000_main.c:1331 e1000_sw_init() Call Trace: [<a000000100013c20>] show_stack+0x40/0xa0 sp=e0000017e50c7b30 bsp=e0000017e50c1120 [<a000000100013cb0>] dump_stack+0x30/0x60 sp=e0000017e50c7d00 bsp=e0000017e50c1108 [<a000000236c47a40>] e1000_probe+0xce0/0x1ee0 [e1000] sp=e0000017e50c7d00 bsp=e0000017e50c1098 [<a0000001002d5540>] pci_device_probe+0x260/0x3a0 sp=e0000017e50c7d10 bsp=e0000017e50c1050 [<a0000001003e2d10>] driver_probe_device+0x250/0x380 sp=e0000017e50c7de0 bsp=e0000017e50c1010 [<a0000001003e3120>] __driver_attach+0x100/0x1a0 sp=e0000017e50c7de0 bsp=e0000017e50c0fd8 [<a0000001003e10c0>] bus_for_each_dev+0x80/0x100 sp=e0000017e50c7de0 bsp=e0000017e50c0fa0 [<a0000001003e28e0>] driver_attach+0x40/0x60 sp=e0000017e50c7e00 bsp=e0000017e50c0f80 [<a0000001003e18b0>] bus_add_driver+0xf0/0x3c0 sp=e0000017e50c7e00 bsp=e0000017e50c0f40 [<a0000001003e3680>] driver_register+0x140/0x160 sp=e0000017e50c7e00 bsp=e0000017e50c0f20 [<a0000001002d5a10>] __pci_register_driver+0xd0/0x160 sp=e0000017e50c7e00 bsp=e0000017e50c0ee8 [<a0000002363400b0>] e1000_init_module+0xb0/0x1b0 [e1000] sp=e0000017e50c7e00 bsp=e0000017e50c0ec8 [<a0000001000cd440>] sys_init_module+0x2dc0/0x31a0 sp=e0000017e50c7e00 bsp=e0000017e50c0d48 [<a00000010000bae0>] ia64_ret_from_syscall+0x0/0x20 sp=e0000017e50c7e30 bsp=e0000017e50c0d48 - Doug ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: REGRESSION: panic on e1000 driver 2007-05-31 22:38 ` Doug Chapman @ 2007-06-01 0:34 ` Herbert Xu 0 siblings, 0 replies; 17+ messages in thread From: Herbert Xu @ 2007-06-01 0:34 UTC (permalink / raw) To: Doug Chapman; +Cc: e1000-devel, netdev, auke-jan.h.kok On Thu, May 31, 2007 at 06:38:28PM -0400, Doug Chapman wrote: > > I get a backtrace as it probes each e1000 device and I also still get > the unexpected interrupt message. > > > WARNING: at drivers/net/e1000/e1000_main.c:1331 e1000_sw_init() Thanks for testing! Although I still don't know what caused the interrupt in your case, it is clear that we need to be able to deal with interrupts as soon as the handler is registered since the cause register is not affected by e1000_irq_disable and a shared interrupt can easily be mistaken as our own. So Auke's solution of doing netif_poll_disable should fix this problem. In looking at this I've found a couple of other problems: 1) Race between IRQ handler and e1000_open: A shared/spurious interrupt can cause this: CPU0 CPU1 e1000_open request_irq spurious/shared IRQ e1000_interrupt e1000_irq_enable atomic_dec_* atomic_inc IMC <- ~0 IMS <- MASK So we end up with IRQs enabled when they shouldn't be. 2) Race between IRQ handler and e1000_clean (and other mgmt functions): Again shared/spurious interrupts may cause problems: CPU0 CPU1 e1000_clean do work spurious/shared IRQ e1000_interrupt clear ICR netif_rx_schedule_prep fails e1000_irq_enable netif_rx_complete e1000_irq_enable At this point IRQs are on but we've lost an interrupt. We can fix this by 1) Ignoring IRQs when irq_sem > 0. 2) Always generate an IRQ after e1000_irq_enable. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-06-01 0:34 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-30 21:22 REGRESSION: panic on e1000 driver Doug Chapman 2007-05-30 22:57 ` Kok, Auke 2007-05-30 23:31 ` Doug Chapman 2007-05-31 0:51 ` Herbert Xu 2007-05-31 4:51 ` Kok, Auke 2007-05-31 5:06 ` Herbert Xu 2007-05-31 14:54 ` Kok, Auke 2007-05-31 21:49 ` Herbert Xu 2007-05-31 22:11 ` Kok, Auke 2007-05-31 1:08 ` Herbert Xu 2007-05-31 15:16 ` Doug Chapman 2007-05-31 15:23 ` [E1000-devel] " Kok, Auke 2007-05-31 21:48 ` Herbert Xu 2007-05-31 22:10 ` Kok, Auke 2007-05-31 22:10 ` Herbert Xu 2007-05-31 22:38 ` Doug Chapman 2007-06-01 0:34 ` Herbert Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).