* tulip_rxtx_stop() on Cobalt Qube2 @ 2009-05-01 10:38 Florian Fainelli 2009-05-03 11:32 ` Grant Grundler 2009-05-31 1:40 ` Grant Grundler 0 siblings, 2 replies; 7+ messages in thread From: Florian Fainelli @ 2009-05-01 10:38 UTC (permalink / raw) To: netdev, grundler, kyle Hi Grant, Kyle, I just updated my Qube2 to run a 2.6.30-rc2-00476-gd678033 kernel (also seen on a 2.6.29-rc2-00462-gfa04b54) and I get the following message while booting the box now, I have not git bisected the offending commit yet : [snip] Linux Tulip driver version 1.1.15-NAPI (Feb 27, 2007) PCI: Enabling device 0000:00:07.0 (0045 -> 0047) tulip0: Old format EEPROM on 'Cobalt Microserver' board. Using substitute media control info. tulip0: EEPROM default media type Autosense. tulip0: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. tulip0: MII transceiver #1 config 1000 status 7809 advertising 01e1. eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082000, 00:10:e0:00:7d:1f, IRQ 19. PCI: Enabling device 0000:00:0c.0 (0005 -> 0007) tulip1: Old format EEPROM on 'Cobalt Microserver' board. Using substitute media control info. tulip1: EEPROM default media type Autosense. tulip1: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. tulip1: MII transceiver #1 config 1000 status 7809 advertising 01e1. eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400, 00:10:e0:00:88:b9, IRQ 20. [snip] 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202) eth0: Setting full-duplex based on MII#1 link partner capability of 45e1. The interface is still fully functional, shall we increase the timeount in tulip_stop_rxtx() to prevent this message from appearing ? Thanks a lot in advance for your answer. -- Best regards, Florian Fainelli Email : florian@openwrt.org http://openwrt.org ------------------------------- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tulip_rxtx_stop() on Cobalt Qube2 2009-05-01 10:38 tulip_rxtx_stop() on Cobalt Qube2 Florian Fainelli @ 2009-05-03 11:32 ` Grant Grundler 2009-05-03 19:45 ` Florian Fainelli 2009-05-31 1:40 ` Grant Grundler 1 sibling, 1 reply; 7+ messages in thread From: Grant Grundler @ 2009-05-03 11:32 UTC (permalink / raw) To: Florian Fainelli; +Cc: netdev, grundler, kyle On Fri, May 01, 2009 at 12:38:57PM +0200, Florian Fainelli wrote: > Hi Grant, Kyle, > > I just updated my Qube2 to run a 2.6.30-rc2-00476-gd678033 kernel (also seen > on a 2.6.29-rc2-00462-gfa04b54) and I get the following message while booting > the box now, I have not git bisected the offending commit yet : > > [snip] > Linux Tulip driver version 1.1.15-NAPI (Feb 27, 2007) > PCI: Enabling device 0000:00:07.0 (0045 -> 0047) > tulip0: Old format EEPROM on 'Cobalt Microserver' board. Using substitute > media control info. > tulip0: EEPROM default media type Autosense. > tulip0: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. > tulip0: MII transceiver #1 config 1000 status 7809 advertising 01e1. > eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082000, 00:10:e0:00:7d:1f, > IRQ 19. > PCI: Enabling device 0000:00:0c.0 (0005 -> 0007) > tulip1: Old format EEPROM on 'Cobalt Microserver' board. Using substitute > media control info. > tulip1: EEPROM default media type Autosense. > tulip1: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. > tulip1: MII transceiver #1 config 1000 status 7809 advertising 01e1. > eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400, 00:10:e0:00:88:b9, > IRQ 20. > [snip] > 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202) I added the additional output. I'll need to grab the manuals and look up the bits. > eth0: Setting full-duplex based on MII#1 link partner capability of 45e1. > > The interface is still fully functional, shall we increase the timeount in > tulip_stop_rxtx() to prevent this message from appearing ? I expect the timeout (1.3ms) is long enough to cover "normal" cases. If something is taking longer than "normal", I'd like to know. If increasing it to 1.5 or 2ms makes this go away, I don't think I'll object. hth, grant > > Thanks a lot in advance for your answer. > -- > Best regards, Florian Fainelli > Email : florian@openwrt.org > http://openwrt.org > ------------------------------- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tulip_rxtx_stop() on Cobalt Qube2 2009-05-03 11:32 ` Grant Grundler @ 2009-05-03 19:45 ` Florian Fainelli 0 siblings, 0 replies; 7+ messages in thread From: Florian Fainelli @ 2009-05-03 19:45 UTC (permalink / raw) To: Grant Grundler; +Cc: netdev, kyle Hi Grant, Le Sunday 03 May 2009 13:32:09 Grant Grundler, vous avez écrit : > On Fri, May 01, 2009 at 12:38:57PM +0200, Florian Fainelli wrote: > I added the additional output. I'll need to grab the manuals and > look up the bits. That would be great, thanks. > > > eth0: Setting full-duplex based on MII#1 link partner capability of 45e1. > > > > The interface is still fully functional, shall we increase the timeount > > in tulip_stop_rxtx() to prevent this message from appearing ? > > I expect the timeout (1.3ms) is long enough to cover "normal" cases. > If something is taking longer than "normal", I'd like to know. > > If increasing it to 1.5 or 2ms makes this go away, I don't > think I'll object. Neither 1.5 nor 2 ms prevent this message from showing up. I have not checked CoLo but maybe the bootloader has already configured the Ethernet interfaces and therefore this check fails. -- Best regards, Florian Fainelli Email : florian@openwrt.org http://openwrt.org ------------------------------- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tulip_rxtx_stop() on Cobalt Qube2 2009-05-01 10:38 tulip_rxtx_stop() on Cobalt Qube2 Florian Fainelli 2009-05-03 11:32 ` Grant Grundler @ 2009-05-31 1:40 ` Grant Grundler 2009-05-31 21:02 ` Florian Fainelli 1 sibling, 1 reply; 7+ messages in thread From: Grant Grundler @ 2009-05-31 1:40 UTC (permalink / raw) To: Florian Fainelli; +Cc: netdev, grundler, kyle Florian, Summary: proposed patch below for you to test (I've not tested it yet) and some explanation on what I think is happening. On Fri, May 01, 2009 at 12:38:57PM +0200, Florian Fainelli wrote: ... > Linux Tulip driver version 1.1.15-NAPI (Feb 27, 2007) ... > PCI: Enabling device 0000:00:0c.0 (0005 -> 0007) > tulip1: Old format EEPROM on 'Cobalt Microserver' board. Using substitute > media control info. > tulip1: EEPROM default media type Autosense. > tulip1: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. > tulip1: MII transceiver #1 config 1000 status 7809 advertising 01e1. > eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400, 00:10:e0:00:88:b9, > IRQ 20. > [snip] > 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202) Looking up these bits in the publicly available manual: ftp://download.intel.com/design/network/manuals/27807401.pdf "Rev 65" == 0x41 == 21143-PD or 21143-TD (page 3-7) Operation Mode Register (CSR6ÂOffset 30H) ---- bit : Val 31 : 1 Special Capture Effect Enable 30 : 1 Receive All 13 : 1 ST - Start Transmission 9 : 1 Full Duplex 1 : 1 SR - Start Receive Status Register (CSR5ÂOffset 28H) ---- 28-31 : reserved 23-25 : 0x6 TX State = Suspended--Transmit FIFO underflow, or an unavailable transmit descriptor 20-22 : 0x3 RX State = Running--Waiting for receive packet 1 : 0x0 TX isn't stopped 0 : 0x0 No TX Interrupt pending The RX/TX engines are in a wedged state to begin with. :( The normal calling path here is tulip_init() to register the driver callbacks and tulip_init_one() gets called immediately by pci subsystem. tulip_up() gets called after netdev registration when someone ifconfig's the device (ifconfig up). I don't know when else tulip_up() is called. I have two ideas on how to fix this: 1) reset the RX/TX engines in tulip_init_one() before tulip_stop_rxtx(). 2) reset the RX/TX engines in tulip_stop_rxtx() if they are "wedged". 3) remove tulip_stop_rxtx() call in tulip_init_one() (1) seems like a reasonable thing to do at init time anyway. (2) feels like a pretty big hammer and I don't know all the side effects. (3) tulip_up() will reset the RX/TX engine and call tulip_stop_rxtx() when the NIC is opene/ifconfig'd. Calling pci_set_master() will allow the device to scribble into Host Memory...probably should move that call to tulip_up() as well. But I don't see anything in tulip_init_one() that requires DMA (and thus pci_set_master()). Is there any reason for NOT implementing (1) and (3)? I still need to work out where to call pci_set_master() in tulip_up(). If that's feasible, hrm. WTF. tulip_stop_rxtx() is called again from tulip_up() right before CSR6 is written. Earlier in tulip_up() we reset the chip. Removed. Can you test the patch below? I have not tested or even compiled this patch...will do so on parisc/ia64 machines once I get some feedback on this patch. And I just noticed pci_clear_master() is not called *anywhere*. :( Need to add such a call after tulip_stop_rxtx() some place (many places?). This patch is just RFC and not suitable for merging upstream. many thanks, grant diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c index 2abb5d3..1aa058e 100644 --- a/drivers/net/tulip/tulip_core.c +++ b/drivers/net/tulip/tulip_core.c @@ -470,11 +470,12 @@ media_picked: tulip_select_media(dev, 1); /* Start the chip's Tx to process setup frame. */ - tulip_stop_rxtx(tp); barrier(); udelay(5); iowrite32(tp->csr6 | TxOn, ioaddr + CSR6); + pci_set_master(pdev); /* enabled DMA */ + /* Enable interrupts by setting the interrupt mask. */ iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5); iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); @@ -1422,11 +1423,6 @@ static int __devinit tulip_init_one (struct pci_dev *pdev, tulip_mwi_config (pdev, dev); #endif - /* Stop the chip's Tx and Rx processes. */ - tulip_stop_rxtx(tp); - - pci_set_master(pdev); - #ifdef CONFIG_GSC if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP) { switch (pdev->subsystem_device) { ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: tulip_rxtx_stop() on Cobalt Qube2 2009-05-31 1:40 ` Grant Grundler @ 2009-05-31 21:02 ` Florian Fainelli 2009-05-31 23:43 ` Grant Grundler 0 siblings, 1 reply; 7+ messages in thread From: Florian Fainelli @ 2009-05-31 21:02 UTC (permalink / raw) To: Grant Grundler; +Cc: netdev, kyle Hi Grant, Le Sunday 31 May 2009 03:40:36 Grant Grundler, vous avez écrit : > Florian, > Summary: proposed patch below for you to test (I've not tested it yet) > and some explanation on what I think is happening. > > > On Fri, May 01, 2009 at 12:38:57PM +0200, Florian Fainelli wrote: > ... > > > Linux Tulip driver version 1.1.15-NAPI (Feb 27, 2007) > > ... > > > PCI: Enabling device 0000:00:0c.0 (0005 -> 0007) > > tulip1: Old format EEPROM on 'Cobalt Microserver' board. Using > > substitute media control info. > > tulip1: EEPROM default media type Autosense. > > tulip1: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) > > block. tulip1: MII transceiver #1 config 1000 status 7809 advertising > > 01e1. eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400, > > 00:10:e0:00:88:b9, IRQ 20. > > [snip] > > 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202) > > Looking up these bits in the publicly available manual: > ftp://download.intel.com/design/network/manuals/27807401.pdf > > "Rev 65" == 0x41 == 21143-PD or 21143-TD (page 3-7) > > Operation Mode Register (CSR6ÂOffset 30H) > ---- > bit : Val > 31 : 1 Special Capture Effect Enable > 30 : 1 Receive All > 13 : 1 ST - Start Transmission > 9 : 1 Full Duplex > 1 : 1 SR - Start Receive > > Status Register (CSR5ÂOffset 28H) > ---- > 28-31 : reserved > 23-25 : 0x6 TX State = Suspended--Transmit FIFO underflow, or an > unavailable transmit descriptor 20-22 : 0x3 RX State = Running--Waiting for > receive packet > 1 : 0x0 TX isn't stopped > 0 : 0x0 No TX Interrupt pending > > The RX/TX engines are in a wedged state to begin with. :( I suppose this is due to the Bootloader, either CoLo or the original Cobalt microservers bootloader. > > The normal calling path here is tulip_init() to register the driver > callbacks and tulip_init_one() gets called immediately by pci subsystem. > tulip_up() gets called after netdev registration when someone ifconfig's > the device (ifconfig up). I don't know when else tulip_up() is called. > > I have two ideas on how to fix this: > 1) reset the RX/TX engines in tulip_init_one() before tulip_stop_rxtx(). > 2) reset the RX/TX engines in tulip_stop_rxtx() if they are "wedged". > 3) remove tulip_stop_rxtx() call in tulip_init_one() > > > (1) seems like a reasonable thing to do at init time anyway. > (2) feels like a pretty big hammer and I don't know all the side effects. > (3) tulip_up() will reset the RX/TX engine and call tulip_stop_rxtx() > when the NIC is opene/ifconfig'd. Calling pci_set_master() > will allow the device to scribble into Host Memory...probably > should move that call to tulip_up() as well. But I don't see anything > in tulip_init_one() that requires DMA (and thus pci_set_master()). > > Is there any reason for NOT implementing (1) and (3)? > > I still need to work out where to call pci_set_master() in > tulip_up(). If that's feasible, > > hrm. WTF. tulip_stop_rxtx() is called again from tulip_up() right before > CSR6 is written. Earlier in tulip_up() we reset the chip. Removed. > > Can you test the patch below? > > I have not tested or even compiled this patch...will do so on parisc/ia64 > machines once I get some feedback on this patch. > > And I just noticed pci_clear_master() is not called *anywhere*. :( > Need to add such a call after tulip_stop_rxtx() some place (many places?). > This patch is just RFC and not suitable for merging upstream. The patch below does not help on my Qube2, I am still having the same message appearing. > > many thanks, > grant > > > diff --git a/drivers/net/tulip/tulip_core.c > b/drivers/net/tulip/tulip_core.c index 2abb5d3..1aa058e 100644 > --- a/drivers/net/tulip/tulip_core.c > +++ b/drivers/net/tulip/tulip_core.c > @@ -470,11 +470,12 @@ media_picked: > tulip_select_media(dev, 1); > > /* Start the chip's Tx to process setup frame. */ > - tulip_stop_rxtx(tp); > barrier(); > udelay(5); > iowrite32(tp->csr6 | TxOn, ioaddr + CSR6); > > + pci_set_master(pdev); /* enabled DMA */ > + > /* Enable interrupts by setting the interrupt mask. */ > iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5); > iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); > @@ -1422,11 +1423,6 @@ static int __devinit tulip_init_one (struct pci_dev > *pdev, tulip_mwi_config (pdev, dev); > #endif > > - /* Stop the chip's Tx and Rx processes. */ > - tulip_stop_rxtx(tp); > - > - pci_set_master(pdev); > - > #ifdef CONFIG_GSC > if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP) { > switch (pdev->subsystem_device) { -- Best regards, Florian Fainelli Email : florian@openwrt.org http://openwrt.org ------------------------------- ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tulip_rxtx_stop() on Cobalt Qube2 2009-05-31 21:02 ` Florian Fainelli @ 2009-05-31 23:43 ` Grant Grundler 2009-06-07 18:27 ` Florian Fainelli 0 siblings, 1 reply; 7+ messages in thread From: Grant Grundler @ 2009-05-31 23:43 UTC (permalink / raw) To: Florian Fainelli; +Cc: Grant Grundler, netdev, kyle On Sun, May 31, 2009 at 11:02:22PM +0200, Florian Fainelli wrote: > Hi Grant, ... > > The RX/TX engines are in a wedged state to begin with. :( > > I suppose this is due to the Bootloader, either CoLo or the original Cobalt > microservers bootloader. Yeah - either bootloader or BIOS - whatever talked to the NIC most recently. ... > > I have not tested or even compiled this patch...will do so on parisc/ia64 > > machines once I get some feedback on this patch. > > > > And I just noticed pci_clear_master() is not called *anywhere*. :( > > Need to add such a call after tulip_stop_rxtx() some place (many places?). > > This patch is just RFC and not suitable for merging upstream. > > The patch below does not help on my Qube2, I am still having the same message > appearing. Are you sure? I thought I removed all calls to tulip_stop_rxtx() in the initialization code path and didn't think it would get called. Did I overlook one? Can you add "dump_stack()" to tulip_stop_rxtx() failure case? Can you also modify the driver version to make sure you are using the correct/most recenly built module? And the please post the dmesg output from the driver again (plus 10 lines of output before and after). thanks, grant ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tulip_rxtx_stop() on Cobalt Qube2 2009-05-31 23:43 ` Grant Grundler @ 2009-06-07 18:27 ` Florian Fainelli 0 siblings, 0 replies; 7+ messages in thread From: Florian Fainelli @ 2009-06-07 18:27 UTC (permalink / raw) To: Grant Grundler; +Cc: netdev, kyle Le Monday 01 June 2009 01:43:42 Grant Grundler, vous avez écrit : > On Sun, May 31, 2009 at 11:02:22PM +0200, Florian Fainelli wrote: > > Hi Grant, > > ... > > > > The RX/TX engines are in a wedged state to begin with. :( > > > > I suppose this is due to the Bootloader, either CoLo or the original > > Cobalt microservers bootloader. > > Yeah - either bootloader or BIOS - whatever talked to the NIC most > recently. > > ... > > > > I have not tested or even compiled this patch...will do so on > > > parisc/ia64 machines once I get some feedback on this patch. > > > > > > And I just noticed pci_clear_master() is not called *anywhere*. :( > > > Need to add such a call after tulip_stop_rxtx() some place (many > > > places?). This patch is just RFC and not suitable for merging upstream. > > > > The patch below does not help on my Qube2, I am still having the same > > message appearing. > > Are you sure? Yes I am see below. > > I thought I removed all calls to tulip_stop_rxtx() in the initialization > code path and didn't think it would get called. Did I overlook one? > Can you add "dump_stack()" to tulip_stop_rxtx() failure case? Will do. > > Can you also modify the driver version to make sure you are using > the correct/most recenly built module? See below the patch that was applied: --- a/drivers/net/tulip/tulip_core.c +++ b/drivers/net/tulip/tulip_core.c @@ -15,11 +15,11 @@ #define DRV_NAME "tulip" #ifdef CONFIG_TULIP_NAPI -#define DRV_VERSION "1.1.15-NAPI" /* Keep at least for test */ +#define DRV_VERSION "1.1.16-NAPI" /* Keep at least for test */ #else -#define DRV_VERSION "1.1.15" +#define DRV_VERSION "1.1.16" #endif -#define DRV_RELDATE "Feb 27, 2007" +#define DRV_RELDATE "Mar 03, 2009" #include <linux/module.h> @@ -470,11 +470,12 @@ media_picked: tulip_select_media(dev, 1); /* Start the chip's Tx to process setup frame. */ - tulip_stop_rxtx(tp); barrier(); udelay(5); iowrite32(tp->csr6 | TxOn, ioaddr + CSR6); + pci_set_master(tp->pdev);/* enabled DMA */ + /* Enable interrupts by setting the interrupt mask. */ iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5); iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); @@ -1421,12 +1422,6 @@ static int __devinit tulip_init_one (struct pci_dev *pdev, if (!force_csr0 && (tp->flags & HAS_PCI_MWI)) tulip_mwi_config (pdev, dev); #endif - - /* Stop the chip's Tx and Rx processes. */ - tulip_stop_rxtx(tp); - - pci_set_master(pdev); - #ifdef CONFIG_GSC if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP) { switch (pdev->subsystem_device) { > > And the please post the dmesg output from the driver again (plus 10 > lines of output before and after). Here comes the dmesg: Linux Tulip driver version 1.1.16-NAPI (Mar 03, 2009) PCI: Enabling device 0000:00:07.0 (0045 -> 0047) tulip0: Old format EEPROM on 'Cobalt Microserver' board. Using substitute media control info. tulip0: EEPROM default media type Autosense. tulip0: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. tulip0: MII transceiver #1 config 1000 status 7809 advertising 01e1. eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082000, 00:10:e0:00:7d:1f, IRQ 19. PCI: Enabling device 0000:00:0c.0 (0005 -> 0007) tulip1: Old format EEPROM on 'Cobalt Microserver' board. Using substitute media control info. tulip1: EEPROM default media type Autosense. tulip1: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. tulip1: MII transceiver #1 config 1000 status 7809 advertising 01e1. eth1: Digital DS21142/43 Tulip rev 65 at MMIO 0x12082400, 00:10:e0:00:88:b9, IRQ 20. [snip] 0000:00:07.0: tulip_stop_rxtx() failed (CSR5 0xf0660000 CSR6 0xb20e2202) eth0: Setting full-duplex based on MII#1 link partner capability of 45e1. NET: Registered protocol family 10 -- Best regards, Florian Fainelli Email : florian@openwrt.org http://openwrt.org ------------------------------- ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-06-07 18:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-01 10:38 tulip_rxtx_stop() on Cobalt Qube2 Florian Fainelli 2009-05-03 11:32 ` Grant Grundler 2009-05-03 19:45 ` Florian Fainelli 2009-05-31 1:40 ` Grant Grundler 2009-05-31 21:02 ` Florian Fainelli 2009-05-31 23:43 ` Grant Grundler 2009-06-07 18:27 ` Florian Fainelli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).