* pch_gbe: oops with vlan (new) @ 2012-05-11 20:48 Andy Cress 2012-05-11 21:12 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Andy Cress @ 2012-05-11 20:48 UTC (permalink / raw) To: netdev Folks, I am looking for help in debugging a pch_gbe driver oops/abort. Kernel: version 2.6.32-220.el6.i686 (RHEL6.2) Driver: pch_gbe version 0.91-NAPI (source tarball we used is at https://sendfile.kontron.com/message/24tdUi6MXklnUtBLnOsumq until May 16) NIC: 0b:00.1 Ethernet controller [0200]: Intel Corporation Platform Controller Hub EG20T Gigabit Ethernet Controller [8086:8802] (rev 02) Configuration, with VLAN: eth0 (not started) eth0.100 = 192.168.100.1 eth0.200 = 192.168.200.1 eth0.6 = 192.168.6.1 When starting the VLAN configuration, then doing a ping test for >= 5 minutes, I get a kernel oop/abort message as shown below. This does not happen without configuring VLAN. Where should I look for possible causes for a transmit queue timeout like this? I have contacted the OKI/LAPIS driver authors, but no response so far. I thought that this group might be able to comment from similar experiences. Andy May 11 11:06:09 kontron kernel: ------------[ cut here ]------------ May 11 11:06:09 kontron kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x1ec/0x200() (Not tainted) May 11 11:06:09 kontron kernel: Hardware name: N/A May 11 11:06:09 kontron kernel: NETDEV WATCHDOG: eth0 (pch_gbe): transmit queue 0 timed out May 11 11:06:09 kontron kernel: Modules linked in: fuse ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables tun bridge autofs4 sunrpc cpufreq_ondemand acpi_cpufreq mperf 8021q garp stp llc ipv6 ext3 jbd uinput ppdev parport_pc parport sg microcode pch_gbe(U) mii serio_raw snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 mbcache jbd2 sd_mod crc_t10dif ahci sdhci_pci sdhci mmc_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] May 11 11:06:09 kontron kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.el6.i686 #1 May 11 11:06:09 kontron kernel: Call Trace: May 11 11:06:09 kontron kernel: [<c0454c81>] ? warn_slowpath_common+0x81/0xc0 May 11 11:06:09 kontron kernel: [<c07a16bc>] ? dev_watchdog+0x1ec/0x200 May 11 11:06:09 kontron kernel: [<c07a16bc>] ? dev_watchdog+0x1ec/0x200 May 11 11:06:09 kontron kernel: [<c0454d53>] ? warn_slowpath_fmt+0x33/0x40 May 11 11:06:09 kontron kernel: [<c07a16bc>] ? dev_watchdog+0x1ec/0x200 May 11 11:06:09 kontron kernel: [<c0471bfa>] ? insert_work+0x5a/0xb0 May 11 11:06:09 kontron kernel: [<c04656f9>] ? run_timer_softirq+0x139/0x2c0 May 11 11:06:09 kontron kernel: [<c0831315>] ? apic_timer_interrupt+0x31/0x38 May 11 11:06:09 kontron kernel: [<c07a14d0>] ? dev_watchdog+0x0/0x200 May 11 11:06:09 kontron kernel: [<c045be4a>] ? __do_softirq+0x8a/0x1a0 May 11 11:06:09 kontron kernel: [<c045bf9d>] ? do_softirq+0x3d/0x50 May 11 11:06:09 kontron kernel: [<c045c0f5>] ? irq_exit+0x65/0x70 May 11 11:06:09 kontron kernel: [<c0428473>] ? smp_apic_timer_interrupt+0x53/0x90 May 11 11:06:09 kontron kernel: [<c0831315>] ? apic_timer_interrupt+0x31/0x38 May 11 11:06:09 kontron kernel: [<c045007b>] ? throttle_cfs_rq+0x6b/0x130 May 11 11:06:09 kontron kernel: [<c064735f>] ? intel_idle+0xaf/0x140 May 11 11:06:09 kontron kernel: [<c075c282>] ? cpuidle_idle_call+0x72/0x100 May 11 11:06:09 kontron kernel: [<c0408964>] ? cpu_idle+0x94/0xd0 May 11 11:06:09 kontron kernel: [<c082a645>] ? start_secondary+0x20d/0x252 May 11 11:06:09 kontron kernel: ---[ end trace 3672ff56500ae344 ]--- May 11 11:06:09 kontron NetworkManager[1608]: <info> (eth0): carrier now OFF (device state 3) May 11 11:06:09 kontron NetworkManager[1608]: <info> (eth0): device state change: 3 -> 2 (reason 40) May 11 11:06:09 kontron NetworkManager[1608]: <info> (eth0): deactivating device (reason: 40). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pch_gbe: oops with vlan (new) 2012-05-11 20:48 pch_gbe: oops with vlan (new) Andy Cress @ 2012-05-11 21:12 ` Eric Dumazet 2012-05-11 22:10 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2012-05-11 21:12 UTC (permalink / raw) To: Andy Cress; +Cc: netdev On Fri, 2012-05-11 at 13:48 -0700, Andy Cress wrote: > Folks, > > I am looking for help in debugging a pch_gbe driver oops/abort. > > Kernel: version 2.6.32-220.el6.i686 (RHEL6.2) > Driver: pch_gbe version 0.91-NAPI (source tarball we used is at > https://sendfile.kontron.com/message/24tdUi6MXklnUtBLnOsumq until May > 16) > NIC: 0b:00.1 Ethernet controller [0200]: Intel Corporation Platform > Controller Hub EG20T Gigabit Ethernet Controller [8086:8802] (rev 02) > > Configuration, with VLAN: > eth0 (not started) > eth0.100 = 192.168.100.1 > eth0.200 = 192.168.200.1 > eth0.6 = 192.168.6.1 > > When starting the VLAN configuration, then doing a ping test for >= 5 > minutes, I get a kernel oop/abort message as shown below. This does not > happen without configuring VLAN. > Where should I look for possible causes for a transmit queue timeout > like this? > > I have contacted the OKI/LAPIS driver authors, but no response so far. > I thought that this group might be able to comment from similar > experiences. > > Andy typical sign of a buggy driver A quick look in current Linus tree show a non existent synchronization between ndo_start_xmit and TX completion. tx completion uses a tx_queue_lock spinlock for nothing but false sense of correctness. # find drivers/net/ethernet/oki-semi/pch_gbe -name "*.[ch]"|xargs grep -4 -n tx_queue_lock drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-583- drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-584-/** drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-585- * struct pch_gbe_adapter - board specific private data structure drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-586- * @stats_lock: Spinlock structure for status drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h:587: * @tx_queue_lock: Spinlock structure for transmit drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-588- * @ethtool_lock: Spinlock structure for ethtool drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-589- * @irq_sem: Semaphore for interrupt drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-590- * @netdev: Pointer of network device structure drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-591- * @pdev: Pointer of pci device structure -- drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-608- */ drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-609- drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-610-struct pch_gbe_adapter { drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-611- spinlock_t stats_lock; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h:612: spinlock_t tx_queue_lock; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-613- spinlock_t ethtool_lock; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-614- atomic_t irq_sem; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-615- struct net_device *netdev; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h-616- struct pci_dev *pdev; -- drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1641- netif_wake_queue(adapter->netdev); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1642- adapter->stats.tx_restart_count++; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1643- pr_debug("Tx wake queue\n"); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1644- } drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:1645: spin_lock(&adapter->tx_queue_lock); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1646- tx_ring->next_to_clean = i; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:1647: spin_unlock(&adapter->tx_queue_lock); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1648- pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1649- return cleaned; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1650-} drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-1651- -- drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2036- pr_err("Unable to allocate memory for queues\n"); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2037- return -ENOMEM; drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2038- } drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2039- spin_lock_init(&adapter->hw.miim_lock); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:2040: spin_lock_init(&adapter->tx_queue_lock); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2041- spin_lock_init(&adapter->stats_lock); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2042- spin_lock_init(&adapter->ethtool_lock); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2043- atomic_set(&adapter->irq_sem, 0); drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c-2044- pch_gbe_irq_disable(adapter); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pch_gbe: oops with vlan (new) 2012-05-11 21:12 ` Eric Dumazet @ 2012-05-11 22:10 ` Eric Dumazet 2012-05-14 16:40 ` pch_gbe: oops with vlan (resolved) Andy Cress 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2012-05-11 22:10 UTC (permalink / raw) To: Andy Cress; +Cc: netdev On Fri, 2012-05-11 at 23:13 +0200, Eric Dumazet wrote: > On Fri, 2012-05-11 at 13:48 -0700, Andy Cress wrote: > > Folks, > > > > I am looking for help in debugging a pch_gbe driver oops/abort. > > > > Kernel: version 2.6.32-220.el6.i686 (RHEL6.2) > > Driver: pch_gbe version 0.91-NAPI (source tarball we used is at > > https://sendfile.kontron.com/message/24tdUi6MXklnUtBLnOsumq until May > > 16) > > NIC: 0b:00.1 Ethernet controller [0200]: Intel Corporation Platform > > Controller Hub EG20T Gigabit Ethernet Controller [8086:8802] (rev 02) > > > > Configuration, with VLAN: > > eth0 (not started) > > eth0.100 = 192.168.100.1 > > eth0.200 = 192.168.200.1 > > eth0.6 = 192.168.6.1 > > > > When starting the VLAN configuration, then doing a ping test for >= 5 > > minutes, I get a kernel oop/abort message as shown below. This does not > > happen without configuring VLAN. > > Where should I look for possible causes for a transmit queue timeout > > like this? > > > > I have contacted the OKI/LAPIS driver authors, but no response so far. > > I thought that this group might be able to comment from similar > > experiences. > > > > Andy > > typical sign of a buggy driver > > A quick look in current Linus tree show a non existent synchronization > between ndo_start_xmit and TX completion. > > tx completion uses a tx_queue_lock spinlock for nothing but false sense > of correctness. Please try the following patch : (based on current net-next tree) Also this driver has a strange RX path : It does a copy of incoming frames on fixed size skbs, (2048+overhead -> kmalloc-4096 pool) instead of using skb of the right size... drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h | 2 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 25 ++++------ 2 files changed, 11 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h index 9f3dbc4..b07311e 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h @@ -584,7 +584,6 @@ struct pch_gbe_hw_stats { /** * struct pch_gbe_adapter - board specific private data structure * @stats_lock: Spinlock structure for status - * @tx_queue_lock: Spinlock structure for transmit * @ethtool_lock: Spinlock structure for ethtool * @irq_sem: Semaphore for interrupt * @netdev: Pointer of network device structure @@ -609,7 +608,6 @@ struct pch_gbe_hw_stats { struct pch_gbe_adapter { spinlock_t stats_lock; - spinlock_t tx_queue_lock; spinlock_t ethtool_lock; atomic_t irq_sem; struct net_device *netdev; diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index 9dc7e50..3787c64 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -645,14 +645,11 @@ static void pch_gbe_mac_set_pause_packet(struct pch_gbe_hw *hw) */ static int pch_gbe_alloc_queues(struct pch_gbe_adapter *adapter) { - int size; - - size = (int)sizeof(struct pch_gbe_tx_ring); - adapter->tx_ring = kzalloc(size, GFP_KERNEL); + adapter->tx_ring = kzalloc(sizeof(*adapter->tx_ring), GFP_KERNEL); if (!adapter->tx_ring) return -ENOMEM; - size = (int)sizeof(struct pch_gbe_rx_ring); - adapter->rx_ring = kzalloc(size, GFP_KERNEL); + + adapter->rx_ring = kzalloc(sizeof(*adapter->rx_ring), GFP_KERNEL); if (!adapter->rx_ring) { kfree(adapter->tx_ring); return -ENOMEM; @@ -1169,7 +1166,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct sk_buff *tmp_skb; unsigned int frame_ctrl; unsigned int ring_num; - unsigned long flags; /*-- Set frame control --*/ frame_ctrl = 0; @@ -1216,14 +1212,14 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, } } } - spin_lock_irqsave(&tx_ring->tx_lock, flags); + ring_num = tx_ring->next_to_use; if (unlikely((ring_num + 1) == tx_ring->count)) tx_ring->next_to_use = 0; else tx_ring->next_to_use = ring_num + 1; - spin_unlock_irqrestore(&tx_ring->tx_lock, flags); + buffer_info = &tx_ring->buffer_info[ring_num]; tmp_skb = buffer_info->skb; @@ -1525,7 +1521,7 @@ pch_gbe_alloc_rx_buffers_pool(struct pch_gbe_adapter *adapter, &rx_ring->rx_buff_pool_logic, GFP_KERNEL); if (!rx_ring->rx_buff_pool) { - pr_err("Unable to allocate memory for the receive poll buffer\n"); + pr_err("Unable to allocate memory for the receive pool buffer\n"); return -ENOMEM; } memset(rx_ring->rx_buff_pool, 0, size); @@ -1644,15 +1640,17 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, pr_debug("called pch_gbe_unmap_and_free_tx_resource() %d count\n", cleaned_count); /* Recover from running out of Tx resources in xmit_frame */ + spin_lock(&tx_ring->tx_lock); if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev)))) { netif_wake_queue(adapter->netdev); adapter->stats.tx_restart_count++; pr_debug("Tx wake queue\n"); } - spin_lock(&adapter->tx_queue_lock); + tx_ring->next_to_clean = i; - spin_unlock(&adapter->tx_queue_lock); + pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean); + spin_unlock(&tx_ring->tx_lock); return cleaned; } @@ -2043,7 +2041,6 @@ static int pch_gbe_sw_init(struct pch_gbe_adapter *adapter) return -ENOMEM; } spin_lock_init(&adapter->hw.miim_lock); - spin_lock_init(&adapter->tx_queue_lock); spin_lock_init(&adapter->stats_lock); spin_lock_init(&adapter->ethtool_lock); atomic_set(&adapter->irq_sem, 0); @@ -2148,10 +2145,10 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) tx_ring->next_to_use, tx_ring->next_to_clean); return NETDEV_TX_BUSY; } - spin_unlock_irqrestore(&tx_ring->tx_lock, flags); /* CRC,ITAG no support */ pch_gbe_tx_queue(adapter, tx_ring, skb); + spin_unlock_irqrestore(&tx_ring->tx_lock, flags); return NETDEV_TX_OK; } ^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: pch_gbe: oops with vlan (resolved) 2012-05-11 22:10 ` Eric Dumazet @ 2012-05-14 16:40 ` Andy Cress 2012-05-14 19:26 ` [PATCH] pch_gbe: fix transmit races Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Andy Cress @ 2012-05-14 16:40 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Eric, This patch resolved the driver oops (transmit queue timeout) with vlan. It makes a lot of sense that the wrong spin_locks caused this. Thanks a lot. Andy -----Original Message----- From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Eric Dumazet Sent: Friday, May 11, 2012 6:11 PM To: Andy Cress Cc: netdev Subject: Re: pch_gbe: oops with vlan (new) On Fri, 2012-05-11 at 23:13 +0200, Eric Dumazet wrote: > On Fri, 2012-05-11 at 13:48 -0700, Andy Cress wrote: > > Folks, > > > > I am looking for help in debugging a pch_gbe driver oops/abort. > > > > Kernel: version 2.6.32-220.el6.i686 (RHEL6.2) > > Driver: pch_gbe version 0.91-NAPI (source tarball we used is at > > https://sendfile.kontron.com/message/24tdUi6MXklnUtBLnOsumq until May > > 16) > > NIC: 0b:00.1 Ethernet controller [0200]: Intel Corporation Platform > > Controller Hub EG20T Gigabit Ethernet Controller [8086:8802] (rev 02) > > > > Configuration, with VLAN: > > eth0 (not started) > > eth0.100 = 192.168.100.1 > > eth0.200 = 192.168.200.1 > > eth0.6 = 192.168.6.1 > > > > When starting the VLAN configuration, then doing a ping test for >= 5 > > minutes, I get a kernel oop/abort message as shown below. This does not > > happen without configuring VLAN. > > Where should I look for possible causes for a transmit queue timeout > > like this? > > > > I have contacted the OKI/LAPIS driver authors, but no response so far. > > I thought that this group might be able to comment from similar > > experiences. > > > > Andy > > typical sign of a buggy driver > > A quick look in current Linus tree show a non existent synchronization > between ndo_start_xmit and TX completion. > > tx completion uses a tx_queue_lock spinlock for nothing but false sense > of correctness. Please try the following patch : (based on current net-next tree) Also this driver has a strange RX path : It does a copy of incoming frames on fixed size skbs, (2048+overhead -> kmalloc-4096 pool) instead of using skb of the right size... drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h | 2 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 25 ++++------ 2 files changed, 11 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h index 9f3dbc4..b07311e 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h @@ -584,7 +584,6 @@ struct pch_gbe_hw_stats { /** * struct pch_gbe_adapter - board specific private data structure * @stats_lock: Spinlock structure for status - * @tx_queue_lock: Spinlock structure for transmit * @ethtool_lock: Spinlock structure for ethtool * @irq_sem: Semaphore for interrupt * @netdev: Pointer of network device structure @@ -609,7 +608,6 @@ struct pch_gbe_hw_stats { struct pch_gbe_adapter { spinlock_t stats_lock; - spinlock_t tx_queue_lock; spinlock_t ethtool_lock; atomic_t irq_sem; struct net_device *netdev; diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index 9dc7e50..3787c64 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -645,14 +645,11 @@ static void pch_gbe_mac_set_pause_packet(struct pch_gbe_hw *hw) */ static int pch_gbe_alloc_queues(struct pch_gbe_adapter *adapter) { - int size; - - size = (int)sizeof(struct pch_gbe_tx_ring); - adapter->tx_ring = kzalloc(size, GFP_KERNEL); + adapter->tx_ring = kzalloc(sizeof(*adapter->tx_ring), GFP_KERNEL); if (!adapter->tx_ring) return -ENOMEM; - size = (int)sizeof(struct pch_gbe_rx_ring); - adapter->rx_ring = kzalloc(size, GFP_KERNEL); + + adapter->rx_ring = kzalloc(sizeof(*adapter->rx_ring), GFP_KERNEL); if (!adapter->rx_ring) { kfree(adapter->tx_ring); return -ENOMEM; @@ -1169,7 +1166,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct sk_buff *tmp_skb; unsigned int frame_ctrl; unsigned int ring_num; - unsigned long flags; /*-- Set frame control --*/ frame_ctrl = 0; @@ -1216,14 +1212,14 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, } } } - spin_lock_irqsave(&tx_ring->tx_lock, flags); + ring_num = tx_ring->next_to_use; if (unlikely((ring_num + 1) == tx_ring->count)) tx_ring->next_to_use = 0; else tx_ring->next_to_use = ring_num + 1; - spin_unlock_irqrestore(&tx_ring->tx_lock, flags); + buffer_info = &tx_ring->buffer_info[ring_num]; tmp_skb = buffer_info->skb; @@ -1525,7 +1521,7 @@ pch_gbe_alloc_rx_buffers_pool(struct pch_gbe_adapter *adapter, &rx_ring->rx_buff_pool_logic, GFP_KERNEL); if (!rx_ring->rx_buff_pool) { - pr_err("Unable to allocate memory for the receive poll buffer\n"); + pr_err("Unable to allocate memory for the receive pool buffer\n"); return -ENOMEM; } memset(rx_ring->rx_buff_pool, 0, size); @@ -1644,15 +1640,17 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, pr_debug("called pch_gbe_unmap_and_free_tx_resource() %d count\n", cleaned_count); /* Recover from running out of Tx resources in xmit_frame */ + spin_lock(&tx_ring->tx_lock); if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev)))) { netif_wake_queue(adapter->netdev); adapter->stats.tx_restart_count++; pr_debug("Tx wake queue\n"); } - spin_lock(&adapter->tx_queue_lock); + tx_ring->next_to_clean = i; - spin_unlock(&adapter->tx_queue_lock); + pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean); + spin_unlock(&tx_ring->tx_lock); return cleaned; } @@ -2043,7 +2041,6 @@ static int pch_gbe_sw_init(struct pch_gbe_adapter *adapter) return -ENOMEM; } spin_lock_init(&adapter->hw.miim_lock); - spin_lock_init(&adapter->tx_queue_lock); spin_lock_init(&adapter->stats_lock); spin_lock_init(&adapter->ethtool_lock); atomic_set(&adapter->irq_sem, 0); @@ -2148,10 +2145,10 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) tx_ring->next_to_use, tx_ring->next_to_clean); return NETDEV_TX_BUSY; } - spin_unlock_irqrestore(&tx_ring->tx_lock, flags); /* CRC,ITAG no support */ pch_gbe_tx_queue(adapter, tx_ring, skb); + spin_unlock_irqrestore(&tx_ring->tx_lock, flags); return NETDEV_TX_OK; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] pch_gbe: fix transmit races 2012-05-14 16:40 ` pch_gbe: oops with vlan (resolved) Andy Cress @ 2012-05-14 19:26 ` Eric Dumazet 2012-05-15 17:43 ` David Miller 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2012-05-14 19:26 UTC (permalink / raw) To: Andy Cress, David Miller; +Cc: netdev From: Eric Dumazet <edumazet@google.com> Andy reported pch_gbe triggered "NETDEV WATCHDOG" errors. May 11 11:06:09 kontron kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x1ec/0x200() (Not tainted) May 11 11:06:09 kontron kernel: Hardware name: N/A May 11 11:06:09 kontron kernel: NETDEV WATCHDOG: eth0 (pch_gbe): transmit queue 0 timed out It seems pch_gbe has a racy tx path (races with TX completion path) Remove tx_queue_lock lock since it has no purpose, we must use tx_lock instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andy Cress <andy.cress@us.kontron.com> Tested-by: Andy Cress <andy.cress@us.kontron.com> --- patch rebased on net tree. drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h | 2 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 25 ++++------ 2 files changed, 11 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h index dd14915..ba78174 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h @@ -584,7 +584,6 @@ struct pch_gbe_hw_stats { /** * struct pch_gbe_adapter - board specific private data structure * @stats_lock: Spinlock structure for status - * @tx_queue_lock: Spinlock structure for transmit * @ethtool_lock: Spinlock structure for ethtool * @irq_sem: Semaphore for interrupt * @netdev: Pointer of network device structure @@ -609,7 +608,6 @@ struct pch_gbe_hw_stats { struct pch_gbe_adapter { spinlock_t stats_lock; - spinlock_t tx_queue_lock; spinlock_t ethtool_lock; atomic_t irq_sem; struct net_device *netdev; diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index 8035e5f..1e38d50 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -640,14 +640,11 @@ static void pch_gbe_mac_set_pause_packet(struct pch_gbe_hw *hw) */ static int pch_gbe_alloc_queues(struct pch_gbe_adapter *adapter) { - int size; - - size = (int)sizeof(struct pch_gbe_tx_ring); - adapter->tx_ring = kzalloc(size, GFP_KERNEL); + adapter->tx_ring = kzalloc(sizeof(*adapter->tx_ring), GFP_KERNEL); if (!adapter->tx_ring) return -ENOMEM; - size = (int)sizeof(struct pch_gbe_rx_ring); - adapter->rx_ring = kzalloc(size, GFP_KERNEL); + + adapter->rx_ring = kzalloc(sizeof(*adapter->rx_ring), GFP_KERNEL); if (!adapter->rx_ring) { kfree(adapter->tx_ring); return -ENOMEM; @@ -1162,7 +1159,6 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, struct sk_buff *tmp_skb; unsigned int frame_ctrl; unsigned int ring_num; - unsigned long flags; /*-- Set frame control --*/ frame_ctrl = 0; @@ -1211,14 +1207,14 @@ static void pch_gbe_tx_queue(struct pch_gbe_adapter *adapter, } } } - spin_lock_irqsave(&tx_ring->tx_lock, flags); + ring_num = tx_ring->next_to_use; if (unlikely((ring_num + 1) == tx_ring->count)) tx_ring->next_to_use = 0; else tx_ring->next_to_use = ring_num + 1; - spin_unlock_irqrestore(&tx_ring->tx_lock, flags); + buffer_info = &tx_ring->buffer_info[ring_num]; tmp_skb = buffer_info->skb; @@ -1518,7 +1514,7 @@ pch_gbe_alloc_rx_buffers_pool(struct pch_gbe_adapter *adapter, &rx_ring->rx_buff_pool_logic, GFP_KERNEL); if (!rx_ring->rx_buff_pool) { - pr_err("Unable to allocate memory for the receive poll buffer\n"); + pr_err("Unable to allocate memory for the receive pool buffer\n"); return -ENOMEM; } memset(rx_ring->rx_buff_pool, 0, size); @@ -1637,15 +1633,17 @@ pch_gbe_clean_tx(struct pch_gbe_adapter *adapter, pr_debug("called pch_gbe_unmap_and_free_tx_resource() %d count\n", cleaned_count); /* Recover from running out of Tx resources in xmit_frame */ + spin_lock(&tx_ring->tx_lock); if (unlikely(cleaned && (netif_queue_stopped(adapter->netdev)))) { netif_wake_queue(adapter->netdev); adapter->stats.tx_restart_count++; pr_debug("Tx wake queue\n"); } - spin_lock(&adapter->tx_queue_lock); + tx_ring->next_to_clean = i; - spin_unlock(&adapter->tx_queue_lock); + pr_debug("next_to_clean : %d\n", tx_ring->next_to_clean); + spin_unlock(&tx_ring->tx_lock); return cleaned; } @@ -2037,7 +2035,6 @@ static int pch_gbe_sw_init(struct pch_gbe_adapter *adapter) return -ENOMEM; } spin_lock_init(&adapter->hw.miim_lock); - spin_lock_init(&adapter->tx_queue_lock); spin_lock_init(&adapter->stats_lock); spin_lock_init(&adapter->ethtool_lock); atomic_set(&adapter->irq_sem, 0); @@ -2142,10 +2139,10 @@ static int pch_gbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev) tx_ring->next_to_use, tx_ring->next_to_clean); return NETDEV_TX_BUSY; } - spin_unlock_irqrestore(&tx_ring->tx_lock, flags); /* CRC,ITAG no support */ pch_gbe_tx_queue(adapter, tx_ring, skb); + spin_unlock_irqrestore(&tx_ring->tx_lock, flags); return NETDEV_TX_OK; } ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] pch_gbe: fix transmit races 2012-05-14 19:26 ` [PATCH] pch_gbe: fix transmit races Eric Dumazet @ 2012-05-15 17:43 ` David Miller 0 siblings, 0 replies; 6+ messages in thread From: David Miller @ 2012-05-15 17:43 UTC (permalink / raw) To: eric.dumazet; +Cc: andy.cress, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 14 May 2012 21:26:06 +0200 > From: Eric Dumazet <edumazet@google.com> > > Andy reported pch_gbe triggered "NETDEV WATCHDOG" errors. > > May 11 11:06:09 kontron kernel: WARNING: at net/sched/sch_generic.c:261 > dev_watchdog+0x1ec/0x200() (Not tainted) > May 11 11:06:09 kontron kernel: Hardware name: N/A > May 11 11:06:09 kontron kernel: NETDEV WATCHDOG: eth0 (pch_gbe): > transmit queue 0 timed out > > It seems pch_gbe has a racy tx path (races with TX completion path) > > Remove tx_queue_lock lock since it has no purpose, we must use tx_lock > instead. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Andy Cress <andy.cress@us.kontron.com> > Tested-by: Andy Cress <andy.cress@us.kontron.com> Applied. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-05-15 17:43 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-11 20:48 pch_gbe: oops with vlan (new) Andy Cress 2012-05-11 21:12 ` Eric Dumazet 2012-05-11 22:10 ` Eric Dumazet 2012-05-14 16:40 ` pch_gbe: oops with vlan (resolved) Andy Cress 2012-05-14 19:26 ` [PATCH] pch_gbe: fix transmit races Eric Dumazet 2012-05-15 17:43 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).