Netdev List
 help / color / mirror / Atom feed
* [net-next-2.6 PATCH 15/23] igb: Remove invalid stats counters
From: Jeff Kirsher @ 2009-10-28  9:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

There are several counters being used like they are static when in fact
they are clear on read.  In order to prevent the values from being
incorrect I am removing the defunct counters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb.h      |    6 ------
 drivers/net/igb/igb_main.c |   18 +-----------------
 2 files changed, 1 insertions(+), 23 deletions(-)

diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index c27dc1a..b9fcfd3 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -269,10 +269,6 @@ struct igb_adapter {
 	/* TX */
 	struct igb_ring *tx_ring;      /* One per active queue */
 	unsigned long tx_queue_len;
-	u32 gotc;
-	u64 gotc_old;
-	u64 tpt_old;
-	u64 colc_old;
 	u32 tx_timeout_count;
 
 	/* RX */
@@ -280,8 +276,6 @@ struct igb_adapter {
 	int num_tx_queues;
 	int num_rx_queues;
 
-	u32 gorc;
-	u64 gorc_old;
 	u32 max_frame_size;
 	u32 min_frame_size;
 
diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 5b0f939..cb1acca 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -2925,9 +2925,6 @@ static void igb_watchdog_task(struct work_struct *work)
 	int i;
 
 	link = igb_has_link(adapter);
-	if ((netif_carrier_ok(netdev)) && link)
-		goto link_up;
-
 	if (link) {
 		if (!netif_carrier_ok(netdev)) {
 			u32 ctrl;
@@ -2990,20 +2987,8 @@ static void igb_watchdog_task(struct work_struct *work)
 		}
 	}
 
-link_up:
 	igb_update_stats(adapter);
-
-	hw->mac.tx_packet_delta = adapter->stats.tpt - adapter->tpt_old;
-	adapter->tpt_old = adapter->stats.tpt;
-	hw->mac.collision_delta = adapter->stats.colc - adapter->colc_old;
-	adapter->colc_old = adapter->stats.colc;
-
-	adapter->gorc = adapter->stats.gorc - adapter->gorc_old;
-	adapter->gorc_old = adapter->stats.gorc;
-	adapter->gotc = adapter->stats.gotc - adapter->gotc_old;
-	adapter->gotc_old = adapter->stats.gotc;
-
-	igb_update_adaptive(&adapter->hw);
+	igb_update_adaptive(hw);
 
 	if (!netif_carrier_ok(netdev)) {
 		if (igb_desc_unused(tx_ring) + 1 < tx_ring->count) {
@@ -3875,7 +3860,6 @@ void igb_update_stats(struct igb_adapter *adapter)
 	adapter->stats.bptc += rd32(E1000_BPTC);
 
 	/* used for adaptive IFS */
-
 	hw->mac.tx_packet_delta = rd32(E1000_TPT);
 	adapter->stats.tpt += hw->mac.tx_packet_delta;
 	hw->mac.collision_delta = rd32(E1000_COLC);


^ permalink raw reply related

* [net-next-2.6 PATCH 16/23] igb: cleanup igb.h header whitespace and some structure formatting
From: Jeff Kirsher @ 2009-10-28  9:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch changes the layout of the ring and adapter structs to fill a few
holes in the structure.  It also cleans up some whitespace and formatting
issues.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb.h |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/igb/igb.h b/drivers/net/igb/igb.h
index b9fcfd3..3298f5a 100644
--- a/drivers/net/igb/igb.h
+++ b/drivers/net/igb/igb.h
@@ -185,16 +185,15 @@ struct igb_ring {
 	dma_addr_t dma;                /* phys address of the ring */
 	void *desc;                    /* descriptor ring memory */
 	unsigned int size;             /* length of desc. ring in bytes */
-	unsigned int count;            /* number of desc. in the ring */
+	u16 count;                     /* number of desc. in the ring */
 	u16 next_to_use;
 	u16 next_to_clean;
+	u8 queue_index;
+	u8 reg_idx;
 	void __iomem *head;
 	void __iomem *tail;
 	struct igb_buffer *buffer_info; /* array of buffer info structs */
 
-	u8 queue_index;
-	u8 reg_idx;
-
 	unsigned int total_bytes;
 	unsigned int total_packets;
 
@@ -249,6 +248,7 @@ struct igb_adapter {
 	u32 en_mng_pt;
 	u16 link_speed;
 	u16 link_duplex;
+
 	unsigned int total_tx_bytes;
 	unsigned int total_tx_packets;
 	unsigned int total_rx_bytes;
@@ -311,8 +311,8 @@ struct igb_adapter {
 	u32 eeprom_wol;
 
 	struct igb_ring *multi_tx_table[IGB_ABS_MAX_TX_QUEUES];
-	unsigned int tx_ring_count;
-	unsigned int rx_ring_count;
+	u16 tx_ring_count;
+	u16 rx_ring_count;
 	unsigned int vfs_allocated_count;
 	struct vf_data_storage *vf_data;
 };


^ permalink raw reply related

* [net-next-2.6 PATCH 17/23] igb: cleanup igb xmit frame path
From: Jeff Kirsher @ 2009-10-28  9:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch cleans up the xmit frame path for igb to better handle xmit
frame errors and avoid null pointer exceptions.  It also cleans up some
whitespace issues found in the xmit frame path.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |   30 +++++++++++++++---------------
 1 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index cb1acca..8f8b7cc 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -3245,9 +3245,9 @@ set_itr_now:
 #define IGB_TX_FLAGS_VLAN		0x00000002
 #define IGB_TX_FLAGS_TSO		0x00000004
 #define IGB_TX_FLAGS_IPV4		0x00000008
-#define IGB_TX_FLAGS_TSTAMP             0x00000010
-#define IGB_TX_FLAGS_VLAN_MASK	0xffff0000
-#define IGB_TX_FLAGS_VLAN_SHIFT	16
+#define IGB_TX_FLAGS_TSTAMP		0x00000010
+#define IGB_TX_FLAGS_VLAN_MASK		0xffff0000
+#define IGB_TX_FLAGS_VLAN_SHIFT		        16
 
 static inline int igb_tso_adv(struct igb_ring *tx_ring,
 			      struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
@@ -3346,6 +3346,7 @@ static inline bool igb_tx_csum_adv(struct igb_ring *tx_ring,
 
 		if (tx_flags & IGB_TX_FLAGS_VLAN)
 			info |= (tx_flags & IGB_TX_FLAGS_VLAN_MASK);
+
 		info |= (skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT);
 		if (skb->ip_summed == CHECKSUM_PARTIAL)
 			info |= skb_network_header_len(skb);
@@ -3462,17 +3463,17 @@ static inline int igb_tx_map_adv(struct igb_ring *tx_ring, struct sk_buff *skb,
 	tx_ring->buffer_info[i].skb = skb;
 	tx_ring->buffer_info[first].next_to_watch = i;
 
-	return count + 1;
+	return ++count;
 }
 
 static inline void igb_tx_queue_adv(struct igb_ring *tx_ring,
 				    int tx_flags, int count, u32 paylen,
 				    u8 hdr_len)
 {
-	union e1000_adv_tx_desc *tx_desc = NULL;
+	union e1000_adv_tx_desc *tx_desc;
 	struct igb_buffer *buffer_info;
 	u32 olinfo_status = 0, cmd_type_len;
-	unsigned int i;
+	unsigned int i = tx_ring->next_to_use;
 
 	cmd_type_len = (E1000_ADVTXD_DTYP_DATA | E1000_ADVTXD_DCMD_IFCS |
 			E1000_ADVTXD_DCMD_DEXT);
@@ -3505,18 +3506,18 @@ static inline void igb_tx_queue_adv(struct igb_ring *tx_ring,
 
 	olinfo_status |= ((paylen - hdr_len) << E1000_ADVTXD_PAYLEN_SHIFT);
 
-	i = tx_ring->next_to_use;
-	while (count--) {
+	do {
 		buffer_info = &tx_ring->buffer_info[i];
 		tx_desc = E1000_TX_DESC_ADV(*tx_ring, i);
 		tx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma);
 		tx_desc->read.cmd_type_len =
 			cpu_to_le32(cmd_type_len | buffer_info->length);
 		tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
+		count--;
 		i++;
 		if (i == tx_ring->count)
 			i = 0;
-	}
+	} while (count > 0);
 
 	tx_desc->read.cmd_type_len |= cpu_to_le32(IGB_ADVTXD_DCMD);
 	/* Force memory writes to complete before letting h/w
@@ -3568,8 +3569,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 	unsigned int first;
 	unsigned int tx_flags = 0;
 	u8 hdr_len = 0;
-	int count = 0;
-	int tso = 0;
+	int tso = 0, count;
 	union skb_shared_tx *shtx = skb_tx(skb);
 
 	/* need: 1 descriptor per page,
@@ -3587,7 +3587,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 		tx_flags |= IGB_TX_FLAGS_TSTAMP;
 	}
 
-	if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
+	if (vlan_tx_tag_present(skb) && adapter->vlgrp) {
 		tx_flags |= IGB_TX_FLAGS_VLAN;
 		tx_flags |= (vlan_tx_tag_get(skb) << IGB_TX_FLAGS_VLAN_SHIFT);
 	}
@@ -3598,6 +3598,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 	first = tx_ring->next_to_use;
 	if (skb_is_gso(skb)) {
 		tso = igb_tso_adv(tx_ring, skb, tx_flags, &hdr_len);
+
 		if (tso < 0) {
 			dev_kfree_skb_any(skb);
 			return NETDEV_TX_OK;
@@ -3611,12 +3612,11 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 		tx_flags |= IGB_TX_FLAGS_CSUM;
 
 	/*
-	 * count reflects descriptors mapped, if 0 then mapping error
+	 * count reflects descriptors mapped, if 0 or less then mapping error
 	 * has occured and we need to rewind the descriptor queue
 	 */
 	count = igb_tx_map_adv(tx_ring, skb, first);
-
-	if (!count) {
+	if (count <= 0) {
 		dev_kfree_skb_any(skb);
 		tx_ring->buffer_info[first].time_stamp = 0;
 		tx_ring->next_to_use = first;


^ permalink raw reply related

* Re: [PATCH net-next] bnx2x: Do Tx handling in a separate tasklet.
From: Vladislav Zolotarov @ 2009-10-28  9:54 UTC (permalink / raw)
  To: David Miller ; +Cc: Eilon Greenstein, netdev@vger.kernel.org

I'd like to start from your last remark: you r absolutely right, and this is the problem we have in the current net-next driver. More than that, this patch is fixing this problem: it moved liberation of Tx SKBs from hardIRQ context (ISR) to the softIRQ context (tasklet) thereby resolving the problem u've mentioned. So, total agreement with u on this one. I must have named the patch differently to emphasize it.

I'd like to summarize the patch I've sent:
- Take Tx SKB liberation out of hardIRQ.
- Instead schedule a DPC that handles Tx work.
- Optimize the access to status block indices: read only the index we are about to use in the current context.

So, could u, pls., apply the patch in order to fix the problem we currently have in bnx2x?

Bullet 1 is correct but not complete: what about SKB's needed for filling Rx ring, what about Tx-only scenarios where u'd prefer to liberate SKBs from start_xmit()? Generally, we'd like to do SKB liberation both from start_xmit and from NAPI. Making it straight forward would make us take a tx_lock from inside NAPI and this is what we surely don't wan't to do. We are working on this optimization at the moment and it will be the topic of one of the next patches.

Regarding the second bullet u wrote: saying "low CPU consumption" in regard of Tx work in my previous e-mail I meant that CPU per packet ratio for Tx is much lower than for Rx. Sorry for being unclear.

Best regards
vlad


-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Tuesday, October 27, 2009 12:28 AM
To: Vladislav Zolotarov
Cc: Eilon Greenstein; netdev@vger.kernel.org
Subject: Re: [PATCH net-next] bnx2x: Do Tx handling in a separate tasklet.

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Mon, 26 Oct 2009 07:42:27 -0700

> The separation of Tx and Rx interrupt handling gives us the
> possibility to properly affinitize the Rx (heavy CPU consuming task)
> and Tx (low CPU consuming task) and to ensure that Tx work is done
> not long after the Tx interrupt without interference of Rx work thus
> letting the user benefit from Tx coalescing configuration in order
> to achieve the best performance in each specific scenario. This is
> most important in heavy load scenarios with mixed traffic (UDP + TCP
> for instance). If we didn't separate Tx and Rx interrupt handling Tx
> coalescing configuration was not worth much.

There are other issues:

1) Actually, it makes sense to do TX and RX work together, since TX
   packet liberation makes fresh CPU local packets available for
   responses generated by RX packet reception.

2) TX packet liberation is not low CPU consumption, it has to perform
   many atomic instructions, reference socket state, enter the SLAB
   allocator, potentially liberate netfilter state, etc.

Using NAPI also moves the TX freeing into softirq context.

If you do it from a hardirq you are making it more expensive.  From
hardirq the free just puts the SKB on a list, schedules a softirq,
then does the real SKB free work from the softirq.

This needless SKB list management and softirq scheduling you'll
avoid if you do things from softirqs, and thus using NAPI makes
sense here.



^ permalink raw reply

* [net-next-2.6 PATCH 18/23] igb: cleanup clean_rx_irq_adv and alloc_rx_buffers_adv
From: Jeff Kirsher @ 2009-10-28  9:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch cleans up some whitespace issues in clean_rx_irq_adv.  It also
adds NUMA aware page allocation and dma error handling to
alloc_rx_buffers_adv.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |   24 +++++++++++++++++++-----
 1 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 8f8b7cc..d3e8316 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -4952,6 +4952,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 		i++;
 		if (i == rx_ring->count)
 			i = 0;
+
 		next_rxd = E1000_RX_DESC_ADV(*rx_ring, i);
 		prefetch(next_rxd);
 		next_buffer = &rx_ring->buffer_info[i];
@@ -4989,7 +4990,6 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 
 			skb->len += length;
 			skb->data_len += length;
-
 			skb->truesize += length;
 		}
 
@@ -5071,7 +5071,7 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 
 		if ((bufsz < IGB_RXBUFFER_1024) && !buffer_info->page_dma) {
 			if (!buffer_info->page) {
-				buffer_info->page = alloc_page(GFP_ATOMIC);
+				buffer_info->page = netdev_alloc_page(netdev);
 				if (!buffer_info->page) {
 					rx_ring->rx_stats.alloc_failed++;
 					goto no_buffers;
@@ -5085,9 +5085,16 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 					     buffer_info->page_offset,
 					     PAGE_SIZE / 2,
 					     PCI_DMA_FROMDEVICE);
+			if (pci_dma_mapping_error(rx_ring->pdev,
+			                          buffer_info->page_dma)) {
+				buffer_info->page_dma = 0;
+				rx_ring->rx_stats.alloc_failed++;
+				goto no_buffers;
+			}
 		}
 
-		if (!buffer_info->skb) {
+		skb = buffer_info->skb;
+		if (!skb) {
 			skb = netdev_alloc_skb_ip_align(netdev, bufsz);
 			if (!skb) {
 				rx_ring->rx_stats.alloc_failed++;
@@ -5095,10 +5102,18 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 			}
 
 			buffer_info->skb = skb;
+		}
+		if (!buffer_info->dma) {
 			buffer_info->dma = pci_map_single(rx_ring->pdev,
 			                                  skb->data,
 							  bufsz,
 							  PCI_DMA_FROMDEVICE);
+			if (pci_dma_mapping_error(rx_ring->pdev,
+			                          buffer_info->dma)) {
+				buffer_info->dma = 0;
+				rx_ring->rx_stats.alloc_failed++;
+				goto no_buffers;
+			}
 		}
 		/* Refresh the desc even if buffer_addrs didn't change because
 		 * each write-back erases this info. */
@@ -5107,8 +5122,7 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 			     cpu_to_le64(buffer_info->page_dma);
 			rx_desc->read.hdr_addr = cpu_to_le64(buffer_info->dma);
 		} else {
-			rx_desc->read.pkt_addr =
-			     cpu_to_le64(buffer_info->dma);
+			rx_desc->read.pkt_addr = cpu_to_le64(buffer_info->dma);
 			rx_desc->read.hdr_addr = 0;
 		}
 


^ permalink raw reply related

* [net-next-2.6 PATCH 19/23] igb: replace unecessary &adapter->hw with just hw where applicable
From: Jeff Kirsher @ 2009-10-28  9:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch just cleans up some unecessary references to the adapter->hw
member when it has already been placed in a local variable named hw.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |   26 ++++++++++++--------------
 1 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index d3e8316..b2c0c97 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1107,8 +1107,8 @@ int igb_up(struct igb_adapter *adapter)
 
 void igb_down(struct igb_adapter *adapter)
 {
-	struct e1000_hw *hw = &adapter->hw;
 	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
 	u32 tctl, rctl;
 	int i;
 
@@ -1271,10 +1271,10 @@ void igb_reset(struct igb_adapter *adapter)
 	}
 
 	/* Allow time for pending master requests to run */
-	adapter->hw.mac.ops.reset_hw(&adapter->hw);
+	hw->mac.ops.reset_hw(hw);
 	wr32(E1000_WUC, 0);
 
-	if (adapter->hw.mac.ops.init_hw(&adapter->hw))
+	if (hw->mac.ops.init_hw(hw))
 		dev_err(&adapter->pdev->dev, "Hardware Error\n");
 
 	igb_update_mng_vlan(adapter);
@@ -1282,8 +1282,8 @@ void igb_reset(struct igb_adapter *adapter)
 	/* Enable h/w to recognize an 802.1Q VLAN Ethernet packet */
 	wr32(E1000_VET, ETHERNET_IEEE_VLAN_TYPE);
 
-	igb_reset_adaptive(&adapter->hw);
-	igb_get_phy_info(&adapter->hw);
+	igb_reset_adaptive(hw);
+	igb_get_phy_info(hw);
 }
 
 static const struct net_device_ops igb_netdev_ops = {
@@ -1404,8 +1404,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	hw->subsystem_vendor_id = pdev->subsystem_vendor;
 	hw->subsystem_device_id = pdev->subsystem_device;
 
-	/* setup the private structure */
-	hw->back = adapter;
 	/* Copy the default MAC, PHY and NVM function pointers */
 	memcpy(&hw->mac.ops, ei->mac_ops, sizeof(hw->mac.ops));
 	memcpy(&hw->phy.ops, ei->phy_ops, sizeof(hw->phy.ops));
@@ -1460,7 +1458,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	if (adapter->hw.mac.type == e1000_82576)
 		netdev->features |= NETIF_F_SCTP_CSUM;
 
-	adapter->en_mng_pt = igb_enable_mng_pass_thru(&adapter->hw);
+	adapter->en_mng_pt = igb_enable_mng_pass_thru(hw);
 
 	/* before reading the NVM, reset the controller to put the device in a
 	 * known good starting state */
@@ -1705,8 +1703,8 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 
 	unregister_netdev(netdev);
 
-	if (!igb_check_reset_block(&adapter->hw))
-		igb_reset_phy(&adapter->hw);
+	if (!igb_check_reset_block(hw))
+		igb_reset_phy(hw);
 
 	igb_clear_interrupt_scheme(adapter);
 
@@ -2928,9 +2926,9 @@ static void igb_watchdog_task(struct work_struct *work)
 	if (link) {
 		if (!netif_carrier_ok(netdev)) {
 			u32 ctrl;
-			hw->mac.ops.get_speed_and_duplex(&adapter->hw,
-						   &adapter->link_speed,
-						   &adapter->link_duplex);
+			hw->mac.ops.get_speed_and_duplex(hw,
+			                                 &adapter->link_speed,
+			                                 &adapter->link_duplex);
 
 			ctrl = rd32(E1000_CTRL);
 			/* Links status message must follow this format */
@@ -5552,7 +5550,7 @@ static int __igb_shutdown(struct pci_dev *pdev, bool *enable_wake)
 		wr32(E1000_CTRL, ctrl);
 
 		/* Allow time for pending master requests to run */
-		igb_disable_pcie_master(&adapter->hw);
+		igb_disable_pcie_master(hw);
 
 		wr32(E1000_WUC, E1000_WUC_PME_EN);
 		wr32(E1000_WUFC, wufc);


^ permalink raw reply related

* [net-next-2.6 PATCH 20/23] igb: add pci_dev in few spots to clean up use of dev_err/info/warn
From: Jeff Kirsher @ 2009-10-28  9:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch relpaces several references to adapter->pdev->dev with just
pdev->dev.  This allows for cleanup of several multiline dev_err/info
calls.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index b2c0c97..264ff00 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1173,6 +1173,7 @@ void igb_reinit_locked(struct igb_adapter *adapter)
 
 void igb_reset(struct igb_adapter *adapter)
 {
+	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_hw *hw = &adapter->hw;
 	struct e1000_mac_info *mac = &hw->mac;
 	struct e1000_fc_info *fc = &hw->fc;
@@ -1275,7 +1276,7 @@ void igb_reset(struct igb_adapter *adapter)
 	wr32(E1000_WUC, 0);
 
 	if (hw->mac.ops.init_hw(hw))
-		dev_err(&adapter->pdev->dev, "Hardware Error\n");
+		dev_err(&pdev->dev, "Hardware Error\n");
 
 	igb_update_mng_vlan(adapter);
 
@@ -3704,17 +3705,18 @@ static struct net_device_stats *igb_get_stats(struct net_device *netdev)
 static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
+	struct pci_dev *pdev = adapter->pdev;
 	int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
 	u32 rx_buffer_len, i;
 
 	if ((max_frame < ETH_ZLEN + ETH_FCS_LEN) ||
 	    (max_frame > MAX_JUMBO_FRAME_SIZE)) {
-		dev_err(&adapter->pdev->dev, "Invalid MTU setting\n");
+		dev_err(&pdev->dev, "Invalid MTU setting\n");
 		return -EINVAL;
 	}
 
 	if (max_frame > MAX_STD_JUMBO_FRAME_SIZE) {
-		dev_err(&adapter->pdev->dev, "MTU > 9216 not supported.\n");
+		dev_err(&pdev->dev, "MTU > 9216 not supported.\n");
 		return -EINVAL;
 	}
 
@@ -3739,7 +3741,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 	if (netif_running(netdev))
 		igb_down(adapter);
 
-	dev_info(&adapter->pdev->dev, "changing MTU from %d to %d\n",
+	dev_info(&pdev->dev, "changing MTU from %d to %d\n",
 		 netdev->mtu, new_mtu);
 	netdev->mtu = new_mtu;
 
@@ -4053,6 +4055,7 @@ static int __igb_notify_dca(struct device *dev, void *data)
 {
 	struct net_device *netdev = dev_get_drvdata(dev);
 	struct igb_adapter *adapter = netdev_priv(netdev);
+	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_hw *hw = &adapter->hw;
 	unsigned long event = *(unsigned long *)data;
 
@@ -4061,12 +4064,9 @@ static int __igb_notify_dca(struct device *dev, void *data)
 		/* if already enabled, don't do it again */
 		if (adapter->flags & IGB_FLAG_DCA_ENABLED)
 			break;
-		/* Always use CB2 mode, difference is masked
-		 * in the CB driver. */
-		wr32(E1000_DCA_CTRL, E1000_DCA_CTRL_DCA_MODE_CB2);
 		if (dca_add_requester(dev) == 0) {
 			adapter->flags |= IGB_FLAG_DCA_ENABLED;
-			dev_info(&adapter->pdev->dev, "DCA enabled\n");
+			dev_info(&pdev->dev, "DCA enabled\n");
 			igb_setup_dca(adapter);
 			break;
 		}
@@ -4076,7 +4076,7 @@ static int __igb_notify_dca(struct device *dev, void *data)
 			/* without this a class_device is left
 			 * hanging around in the sysfs model */
 			dca_remove_requester(dev);
-			dev_info(&adapter->pdev->dev, "DCA disabled\n");
+			dev_info(&pdev->dev, "DCA disabled\n");
 			adapter->flags &= ~IGB_FLAG_DCA_ENABLED;
 			wr32(E1000_DCA_CTRL, E1000_DCA_CTRL_DCA_MODE_DISABLE);
 		}
@@ -4471,7 +4471,7 @@ static void igb_rcv_msg_from_vf(struct igb_adapter *adapter, u32 vf)
 		retval = igb_set_vf_vlan(adapter, msgbuf, vf);
 		break;
 	default:
-		dev_err(&adapter->pdev->dev, "Unhandled Msg %08x\n", msgbuf[0]);
+		dev_err(&pdev->dev, "Unhandled Msg %08x\n", msgbuf[0]);
 		retval = -1;
 		break;
 	}
@@ -5472,6 +5472,7 @@ static void igb_restore_vlan(struct igb_adapter *adapter)
 
 int igb_set_spd_dplx(struct igb_adapter *adapter, u16 spddplx)
 {
+	struct pci_dev *pdev = adapter->pdev;
 	struct e1000_mac_info *mac = &adapter->hw.mac;
 
 	mac->autoneg = 0;
@@ -5495,8 +5496,7 @@ int igb_set_spd_dplx(struct igb_adapter *adapter, u16 spddplx)
 		break;
 	case SPEED_1000 + DUPLEX_HALF: /* not supported */
 	default:
-		dev_err(&adapter->pdev->dev,
-			"Unsupported Speed/Duplex configuration\n");
+		dev_err(&pdev->dev, "Unsupported Speed/Duplex configuration\n");
 		return -EINVAL;
 	}
 	return 0;


^ permalink raw reply related

* [net-next-2.6 PATCH 21/23] igb: limit minimum mtu to 68 to keep ip bound to interface
From: Jeff Kirsher @ 2009-10-28  9:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Limit the minimum mtu to 68 in order to prevent ip from being unbound from
the interface.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 264ff00..846e64f 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -3709,8 +3709,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 	int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
 	u32 rx_buffer_len, i;
 
-	if ((max_frame < ETH_ZLEN + ETH_FCS_LEN) ||
-	    (max_frame > MAX_JUMBO_FRAME_SIZE)) {
+	if ((new_mtu < 68) || (max_frame > MAX_JUMBO_FRAME_SIZE)) {
 		dev_err(&pdev->dev, "Invalid MTU setting\n");
 		return -EINVAL;
 	}


^ permalink raw reply related

* [net-next-2.6 PATCH 22/23] igb: open up SCTP checksum offloads to all MACs 82576 and newer
From: Jeff Kirsher @ 2009-10-28  9:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Going forward the plan is to have the MACs support SCTP checksum offloads
so change the check from == to >=.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 846e64f..1a6c074 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1456,7 +1456,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	if (pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
-	if (adapter->hw.mac.type == e1000_82576)
+	if (hw->mac.type >= e1000_82576)
 		netdev->features |= NETIF_F_SCTP_CSUM;
 
 	adapter->en_mng_pt = igb_enable_mng_pass_thru(hw);


^ permalink raw reply related

* [net-next-2.6 PATCH 23/23] igb: cleanup whitespace issues in igb_main.c
From: Jeff Kirsher @ 2009-10-28  9:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Alexander Duyck, Jeff Kirsher
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch goes through and cleans up whitespace issues in igb_main.c
to help improve readability.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/igb/igb_main.c |   41 +++++++++++++++++++----------------------
 1 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/drivers/net/igb/igb_main.c b/drivers/net/igb/igb_main.c
index 1a6c074..b044c98 100644
--- a/drivers/net/igb/igb_main.c
+++ b/drivers/net/igb/igb_main.c
@@ -1006,7 +1006,6 @@ static void igb_release_hw_control(struct igb_adapter *adapter)
 			ctrl_ext & ~E1000_CTRL_EXT_DRV_LOAD);
 }
 
-
 /**
  * igb_get_hw_control - get control of the h/w from f/w
  * @adapter: address of board private structure
@@ -1067,7 +1066,6 @@ static void igb_configure(struct igb_adapter *adapter)
  * igb_up - Open the interface and prepare it to handle traffic
  * @adapter: board private structure
  **/
-
 int igb_up(struct igb_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
@@ -1288,7 +1286,7 @@ void igb_reset(struct igb_adapter *adapter)
 }
 
 static const struct net_device_ops igb_netdev_ops = {
-	.ndo_open 		= igb_open,
+	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
 	.ndo_start_xmit		= igb_xmit_frame_adv,
 	.ndo_get_stats		= igb_get_stats,
@@ -1444,7 +1442,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	netdev->features |= NETIF_F_IPV6_CSUM;
 	netdev->features |= NETIF_F_TSO;
 	netdev->features |= NETIF_F_TSO6;
-
 	netdev->features |= NETIF_F_GRO;
 
 	netdev->vlan_features |= NETIF_F_TSO;
@@ -1569,7 +1566,6 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	}
 
 #endif
-
 	switch (hw->mac.type) {
 	case e1000_82576:
 		/*
@@ -1624,8 +1620,8 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	/* print bus type/speed/width info */
 	dev_info(&pdev->dev, "%s: (PCIe:%s:%s) %pM\n",
 		 netdev->name,
-		 ((hw->bus.speed == e1000_bus_speed_2500)
-		  ? "2.5Gb/s" : "unknown"),
+		 ((hw->bus.speed == e1000_bus_speed_2500) ? "2.5Gb/s" :
+		                                            "unknown"),
 		 ((hw->bus.width == e1000_bus_width_pcie_x4) ? "Width x4" :
 		  (hw->bus.width == e1000_bus_width_pcie_x2) ? "Width x2" :
 		  (hw->bus.width == e1000_bus_width_pcie_x1) ? "Width x1" :
@@ -1658,8 +1654,8 @@ err_sw_init:
 err_ioremap:
 	free_netdev(netdev);
 err_alloc_etherdev:
-	pci_release_selected_regions(pdev, pci_select_bars(pdev,
-	                             IORESOURCE_MEM));
+	pci_release_selected_regions(pdev,
+	                             pci_select_bars(pdev, IORESOURCE_MEM));
 err_pci_reg:
 err_dma:
 	pci_disable_device(pdev);
@@ -1723,11 +1719,12 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 		dev_info(&pdev->dev, "IOV Disabled\n");
 	}
 #endif
+
 	iounmap(hw->hw_addr);
 	if (hw->flash_address)
 		iounmap(hw->flash_address);
-	pci_release_selected_regions(pdev, pci_select_bars(pdev,
-	                             IORESOURCE_MEM));
+	pci_release_selected_regions(pdev,
+	                             pci_select_bars(pdev, IORESOURCE_MEM));
 
 	free_netdev(netdev);
 
@@ -2288,9 +2285,7 @@ void igb_setup_rctl(struct igb_adapter *adapter)
 	 */
 	rctl |= E1000_RCTL_SECRC;
 
-	/*
-	 * disable store bad packets and clear size bits.
-	 */
+	/* disable store bad packets and clear size bits. */
 	rctl &= ~(E1000_RCTL_SBP | E1000_RCTL_SZ_256);
 
 	/* enable LPE to prevent packets larger than max_frame_size */
@@ -2916,7 +2911,8 @@ static void igb_watchdog(unsigned long data)
 static void igb_watchdog_task(struct work_struct *work)
 {
 	struct igb_adapter *adapter = container_of(work,
-					struct igb_adapter, watchdog_task);
+	                                           struct igb_adapter,
+                                                   watchdog_task);
 	struct e1000_hw *hw = &adapter->hw;
 	struct net_device *netdev = adapter->netdev;
 	struct igb_ring *tx_ring = adapter->tx_ring;
@@ -2935,14 +2931,14 @@ static void igb_watchdog_task(struct work_struct *work)
 			/* Links status message must follow this format */
 			printk(KERN_INFO "igb: %s NIC Link is Up %d Mbps %s, "
 				 "Flow Control: %s\n",
-			         netdev->name,
-				 adapter->link_speed,
-				 adapter->link_duplex == FULL_DUPLEX ?
+			       netdev->name,
+			       adapter->link_speed,
+			       adapter->link_duplex == FULL_DUPLEX ?
 				 "Full Duplex" : "Half Duplex",
-				 ((ctrl & E1000_CTRL_TFCE) && (ctrl &
-				 E1000_CTRL_RFCE)) ? "RX/TX" : ((ctrl &
-				 E1000_CTRL_RFCE) ? "RX" : ((ctrl &
-				 E1000_CTRL_TFCE) ? "TX" : "None")));
+			       ((ctrl & E1000_CTRL_TFCE) &&
+			        (ctrl & E1000_CTRL_RFCE)) ? "RX/TX" :
+			       ((ctrl & E1000_CTRL_RFCE) ?  "RX" :
+			       ((ctrl & E1000_CTRL_TFCE) ?  "TX" : "None")));
 
 			/* tweak tx_queue_len according to speed/duplex and
 			 * adjust the timeout factor */
@@ -3724,6 +3720,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 
 	/* igb_down has a dependency on max_frame_size */
 	adapter->max_frame_size = max_frame;
+
 	/* NOTE: netdev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
 	 * means we reserve 2 more, this pushes us to allocate from the next
 	 * larger slab size.


^ permalink raw reply related

* Re: [PATCH net-next] bnx2x: Do Tx handling in a separate tasklet.
From: David Miller @ 2009-10-28  9:57 UTC (permalink / raw)
  To: vladz; +Cc: IMCEAMAILTO-davem+40davemloft+2Enet, eilong, netdev
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADCB2CFF2028@SJEXCHCCR02.corp.ad.broadcom.com>

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Wed, 28 Oct 2009 02:54:37 -0700

> I'd like to start from your last remark: you r absolutely right, and this is the problem we have in the current net-next driver. More than that, this patch is fixing this problem: it moved liberation of Tx SKBs from hardIRQ context (ISR) to the softIRQ context (tasklet) thereby resolving the problem u've mentioned. So, total agreement with u on this one. I must have named the patch differently to emphasize it.
> 
> I'd like to summarize the patch I've sent:
> - Take Tx SKB liberation out of hardIRQ.
> - Instead schedule a DPC that handles Tx work.
> - Optimize the access to status block indices: read only the index we are about to use in the current context.
> 
> So, could u, pls., apply the patch in order to fix the problem we currently have in bnx2x?

There is no reason not to use NAPI to achieve this objective and that's
the main objection I have to your patch.

Using NAPI will not only allow you to move the SKB freeing to softirq
context but it will also provide fairness between multiple NAPI
contexts active at the same time on the same cpu.

Furthermore, if you combine RX and TX NAPI work for a specific queue
into the same NAPI context, TX liberation can run first and provide
fresh CPU local SKBs for RX packet input processing created replies
to allocate.

You haven't addressed any of that, and I am not going to apply your
patch becuase I don't want your driver to set a precedence here.

^ permalink raw reply

* Re: [PATCH v3 1/7] Only parse time stamp TCP option in time wait sock
From: Gilad Ben-Yossef @ 2009-10-28 10:14 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev, ori, Yony Amit
In-Reply-To: <4AE5D4AE.2080108@gmail.com>

Hi William,


William Allen Simpson wrote:

> Gilad Ben-Yossef wrote:
>> Since we only use tcp_parse_options here to check for the exietence
>> of TCP timestamp option in the header, it is better to call with
>> the "established" flag on.
>>
> Please explain how this patch is required for the other patches?
Gladly (and suggestions to do it differently are welcome) :

For the purpose of the patch tcp_parse_options was changed to consult 
dst_entry options when parsing non established packets.

This means that for any place that we call tcp_parse_options with the 
established parameter set to false, we need to supply it with a dst_entry.

In all other locations in kernel code when tcp_parse_options is called 
such a dst_entry is easily available already.

The time wait mini socket exists so that we would not waste resource 
keeping around the full socket state of a "real socket". As such, it 
does not cache the dst_entry. Adding it to the TIME_WAIT mini sockets 
jsut for this purpose defeats the purpose of having a mini socket in the 
first place.

One other possible way to go about it is to re-compute the dst_entry at 
this location, but this seemed an expensive operation to perform for 
what should be a light weight operation. I asked myself if there might 
be another way?

So I took a good look at the code and discovered that there is no need 
to call tcp_parse_options there in "non established" mode at all.
>
> And more importantly, why it is better to call with established on?
Sure. This is kind of long written down, although it's really simple. I 
will try to describe it as best I can.

Take a look at what tcp_parse_options() does as a function -

It has only one output: changing the fields of the tcp_options_received 
struct  which it gets a pointer to as a parameter. It also has a single 
side effect: it updates the  SKB TCP control block sacked field, if a 
SACK option is detected in the packet header.

Its behavior is dictated by the established parameter. If false, it will 
try to parse all supported TCP options, if found in the packet header. 
If true it will only try to parse the time stamp and SACK options.

Now take a look what happens at tcp_timewait_state_process() when we 
call tcp_parse_options() -

We allocate (on stack) a temporary tcp_options_received struct, and if 
our TIME_WAIT mini socket had recent timestamp data 
(tcptw->tw_ts_recent_stamp), we call tcp_parse_options() with our 
temporay tcp_options_received struct.

Here is the important bit:  we never ever look at anything in the 
tcp_options_received struct after the call returns, except for the time 
stamp data if it is available!

So, passing established as false here makes us try to parse, if found in 
the packet, a bunch of options that we never ever look at the result of. 
(The fact that time wait minisocket  code also zeros the saw_tstamp 
before the call to tcp_parse_options although the same field is being 
zeroed again inside the function is just icing on the cake...)

I have one more issue to explain, and this is regarding the single side 
effect tcp_parse_option has - if the SACK option is found, 
tcp_parse_options updates the skb control block sacked field. However, 
not that it does this regardless of whether established is true or 
false, so it doesn't matter which we pass. (I will leave out the fact 
that whether or not the SACK option is parsed depends on a non 
initialized field of the tcp_options_received struct now as an obscure 
detail... nothing obviously looks at that later).

So bottom line: passing a true value in established does the exact same 
thing, result wise, as current code, except it does so in fewer cycles.

I do confess to having goofed here in one regard: the patch I posted did 
not set the tstamp_ok field of the tcp_options_received struct, which 
can lead to randomly not parsing the time stamp option even when you 
need to.

Perhaps this is what masks my intent. This is a bug of course and I'm 
grateful for you for helping me catch it :-)

I will send an updated patch set with this fixed ASAP.

> And most importantly, what end cases you considered, and how this
> interacts with the proposed rfc1323bis changes, especially on reset?
>
My whole point was that this "change" does not change the behavior of 
the code in any way. In fact, it is no different then a compile time 
optimization (don't execute code paths nothing later uses the result 
thereof) and if the compiler was smart enough, it would have done the 
same. So corner cases and RFC compliance stay exactly as before.

I hope I managed to explain myself better this time around and thanks 
again for taking the time to review this. ;-)

Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"The biggest risk you can take it is to take no risk."
		-- Mark Zuckerberg and probably others


^ permalink raw reply

* Re: [PATCH v3 4/7] Add the no SACK route option feature
From: Gilad Ben-Yossef @ 2009-10-28 10:18 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev, ori
In-Reply-To: <4AE5D089.2050606@gmail.com>

William Allen Simpson wrote:

> Gilad Ben-Yossef wrote:
>> Implement querying and acting upon the no sack bit in the features
>> field.
>>
>> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
>> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
>> Sigend-off-by: Yony Amit <yony@comsleep.com>
>>
> Please explain how this code turns SACK on when it is off globally?
>
> As both Eric and I asked?
It doesn't. Please see my discussion with Eric for the why.

In short, doing so introduce a very subtle change to what the existing 
interface do today, which will break
backwards compatibility by changing the meaning of writing zero to the 
relevant sysctl. I don't want to be hunt down by angry sys admins :-)

Thanks,
Gilad




-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"The biggest risk you can take it is to take no risk."
		-- Mark Zuckerberg and probably others


^ permalink raw reply

* Re: [PATCH] Multicast packet reassembly can fail
From: Eric Dumazet @ 2009-10-28 10:18 UTC (permalink / raw)
  To: Steve Chen; +Cc: netdev
In-Reply-To: <1256683583.3153.389.camel@linux-1lbu>

Steve Chen a écrit :
> Multicast packet reassembly can fail
> 
> When multicast connections with multiple fragments are received by the same
> node from more than one Ethernet ports, race condition between fragments
> from each Ethernet port can cause fragment reassembly to fail leading to
> packet drop.  This is because packets from each Ethernet port appears identical
> to the the code that reassembles the Ethernet packet.
> 
> The solution is evaluate the Ethernet interface number in addition to all other
> parameters so that every packet can be uniquely identified.  The existing
> iif field in struct ipq is now used to generate the hash key, and iif is also
> used for comparison in case of hash collision.
> 
> Please note that q->saddr ^ (q->iif << 5) is now being passed into
> ipqhashfn to generate the hash key.  This is borrowed from the routing
> code.
> 
> Signed-off-by: Steve Chen <schen@mvista.com>
> Signed-off-by: Mark Huth <mhuth@mvista.com>
> 

This makes no sense to me, but I need to check the code.

How interface could matter in IP defragmentation ?
And why multicast is part of the equation ?

If defrag fails, this must be for other reason,
and probably needs another fix.

Check line 219 of net/ipv4/inet_fragment.c

#ifdef CONFIG_SMP
        /* With SMP race we have to recheck hash table, because
         * such entry could be created on other cpu, while we
         * promoted read lock to write lock.
         */
        hlist_for_each_entry(qp, n, &f->hash[hash], list) {
                if (qp->net == nf && f->match(qp, arg)) {
                        atomic_inc(&qp->refcnt);
                        write_unlock(&f->lock);
                        qp_in->last_in |= INET_FRAG_COMPLETE;   <<< HERE >>>
                        inet_frag_put(qp_in, f);
                        return qp;
                }
        }
#endif

I really wonder why we set INET_FRAG_COMPLETE here

^ permalink raw reply

* Re: [net-next-2.6 PATCH 01/23] igb: add support for seperate tx-usecs setting in ethtool
From: David Miller @ 2009-10-28 10:39 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091028094540.13156.2637.stgit@localhost.localdomain>


All applied to net-next-2.6, but then I had to add the following
patch to kill a warning:

igb: Fix warnings in igb_set_ringparam()

drivers/net/igb/igb_ethtool.c: In function ‘igb_set_ringparam’:
drivers/net/igb/igb_ethtool.c:744: warning: comparison of distinct pointer types lacks a cast
drivers/net/igb/igb_ethtool.c:748: warning: comparison of distinct pointer types lacks a cast

Casts were to u16 on the constant, but the type of new_{r,t}x_count is
u32.  Cast to u32 instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/igb/igb_ethtool.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/igb/igb_ethtool.c b/drivers/net/igb/igb_ethtool.c
index d24b902..90b89a8 100644
--- a/drivers/net/igb/igb_ethtool.c
+++ b/drivers/net/igb/igb_ethtool.c
@@ -741,11 +741,11 @@ static int igb_set_ringparam(struct net_device *netdev,
 		return -EINVAL;
 
 	new_rx_count = min(ring->rx_pending, (u32)IGB_MAX_RXD);
-	new_rx_count = max(new_rx_count, (u16)IGB_MIN_RXD);
+	new_rx_count = max(new_rx_count, (u32)IGB_MIN_RXD);
 	new_rx_count = ALIGN(new_rx_count, REQ_RX_DESCRIPTOR_MULTIPLE);
 
 	new_tx_count = min(ring->tx_pending, (u32)IGB_MAX_TXD);
-	new_tx_count = max(new_tx_count, (u16)IGB_MIN_TXD);
+	new_tx_count = max(new_tx_count, (u32)IGB_MIN_TXD);
 	new_tx_count = ALIGN(new_tx_count, REQ_TX_DESCRIPTOR_MULTIPLE);
 
 	if ((new_tx_count == adapter->tx_ring_count) &&
-- 
1.6.5.1


^ permalink raw reply related

* Re: [PATCH] cnic: Fix L2CTX_STATUSB_NUM offset in context memory.
From: David Miller @ 2009-10-28 10:42 UTC (permalink / raw)
  To: mchan; +Cc: davem, netdev, benli
In-Reply-To: <1256662728-21864-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 27 Oct 2009 08:58:48 -0800

> The BNX2_L2CTX_STATUSB_NUM definition needs to be changed to match
> the recent firmware update:
> 
> commit 078b0735881c7969aaf21469f3577831cddd9f8c
> bnx2: Update firmware to 5.0.0.j3.
> 
> Without the fix, bnx2 can crash intermittently in bnx2_rx_int() when
> iSCSI is enabled.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> Signed-off-by: Benjamin Li <benli@broadcom.com>

Applied to net-2.6, but please be explicit about what tree you
want me to apply this to in the future.

Sure I could deduce this by running "git describe" on that
commit ID mentioned in the commit message, but why not be
explicit? :-)

^ permalink raw reply

* Re: [PATCH net-2.6] sfc: Set ip_summed correctly for page buffers passed to GRO
From: David Miller @ 2009-10-28 10:44 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1256655057.2794.4.camel@achroite>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 27 Oct 2009 14:50:57 +0000

> Page buffers containing packets with an incorrect checksum or using a
> protocol not handled by hardware checksum offload were previously not
> passed to LRO.  The conversion to GRO changed this, but did not set
> the ip_summed value accordingly.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

> This affects 2.6.31 and seems like a candidate for a stable update.

Queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6] sfc: Really allow RX checksum offload to be disabled
From: Ben Hutchings @ 2009-10-28 10:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <20091028.024940.181264224.davem@davemloft.net>

On Wed, 2009-10-28 at 02:49 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Tue, 27 Oct 2009 19:44:33 +0000
> 
> > We have never checked the efx_nic::rx_checksum_enabled flag everywhere
> > we should, and since the switch to GRO we don't check it anywhere.
> > It's simplest to check it in the one place where we initialise the
> > per-packet checksummed flag.
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > Cc: stable@kernel.org
> > ---
> > I'm not sure whether this is serious enough to merit a stable update.
> > It's not a recent regression.
> 
> This patch only applies to net-next-2.6, so I can't see how it could
> be a -stable candidate :-)
> 
> So I've applied it there.

The register name update in net-next-2.6 changed the context for this
patch.  I'll send a new patch that will apply to the earlier versions.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH V2]NET/KS8695: add support NAPI for Rx
From: David Miller @ 2009-10-28 10:55 UTC (permalink / raw)
  To: figo1802; +Cc: dsilvers, netdev, ben
In-Reply-To: <1256653422.2148.23.camel@myhost>

From: "Figo.zhang" <figo1802@gmail.com>
Date: Tue, 27 Oct 2009 22:23:42 +0800

> Add support NAPI Rx API for KS8695NET driver.
> 
> v2, change the Rx function to NAPI.
 ...
> Signed-off-by: Figo.zhang <figo1802@gmail.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH 1/2] tc35815: Fix return value of tc35815_do_interrupt when NAPI enabled
From: David Miller @ 2009-10-28 10:57 UTC (permalink / raw)
  To: anemo; +Cc: netdev, ralf.roesch
In-Reply-To: <1256564782-2781-1-git-send-email-anemo@mba.ocn.ne.jp>

From: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Date: Mon, 26 Oct 2009 22:46:21 +0900

> Return received count correctly even if tx completed at the same time.
> Currently NAPI is disabled for this driver so this patch does not fix
> any real problem.
> 
> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

Applied.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix data corruption with OOM
From: Rusty Russell @ 2009-10-28 10:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, mst
In-Reply-To: <20091026.182720.81248604.davem@davemloft.net>

On Tue, 27 Oct 2009 11:57:20 am you wrote:
> Anything in a reply to a patch that looks like a signoff or ACK,
> patchwork adds to the commit message in the mbox blob it spits out for
> me.

In case this got lost in the meta-discussion:

Subject: virtio-net: fix data corruption with OOM
Date: Sun, 25 Oct 2009 19:03:40 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>

virtio net used to unlink skbs from send queues on error,
but ever since 48925e372f04f5e35fec6269127c62b2c71ab794
we do not do this. This causes guest data corruption and crashes
with vhost since net core can requeue the skb or free it without
it being taken off the list.

This patch fixes this by queueing the skb after successful
transmit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ comment)
---
 drivers/net/virtio_net.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -516,8 +516,7 @@ again:
 	/* Free up any pending old buffers before queueing new ones. */
 	free_old_xmit_skbs(vi);
 
-	/* Put new one in send queue and do transmit */
-	__skb_queue_head(&vi->send, skb);
+	/* Try to transmit */
 	capacity = xmit_skb(vi, skb);
 
 	/* This can happen with OOM and indirect buffers. */
@@ -531,8 +530,17 @@ again:
 		}
 		return NETDEV_TX_BUSY;
 	}
+	vi->svq->vq_ops->kick(vi->svq);
 
-	vi->svq->vq_ops->kick(vi->svq);
+	/*
+	 * Put new one in send queue.  You'd expect we'd need this before
+	 * xmit_skb calls add_buf(), since the callback can be triggered
+	 * immediately after that.  But since the callback just triggers
+	 * another call back here, normal network xmit locking prevents the
+	 * race.
+	 */
+	__skb_queue_head(&vi->send, skb);
+
 	/* Don't wait up for transmitted skbs to be freed. */
 	skb_orphan(skb);
 	nf_reset(skb);


^ permalink raw reply

* Re: [PATCH 2/2] tc35815: Enable NAPI
From: David Miller @ 2009-10-28 10:57 UTC (permalink / raw)
  To: anemo; +Cc: netdev, ralf.roesch
In-Reply-To: <1256564782-2781-2-git-send-email-anemo@mba.ocn.ne.jp>

From: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Date: Mon, 26 Oct 2009 22:46:22 +0900

> This driver has NAPI code but it has been disabled.  Enable it now.
> The non-napi code will be removed lator.
> 
> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

Applied.

Please remove the NAPI enabling macro and the tests for it.
NAPI support should be unconditional.

If people want to test the pre-NAPI behavior, they can check
out an older copy of the driver quite easily.

Thanks.

^ permalink raw reply

* Re: [PATCH V2]NET/KS8695: add support NAPI for Rx
From: Ben Dooks @ 2009-10-28 10:57 UTC (permalink / raw)
  To: Figo.zhang; +Cc: David S. Miller, netdev
In-Reply-To: <1256653422.2148.23.camel@myhost>

Figo.zhang wrote:
> Add support NAPI Rx API for KS8695NET driver.
> 
> v2, change the Rx function to NAPI.
> 
> in <KS8695X Integrated Multi-port Gateway Solution Register Description
>  v1.0>:
> 
> Interrupt Enable Register (offset 0xE204)
> Bit29 : WAN MAC Receive Interrupt Enable
> Bit16 : LAN MAC Receive Interrupt Enable
> 
> Interrupt Status Register (Offset 0xF208)
> Bit29: WAN MAC Receive Status
> Bit16: LAN MAC Receive Status
> 
> see arch/arm/mach-ks8695/devices.c:
> ks8695_wan_resources[] and ks8695_lan_resources[]
> have IORESOURCE_IRQ , it have define the RX irq,
> for wan, irq = 29; for lan ,irq = 16.
> so we can do this read the interrupt status:
> 
> unsigned long mask_bit = 1 << ksp->rx_irq;
> status = readl(KS8695_IRQ_VA + KS8695_INTST);

It would be nice to see some form of API addition to
the interrupt system to ack interrupts that have been
handled like this, since the irq layer is already
tracking the necessary IRQ->handler mappings.

-- 
Ben Dooks, Software Engineer, Simtec Electronics

http://www.simtec.co.uk/

^ permalink raw reply

* Re: [PATCH] via-velocity: Remove private device list
From: David Miller @ 2009-10-28 11:02 UTC (permalink / raw)
  To: ben; +Cc: romieu, netdev
In-Reply-To: <1256501329.3136.109.camel@localhost>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Sun, 25 Oct 2009 20:08:49 +0000

> via-velocity maintains a list of its devices in order to determine
> whether a netdev notification applies to one of them.  That can be
> determined simply by checking the netdev_ops pointer, so the list can
> be removed.
> 
> Compile-tested only.
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Looks good to me, applied to net-next-2.6

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sysfs: ethtool_ops can be NULL
From: David Miller @ 2009-10-28 11:02 UTC (permalink / raw)
  To: andy; +Cc: eric.dumazet, netdev
In-Reply-To: <20091026134033.GD1639@gospo.rdu.redhat.com>

From: Andy Gospodarek <andy@greyhouse.net>
Date: Mon, 26 Oct 2009 09:40:33 -0400

> On Mon, Oct 26, 2009 at 12:23:33PM +0100, Eric Dumazet wrote:
>> commit d519e17e2d01a0ee9abe083019532061b4438065
>> (net: export device speed and duplex via sysfs)
>> made the wrong assumption that netdev->ethtool_ops was always set.
>> 
>> This makes possible to crash kernel and let rtnl in locked state.
>> 
>> modprobe dummy
>> ip link set dummy0 up
>> (udev runs and crash)
>> 
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
 ...
> Nice catch, Eric.
> 
> Acked-by: Andy Gospodarek <andy@greyhouse.net>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox