netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second
@ 2009-07-06 20:44 Jeff Kirsher
  2009-07-06 20:44 ` [net-next-2.6 PATCH 2/3] e1000: implement jumbo receive with partial descriptors Jeff Kirsher
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jeff Kirsher @ 2009-07-06 20:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Jesse Brandeburg, Jeff Kirsher

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

This patch allows on-the-fly adjustment of the interrupts per second generated
by e1000 devices 82545/82546 (hardware support of ITR register is a
requirement)

adjust using this command:
ethtool -C eth0 rx-usecs 10

where 10 is 10 microseconds per interrupt interval, so 10 = 100,000 interrupts
per second, and 125 = 8000 interrupts per second.

changes should be immediate.

1,3 are special values and indicate the automatic tuning mode to the driver,
where 1 is 4000-90000 interrupts per second and 3 is 4000-20000 interrupts
per second and is the driver default.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000/e1000.h         |    3 ++
 drivers/net/e1000/e1000_ethtool.c |   51 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 53 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h
index e9a416f..c87f2cb 100644
--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -111,6 +111,9 @@ do {									\
 #define E1000_MIN_RXD                       80
 #define E1000_MAX_82544_RXD               4096
 
+#define E1000_MIN_ITR_USECS		10 /* 100000 irq/sec */
+#define E1000_MAX_ITR_USECS		10000 /* 100    irq/sec */
+
 /* this is the size past which hardware will drop packets when setting LPE=0 */
 #define MAXIMUM_ETHERNET_VLAN_SIZE 1522
 
diff --git a/drivers/net/e1000/e1000_ethtool.c b/drivers/net/e1000/e1000_ethtool.c
index c854c96..27f996a 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -1904,6 +1904,53 @@ static int e1000_phys_id(struct net_device *netdev, u32 data)
 	return 0;
 }
 
+static int e1000_get_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+
+	if (adapter->hw.mac_type < e1000_82545)
+		return -EOPNOTSUPP;
+
+	if (adapter->itr_setting <= 3)
+		ec->rx_coalesce_usecs = adapter->itr_setting;
+	else
+		ec->rx_coalesce_usecs = 1000000 / adapter->itr_setting;
+
+	return 0;
+}
+
+static int e1000_set_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	struct e1000_hw *hw = &adapter->hw;
+
+	if (hw->mac_type < e1000_82545)
+		return -EOPNOTSUPP;
+
+	if ((ec->rx_coalesce_usecs > E1000_MAX_ITR_USECS) ||
+	    ((ec->rx_coalesce_usecs > 3) &&
+	     (ec->rx_coalesce_usecs < E1000_MIN_ITR_USECS)) ||
+	    (ec->rx_coalesce_usecs == 2))
+		return -EINVAL;
+
+	if (ec->rx_coalesce_usecs <= 3) {
+		adapter->itr = 20000;
+		adapter->itr_setting = ec->rx_coalesce_usecs;
+	} else {
+		adapter->itr = (1000000 / ec->rx_coalesce_usecs);
+		adapter->itr_setting = adapter->itr & ~3;
+	}
+
+	if (adapter->itr_setting != 0)
+		ew32(ITR, 1000000000 / (adapter->itr * 256));
+	else
+		ew32(ITR, 0);
+
+	return 0;
+}
+
 static int e1000_nway_reset(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
@@ -1978,7 +2025,9 @@ static const struct ethtool_ops e1000_ethtool_ops = {
 	.get_strings            = e1000_get_strings,
 	.phys_id                = e1000_phys_id,
 	.get_ethtool_stats      = e1000_get_ethtool_stats,
-	.get_sset_count		= e1000_get_sset_count,
+	.get_sset_count         = e1000_get_sset_count,
+	.get_coalesce           = e1000_get_coalesce,
+	.set_coalesce           = e1000_set_coalesce,
 };
 
 void e1000_set_ethtool_ops(struct net_device *netdev)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [net-next-2.6 PATCH 2/3] e1000: implement jumbo receive with partial descriptors
  2009-07-06 20:44 [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second Jeff Kirsher
@ 2009-07-06 20:44 ` Jeff Kirsher
  2009-07-06 20:45 ` [net-next-2.6 PATCH 3/3] e1000: fix flow control thresholds Jeff Kirsher
  2009-07-07  1:09 ` [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Kirsher @ 2009-07-06 20:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Jesse Brandeburg, Andrew Morton, Jeff Kirsher

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

This is code extremely similar to what is committed in e1000e
already.

e1000 will no longer request 32kB slab buffers to support jumbo
frames on PCI/PCI-X adapters.  This will significantly reduce the
likelyhood of order:3 allocation failures.

This new code adds support for using pages as receive buffers,
and the driver will chain multiple pages together to build a
jumbo frame for OS consumption.

The hardware takes a power of two buffer size and will
dump as much data as it can receive into 1 or more buffers.

The benefits of applying this are
1) stop akpm's dissing :-) of this lame e1000 behavior [1]
2) more efficient memory allocation (half) when using jumbo
   frames, which will also allow for much better socket utilization
   with jumbos since the socket is charged for the full allocation
   of each receive buffer, regardless of how much is used.
3) this was a feature request by a customer
4) copybreak for small packets < 256 bytes still applies

[1] http://lkml.org/lkml/2008/7/10/68
    http://article.gmane.org/gmane.linux.network/130986

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000/e1000.h      |    4 
 drivers/net/e1000/e1000_main.c |  425 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 391 insertions(+), 38 deletions(-)

diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h
index c87f2cb..1a4f89c 100644
--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -140,7 +140,7 @@ do {									\
 #define E1000_FC_HIGH_DIFF 0x1638  /* High: 5688 bytes below Rx FIFO size */
 #define E1000_FC_LOW_DIFF 0x1640   /* Low:  5696 bytes below Rx FIFO size */
 
-#define E1000_FC_PAUSE_TIME 0x0680 /* 858 usec */
+#define E1000_FC_PAUSE_TIME 0xFFFF /* pause for the max or until send xon */
 
 /* How many Tx Descriptors do we need to call netif_wake_queue ? */
 #define E1000_TX_QUEUE_WAKE	16
@@ -164,6 +164,7 @@ do {									\
 struct e1000_buffer {
 	struct sk_buff *skb;
 	dma_addr_t dma;
+	struct page *page;
 	unsigned long time_stamp;
 	u16 length;
 	u16 next_to_watch;
@@ -205,6 +206,7 @@ struct e1000_rx_ring {
 	unsigned int next_to_clean;
 	/* array of buffer information structs */
 	struct e1000_buffer *buffer_info;
+	struct sk_buff *rx_skb_top;
 
 	/* cpu for rx queue */
 	int cpu;
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 5b8cbdb..f2db9e2 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -137,9 +137,15 @@ static int e1000_clean(struct napi_struct *napi, int budget);
 static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 			       struct e1000_rx_ring *rx_ring,
 			       int *work_done, int work_to_do);
+static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
+				     struct e1000_rx_ring *rx_ring,
+				     int *work_done, int work_to_do);
 static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
-                                   struct e1000_rx_ring *rx_ring,
+				   struct e1000_rx_ring *rx_ring,
 				   int cleaned_count);
+static void e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
+					 struct e1000_rx_ring *rx_ring,
+					 int cleaned_count);
 static int e1000_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd);
 static int e1000_mii_ioctl(struct net_device *netdev, struct ifreq *ifr,
 			   int cmd);
@@ -761,10 +767,7 @@ void e1000_reset(struct e1000_adapter *adapter)
 
 	hw->fc_high_water = fc_high_water_mark;
 	hw->fc_low_water = fc_high_water_mark - 8;
-	if (hw->mac_type == e1000_80003es2lan)
-		hw->fc_pause_time = 0xFFFF;
-	else
-		hw->fc_pause_time = E1000_FC_PAUSE_TIME;
+	hw->fc_pause_time = E1000_FC_PAUSE_TIME;
 	hw->fc_send_xon = 1;
 	hw->fc = hw->original_fc;
 
@@ -1862,6 +1865,7 @@ setup_rx_desc_die:
 
 	rxdr->next_to_clean = 0;
 	rxdr->next_to_use = 0;
+	rxdr->rx_skb_top = NULL;
 
 	return 0;
 }
@@ -1968,10 +1972,17 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
 	struct e1000_hw *hw = &adapter->hw;
 	u32 rdlen, rctl, rxcsum, ctrl_ext;
 
-	rdlen = adapter->rx_ring[0].count *
-		sizeof(struct e1000_rx_desc);
-	adapter->clean_rx = e1000_clean_rx_irq;
-	adapter->alloc_rx_buf = e1000_alloc_rx_buffers;
+	if (adapter->netdev->mtu > ETH_DATA_LEN) {
+		rdlen = adapter->rx_ring[0].count *
+		        sizeof(struct e1000_rx_desc);
+		adapter->clean_rx = e1000_clean_jumbo_rx_irq;
+		adapter->alloc_rx_buf = e1000_alloc_jumbo_rx_buffers;
+	} else {
+		rdlen = adapter->rx_ring[0].count *
+		        sizeof(struct e1000_rx_desc);
+		adapter->clean_rx = e1000_clean_rx_irq;
+		adapter->alloc_rx_buf = e1000_alloc_rx_buffers;
+	}
 
 	/* disable receives while setting up the descriptors */
 	rctl = er32(RCTL);
@@ -2185,26 +2196,39 @@ static void e1000_clean_rx_ring(struct e1000_adapter *adapter,
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
 		buffer_info = &rx_ring->buffer_info[i];
-		if (buffer_info->dma) {
-			pci_unmap_single(pdev,
-					 buffer_info->dma,
-					 buffer_info->length,
-					 PCI_DMA_FROMDEVICE);
+		if (buffer_info->dma &&
+		    adapter->clean_rx == e1000_clean_rx_irq) {
+			pci_unmap_single(pdev, buffer_info->dma,
+			                 buffer_info->length,
+			                 PCI_DMA_FROMDEVICE);
+		} else if (buffer_info->dma &&
+		           adapter->clean_rx == e1000_clean_jumbo_rx_irq) {
+			pci_unmap_page(pdev, buffer_info->dma,
+			               buffer_info->length,
+			               PCI_DMA_FROMDEVICE);
 		}
 
 		buffer_info->dma = 0;
-
+		if (buffer_info->page) {
+			put_page(buffer_info->page);
+			buffer_info->page = NULL;
+		}
 		if (buffer_info->skb) {
 			dev_kfree_skb(buffer_info->skb);
 			buffer_info->skb = NULL;
 		}
 	}
 
+	/* there also may be some cached data from a chained receive */
+	if (rx_ring->rx_skb_top) {
+		dev_kfree_skb(rx_ring->rx_skb_top);
+		rx_ring->rx_skb_top = NULL;
+	}
+
 	size = sizeof(struct e1000_buffer) * rx_ring->count;
 	memset(rx_ring->buffer_info, 0, size);
 
 	/* Zero out the descriptor ring */
-
 	memset(rx_ring->desc, 0, rx_ring->size);
 
 	rx_ring->next_to_clean = 0;
@@ -3489,8 +3513,10 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 
 	/* NOTE: netdev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
 	 * means we reserve 2 more, this pushes us to allocate from the next
-	 * larger slab size
-	 * i.e. RXBUFFER_2048 --> size-4096 slab */
+	 * larger slab size.
+	 * i.e. RXBUFFER_2048 --> size-4096 slab
+	 *  however with the new *_jumbo_rx* routines, jumbo receives will use
+	 *  fragmented skbs */
 
 	if (max_frame <= E1000_RXBUFFER_256)
 		adapter->rx_buffer_len = E1000_RXBUFFER_256;
@@ -3500,12 +3526,12 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 		adapter->rx_buffer_len = E1000_RXBUFFER_1024;
 	else if (max_frame <= E1000_RXBUFFER_2048)
 		adapter->rx_buffer_len = E1000_RXBUFFER_2048;
-	else if (max_frame <= E1000_RXBUFFER_4096)
-		adapter->rx_buffer_len = E1000_RXBUFFER_4096;
-	else if (max_frame <= E1000_RXBUFFER_8192)
-		adapter->rx_buffer_len = E1000_RXBUFFER_8192;
-	else if (max_frame <= E1000_RXBUFFER_16384)
+	else
+#if (PAGE_SIZE >= E1000_RXBUFFER_16384)
 		adapter->rx_buffer_len = E1000_RXBUFFER_16384;
+#elif (PAGE_SIZE >= E1000_RXBUFFER_4096)
+		adapter->rx_buffer_len = PAGE_SIZE;
+#endif
 
 	/* adjust allocation if LPE protects us, and we aren't using SBP */
 	if (!hw->tbi_compatibility_on &&
@@ -3987,9 +4013,227 @@ static void e1000_rx_checksum(struct e1000_adapter *adapter, u32 status_err,
 }
 
 /**
+ * e1000_consume_page - helper function
+ **/
+static void e1000_consume_page(struct e1000_buffer *bi, struct sk_buff *skb,
+                               u16 length)
+{
+	bi->page = NULL;
+	skb->len += length;
+	skb->data_len += length;
+	skb->truesize += length;
+}
+
+/**
+ * e1000_receive_skb - helper function to handle rx indications
+ * @adapter: board private structure
+ * @status: descriptor status field as written by hardware
+ * @vlan: descriptor vlan field as written by hardware (no le/be conversion)
+ * @skb: pointer to sk_buff to be indicated to stack
+ */
+static void e1000_receive_skb(struct e1000_adapter *adapter, u8 status,
+			      __le16 vlan, struct sk_buff *skb)
+{
+	if (unlikely(adapter->vlgrp && (status & E1000_RXD_STAT_VP))) {
+		vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
+		                         le16_to_cpu(vlan) &
+		                         E1000_RXD_SPC_VLAN_MASK);
+	} else {
+		netif_receive_skb(skb);
+	}
+}
+
+/**
+ * e1000_clean_jumbo_rx_irq - Send received data up the network stack; legacy
+ * @adapter: board private structure
+ * @rx_ring: ring to clean
+ * @work_done: amount of napi work completed this call
+ * @work_to_do: max amount of work allowed for this call to do
+ *
+ * the return value indicates whether actual cleaning was done, there
+ * is no guarantee that everything was cleaned
+ */
+static bool e1000_clean_jumbo_rx_irq(struct e1000_adapter *adapter,
+				     struct e1000_rx_ring *rx_ring,
+				     int *work_done, int work_to_do)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	struct net_device *netdev = adapter->netdev;
+	struct pci_dev *pdev = adapter->pdev;
+	struct e1000_rx_desc *rx_desc, *next_rxd;
+	struct e1000_buffer *buffer_info, *next_buffer;
+	unsigned long irq_flags;
+	u32 length;
+	unsigned int i;
+	int cleaned_count = 0;
+	bool cleaned = false;
+	unsigned int total_rx_bytes=0, total_rx_packets=0;
+
+	i = rx_ring->next_to_clean;
+	rx_desc = E1000_RX_DESC(*rx_ring, i);
+	buffer_info = &rx_ring->buffer_info[i];
+
+	while (rx_desc->status & E1000_RXD_STAT_DD) {
+		struct sk_buff *skb;
+		u8 status;
+
+		if (*work_done >= work_to_do)
+			break;
+		(*work_done)++;
+
+		status = rx_desc->status;
+		skb = buffer_info->skb;
+		buffer_info->skb = NULL;
+
+		if (++i == rx_ring->count) i = 0;
+		next_rxd = E1000_RX_DESC(*rx_ring, i);
+		prefetch(next_rxd);
+
+		next_buffer = &rx_ring->buffer_info[i];
+
+		cleaned = true;
+		cleaned_count++;
+		pci_unmap_page(pdev, buffer_info->dma, buffer_info->length,
+		               PCI_DMA_FROMDEVICE);
+		buffer_info->dma = 0;
+
+		length = le16_to_cpu(rx_desc->length);
+
+		/* errors is only valid for DD + EOP descriptors */
+		if (unlikely((status & E1000_RXD_STAT_EOP) &&
+		    (rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK))) {
+			u8 last_byte = *(skb->data + length - 1);
+			if (TBI_ACCEPT(hw, status, rx_desc->errors, length,
+				       last_byte)) {
+				spin_lock_irqsave(&adapter->stats_lock,
+				                  irq_flags);
+				e1000_tbi_adjust_stats(hw, &adapter->stats,
+				                       length, skb->data);
+				spin_unlock_irqrestore(&adapter->stats_lock,
+				                       irq_flags);
+				length--;
+			} else {
+				/* recycle both page and skb */
+				buffer_info->skb = skb;
+				/* an error means any chain goes out the window
+				 * too */
+				if (rx_ring->rx_skb_top)
+					dev_kfree_skb(rx_ring->rx_skb_top);
+				rx_ring->rx_skb_top = NULL;
+				goto next_desc;
+			}
+		}
+
+#define rxtop rx_ring->rx_skb_top
+		if (!(status & E1000_RXD_STAT_EOP)) {
+			/* this descriptor is only the beginning (or middle) */
+			if (!rxtop) {
+				/* this is the beginning of a chain */
+				rxtop = skb;
+				skb_fill_page_desc(rxtop, 0, buffer_info->page,
+				                   0, length);
+			} else {
+				/* this is the middle of a chain */
+				skb_fill_page_desc(rxtop,
+				    skb_shinfo(rxtop)->nr_frags,
+				    buffer_info->page, 0, length);
+				/* re-use the skb, only consumed the page */
+				buffer_info->skb = skb;
+			}
+			e1000_consume_page(buffer_info, rxtop, length);
+			goto next_desc;
+		} else {
+			if (rxtop) {
+				/* end of the chain */
+				skb_fill_page_desc(rxtop,
+				    skb_shinfo(rxtop)->nr_frags,
+				    buffer_info->page, 0, length);
+				/* re-use the current skb, we only consumed the
+				 * page */
+				buffer_info->skb = skb;
+				skb = rxtop;
+				rxtop = NULL;
+				e1000_consume_page(buffer_info, skb, length);
+			} else {
+				/* no chain, got EOP, this buf is the packet
+				 * copybreak to save the put_page/alloc_page */
+				if (length <= copybreak &&
+				    skb_tailroom(skb) >= length) {
+					u8 *vaddr;
+					vaddr = kmap_atomic(buffer_info->page,
+					                    KM_SKB_DATA_SOFTIRQ);
+					memcpy(skb_tail_pointer(skb), vaddr, length);
+					kunmap_atomic(vaddr,
+					              KM_SKB_DATA_SOFTIRQ);
+					/* re-use the page, so don't erase
+					 * buffer_info->page */
+					skb_put(skb, length);
+				} else {
+					skb_fill_page_desc(skb, 0,
+					                   buffer_info->page, 0,
+				                           length);
+					e1000_consume_page(buffer_info, skb,
+					                   length);
+				}
+			}
+		}
+
+		/* Receive Checksum Offload XXX recompute due to CRC strip? */
+		e1000_rx_checksum(adapter,
+		                  (u32)(status) |
+		                  ((u32)(rx_desc->errors) << 24),
+		                  le16_to_cpu(rx_desc->csum), skb);
+
+		pskb_trim(skb, skb->len - 4);
+
+		/* probably a little skewed due to removing CRC */
+		total_rx_bytes += skb->len;
+		total_rx_packets++;
+
+		/* eth type trans needs skb->data to point to something */
+		if (!pskb_may_pull(skb, ETH_HLEN)) {
+			DPRINTK(DRV, ERR, "pskb_may_pull failed.\n");
+			dev_kfree_skb(skb);
+			goto next_desc;
+		}
+
+		skb->protocol = eth_type_trans(skb, netdev);
+
+		e1000_receive_skb(adapter, status, rx_desc->special, skb);
+
+next_desc:
+		rx_desc->status = 0;
+
+		/* return some buffers to hardware, one at a time is too slow */
+		if (unlikely(cleaned_count >= E1000_RX_BUFFER_WRITE)) {
+			adapter->alloc_rx_buf(adapter, rx_ring, cleaned_count);
+			cleaned_count = 0;
+		}
+
+		/* use prefetched values */
+		rx_desc = next_rxd;
+		buffer_info = next_buffer;
+	}
+	rx_ring->next_to_clean = i;
+
+	cleaned_count = E1000_DESC_UNUSED(rx_ring);
+	if (cleaned_count)
+		adapter->alloc_rx_buf(adapter, rx_ring, cleaned_count);
+
+	adapter->total_rx_packets += total_rx_packets;
+	adapter->total_rx_bytes += total_rx_bytes;
+	adapter->net_stats.rx_bytes += total_rx_bytes;
+	adapter->net_stats.rx_packets += total_rx_packets;
+	return cleaned;
+}
+
+/**
  * e1000_clean_rx_irq - Send received data up the network stack; legacy
  * @adapter: board private structure
- **/
+ * @rx_ring: ring to clean
+ * @work_done: amount of napi work completed this call
+ * @work_to_do: max amount of work allowed for this call to do
+ */
 static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 			       struct e1000_rx_ring *rx_ring,
 			       int *work_done, int work_to_do)
@@ -4001,7 +4245,6 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 	struct e1000_buffer *buffer_info, *next_buffer;
 	unsigned long flags;
 	u32 length;
-	u8 last_byte;
 	unsigned int i;
 	int cleaned_count = 0;
 	bool cleaned = false;
@@ -4033,9 +4276,7 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 
 		cleaned = true;
 		cleaned_count++;
-		pci_unmap_single(pdev,
-		                 buffer_info->dma,
-		                 buffer_info->length,
+		pci_unmap_single(pdev, buffer_info->dma, buffer_info->length,
 		                 PCI_DMA_FROMDEVICE);
 		buffer_info->dma = 0;
 
@@ -4052,7 +4293,7 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 		}
 
 		if (unlikely(rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK)) {
-			last_byte = *(skb->data + length - 1);
+			u8 last_byte = *(skb->data + length - 1);
 			if (TBI_ACCEPT(hw, status, rx_desc->errors, length,
 				       last_byte)) {
 				spin_lock_irqsave(&adapter->stats_lock, flags);
@@ -4107,13 +4348,7 @@ static bool e1000_clean_rx_irq(struct e1000_adapter *adapter,
 
 		skb->protocol = eth_type_trans(skb, netdev);
 
-		if (unlikely(adapter->vlgrp &&
-			    (status & E1000_RXD_STAT_VP))) {
-			vlan_hwaccel_receive_skb(skb, adapter->vlgrp,
-						 le16_to_cpu(rx_desc->special));
-		} else {
-			netif_receive_skb(skb);
-		}
+		e1000_receive_skb(adapter, status, rx_desc->special, skb);
 
 next_desc:
 		rx_desc->status = 0;
@@ -4142,6 +4377,114 @@ next_desc:
 }
 
 /**
+ * e1000_alloc_jumbo_rx_buffers - Replace used jumbo receive buffers
+ * @adapter: address of board private structure
+ * @rx_ring: pointer to receive ring structure
+ * @cleaned_count: number of buffers to allocate this pass
+ **/
+
+static void
+e1000_alloc_jumbo_rx_buffers(struct e1000_adapter *adapter,
+                             struct e1000_rx_ring *rx_ring, int cleaned_count)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct pci_dev *pdev = adapter->pdev;
+	struct e1000_rx_desc *rx_desc;
+	struct e1000_buffer *buffer_info;
+	struct sk_buff *skb;
+	unsigned int i;
+	unsigned int bufsz = 256 -
+	                     16 /*for skb_reserve */ -
+	                     NET_IP_ALIGN;
+
+	i = rx_ring->next_to_use;
+	buffer_info = &rx_ring->buffer_info[i];
+
+	while (cleaned_count--) {
+		skb = buffer_info->skb;
+		if (skb) {
+			skb_trim(skb, 0);
+			goto check_page;
+		}
+
+		skb = netdev_alloc_skb(netdev, bufsz);
+		if (unlikely(!skb)) {
+			/* Better luck next round */
+			adapter->alloc_rx_buff_failed++;
+			break;
+		}
+
+		/* Fix for errata 23, can't cross 64kB boundary */
+		if (!e1000_check_64k_bound(adapter, skb->data, bufsz)) {
+			struct sk_buff *oldskb = skb;
+			DPRINTK(PROBE, ERR, "skb align check failed: %u bytes "
+					     "at %p\n", bufsz, skb->data);
+			/* Try again, without freeing the previous */
+			skb = netdev_alloc_skb(netdev, bufsz);
+			/* Failed allocation, critical failure */
+			if (!skb) {
+				dev_kfree_skb(oldskb);
+				adapter->alloc_rx_buff_failed++;
+				break;
+			}
+
+			if (!e1000_check_64k_bound(adapter, skb->data, bufsz)) {
+				/* give up */
+				dev_kfree_skb(skb);
+				dev_kfree_skb(oldskb);
+				break; /* while (cleaned_count--) */
+			}
+
+			/* Use new allocation */
+			dev_kfree_skb(oldskb);
+		}
+		/* Make buffer alignment 2 beyond a 16 byte boundary
+		 * this will result in a 16 byte aligned IP header after
+		 * the 14 byte MAC header is removed
+		 */
+		skb_reserve(skb, NET_IP_ALIGN);
+
+		buffer_info->skb = skb;
+		buffer_info->length = adapter->rx_buffer_len;
+check_page:
+		/* allocate a new page if necessary */
+		if (!buffer_info->page) {
+			buffer_info->page = alloc_page(GFP_ATOMIC);
+			if (unlikely(!buffer_info->page)) {
+				adapter->alloc_rx_buff_failed++;
+				break;
+			}
+		}
+
+		if (!buffer_info->dma)
+			buffer_info->dma = pci_map_page(pdev,
+			                                buffer_info->page, 0,
+			                                buffer_info->length,
+			                                PCI_DMA_FROMDEVICE);
+
+		rx_desc = E1000_RX_DESC(*rx_ring, i);
+		rx_desc->buffer_addr = cpu_to_le64(buffer_info->dma);
+
+		if (unlikely(++i == rx_ring->count))
+			i = 0;
+		buffer_info = &rx_ring->buffer_info[i];
+	}
+
+	if (likely(rx_ring->next_to_use != i)) {
+		rx_ring->next_to_use = i;
+		if (unlikely(i-- == 0))
+			i = (rx_ring->count - 1);
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.  (Only
+		 * applicable for weak-ordered memory model archs,
+		 * such as IA-64). */
+		wmb();
+		writel(i, adapter->hw.hw_addr + rx_ring->rdt);
+	}
+}
+
+/**
  * e1000_alloc_rx_buffers - Replace used receive buffers; legacy & extended
  * @adapter: address of board private structure
  **/
@@ -4186,6 +4529,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 			/* Failed allocation, critical failure */
 			if (!skb) {
 				dev_kfree_skb(oldskb);
+				adapter->alloc_rx_buff_failed++;
 				break;
 			}
 
@@ -4193,6 +4537,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 				/* give up */
 				dev_kfree_skb(skb);
 				dev_kfree_skb(oldskb);
+				adapter->alloc_rx_buff_failed++;
 				break; /* while !buffer_info->skb */
 			}
 
@@ -4210,9 +4555,14 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 map_skb:
 		buffer_info->dma = pci_map_single(pdev,
 						  skb->data,
-						  adapter->rx_buffer_len,
+						  buffer_info->length,
 						  PCI_DMA_FROMDEVICE);
 
+		/*
+		 * XXX if it was allocated cleanly it will never map to a
+		 * boundary crossing
+		 */
+
 		/* Fix for errata 23, can't cross 64kB boundary */
 		if (!e1000_check_64k_bound(adapter,
 					(void *)(unsigned long)buffer_info->dma,
@@ -4229,6 +4579,7 @@ map_skb:
 					 PCI_DMA_FROMDEVICE);
 			buffer_info->dma = 0;
 
+			adapter->alloc_rx_buff_failed++;
 			break; /* while !buffer_info->skb */
 		}
 		rx_desc = E1000_RX_DESC(*rx_ring, i);


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [net-next-2.6 PATCH 3/3] e1000: fix flow control thresholds
  2009-07-06 20:44 [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second Jeff Kirsher
  2009-07-06 20:44 ` [net-next-2.6 PATCH 2/3] e1000: implement jumbo receive with partial descriptors Jeff Kirsher
@ 2009-07-06 20:45 ` Jeff Kirsher
  2009-07-07  1:09 ` [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Kirsher @ 2009-07-06 20:45 UTC (permalink / raw)
  To: davem; +Cc: netdev, gospo, Jesse Brandeburg, Jeff Kirsher

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

when testing the jumbo frames with pages patch, the stats would
show rx_missed errors (dropped packets) even when connected to a
link partner with flow control enabled.

this indicates that for this MTU (9000) the flow control
thresholds are not adjusting correctly.

In fact, before this change, the FCRTH (xoff threshold) is 36864
when the fifo size is only 40000, with 9000 byte MTU.

fix it so that we at least have room for one frame after we send
the xoff.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---

 drivers/net/e1000/e1000_hw.c   |    2 +
 drivers/net/e1000/e1000_hw.h   |    3 --
 drivers/net/e1000/e1000_main.c |   58 ++++++++++++++++++++++------------------
 3 files changed, 33 insertions(+), 30 deletions(-)

diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c
index e1a3fc1..1e5ae11 100644
--- a/drivers/net/e1000/e1000_hw.c
+++ b/drivers/net/e1000/e1000_hw.c
@@ -1955,7 +1955,7 @@ static s32 e1000_setup_copper_link(struct e1000_hw *hw)
     s32 ret_val;
     u16 i;
     u16 phy_data;
-    u16 reg_data;
+    u16 reg_data = 0;
 
     DEBUGFUNC("e1000_setup_copper_link");
 
diff --git a/drivers/net/e1000/e1000_hw.h b/drivers/net/e1000/e1000_hw.h
index 99fce2c..a8866bd 100644
--- a/drivers/net/e1000/e1000_hw.h
+++ b/drivers/net/e1000/e1000_hw.h
@@ -523,11 +523,8 @@ s32 e1000_check_phy_reset_block(struct e1000_hw *hw);
 
 /* The sizes (in bytes) of a ethernet packet */
 #define ENET_HEADER_SIZE             14
-#define MAXIMUM_ETHERNET_FRAME_SIZE  1518 /* With FCS */
 #define MINIMUM_ETHERNET_FRAME_SIZE  64   /* With FCS */
 #define ETHERNET_FCS_SIZE            4
-#define MAXIMUM_ETHERNET_PACKET_SIZE \
-    (MAXIMUM_ETHERNET_FRAME_SIZE - ETHERNET_FCS_SIZE)
 #define MINIMUM_ETHERNET_PACKET_SIZE \
     (MINIMUM_ETHERNET_FRAME_SIZE - ETHERNET_FCS_SIZE)
 #define CRC_LENGTH                   ETHERNET_FCS_SIZE
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index f2db9e2..d7df00c 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -641,8 +641,8 @@ void e1000_reset(struct e1000_adapter *adapter)
 {
 	struct e1000_hw *hw = &adapter->hw;
 	u32 pba = 0, tx_space, min_tx_space, min_rx_space;
-	u16 fc_high_water_mark = E1000_FC_HIGH_DIFF;
 	bool legacy_pba_adjust = false;
+	u16 hwm;
 
 	/* Repartition Pba for greater than 9k mtu
 	 * To take effect CTRL.RST is required.
@@ -686,7 +686,7 @@ void e1000_reset(struct e1000_adapter *adapter)
 	}
 
 	if (legacy_pba_adjust) {
-		if (adapter->netdev->mtu > E1000_RXBUFFER_8192)
+		if (hw->max_frame_size > E1000_RXBUFFER_8192)
 			pba -= 8; /* allocate more FIFO for Tx */
 
 		if (hw->mac_type == e1000_82547) {
@@ -696,14 +696,14 @@ void e1000_reset(struct e1000_adapter *adapter)
 				(E1000_PBA_40K - pba) << E1000_PBA_BYTES_SHIFT;
 			atomic_set(&adapter->tx_fifo_stall, 0);
 		}
-	} else if (hw->max_frame_size > MAXIMUM_ETHERNET_FRAME_SIZE) {
+	} else if (hw->max_frame_size >  ETH_FRAME_LEN + ETH_FCS_LEN) {
 		/* adjust PBA for jumbo frames */
 		ew32(PBA, pba);
 
 		/* To maintain wire speed transmits, the Tx FIFO should be
-		 * large enough to accomodate two full transmit packets,
+		 * large enough to accommodate two full transmit packets,
 		 * rounded up to the next 1KB and expressed in KB.  Likewise,
-		 * the Rx FIFO should be large enough to accomodate at least
+		 * the Rx FIFO should be large enough to accommodate at least
 		 * one full receive packet and is similarly rounded up and
 		 * expressed in KB. */
 		pba = er32(PBA);
@@ -711,13 +711,17 @@ void e1000_reset(struct e1000_adapter *adapter)
 		tx_space = pba >> 16;
 		/* lower 16 bits has Rx packet buffer allocation size in KB */
 		pba &= 0xffff;
-		/* don't include ethernet FCS because hardware appends/strips */
-		min_rx_space = adapter->netdev->mtu + ENET_HEADER_SIZE +
-		               VLAN_TAG_SIZE;
-		min_tx_space = min_rx_space;
-		min_tx_space *= 2;
+		/*
+		 * the tx fifo also stores 16 bytes of information about the tx
+		 * but don't include ethernet FCS because hardware appends it
+		 */
+		min_tx_space = (hw->max_frame_size +
+		                sizeof(struct e1000_tx_desc) -
+		                ETH_FCS_LEN) * 2;
 		min_tx_space = ALIGN(min_tx_space, 1024);
 		min_tx_space >>= 10;
+		/* software strips receive CRC, so leave room for it */
+		min_rx_space = hw->max_frame_size;
 		min_rx_space = ALIGN(min_rx_space, 1024);
 		min_rx_space >>= 10;
 
@@ -754,19 +758,21 @@ void e1000_reset(struct e1000_adapter *adapter)
 
 	ew32(PBA, pba);
 
-	/* flow control settings */
-	/* Set the FC high water mark to 90% of the FIFO size.
-	 * Required to clear last 3 LSB */
-	fc_high_water_mark = ((pba * 9216)/10) & 0xFFF8;
-	/* We can't use 90% on small FIFOs because the remainder
-	 * would be less than 1 full frame.  In this case, we size
-	 * it to allow at least a full frame above the high water
-	 *  mark. */
-	if (pba < E1000_PBA_16K)
-		fc_high_water_mark = (pba * 1024) - 1600;
-
-	hw->fc_high_water = fc_high_water_mark;
-	hw->fc_low_water = fc_high_water_mark - 8;
+	/*
+	 * flow control settings:
+	 * The high water mark must be low enough to fit one full frame
+	 * (or the size used for early receive) above it in the Rx FIFO.
+	 * Set it to the lower of:
+	 * - 90% of the Rx FIFO size, and
+	 * - the full Rx FIFO size minus the early receive size (for parts
+	 *   with ERT support assuming ERT set to E1000_ERT_2048), or
+	 * - the full Rx FIFO size minus one full frame
+	 */
+	hwm = min(((pba << 10) * 9 / 10),
+		  ((pba << 10) - hw->max_frame_size));
+
+	hw->fc_high_water = hwm & 0xFFF8;	/* 8-byte granularity */
+	hw->fc_low_water = hw->fc_high_water - 8;
 	hw->fc_pause_time = E1000_FC_PAUSE_TIME;
 	hw->fc_send_xon = 1;
 	hw->fc = hw->original_fc;
@@ -3474,7 +3480,7 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 	switch (hw->mac_type) {
 	case e1000_undefined ... e1000_82542_rev2_1:
 	case e1000_ich8lan:
-		if (max_frame > MAXIMUM_ETHERNET_FRAME_SIZE) {
+		if (max_frame > (ETH_FRAME_LEN + ETH_FCS_LEN)) {
 			DPRINTK(PROBE, ERR, "Jumbo Frames not supported.\n");
 			return -EINVAL;
 		}
@@ -3487,7 +3493,7 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 		                  &eeprom_data);
 		if ((hw->device_id != E1000_DEV_ID_82573L) ||
 		    (eeprom_data & EEPROM_WORD1A_ASPM_MASK)) {
-			if (max_frame > MAXIMUM_ETHERNET_FRAME_SIZE) {
+			if (max_frame > (ETH_FRAME_LEN + ETH_FCS_LEN)) {
 				DPRINTK(PROBE, ERR,
 			            	"Jumbo Frames not supported.\n");
 				return -EINVAL;
@@ -3535,7 +3541,7 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
 
 	/* adjust allocation if LPE protects us, and we aren't using SBP */
 	if (!hw->tbi_compatibility_on &&
-	    ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
+	    ((max_frame == (ETH_FRAME_LEN + ETH_FCS_LEN)) ||
 	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
 		adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second
  2009-07-06 20:44 [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second Jeff Kirsher
  2009-07-06 20:44 ` [net-next-2.6 PATCH 2/3] e1000: implement jumbo receive with partial descriptors Jeff Kirsher
  2009-07-06 20:45 ` [net-next-2.6 PATCH 3/3] e1000: fix flow control thresholds Jeff Kirsher
@ 2009-07-07  1:09 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2009-07-07  1:09 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, jesse.brandeburg


All 3 patches applied, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-07-07  1:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-06 20:44 [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second Jeff Kirsher
2009-07-06 20:44 ` [net-next-2.6 PATCH 2/3] e1000: implement jumbo receive with partial descriptors Jeff Kirsher
2009-07-06 20:45 ` [net-next-2.6 PATCH 3/3] e1000: fix flow control thresholds Jeff Kirsher
2009-07-07  1:09 ` [net-next-2.6 PATCH 1/3] e1000: allow ethtool coalesece to adjust interrupts per second David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).