Netdev List
 help / color / mirror / Atom feed
* Question about LRO/GRO and TCP acknowledgements
From: Joris van Rantwijk @ 2011-06-11 19:59 UTC (permalink / raw)
  To: netdev

Hi,

I'm trying to understand how Linux produces TCP acknowledgements
for segments received via LRO/GRO.

As far as I can see, the network driver uses GRO to collect several
received packets into one big super skb, which is then handled
during just one call to tcp_v4_rcv(). This will eventually result
in the sending of at most one ACK packet for the entire GRO packet.

Conventional wisdom (RFC 5681) says that a receiver should send at
least one ACK for every two data segments received. The sending TCP
needs these ACKs to update its congestion window (e.g. slow start).

It seems to me that the current implementation in Linux may send
just one ACK for a large number of received segments. This would
be a deviation from the standard. As a result the congestion
window of the sender would grow much slower than intended.

Maybe I misunderstand something in the network code (likely).
Could someone please explain me how this ACK issue is handled?

Thanks,
Joris van Rantwijk.

^ permalink raw reply

* Re: [PATCH] drivers/ssb/driver_chipcommon_pmu.c: uninitilized warning
From: Michael Büsch @ 2011-06-11 18:50 UTC (permalink / raw)
  To: Connor Hansen; +Cc: mb, netdev, linux-kernel
In-Reply-To: <1307816085-20062-1-git-send-email-cmdkhh@gmail.com>

On Sat, 11 Jun 2011 11:14:45 -0700
Connor Hansen <cmdkhh@gmail.com> wrote:

> warning message
> drivers/ssb/driver_chipcommon_pmu.c: In function ssb_pmu_resources_init
> drivers/ssb/driver_chipcommon_pmu.c:420:15: warning: updown_tab_size may
> be used uninitilized in this function.
> 
> updown_tab_size and depend_tab_size may not be set in the bus->chip_id
> switch statement, so set to 0 by default to avoid using uninitialized
> stack space.

We wouldn't be using uninitialized stack space or uninitialized variables,
without this patch. However, for the sake of shutting up the compiler, ACK.

> Signed-off-by: Connor Hansen <cmdkhh@gmail.com>
> ---
>  drivers/ssb/driver_chipcommon_pmu.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ssb/driver_chipcommon_pmu.c b/drivers/ssb/driver_chipcommon_pmu.c
> index 305ade7..a7aef47 100644
> --- a/drivers/ssb/driver_chipcommon_pmu.c
> +++ b/drivers/ssb/driver_chipcommon_pmu.c
> @@ -417,9 +417,9 @@ static void ssb_pmu_resources_init(struct ssb_chipcommon *cc)
>  	u32 min_msk = 0, max_msk = 0;
>  	unsigned int i;
>  	const struct pmu_res_updown_tab_entry *updown_tab = NULL;
> -	unsigned int updown_tab_size;
> +	unsigned int updown_tab_size = 0;
>  	const struct pmu_res_depend_tab_entry *depend_tab = NULL;
> -	unsigned int depend_tab_size;
> +	unsigned int depend_tab_size = 0;
>  
>  	switch (bus->chip_id) {
>  	case 0x4312:

^ permalink raw reply

* [PATCH v3 5/5] net: ep93xx_eth: fix DMA API violations
From: Mika Westerberg @ 2011-06-11 18:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, hsweeten, rmallon, davem, ynezz,
	Mika Westerberg
In-Reply-To: <a908e30a30247a07b2352002742f291d1df0eb18.1307816881.git.mika.westerberg@iki.fi>

Russell King said:
>
> So, to summarize what its doing:
>
> 1. It allocates buffers for rx and tx.
> 2. It maps them with dma_map_single().
>       This transfers ownership of the buffer to the DMA device.
> 3. In ep93xx_xmit,
> 3a. It copies the data into the buffer with skb_copy_and_csum_dev()
>       This violates the DMA buffer ownership rules - the CPU should
>       not be writing to this buffer while it is (in principle) owned
>       by the DMA device.
> 3b. It then calls dma_sync_single_for_cpu() for the buffer.
>       This transfers ownership of the buffer to the CPU, which surely
>       is the wrong direction.
> 4. In ep93xx_rx,
> 4a. It calls dma_sync_single_for_cpu() for the buffer.
>       This at least transfers the DMA buffer ownership to the CPU
>       before the CPU reads the buffer
> 4b. It then uses skb_copy_to_linear_data() to copy the data out.
>       At no point does it transfer ownership back to the DMA device.
> 5. When the driver is removed, it dma_unmap_single()'s the buffer.
>       This transfers ownership of the buffer to the CPU.
> 6. It frees the buffer.
>
> While it may work on ep93xx, it's not respecting the DMA API rules,
> and with DMA debugging enabled it will probably encounter quite a few
> warnings.

This patch fixes these violations.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Tested-by: Petr Stetiar <ynezz@true.cz>
---
 drivers/net/arm/ep93xx_eth.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c
index bef3811..0b46b8e 100644
--- a/drivers/net/arm/ep93xx_eth.c
+++ b/drivers/net/arm/ep93xx_eth.c
@@ -283,10 +283,14 @@ static int ep93xx_rx(struct net_device *dev, int processed, int budget)
 
 		skb = dev_alloc_skb(length + 2);
 		if (likely(skb != NULL)) {
+			struct ep93xx_rdesc *rxd = &ep->descs->rdesc[entry];
 			skb_reserve(skb, 2);
-			dma_sync_single_for_cpu(dev->dev.parent, ep->descs->rdesc[entry].buf_addr,
+			dma_sync_single_for_cpu(dev->dev.parent, rxd->buf_addr,
 						length, DMA_FROM_DEVICE);
 			skb_copy_to_linear_data(skb, ep->rx_buf[entry], length);
+			dma_sync_single_for_device(dev->dev.parent,
+						   rxd->buf_addr, length,
+						   DMA_FROM_DEVICE);
 			skb_put(skb, length);
 			skb->protocol = eth_type_trans(skb, dev);
 
@@ -348,6 +352,7 @@ poll_some_more:
 static int ep93xx_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ep93xx_priv *ep = netdev_priv(dev);
+	struct ep93xx_tdesc *txd;
 	int entry;
 
 	if (unlikely(skb->len > MAX_PKT_SIZE)) {
@@ -359,11 +364,14 @@ static int ep93xx_xmit(struct sk_buff *skb, struct net_device *dev)
 	entry = ep->tx_pointer;
 	ep->tx_pointer = (ep->tx_pointer + 1) & (TX_QUEUE_ENTRIES - 1);
 
-	ep->descs->tdesc[entry].tdesc1 =
-		TDESC1_EOF | (entry << 16) | (skb->len & 0xfff);
+	txd = &ep->descs->tdesc[entry];
+
+	txd->tdesc1 = TDESC1_EOF | (entry << 16) | (skb->len & 0xfff);
+	dma_sync_single_for_cpu(dev->dev.parent, txd->buf_addr, skb->len,
+				DMA_TO_DEVICE);
 	skb_copy_and_csum_dev(skb, ep->tx_buf[entry]);
-	dma_sync_single_for_cpu(dev->dev.parent, ep->descs->tdesc[entry].buf_addr,
-				skb->len, DMA_TO_DEVICE);
+	dma_sync_single_for_device(dev->dev.parent, txd->buf_addr, skb->len,
+				   DMA_TO_DEVICE);
 	dev_kfree_skb(skb);
 
 	spin_lock_irq(&ep->tx_pending_lock);
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH v3 4/5] net: ep93xx_eth: drop GFP_DMA from call to dma_alloc_coherent()
From: Mika Westerberg @ 2011-06-11 18:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, hsweeten, rmallon, davem, ynezz,
	Mika Westerberg
In-Reply-To: <a908e30a30247a07b2352002742f291d1df0eb18.1307816881.git.mika.westerberg@iki.fi>

Commit a197b59ae6e8 (mm: fail GFP_DMA allocations when ZONE_DMA is not
configured) made page allocator to return NULL if GFP_DMA is set but
CONFIG_ZONE_DMA is disabled.

This causes ep93xx_eth to fail:

 WARNING: at mm/page_alloc.c:2251 __alloc_pages_nodemask+0x11c/0x638()
 Modules linked in:
 [<c0035498>] (unwind_backtrace+0x0/0xf4) from [<c0043da4>] (warn_slowpath_common+0x48/0x60)
 [<c0043da4>] (warn_slowpath_common+0x48/0x60) from [<c0043dd8>] (warn_slowpath_null+0x1c/0x24)
 [<c0043dd8>] (warn_slowpath_null+0x1c/0x24) from [<c0083b6c>] (__alloc_pages_nodemask+0x11c/0x638)
 [<c0083b6c>] (__alloc_pages_nodemask+0x11c/0x638) from [<c00366fc>] (__dma_alloc+0x8c/0x3ec)
 [<c00366fc>] (__dma_alloc+0x8c/0x3ec) from [<c0036adc>] (dma_alloc_coherent+0x54/0x60)
 [<c0036adc>] (dma_alloc_coherent+0x54/0x60) from [<c0227808>] (ep93xx_open+0x20/0x864)
 [<c0227808>] (ep93xx_open+0x20/0x864) from [<c0283144>] (__dev_open+0xb8/0x108)
 [<c0283144>] (__dev_open+0xb8/0x108) from [<c0280528>] (__dev_change_flags+0x70/0x128)
 [<c0280528>] (__dev_change_flags+0x70/0x128) from [<c0283054>] (dev_change_flags+0x10/0x48)
 [<c0283054>] (dev_change_flags+0x10/0x48) from [<c001a720>] (ip_auto_config+0x190/0xf68)
 [<c001a720>] (ip_auto_config+0x190/0xf68) from [<c00233b0>] (do_one_initcall+0x34/0x18c)
 [<c00233b0>] (do_one_initcall+0x34/0x18c) from [<c0008400>] (kernel_init+0x94/0x134)
 [<c0008400>] (kernel_init+0x94/0x134) from [<c0030858>] (kernel_thread_exit+0x0/0x8)

Since there is no restrictions for DMA on ep93xx, we can fix this by just
removing the GFP_DMA flag from the call.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Tested-by: Petr Stetiar <ynezz@true.cz>
---
 drivers/net/arm/ep93xx_eth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c
index 97bf6b1..bef3811 100644
--- a/drivers/net/arm/ep93xx_eth.c
+++ b/drivers/net/arm/ep93xx_eth.c
@@ -492,7 +492,7 @@ static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 	int i;
 
 	ep->descs = dma_alloc_coherent(dev, sizeof(struct ep93xx_descs),
-				&ep->descs_dma_addr, GFP_KERNEL | GFP_DMA);
+				&ep->descs_dma_addr, GFP_KERNEL);
 	if (ep->descs == NULL)
 		return 1;
 
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH v3 3/5] net: ep93xx_eth: allocate buffers using kmalloc()
From: Mika Westerberg @ 2011-06-11 18:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, hsweeten, rmallon, davem, ynezz,
	Mika Westerberg
In-Reply-To: <a908e30a30247a07b2352002742f291d1df0eb18.1307816881.git.mika.westerberg@iki.fi>

We can use simply kmalloc() to allocate the buffers. This also simplifies the
code and allows us to perform DMA sync operations more easily.

Memory is allocated with only GFP_KERNEL since there are no DMA allocation
restrictions on this platform.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Tested-by: Petr Stetiar <ynezz@true.cz>
---
 drivers/net/arm/ep93xx_eth.c |   51 ++++++++++++++++-------------------------
 1 files changed, 20 insertions(+), 31 deletions(-)

diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c
index f65dfb6..97bf6b1 100644
--- a/drivers/net/arm/ep93xx_eth.c
+++ b/drivers/net/arm/ep93xx_eth.c
@@ -460,36 +460,32 @@ static void ep93xx_free_buffers(struct ep93xx_priv *ep)
 	struct device *dev = ep->dev->dev.parent;
 	int i;
 
-	for (i = 0; i < RX_QUEUE_ENTRIES; i += 2) {
+	for (i = 0; i < RX_QUEUE_ENTRIES; i++) {
 		dma_addr_t d;
 
 		d = ep->descs->rdesc[i].buf_addr;
 		if (d)
-			dma_unmap_single(dev, d, PAGE_SIZE, DMA_FROM_DEVICE);
+			dma_unmap_single(dev, d, PKT_BUF_SIZE, DMA_FROM_DEVICE);
 
 		if (ep->rx_buf[i] != NULL)
-			free_page((unsigned long)ep->rx_buf[i]);
+			kfree(ep->rx_buf[i]);
 	}
 
-	for (i = 0; i < TX_QUEUE_ENTRIES; i += 2) {
+	for (i = 0; i < TX_QUEUE_ENTRIES; i++) {
 		dma_addr_t d;
 
 		d = ep->descs->tdesc[i].buf_addr;
 		if (d)
-			dma_unmap_single(dev, d, PAGE_SIZE, DMA_TO_DEVICE);
+			dma_unmap_single(dev, d, PKT_BUF_SIZE, DMA_TO_DEVICE);
 
 		if (ep->tx_buf[i] != NULL)
-			free_page((unsigned long)ep->tx_buf[i]);
+			kfree(ep->tx_buf[i]);
 	}
 
 	dma_free_coherent(dev, sizeof(struct ep93xx_descs), ep->descs,
 							ep->descs_dma_addr);
 }
 
-/*
- * The hardware enforces a sub-2K maximum packet size, so we put
- * two buffers on every hardware page.
- */
 static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 {
 	struct device *dev = ep->dev->dev.parent;
@@ -500,48 +496,41 @@ static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 	if (ep->descs == NULL)
 		return 1;
 
-	for (i = 0; i < RX_QUEUE_ENTRIES; i += 2) {
-		void *page;
+	for (i = 0; i < RX_QUEUE_ENTRIES; i++) {
+		void *buf;
 		dma_addr_t d;
 
-		page = (void *)__get_free_page(GFP_KERNEL | GFP_DMA);
-		if (page == NULL)
+		buf = kmalloc(PKT_BUF_SIZE, GFP_KERNEL);
+		if (buf == NULL)
 			goto err;
 
-		d = dma_map_single(dev, page, PAGE_SIZE, DMA_FROM_DEVICE);
+		d = dma_map_single(dev, buf, PKT_BUF_SIZE, DMA_FROM_DEVICE);
 		if (dma_mapping_error(dev, d)) {
-			free_page((unsigned long)page);
+			kfree(buf);
 			goto err;
 		}
 
-		ep->rx_buf[i] = page;
+		ep->rx_buf[i] = buf;
 		ep->descs->rdesc[i].buf_addr = d;
 		ep->descs->rdesc[i].rdesc1 = (i << 16) | PKT_BUF_SIZE;
-
-		ep->rx_buf[i + 1] = page + PKT_BUF_SIZE;
-		ep->descs->rdesc[i + 1].buf_addr = d + PKT_BUF_SIZE;
-		ep->descs->rdesc[i + 1].rdesc1 = ((i + 1) << 16) | PKT_BUF_SIZE;
 	}
 
-	for (i = 0; i < TX_QUEUE_ENTRIES; i += 2) {
-		void *page;
+	for (i = 0; i < TX_QUEUE_ENTRIES; i++) {
+		void *buf;
 		dma_addr_t d;
 
-		page = (void *)__get_free_page(GFP_KERNEL | GFP_DMA);
-		if (page == NULL)
+		buf = kmalloc(PKT_BUF_SIZE, GFP_KERNEL);
+		if (buf == NULL)
 			goto err;
 
-		d = dma_map_single(dev, page, PAGE_SIZE, DMA_TO_DEVICE);
+		d = dma_map_single(dev, buf, PKT_BUF_SIZE, DMA_TO_DEVICE);
 		if (dma_mapping_error(dev, d)) {
-			free_page((unsigned long)page);
+			kfree(buf);
 			goto err;
 		}
 
-		ep->tx_buf[i] = page;
+		ep->tx_buf[i] = buf;
 		ep->descs->tdesc[i].buf_addr = d;
-
-		ep->tx_buf[i + 1] = page + PKT_BUF_SIZE;
-		ep->descs->tdesc[i + 1].buf_addr = d + PKT_BUF_SIZE;
 	}
 
 	return 0;
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH v3 2/5] net: ep93xx_eth: pass struct device to DMA API functions
From: Mika Westerberg @ 2011-06-11 18:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, hsweeten, rmallon, davem, ynezz,
	Mika Westerberg
In-Reply-To: <a908e30a30247a07b2352002742f291d1df0eb18.1307816881.git.mika.westerberg@iki.fi>

We shouldn't use NULL for any DMA API functions, unless we are dealing with
ISA or EISA device. So pass correct struct dev pointer to these functions.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
---
I modified this patch according to Hartley's comments; There is no separate
dma_dev pointer in ep93xx_priv structure but we use SET_NETDEV_DEV() to store
pointer to the platform device which we use as the device for DMA API.

Because this patch is different than the original one, I dropped all the acks
and tested-by's.

 drivers/net/arm/ep93xx_eth.c |   23 +++++++++++++----------
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c
index 5a77001..f65dfb6 100644
--- a/drivers/net/arm/ep93xx_eth.c
+++ b/drivers/net/arm/ep93xx_eth.c
@@ -284,7 +284,7 @@ static int ep93xx_rx(struct net_device *dev, int processed, int budget)
 		skb = dev_alloc_skb(length + 2);
 		if (likely(skb != NULL)) {
 			skb_reserve(skb, 2);
-			dma_sync_single_for_cpu(NULL, ep->descs->rdesc[entry].buf_addr,
+			dma_sync_single_for_cpu(dev->dev.parent, ep->descs->rdesc[entry].buf_addr,
 						length, DMA_FROM_DEVICE);
 			skb_copy_to_linear_data(skb, ep->rx_buf[entry], length);
 			skb_put(skb, length);
@@ -362,7 +362,7 @@ static int ep93xx_xmit(struct sk_buff *skb, struct net_device *dev)
 	ep->descs->tdesc[entry].tdesc1 =
 		TDESC1_EOF | (entry << 16) | (skb->len & 0xfff);
 	skb_copy_and_csum_dev(skb, ep->tx_buf[entry]);
-	dma_sync_single_for_cpu(NULL, ep->descs->tdesc[entry].buf_addr,
+	dma_sync_single_for_cpu(dev->dev.parent, ep->descs->tdesc[entry].buf_addr,
 				skb->len, DMA_TO_DEVICE);
 	dev_kfree_skb(skb);
 
@@ -457,6 +457,7 @@ static irqreturn_t ep93xx_irq(int irq, void *dev_id)
 
 static void ep93xx_free_buffers(struct ep93xx_priv *ep)
 {
+	struct device *dev = ep->dev->dev.parent;
 	int i;
 
 	for (i = 0; i < RX_QUEUE_ENTRIES; i += 2) {
@@ -464,7 +465,7 @@ static void ep93xx_free_buffers(struct ep93xx_priv *ep)
 
 		d = ep->descs->rdesc[i].buf_addr;
 		if (d)
-			dma_unmap_single(NULL, d, PAGE_SIZE, DMA_FROM_DEVICE);
+			dma_unmap_single(dev, d, PAGE_SIZE, DMA_FROM_DEVICE);
 
 		if (ep->rx_buf[i] != NULL)
 			free_page((unsigned long)ep->rx_buf[i]);
@@ -475,13 +476,13 @@ static void ep93xx_free_buffers(struct ep93xx_priv *ep)
 
 		d = ep->descs->tdesc[i].buf_addr;
 		if (d)
-			dma_unmap_single(NULL, d, PAGE_SIZE, DMA_TO_DEVICE);
+			dma_unmap_single(dev, d, PAGE_SIZE, DMA_TO_DEVICE);
 
 		if (ep->tx_buf[i] != NULL)
 			free_page((unsigned long)ep->tx_buf[i]);
 	}
 
-	dma_free_coherent(NULL, sizeof(struct ep93xx_descs), ep->descs,
+	dma_free_coherent(dev, sizeof(struct ep93xx_descs), ep->descs,
 							ep->descs_dma_addr);
 }
 
@@ -491,9 +492,10 @@ static void ep93xx_free_buffers(struct ep93xx_priv *ep)
  */
 static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 {
+	struct device *dev = ep->dev->dev.parent;
 	int i;
 
-	ep->descs = dma_alloc_coherent(NULL, sizeof(struct ep93xx_descs),
+	ep->descs = dma_alloc_coherent(dev, sizeof(struct ep93xx_descs),
 				&ep->descs_dma_addr, GFP_KERNEL | GFP_DMA);
 	if (ep->descs == NULL)
 		return 1;
@@ -506,8 +508,8 @@ static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 		if (page == NULL)
 			goto err;
 
-		d = dma_map_single(NULL, page, PAGE_SIZE, DMA_FROM_DEVICE);
-		if (dma_mapping_error(NULL, d)) {
+		d = dma_map_single(dev, page, PAGE_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(dev, d)) {
 			free_page((unsigned long)page);
 			goto err;
 		}
@@ -529,8 +531,8 @@ static int ep93xx_alloc_buffers(struct ep93xx_priv *ep)
 		if (page == NULL)
 			goto err;
 
-		d = dma_map_single(NULL, page, PAGE_SIZE, DMA_TO_DEVICE);
-		if (dma_mapping_error(NULL, d)) {
+		d = dma_map_single(dev, page, PAGE_SIZE, DMA_TO_DEVICE);
+		if (dma_mapping_error(dev, d)) {
 			free_page((unsigned long)page);
 			goto err;
 		}
@@ -829,6 +831,7 @@ static int ep93xx_eth_probe(struct platform_device *pdev)
 	}
 	ep = netdev_priv(dev);
 	ep->dev = dev;
+	SET_NETDEV_DEV(dev, &pdev->dev);
 	netif_napi_add(dev, &ep->napi, ep93xx_poll, 64);
 
 	platform_set_drvdata(pdev, dev);
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH v3 1/5] ep93xx: set DMA masks for the ep93xx_eth
From: Mika Westerberg @ 2011-06-11 18:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-arm-kernel, hsweeten, rmallon, davem, ynezz,
	Mika Westerberg

Since the driver uses the DMA API, we should pass it valid DMA masks.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Tested-by: Petr Stetiar <ynezz@true.cz>
---
 arch/arm/mach-ep93xx/core.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/arm/mach-ep93xx/core.c b/arch/arm/mach-ep93xx/core.c
index 8207954..1d4b65f 100644
--- a/arch/arm/mach-ep93xx/core.c
+++ b/arch/arm/mach-ep93xx/core.c
@@ -402,11 +402,15 @@ static struct resource ep93xx_eth_resource[] = {
 	}
 };
 
+static u64 ep93xx_eth_dma_mask = DMA_BIT_MASK(32);
+
 static struct platform_device ep93xx_eth_device = {
 	.name		= "ep93xx-eth",
 	.id		= -1,
 	.dev		= {
-		.platform_data	= &ep93xx_eth_data,
+		.platform_data		= &ep93xx_eth_data,
+		.coherent_dma_mask	= DMA_BIT_MASK(32),
+		.dma_mask		= &ep93xx_eth_dma_mask,
 	},
 	.num_resources	= ARRAY_SIZE(ep93xx_eth_resource),
 	.resource	= ep93xx_eth_resource,
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH] drivers/ssb/driver_chipcommon_pmu.c: uninitilized warning
From: Connor Hansen @ 2011-06-11 18:14 UTC (permalink / raw)
  To: mb; +Cc: netdev, linux-kernel, Connor Hansen

warning message
drivers/ssb/driver_chipcommon_pmu.c: In function ssb_pmu_resources_init
drivers/ssb/driver_chipcommon_pmu.c:420:15: warning: updown_tab_size may
be used uninitilized in this function.

updown_tab_size and depend_tab_size may not be set in the bus->chip_id
switch statement, so set to 0 by default to avoid using uninitialized
stack space.

Signed-off-by: Connor Hansen <cmdkhh@gmail.com>
---
 drivers/ssb/driver_chipcommon_pmu.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ssb/driver_chipcommon_pmu.c b/drivers/ssb/driver_chipcommon_pmu.c
index 305ade7..a7aef47 100644
--- a/drivers/ssb/driver_chipcommon_pmu.c
+++ b/drivers/ssb/driver_chipcommon_pmu.c
@@ -417,9 +417,9 @@ static void ssb_pmu_resources_init(struct ssb_chipcommon *cc)
 	u32 min_msk = 0, max_msk = 0;
 	unsigned int i;
 	const struct pmu_res_updown_tab_entry *updown_tab = NULL;
-	unsigned int updown_tab_size;
+	unsigned int updown_tab_size = 0;
 	const struct pmu_res_depend_tab_entry *depend_tab = NULL;
-	unsigned int depend_tab_size;
+	unsigned int depend_tab_size = 0;
 
 	switch (bus->chip_id) {
 	case 0x4312:
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH] ISDN, hfcsusb: Don't leak in hfcsusb_ph_info()
From: Jesper Juhl @ 2011-06-11 16:36 UTC (permalink / raw)
  To: Karsten Keil
  Cc: David S. Miller, netdev, linux-kernel, Peter Sprenger,
	Martin Bachem

We leak the memory allocated to 'phi' when the variable goes out of scope 
in hfcsusb_ph_info().

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 hfcsusb.c |    1 +
 1 file changed, 1 insertion(+)

 I have no way to really test this patch, so it is only compile tested.

diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c
index 3ccbff1..71a8eb6 100644
--- a/drivers/isdn/hardware/mISDN/hfcsusb.c
+++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
@@ -283,6 +283,7 @@ hfcsusb_ph_info(struct hfcsusb *hw)
 	_queue_data(&dch->dev.D, MPH_INFORMATION_IND, MISDN_ID_ANY,
 		sizeof(struct ph_info_dch) + dch->dev.nrbchan *
 		sizeof(struct ph_info_ch), phi, GFP_ATOMIC);
+	kfree(phi);
 }
 
 /*


-- 
Jesper Juhl <jj@chaosbits.net>       http://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

^ permalink raw reply related

* [patch] netpoll: call dev_put() on error in netpoll_setup()
From: Dan Carpenter @ 2011-06-11 15:50 UTC (permalink / raw)
  To: WANG Cong
  Cc: Herbert Xu, Neil Horman, David S. Miller, Eric Dumazet,
	open list:NETWORKING [GENERAL], kernel-janitors

There is a dev_put(ndev) missing on an error path.  This was
introduced in 0c1ad04aecb "netpoll: prevent netpoll setup on slave
devices".

Signed-off-by: Dan Carpenter <error27@gmail.com>
---
This is a static checker bug, and it's possible I've misunderstood
something.

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 42ea4b0..18d9cbd 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -795,7 +795,8 @@ int netpoll_setup(struct netpoll *np)
 	if (ndev->master) {
 		printk(KERN_ERR "%s: %s is a slave device, aborting.\n",
 		       np->name, np->dev_name);
-		return -EBUSY;
+		err = -EBUSY;
+		goto put;
 	}
 
 	if (!netif_running(ndev)) {

^ permalink raw reply related

* bnx2 udp multicast packet loss and jtter
From: Kieran Kunhya @ 2011-06-11 14:32 UTC (permalink / raw)
  To: netdev

Hello,

I get some bad packet loss and high jitter on a BCM5709. This is on
kernel 2.6.38-8 - unfortunately using a newer kernel isn't an option
right now.
The bnx2 driver is version v2.0.23b.

I've tried a lot of kenel types such as -lowlatency, -ck and vanilla.
The thread doing the udp output is at realtime priority and the same
thing happens with different applications.

Example of jitter
http://dl.dropbox.com/u/2701213/jitter.png

Is there anything I can do to further diagnose this problem?

Regards,

Kieran Kunhya

^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: Geert Uytterhoeven @ 2011-06-11 11:39 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, netdev, linux-next, linux-kernel, Greg KH,
	Alexey Dobriyan
In-Reply-To: <20110611142648.bf71b026.sfr@canb.auug.org.au>

On Sat, Jun 11, 2011 at 06:26, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> On Fri, 10 Jun 2011 22:28:03 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> My first bisection between two successive versions of linux-next failed, but
>
> Bisecting between successive linux-next versions is bound not to work.
> Bisection really only works between a commit and one of ist ancestors.

Yeah, I should have known that...

>> the second one between linus and linux-next just finished and points
>> to that commit.
>
> That is a much better bet.

Patch has been sent.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH -next] net/m68k: Include <linux/interrupt.h> where needed
From: Geert Uytterhoeven @ 2011-06-11 11:38 UTC (permalink / raw)
  To: Alexey Dobriyan, David Miller
  Cc: netdev, linux-next, linux-kernel, Geert Uytterhoeven

arch/m68k/emu/nfeth.c: In function ‘nfeth_init’:
arch/m68k/emu/nfeth.c:243: error: implicit declaration of function ‘request_irq’
arch/m68k/emu/nfeth.c:243: error: ‘IRQF_SHARED’ undeclared (first use in this function)
arch/m68k/emu/nfeth.c:243: error: (Each undeclared identifier is reported only once
arch/m68k/emu/nfeth.c:243: error: for each function it appears in.)
arch/m68k/emu/nfeth.c: In function ‘nfeth_cleanup’:
arch/m68k/emu/nfeth.c:266: error: implicit declaration of function ‘free_irq’
drivers/net/apne.c: In function ‘apne_probe’:
drivers/net/apne.c:189: error: implicit declaration of function ‘free_irq’
drivers/net/apne.c: In function ‘apne_probe1’:
drivers/net/apne.c:317: error: implicit declaration of function ‘request_irq’
drivers/net/apne.c:317: error: ‘IRQF_SHARED’ undeclared (first use in this function)
drivers/net/apne.c:317: error: (Each undeclared identifier is reported only once
drivers/net/apne.c:317: error: for each function it appears in.)

Introduced by commit a6b7a407865a ("net: remove interrupt.h inclusion from
netdevice.h").

Include <linux/interrupt.h> in the individual drivers to fix the build.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
---
 arch/m68k/emu/nfeth.c |    1 +
 drivers/net/apne.c    |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/m68k/emu/nfeth.c b/arch/m68k/emu/nfeth.c
index 8b6e201..3c55312 100644
--- a/arch/m68k/emu/nfeth.c
+++ b/arch/m68k/emu/nfeth.c
@@ -16,6 +16,7 @@
 
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
+#include <linux/interrupt.h>
 #include <linux/module.h>
 #include <asm/natfeat.h>
 #include <asm/virtconvert.h>
diff --git a/drivers/net/apne.c b/drivers/net/apne.c
index 2fe60f1..5477373 100644
--- a/drivers/net/apne.c
+++ b/drivers/net/apne.c
@@ -36,6 +36,7 @@
 #include <linux/delay.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
+#include <linux/interrupt.h>
 #include <linux/jiffies.h>
 
 #include <asm/system.h>
-- 
1.7.0.4

^ permalink raw reply related

* Dear Friend
From: Liu Yan @ 2011-06-11 10:53 UTC (permalink / raw)


Dear Friend,

I am Mr. Liu Yan a transfer supervisor in Bank of China Ltd. I have a business plan for you.
On June 5, 2002, a client made a numbered time (Fixed) Deposit in my branch and upon maturity, I sent a routine notification to his forwarding address but got no reply and later discovered from his contract employers that he died from an auto-mobile accident. 
On further investigation, I found out that he died without making a WILL and he did not declare any next of kin or relations in all his official documents with us.

The money is still sitting in my Bank and the interest is being rolled over with the principal sum at the end of each year. No one will ever come forward to claim it.
Consequently, my proposal is that I will like you as a foreigner to stand in as the next of kin to this client so we can share the money before it falls into the hands of some corrupt government officials.
There is no risk at all as my position as the credit officer guarantees the successful execution of this transaction. 
If you are interested, please reply immediately to my private email box, ly@ostatmig.com

Regards,

Liu Yan

^ permalink raw reply

* [smatch] re: bonding:delete lacp_fast from ad_bond_info
From: Dan Carpenter @ 2011-06-11  8:12 UTC (permalink / raw)
  To: panweiping3; +Cc: open list:BONDING DRIVER

There was a dereference before a check added in 56d00c677de0a
"bonding:delete lacp_fast from ad_bond_info"

drivers/net/bonding/bond_3ad.c +1907 bond_3ad_bind_slave(7)
	warn: variable dereferenced before check 'bond'

  1900  int bond_3ad_bind_slave(struct slave *slave)
  1901  {
  1902          struct bonding *bond = bond_get_bond_by_slave(slave);
  1903          int lacp_fast = bond->params.lacp_fast;
                                ^^^^^^^^^^^^
	dereference.

  1904          struct port *port;
  1905          struct aggregator *aggregator;
  1906  
  1907          if (bond == NULL) {
                    ^^^^^^^^^^^^
	check.

  1908                  pr_err("%s: The slave %s is not attached to its bond\n",
  1909                         slave->dev->master->name, slave->dev->name);
  1910                  return -1;
  1911          }

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH net-next-2.6] inetpeer: lower false sharing effect
From: Andi Kleen @ 2011-06-11  7:09 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, Tim Chen, David Miller, netdev, andi
In-Reply-To: <BANLkTim4ikjYJjdW8AWjmPi8hx-a0nh=Ug@mail.gmail.com>

On Sat, Jun 11, 2011 at 02:17:24PM +0800, Changli Gao wrote:
> On Sat, Jun 11, 2011 at 12:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le samedi 11 juin 2011 à 08:54 +0800, Changli Gao a écrit :
> >
> >
> > Whole point of the exercice is to prepare ground for routing cache
> > removal :)
> >
> > If you want a server being hit by millions of clients around the world,
> > routing cache is a real pain because of memory needs.
> >
> 
> Yes. I know the routing cache removal is our goal. But for his
> scenario, there aren't so many routing cache entries, so routing cache
> may be a better option currently. However, if he just wants to

The routing cache has terrible cache line bouncing in the reference
count too.

-andi

^ permalink raw reply

* Re: [PATCH net-next-2.6] inetpeer: lower false sharing effect
From: Changli Gao @ 2011-06-11  6:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tim Chen, David Miller, netdev, andi
In-Reply-To: <1307768052.2872.50.camel@edumazet-laptop>

On Sat, Jun 11, 2011 at 12:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le samedi 11 juin 2011 à 08:54 +0800, Changli Gao a écrit :
>
>
> Whole point of the exercice is to prepare ground for routing cache
> removal :)
>
> If you want a server being hit by millions of clients around the world,
> routing cache is a real pain because of memory needs.
>

Yes. I know the routing cache removal is our goal. But for his
scenario, there aren't so many routing cache entries, so routing cache
may be a better option currently. However, if he just wants to
evaluate the effect of the routing cache removal, it is another thing.
:)

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH net-next-2.6] snmp: reduce percpu needs by 50%
From: Eric Dumazet @ 2011-06-11  5:45 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Andi Kleen, linux-kernel, Benjamin Herrenschmidt,
	Ingo Molnar, Tejun Heo, Christoph Lameter, linux-arch
In-Reply-To: <20110608.235113.2177191884416780789.davem@davemloft.net>

SNMP mibs use two percpu arrays, one used in BH context, another in USER
context. With increasing number of cpus in machines, and fact that ipv6
uses per network device ipstats_mib, this is consuming a lot of memory
if many network devices are registered.

commit be281e554e2a (ipv6: reduce per device ICMP mib sizes) shrinked
percpu needs for ipv6, but we can reduce memory use a bit more.

With recent percpu infrastructure (irqsafe_cpu_inc() ...), we no longer
need this BH/USER separation since we can update counters in a single
x86 instruction, regardless of the BH/USER context.

Other arches than x86 might need to disable irq in their
irqsafe_cpu_inc() implementation : If this happens to be a problem, we
can make SNMP_ARRAY_SZ arch dependent, but a previous poll
( https://lkml.org/lkml/2011/3/17/174 ) to arch maintainers did not
raise strong opposition.

Only on 32bit arches, we need to disable BH for 64bit counters updates
done from USER context (currently used for IP MIB)

This also reduces vmlinux size :

1) x86_64 build
$ size vmlinux.before vmlinux.after
   text	   data	    bss	    dec	    hex	filename
7853650	1293772	1896448	11043870	 a8841e	vmlinux.before
7850578	1293772	1896448	11040798	 a8781e	vmlinux.after

2) i386  build
$ size vmlinux.before vmlinux.afterpatch 
   text	   data	    bss	    dec	    hex	filename
6039335	 635076	3670016	10344427	 9dd7eb	vmlinux.before
6037342	 635076	3670016	10342434	 9dd022	vmlinux.afterpatch

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Tejun Heo <tj@kernel.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org
CC: linux-arch@vger.kernel.org
---
 include/net/snmp.h |   85 ++++++++++++++++---------------------------
 net/ipv4/af_inet.c |   52 +++++++++++---------------
 2 files changed, 55 insertions(+), 82 deletions(-)

diff --git a/include/net/snmp.h b/include/net/snmp.h
index 479083a..8f0f9ac 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -116,57 +116,51 @@ struct linux_xfrm_mib {
 	unsigned long	mibs[LINUX_MIB_XFRMMAX];
 };
 
-/* 
- * FIXME: On x86 and some other CPUs the split into user and softirq parts
- * is not needed because addl $1,memory is atomic against interrupts (but 
- * atomic_inc would be overkill because of the lock cycles). Wants new 
- * nonlocked_atomic_inc() primitives -AK
- */ 
+#define SNMP_ARRAY_SZ 1
+
 #define DEFINE_SNMP_STAT(type, name)	\
-	__typeof__(type) __percpu *name[2]
+	__typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
 #define DEFINE_SNMP_STAT_ATOMIC(type, name)	\
 	__typeof__(type) *name
 #define DECLARE_SNMP_STAT(type, name)	\
-	extern __typeof__(type) __percpu *name[2]
-
-#define SNMP_STAT_BHPTR(name)	(name[0])
-#define SNMP_STAT_USRPTR(name)	(name[1])
+	extern __typeof__(type) __percpu *name[SNMP_ARRAY_SZ]
 
 #define SNMP_INC_STATS_BH(mib, field)	\
 			__this_cpu_inc(mib[0]->mibs[field])
+
 #define SNMP_INC_STATS_USER(mib, field)	\
-			this_cpu_inc(mib[1]->mibs[field])
+			irqsafe_cpu_inc(mib[0]->mibs[field])
+
 #define SNMP_INC_STATS_ATOMIC_LONG(mib, field)	\
 			atomic_long_inc(&mib->mibs[field])
+
 #define SNMP_INC_STATS(mib, field)	\
-			this_cpu_inc(mib[!in_softirq()]->mibs[field])
+			irqsafe_cpu_inc(mib[0]->mibs[field])
+
 #define SNMP_DEC_STATS(mib, field)	\
-			this_cpu_dec(mib[!in_softirq()]->mibs[field])
+			irqsafe_cpu_dec(mib[0]->mibs[field])
+
 #define SNMP_ADD_STATS_BH(mib, field, addend)	\
 			__this_cpu_add(mib[0]->mibs[field], addend)
+
 #define SNMP_ADD_STATS_USER(mib, field, addend)	\
-			this_cpu_add(mib[1]->mibs[field], addend)
+			irqsafe_cpu_add(mib[0]->mibs[field], addend)
+
 #define SNMP_ADD_STATS(mib, field, addend)	\
-			this_cpu_add(mib[!in_softirq()]->mibs[field], addend)
+			irqsafe_cpu_add(mib[0]->mibs[field], addend)
 /*
  * Use "__typeof__(*mib[0]) *ptr" instead of "__typeof__(mib[0]) ptr"
  * to make @ptr a non-percpu pointer.
  */
 #define SNMP_UPD_PO_STATS(mib, basefield, addend)	\
 	do { \
-		__typeof__(*mib[0]) *ptr; \
-		preempt_disable(); \
-		ptr = this_cpu_ptr((mib)[!in_softirq()]); \
-		ptr->mibs[basefield##PKTS]++; \
-		ptr->mibs[basefield##OCTETS] += addend;\
-		preempt_enable(); \
+		irqsafe_cpu_inc(mib[0]->mibs[basefield##PKTS]);		\
+		irqsafe_cpu_add(mib[0]->mibs[basefield##OCTETS], addend);	\
 	} while (0)
 #define SNMP_UPD_PO_STATS_BH(mib, basefield, addend)	\
 	do { \
-		__typeof__(*mib[0]) *ptr = \
-			__this_cpu_ptr((mib)[0]); \
-		ptr->mibs[basefield##PKTS]++; \
-		ptr->mibs[basefield##OCTETS] += addend;\
+		__this_cpu_inc(mib[0]->mibs[basefield##PKTS]);		\
+		__this_cpu_add(mib[0]->mibs[basefield##OCTETS], addend);	\
 	} while (0)
 
 
@@ -179,40 +173,20 @@ struct linux_xfrm_mib {
 		ptr->mibs[field] += addend;				\
 		u64_stats_update_end(&ptr->syncp);			\
 	} while (0)
+
 #define SNMP_ADD_STATS64_USER(mib, field, addend) 			\
 	do {								\
-		__typeof__(*mib[0]) *ptr;				\
-		preempt_disable();					\
-		ptr = __this_cpu_ptr((mib)[1]);				\
-		u64_stats_update_begin(&ptr->syncp);			\
-		ptr->mibs[field] += addend;				\
-		u64_stats_update_end(&ptr->syncp);			\
-		preempt_enable();					\
+		local_bh_disable();					\
+		SNMP_ADD_STATS64_BH(mib, field, addend);		\
+		local_bh_enable();					\
 	} while (0)
+
 #define SNMP_ADD_STATS64(mib, field, addend)				\
-	do {								\
-		__typeof__(*mib[0]) *ptr;				\
-		preempt_disable();					\
-		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
-		u64_stats_update_begin(&ptr->syncp);			\
-		ptr->mibs[field] += addend;				\
-		u64_stats_update_end(&ptr->syncp);			\
-		preempt_enable();					\
-	} while (0)
+		SNMP_ADD_STATS64_USER(mib, field, addend)
+
 #define SNMP_INC_STATS64_BH(mib, field) SNMP_ADD_STATS64_BH(mib, field, 1)
 #define SNMP_INC_STATS64_USER(mib, field) SNMP_ADD_STATS64_USER(mib, field, 1)
 #define SNMP_INC_STATS64(mib, field) SNMP_ADD_STATS64(mib, field, 1)
-#define SNMP_UPD_PO_STATS64(mib, basefield, addend)			\
-	do {								\
-		__typeof__(*mib[0]) *ptr;				\
-		preempt_disable();					\
-		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
-		u64_stats_update_begin(&ptr->syncp);			\
-		ptr->mibs[basefield##PKTS]++;				\
-		ptr->mibs[basefield##OCTETS] += addend;			\
-		u64_stats_update_end(&ptr->syncp);			\
-		preempt_enable();					\
-	} while (0)
 #define SNMP_UPD_PO_STATS64_BH(mib, basefield, addend)			\
 	do {								\
 		__typeof__(*mib[0]) *ptr;				\
@@ -222,6 +196,12 @@ struct linux_xfrm_mib {
 		ptr->mibs[basefield##OCTETS] += addend;			\
 		u64_stats_update_end(&ptr->syncp);			\
 	} while (0)
+#define SNMP_UPD_PO_STATS64(mib, basefield, addend)			\
+	do {								\
+		local_bh_disable();					\
+		SNMP_UPD_PO_STATS64_BH(mib, basefield, addend);		\
+		local_bh_enable();					\
+	} while (0)
 #else
 #define SNMP_INC_STATS64_BH(mib, field)		SNMP_INC_STATS_BH(mib, field)
 #define SNMP_INC_STATS64_USER(mib, field)	SNMP_INC_STATS_USER(mib, field)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9c19260..260ccf0 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1437,11 +1437,11 @@ EXPORT_SYMBOL_GPL(inet_ctl_sock_create);
 unsigned long snmp_fold_field(void __percpu *mib[], int offt)
 {
 	unsigned long res = 0;
-	int i;
+	int i, j;
 
 	for_each_possible_cpu(i) {
-		res += *(((unsigned long *) per_cpu_ptr(mib[0], i)) + offt);
-		res += *(((unsigned long *) per_cpu_ptr(mib[1], i)) + offt);
+		for (j = 0; j < SNMP_ARRAY_SZ; j++)
+			res += *(((unsigned long *) per_cpu_ptr(mib[j], i)) + offt);
 	}
 	return res;
 }
@@ -1455,28 +1455,19 @@ u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp_offset)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		void *bhptr, *userptr;
+		void *bhptr;
 		struct u64_stats_sync *syncp;
-		u64 v_bh, v_user;
+		u64 v;
 		unsigned int start;
 
-		/* first mib used by softirq context, we must use _bh() accessors */
-		bhptr = per_cpu_ptr(SNMP_STAT_BHPTR(mib), cpu);
+		bhptr = per_cpu_ptr(mib[0], cpu);
 		syncp = (struct u64_stats_sync *)(bhptr + syncp_offset);
 		do {
 			start = u64_stats_fetch_begin_bh(syncp);
-			v_bh = *(((u64 *) bhptr) + offt);
+			v = *(((u64 *) bhptr) + offt);
 		} while (u64_stats_fetch_retry_bh(syncp, start));
 
-		/* second mib used in USER context */
-		userptr = per_cpu_ptr(SNMP_STAT_USRPTR(mib), cpu);
-		syncp = (struct u64_stats_sync *)(userptr + syncp_offset);
-		do {
-			start = u64_stats_fetch_begin(syncp);
-			v_user = *(((u64 *) userptr) + offt);
-		} while (u64_stats_fetch_retry(syncp, start));
-
-		res += v_bh + v_user;
+		res += v;
 	}
 	return res;
 }
@@ -1488,25 +1479,28 @@ int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align)
 	BUG_ON(ptr == NULL);
 	ptr[0] = __alloc_percpu(mibsize, align);
 	if (!ptr[0])
-		goto err0;
+		return -ENOMEM;
+#if SNMP_ARRAY_SZ == 2 
 	ptr[1] = __alloc_percpu(mibsize, align);
-	if (!ptr[1])
-		goto err1;
+	if (!ptr[1]) {
+		free_percpu(ptr[0]);
+		ptr[0] = NULL;
+		return -ENOMEM;
+	}
+#endif
 	return 0;
-err1:
-	free_percpu(ptr[0]);
-	ptr[0] = NULL;
-err0:
-	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(snmp_mib_init);
 
-void snmp_mib_free(void __percpu *ptr[2])
+void snmp_mib_free(void __percpu *ptr[SNMP_ARRAY_SZ])
 {
+	int i;
+
 	BUG_ON(ptr == NULL);
-	free_percpu(ptr[0]);
-	free_percpu(ptr[1]);
-	ptr[0] = ptr[1] = NULL;
+	for (i = 0; i < SNMP_ARRAY_SZ; i++) {
+		free_percpu(ptr[i]);
+		ptr[i] = NULL;
+	}
 }
 EXPORT_SYMBOL_GPL(snmp_mib_free);
 

^ permalink raw reply related

* Re: [net-next 13/13] ixgbe: use per NUMA node lock for FCoE DDP
From: Eric Dumazet @ 2011-06-11  5:42 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Vasu Dev, netdev, gospo
In-Reply-To: <1307769482.2872.62.camel@edumazet-laptop>

Le samedi 11 juin 2011 à 07:18 +0200, Eric Dumazet a écrit :
> >  /**
> > diff --git a/drivers/net/ixgbe/ixgbe_fcoe.h b/drivers/net/ixgbe/ixgbe_fcoe.h
> > index d876e7a..8618892 100644
> > --- a/drivers/net/ixgbe/ixgbe_fcoe.h
> > +++ b/drivers/net/ixgbe/ixgbe_fcoe.h
> > @@ -69,6 +69,7 @@ struct ixgbe_fcoe {
> >  	struct pci_pool **pool;
> >  	atomic_t refcnt;
> >  	spinlock_t lock;
> > +	struct spinlock **node_lock;
> 
> Wont this read_mostly pointer sits in often modified cache line ?
> 
> >  	struct ixgbe_fcoe_ddp ddp[IXGBE_FCOE_DDP_MAX];
> >  	unsigned char *extra_ddp_buffer;
> >  	dma_addr_t extra_ddp_buffer_dma;
> 

This patch seems overkill to me, have you tried the more simple way I
did in commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527
(net: add additional lock to qdisc to increase throughput )

(remember you must place ->busylock in a separate cache line, to not
slow down the two cpus that have access to ->lock)

struct ixgbe_fcoe could probably be more carefuly reordered to lower
false sharing

I kindly ask you guys provide actual perf numbers between

1) before any patch
2) After your multilevel per numanode locks
3) A more simple way (my suggestion of adding a single 'busylock')

Thanks



^ permalink raw reply

* Re: [net-next 13/13] ixgbe: use per NUMA node lock for FCoE DDP
From: Eric Dumazet @ 2011-06-11  5:18 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Vasu Dev, netdev, gospo
In-Reply-To: <1307761341-5267-14-git-send-email-jeffrey.t.kirsher@intel.com>

Le vendredi 10 juin 2011 à 20:02 -0700, Jeff Kirsher a écrit :
> From: Vasu Dev <vasu.dev@intel.com>
> 
> Adds per NUMA node lock to have CPU pass thru its NUMA lock
> first before contending for global DDP fcoe->lock to setup
> DDP lock, this is to reduce contentions across NUMA nodes.
> 
> Allocates and initialize per NUMA node lock in added
> ixgbe_fcoe_lock_init and then have current CPU's numa_node_id
> based NUMA node lock acquired before taking global fcoe->lock.
> 
> The node lock is allocated from its NUMA node using kzalloc_node.
> 
> Signed-off-by: Vasu Dev <vasu.dev@intel.com>
> Tested-by: Ross Brattain <ross.b.brattain@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>  drivers/net/ixgbe/ixgbe_fcoe.c |   50 ++++++++++++++++++++++++++++++++++++++-
>  drivers/net/ixgbe/ixgbe_fcoe.h |    1 +
>  2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_fcoe.c b/drivers/net/ixgbe/ixgbe_fcoe.c
> index f5f39ed..aadff4f 100644
> --- a/drivers/net/ixgbe/ixgbe_fcoe.c
> +++ b/drivers/net/ixgbe/ixgbe_fcoe.c
> @@ -109,6 +109,7 @@ int ixgbe_fcoe_ddp_put(struct net_device *netdev, u16 xid)
>  	len = ddp->len;
>  	/* if there an error, force to invalidate ddp context */
>  	if (ddp->err) {
> +		spin_lock(fcoe->node_lock[numa_node_id()]);
>  		spin_lock_bh(&fcoe->lock);
>  		IXGBE_WRITE_REG(&adapter->hw, IXGBE_FCFLT, 0);
>  		IXGBE_WRITE_REG(&adapter->hw, IXGBE_FCFLTRW,
> @@ -122,6 +123,7 @@ int ixgbe_fcoe_ddp_put(struct net_device *netdev, u16 xid)
>  				(xid | IXGBE_FCDMARW_RE));
>  		fcbuff = IXGBE_READ_REG(&adapter->hw, IXGBE_FCBUFF);
>  		spin_unlock_bh(&fcoe->lock);
> +		spin_unlock(fcoe->node_lock[numa_node_id()]);
>  		if (fcbuff & IXGBE_FCBUFF_VALID)
>  			udelay(100);
>  	}
> @@ -294,6 +296,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
>  
>  	/* program DMA context */
>  	hw = &adapter->hw;
> +	spin_lock(fcoe->node_lock[numa_node_id()]);
>  	spin_lock_bh(&fcoe->lock);
>  
>  	/* turn on last frame indication for target mode as FCP_RSPtarget is
> @@ -315,6 +318,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
>  	IXGBE_WRITE_REG(hw, IXGBE_FCFLTRW, fcfltrw);
>  
>  	spin_unlock_bh(&fcoe->lock);
> +	spin_unlock(fcoe->node_lock[numa_node_id()]);
>  
>  	return 1;
>  
> @@ -634,6 +638,42 @@ static void ixgbe_fcoe_ddp_pools_alloc(struct ixgbe_adapter *adapter)
>  	}
>  }
>  
> +static void ixgbe_fcoe_locks_free(struct ixgbe_fcoe *fcoe)
> +{
> +	int node;
> +
> +	if (!fcoe->node_lock)
> +		return;
> +
> +	for_each_node_with_cpus(node)
> +			kfree(fcoe->node_lock[node]);
> +
> +	kfree(fcoe->node_lock);
> +	fcoe->node_lock = NULL;
> +}
> +
> +static void ixgbe_fcoe_lock_init(struct ixgbe_fcoe *fcoe)
> +{
> +	int node;
> +	spinlock_t *node_lock;
> +
> +	fcoe->node_lock = kzalloc(sizeof(node_lock) * num_possible_nodes(),
> +				  GFP_KERNEL);

Hmm...

1) Think of what happens if some machine has 3 possible nodes : 0, 2, 3

	-> You should use nr_node_ids instead of num_possible_nodes() 

2) Make sure this block cant have false sharing : Allocate at least a
full cache line : On a typical 2 node machine, you currently allocate
16bytes of memory, and this small block could share a contended cache
line.


> +	if (!fcoe->node_lock)
> +		return;
> +
> +	for_each_node_with_cpus(node) {
> +		node_lock = kzalloc_node(sizeof(*node_lock) , GFP_KERNEL, node);
> +		if (!node_lock) {
> +			ixgbe_fcoe_locks_free(fcoe);
> +			return;
> +		}
> +		spin_lock_init(node_lock);
> +		fcoe->node_lock[node] = node_lock;
> +	}
> +	spin_lock_init(&fcoe->lock);
> +}
> +

...

>  
>  /**
> diff --git a/drivers/net/ixgbe/ixgbe_fcoe.h b/drivers/net/ixgbe/ixgbe_fcoe.h
> index d876e7a..8618892 100644
> --- a/drivers/net/ixgbe/ixgbe_fcoe.h
> +++ b/drivers/net/ixgbe/ixgbe_fcoe.h
> @@ -69,6 +69,7 @@ struct ixgbe_fcoe {
>  	struct pci_pool **pool;
>  	atomic_t refcnt;
>  	spinlock_t lock;
> +	struct spinlock **node_lock;

Wont this read_mostly pointer sits in often modified cache line ?

>  	struct ixgbe_fcoe_ddp ddp[IXGBE_FCOE_DDP_MAX];
>  	unsigned char *extra_ddp_buffer;
>  	dma_addr_t extra_ddp_buffer_dma;



^ permalink raw reply

* Re: [PATCH net-next-2.6] inetpeer: lower false sharing effect
From: Eric Dumazet @ 2011-06-11  4:54 UTC (permalink / raw)
  To: Changli Gao; +Cc: Tim Chen, David Miller, netdev, andi
In-Reply-To: <BANLkTikU_ug=zUdfOZAX6eJ+BD9ZZvDrJA@mail.gmail.com>

Le samedi 11 juin 2011 à 08:54 +0800, Changli Gao a écrit :

> Did you disable routing cache when profiling? If so, enable it and try again.
> 

Whole point of the exercice is to prepare ground for routing cache
removal :)

If you want a server being hit by millions of clients around the world,
routing cache is a real pain because of memory needs.



^ permalink raw reply

* Re: linux-next: build failure after merge of the net tree
From: Stephen Rothwell @ 2011-06-11  4:26 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David Miller, netdev, linux-next, linux-kernel, Greg KH,
	Alexey Dobriyan
In-Reply-To: <BANLkTi=fn1qxcJGs7Ngag6ph2uMECiyWLg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

Hi Geert,

On Fri, 10 Jun 2011 22:28:03 +0200 Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> On Thu, Jun 9, 2011 at 06:56, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > I assume that this is fallout from commit a6b7a407865a ("net: remove
> > interrupt.h inclusion from netdevice.h").
> 
> Ah, that's were the failures on m68k come from:
> http://kisskb.ellerman.id.au/kisskb/buildresult/4222666/
> http://kisskb.ellerman.id.au/kisskb/buildresult/4223000/

Yep.

> My first bisection between two successive versions of linux-next failed, but

Bisecting between successive linux-next versions is bound not to work.
Bisection really only works between a commit and one of ist ancestors.

> the second one between linus and linux-next just finished and points
> to that commit.

That is a much better bet.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [net-next 13/13] ixgbe: use per NUMA node lock for FCoE DDP
From: Jeff Kirsher @ 2011-06-11  3:02 UTC (permalink / raw)
  To: davem; +Cc: Vasu Dev, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307761341-5267-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Vasu Dev <vasu.dev@intel.com>

Adds per NUMA node lock to have CPU pass thru its NUMA lock
first before contending for global DDP fcoe->lock to setup
DDP lock, this is to reduce contentions across NUMA nodes.

Allocates and initialize per NUMA node lock in added
ixgbe_fcoe_lock_init and then have current CPU's numa_node_id
based NUMA node lock acquired before taking global fcoe->lock.

The node lock is allocated from its NUMA node using kzalloc_node.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ixgbe/ixgbe_fcoe.c |   50 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_fcoe.h |    1 +
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_fcoe.c b/drivers/net/ixgbe/ixgbe_fcoe.c
index f5f39ed..aadff4f 100644
--- a/drivers/net/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ixgbe/ixgbe_fcoe.c
@@ -109,6 +109,7 @@ int ixgbe_fcoe_ddp_put(struct net_device *netdev, u16 xid)
 	len = ddp->len;
 	/* if there an error, force to invalidate ddp context */
 	if (ddp->err) {
+		spin_lock(fcoe->node_lock[numa_node_id()]);
 		spin_lock_bh(&fcoe->lock);
 		IXGBE_WRITE_REG(&adapter->hw, IXGBE_FCFLT, 0);
 		IXGBE_WRITE_REG(&adapter->hw, IXGBE_FCFLTRW,
@@ -122,6 +123,7 @@ int ixgbe_fcoe_ddp_put(struct net_device *netdev, u16 xid)
 				(xid | IXGBE_FCDMARW_RE));
 		fcbuff = IXGBE_READ_REG(&adapter->hw, IXGBE_FCBUFF);
 		spin_unlock_bh(&fcoe->lock);
+		spin_unlock(fcoe->node_lock[numa_node_id()]);
 		if (fcbuff & IXGBE_FCBUFF_VALID)
 			udelay(100);
 	}
@@ -294,6 +296,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 
 	/* program DMA context */
 	hw = &adapter->hw;
+	spin_lock(fcoe->node_lock[numa_node_id()]);
 	spin_lock_bh(&fcoe->lock);
 
 	/* turn on last frame indication for target mode as FCP_RSPtarget is
@@ -315,6 +318,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 	IXGBE_WRITE_REG(hw, IXGBE_FCFLTRW, fcfltrw);
 
 	spin_unlock_bh(&fcoe->lock);
+	spin_unlock(fcoe->node_lock[numa_node_id()]);
 
 	return 1;
 
@@ -634,6 +638,42 @@ static void ixgbe_fcoe_ddp_pools_alloc(struct ixgbe_adapter *adapter)
 	}
 }
 
+static void ixgbe_fcoe_locks_free(struct ixgbe_fcoe *fcoe)
+{
+	int node;
+
+	if (!fcoe->node_lock)
+		return;
+
+	for_each_node_with_cpus(node)
+			kfree(fcoe->node_lock[node]);
+
+	kfree(fcoe->node_lock);
+	fcoe->node_lock = NULL;
+}
+
+static void ixgbe_fcoe_lock_init(struct ixgbe_fcoe *fcoe)
+{
+	int node;
+	spinlock_t *node_lock;
+
+	fcoe->node_lock = kzalloc(sizeof(node_lock) * num_possible_nodes(),
+				  GFP_KERNEL);
+	if (!fcoe->node_lock)
+		return;
+
+	for_each_node_with_cpus(node) {
+		node_lock = kzalloc_node(sizeof(*node_lock) , GFP_KERNEL, node);
+		if (!node_lock) {
+			ixgbe_fcoe_locks_free(fcoe);
+			return;
+		}
+		spin_lock_init(node_lock);
+		fcoe->node_lock[node] = node_lock;
+	}
+	spin_lock_init(&fcoe->lock);
+}
+
 /**
  * ixgbe_configure_fcoe - configures registers for fcoe at start
  * @adapter: ptr to ixgbe adapter
@@ -654,12 +694,16 @@ void ixgbe_configure_fcoe(struct ixgbe_adapter *adapter)
 #endif
 
 	if (!fcoe->pool) {
-		spin_lock_init(&fcoe->lock);
+		ixgbe_fcoe_lock_init(fcoe);
+		if (!fcoe->node_lock) {
+			e_err(drv, "failed to initialize fcoe locks\n");
+			return;
+		}
 
 		ixgbe_fcoe_ddp_pools_alloc(adapter);
 		if (!fcoe->pool) {
 			e_err(drv, "failed to alloc percpu fcoe DDP pools\n");
-			return;
+			goto out_locks_free;
 		}
 
 		/* Extra buffer to be shared by all DDPs for HW work around */
@@ -735,6 +779,8 @@ out_extra_ddp_buffer:
 	kfree(fcoe->extra_ddp_buffer);
 out_ddp_pools:
 	ixgbe_fcoe_ddp_pools_free(fcoe);
+out_locks_free:
+	ixgbe_fcoe_locks_free(fcoe);
 }
 
 /**
diff --git a/drivers/net/ixgbe/ixgbe_fcoe.h b/drivers/net/ixgbe/ixgbe_fcoe.h
index d876e7a..8618892 100644
--- a/drivers/net/ixgbe/ixgbe_fcoe.h
+++ b/drivers/net/ixgbe/ixgbe_fcoe.h
@@ -69,6 +69,7 @@ struct ixgbe_fcoe {
 	struct pci_pool **pool;
 	atomic_t refcnt;
 	spinlock_t lock;
+	struct spinlock **node_lock;
 	struct ixgbe_fcoe_ddp ddp[IXGBE_FCOE_DDP_MAX];
 	unsigned char *extra_ddp_buffer;
 	dma_addr_t extra_ddp_buffer_dma;
-- 
1.7.5.2


^ permalink raw reply related

* [net-next 11/13] ixgbe: add support for Dell CEM
From: Jeff Kirsher @ 2011-06-11  3:02 UTC (permalink / raw)
  To: davem; +Cc: Emil Tantilov, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307761341-5267-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

This patch adds support for Dell CEM (Comprehensive Embedded Management)).
This consists of informing the management firmware of the driver version
during probe on 82599 and X540 HW.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Evan Swanson <evan.swanson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ixgbe/ixgbe_82598.c  |    1 +
 drivers/net/ixgbe/ixgbe_82599.c  |    1 +
 drivers/net/ixgbe/ixgbe_common.c |  174 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_common.h |    2 +
 drivers/net/ixgbe/ixgbe_main.c   |    4 +
 drivers/net/ixgbe/ixgbe_type.h   |   46 ++++++++++
 drivers/net/ixgbe/ixgbe_x540.c   |    1 +
 7 files changed, 229 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_82598.c b/drivers/net/ixgbe/ixgbe_82598.c
index bb417d7..0d4e382 100644
--- a/drivers/net/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ixgbe/ixgbe_82598.c
@@ -1316,6 +1316,7 @@ static struct ixgbe_mac_operations mac_ops_82598 = {
 	.clear_vfta		= &ixgbe_clear_vfta_82598,
 	.set_vfta		= &ixgbe_set_vfta_82598,
 	.fc_enable		= &ixgbe_fc_enable_82598,
+	.set_fw_drv_ver         = NULL,
 	.acquire_swfw_sync      = &ixgbe_acquire_swfw_sync,
 	.release_swfw_sync      = &ixgbe_release_swfw_sync,
 };
diff --git a/drivers/net/ixgbe/ixgbe_82599.c b/drivers/net/ixgbe/ixgbe_82599.c
index 324a505..4a6826b 100644
--- a/drivers/net/ixgbe/ixgbe_82599.c
+++ b/drivers/net/ixgbe/ixgbe_82599.c
@@ -2126,6 +2126,7 @@ static struct ixgbe_mac_operations mac_ops_82599 = {
 	.clear_vfta             = &ixgbe_clear_vfta_generic,
 	.set_vfta               = &ixgbe_set_vfta_generic,
 	.fc_enable              = &ixgbe_fc_enable_generic,
+	.set_fw_drv_ver         = &ixgbe_set_fw_drv_ver_generic,
 	.init_uta_tables        = &ixgbe_init_uta_tables_generic,
 	.setup_sfp              = &ixgbe_setup_sfp_modules_82599,
 	.set_mac_anti_spoofing  = &ixgbe_set_mac_anti_spoofing,
diff --git a/drivers/net/ixgbe/ixgbe_common.c b/drivers/net/ixgbe/ixgbe_common.c
index cc2a4a1..777051f 100644
--- a/drivers/net/ixgbe/ixgbe_common.c
+++ b/drivers/net/ixgbe/ixgbe_common.c
@@ -3333,3 +3333,177 @@ void ixgbe_set_rxpba_generic(struct ixgbe_hw *hw,
 		IXGBE_WRITE_REG(hw, IXGBE_TXPBTHRESH(i), 0);
 	}
 }
+
+/**
+ *  ixgbe_calculate_checksum - Calculate checksum for buffer
+ *  @buffer: pointer to EEPROM
+ *  @length: size of EEPROM to calculate a checksum for
+ *  Calculates the checksum for some buffer on a specified length.  The
+ *  checksum calculated is returned.
+ **/
+static u8 ixgbe_calculate_checksum(u8 *buffer, u32 length)
+{
+	u32 i;
+	u8 sum = 0;
+
+	if (!buffer)
+		return 0;
+
+	for (i = 0; i < length; i++)
+		sum += buffer[i];
+
+	return (u8) (0 - sum);
+}
+
+/**
+ *  ixgbe_host_interface_command - Issue command to manageability block
+ *  @hw: pointer to the HW structure
+ *  @buffer: contains the command to write and where the return status will
+ *           be placed
+ *  @lenght: lenght of buffer, must be multiple of 4 bytes
+ *
+ *  Communicates with the manageability block.  On success return 0
+ *  else return IXGBE_ERR_HOST_INTERFACE_COMMAND.
+ **/
+static s32 ixgbe_host_interface_command(struct ixgbe_hw *hw, u8 *buffer,
+					u32 length)
+{
+	u32 hicr, i;
+	u32 hdr_size = sizeof(struct ixgbe_hic_hdr);
+	u8 buf_len, dword_len;
+
+	s32 ret_val = 0;
+
+	if (length == 0 || length & 0x3 ||
+	    length > IXGBE_HI_MAX_BLOCK_BYTE_LENGTH) {
+		hw_dbg(hw, "Buffer length failure.\n");
+		ret_val = IXGBE_ERR_HOST_INTERFACE_COMMAND;
+		goto out;
+	}
+
+	/* Check that the host interface is enabled. */
+	hicr = IXGBE_READ_REG(hw, IXGBE_HICR);
+	if ((hicr & IXGBE_HICR_EN) == 0) {
+		hw_dbg(hw, "IXGBE_HOST_EN bit disabled.\n");
+		ret_val = IXGBE_ERR_HOST_INTERFACE_COMMAND;
+		goto out;
+	}
+
+	/* Calculate length in DWORDs */
+	dword_len = length >> 2;
+
+	/*
+	 * The device driver writes the relevant command block
+	 * into the ram area.
+	 */
+	for (i = 0; i < dword_len; i++)
+		IXGBE_WRITE_REG_ARRAY(hw, IXGBE_FLEX_MNG,
+				      i, *((u32 *)buffer + i));
+
+	/* Setting this bit tells the ARC that a new command is pending. */
+	IXGBE_WRITE_REG(hw, IXGBE_HICR, hicr | IXGBE_HICR_C);
+
+	for (i = 0; i < IXGBE_HI_COMMAND_TIMEOUT; i++) {
+		hicr = IXGBE_READ_REG(hw, IXGBE_HICR);
+		if (!(hicr & IXGBE_HICR_C))
+			break;
+		usleep_range(1000, 2000);
+	}
+
+	/* Check command successful completion. */
+	if (i == IXGBE_HI_COMMAND_TIMEOUT ||
+	    (!(IXGBE_READ_REG(hw, IXGBE_HICR) & IXGBE_HICR_SV))) {
+		hw_dbg(hw, "Command has failed with no status valid.\n");
+		ret_val = IXGBE_ERR_HOST_INTERFACE_COMMAND;
+		goto out;
+	}
+
+	/* Calculate length in DWORDs */
+	dword_len = hdr_size >> 2;
+
+	/* first pull in the header so we know the buffer length */
+	for (i = 0; i < dword_len; i++)
+		*((u32 *)buffer + i) =
+			IXGBE_READ_REG_ARRAY(hw, IXGBE_FLEX_MNG, i);
+
+	/* If there is any thing in data position pull it in */
+	buf_len = ((struct ixgbe_hic_hdr *)buffer)->buf_len;
+	if (buf_len == 0)
+		goto out;
+
+	if (length < (buf_len + hdr_size)) {
+		hw_dbg(hw, "Buffer not large enough for reply message.\n");
+		ret_val = IXGBE_ERR_HOST_INTERFACE_COMMAND;
+		goto out;
+	}
+
+	/* Calculate length in DWORDs, add one for odd lengths */
+	dword_len = (buf_len + 1) >> 2;
+
+	/* Pull in the rest of the buffer (i is where we left off)*/
+	for (; i < buf_len; i++)
+		*((u32 *)buffer + i) =
+			IXGBE_READ_REG_ARRAY(hw, IXGBE_FLEX_MNG, i);
+
+out:
+	return ret_val;
+}
+
+/**
+ *  ixgbe_set_fw_drv_ver_generic - Sends driver version to firmware
+ *  @hw: pointer to the HW structure
+ *  @maj: driver version major number
+ *  @min: driver version minor number
+ *  @build: driver version build number
+ *  @sub: driver version sub build number
+ *
+ *  Sends driver version number to firmware through the manageability
+ *  block.  On success return 0
+ *  else returns IXGBE_ERR_SWFW_SYNC when encountering an error acquiring
+ *  semaphore or IXGBE_ERR_HOST_INTERFACE_COMMAND when command fails.
+ **/
+s32 ixgbe_set_fw_drv_ver_generic(struct ixgbe_hw *hw, u8 maj, u8 min,
+				 u8 build, u8 sub)
+{
+	struct ixgbe_hic_drv_info fw_cmd;
+	int i;
+	s32 ret_val = 0;
+
+	if (hw->mac.ops.acquire_swfw_sync(hw, IXGBE_GSSR_SW_MNG_SM) != 0) {
+		ret_val = IXGBE_ERR_SWFW_SYNC;
+		goto out;
+	}
+
+	fw_cmd.hdr.cmd = FW_CEM_CMD_DRIVER_INFO;
+	fw_cmd.hdr.buf_len = FW_CEM_CMD_DRIVER_INFO_LEN;
+	fw_cmd.hdr.cmd_or_resp.cmd_resv = FW_CEM_CMD_RESERVED;
+	fw_cmd.port_num = (u8)hw->bus.func;
+	fw_cmd.ver_maj = maj;
+	fw_cmd.ver_min = min;
+	fw_cmd.ver_build = build;
+	fw_cmd.ver_sub = sub;
+	fw_cmd.hdr.checksum = 0;
+	fw_cmd.hdr.checksum = ixgbe_calculate_checksum((u8 *)&fw_cmd,
+				(FW_CEM_HDR_LEN + fw_cmd.hdr.buf_len));
+	fw_cmd.pad = 0;
+	fw_cmd.pad2 = 0;
+
+	for (i = 0; i <= FW_CEM_MAX_RETRIES; i++) {
+		ret_val = ixgbe_host_interface_command(hw, (u8 *)&fw_cmd,
+						       sizeof(fw_cmd));
+		if (ret_val != 0)
+			continue;
+
+		if (fw_cmd.hdr.cmd_or_resp.ret_status ==
+		    FW_CEM_RESP_STATUS_SUCCESS)
+			ret_val = 0;
+		else
+			ret_val = IXGBE_ERR_HOST_INTERFACE_COMMAND;
+
+		break;
+	}
+
+	hw->mac.ops.release_swfw_sync(hw, IXGBE_GSSR_SW_MNG_SM);
+out:
+	return ret_val;
+}
diff --git a/drivers/net/ixgbe/ixgbe_common.h b/drivers/net/ixgbe/ixgbe_common.h
index 32a454f..f24fd64 100644
--- a/drivers/net/ixgbe/ixgbe_common.h
+++ b/drivers/net/ixgbe/ixgbe_common.h
@@ -99,6 +99,8 @@ s32 ixgbe_blink_led_stop_generic(struct ixgbe_hw *hw, u32 index);
 void ixgbe_set_mac_anti_spoofing(struct ixgbe_hw *hw, bool enable, int pf);
 void ixgbe_set_vlan_anti_spoofing(struct ixgbe_hw *hw, bool enable, int vf);
 s32 ixgbe_get_device_caps_generic(struct ixgbe_hw *hw, u16 *device_caps);
+s32 ixgbe_set_fw_drv_ver_generic(struct ixgbe_hw *hw, u8 maj, u8 min,
+				 u8 build, u8 ver);
 
 void ixgbe_set_rxpba_generic(struct ixgbe_hw *hw, int num_pb,
 			     u32 headroom, int strategy);
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 295ab6d..4cd66ae 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -7674,6 +7674,10 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 			ixgbe_vf_configuration(pdev, (i | 0x10000000));
 	}
 
+	/* Inform firmware of driver version */
+	if (hw->mac.ops.set_fw_drv_ver)
+		hw->mac.ops.set_fw_drv_ver(hw, MAJ, MIN, BUILD, KFIX);
+
 	/* add san mac addr to netdev */
 	ixgbe_add_sanmac_netdev(netdev);
 
diff --git a/drivers/net/ixgbe/ixgbe_type.h b/drivers/net/ixgbe/ixgbe_type.h
index 5455064..9a499a6 100644
--- a/drivers/net/ixgbe/ixgbe_type.h
+++ b/drivers/net/ixgbe/ixgbe_type.h
@@ -707,6 +707,13 @@
 #define IXGBE_HFDR      0x15FE8
 #define IXGBE_FLEX_MNG  0x15800 /* 0x15800 - 0x15EFC */
 
+#define IXGBE_HICR_EN              0x01  /* Enable bit - RO */
+/* Driver sets this bit when done to put command in RAM */
+#define IXGBE_HICR_C               0x02
+#define IXGBE_HICR_SV              0x04  /* Status Validity */
+#define IXGBE_HICR_FW_RESET_ENABLE 0x40
+#define IXGBE_HICR_FW_RESET        0x80
+
 /* PCI-E registers */
 #define IXGBE_GCR       0x11000
 #define IXGBE_GTV       0x11004
@@ -2124,6 +2131,41 @@ enum ixgbe_fdir_pballoc_type {
 #define IXGBE_FDIR_INIT_DONE_POLL               10
 #define IXGBE_FDIRCMD_CMD_POLL                  10
 
+/* Manageablility Host Interface defines */
+#define IXGBE_HI_MAX_BLOCK_BYTE_LENGTH       1792 /* Num of bytes in range */
+#define IXGBE_HI_MAX_BLOCK_DWORD_LENGTH      448 /* Num of dwords in range */
+#define IXGBE_HI_COMMAND_TIMEOUT             500 /* Process HI command limit */
+
+/* CEM Support */
+#define FW_CEM_HDR_LEN                0x4
+#define FW_CEM_CMD_DRIVER_INFO        0xDD
+#define FW_CEM_CMD_DRIVER_INFO_LEN    0x5
+#define FW_CEM_CMD_RESERVED           0X0
+#define FW_CEM_MAX_RETRIES            3
+#define FW_CEM_RESP_STATUS_SUCCESS    0x1
+
+/* Host Interface Command Structures */
+struct ixgbe_hic_hdr {
+	u8 cmd;
+	u8 buf_len;
+	union {
+		u8 cmd_resv;
+		u8 ret_status;
+	} cmd_or_resp;
+	u8 checksum;
+};
+
+struct ixgbe_hic_drv_info {
+	struct ixgbe_hic_hdr hdr;
+	u8 port_num;
+	u8 ver_sub;
+	u8 ver_build;
+	u8 ver_min;
+	u8 ver_maj;
+	u8 pad; /* end spacing to ensure length is mult. of dword */
+	u16 pad2; /* end spacing to ensure length is mult. of dword2 */
+};
+
 /* Transmit Descriptor - Advanced */
 union ixgbe_adv_tx_desc {
 	struct {
@@ -2663,6 +2705,9 @@ struct ixgbe_mac_operations {
 
 	/* Flow Control */
 	s32 (*fc_enable)(struct ixgbe_hw *, s32);
+
+	/* Manageability interface */
+	s32 (*set_fw_drv_ver)(struct ixgbe_hw *, u8, u8, u8, u8);
 };
 
 struct ixgbe_phy_operations {
@@ -2832,6 +2877,7 @@ struct ixgbe_info {
 #define IXGBE_ERR_SFP_SETUP_NOT_COMPLETE        -30
 #define IXGBE_ERR_PBA_SECTION                   -31
 #define IXGBE_ERR_INVALID_ARGUMENT              -32
+#define IXGBE_ERR_HOST_INTERFACE_COMMAND        -33
 #define IXGBE_NOT_IMPLEMENTED                   0x7FFFFFFF
 
 #endif /* _IXGBE_TYPE_H_ */
diff --git a/drivers/net/ixgbe/ixgbe_x540.c b/drivers/net/ixgbe/ixgbe_x540.c
index fa566ed..bec30ed 100644
--- a/drivers/net/ixgbe/ixgbe_x540.c
+++ b/drivers/net/ixgbe/ixgbe_x540.c
@@ -894,6 +894,7 @@ static struct ixgbe_mac_operations mac_ops_X540 = {
 	.clear_vfta             = &ixgbe_clear_vfta_generic,
 	.set_vfta               = &ixgbe_set_vfta_generic,
 	.fc_enable              = &ixgbe_fc_enable_generic,
+	.set_fw_drv_ver         = &ixgbe_set_fw_drv_ver_generic,
 	.init_uta_tables        = &ixgbe_init_uta_tables_generic,
 	.setup_sfp              = NULL,
 	.set_mac_anti_spoofing  = &ixgbe_set_mac_anti_spoofing,
-- 
1.7.5.2


^ permalink raw reply related

* [net-next 12/13] ixgbe: setup per CPU PCI pool for FCoE DDP
From: Jeff Kirsher @ 2011-06-11  3:02 UTC (permalink / raw)
  To: davem; +Cc: Vasu Dev, netdev, gospo, Jeff Kirsher
In-Reply-To: <1307761341-5267-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Vasu Dev <vasu.dev@intel.com>

Currently single PCI pool used across all CPUs and that
doesn't scales up as number of CPU increases, so this
patch adds per CPU PCI pool to setup udl and that aligns
well from FCoE stack as that already has per CPU exch locking.

Adds per CPU PCI alloc setup and free in
ixgbe_fcoe_ddp_pools_alloc and ixgbe_fcoe_ddp_pools_free,
use CPU specific pool during DDP setup.

Re-arranged ixgbe_fcoe struct to have fewer holes
along with adding pools ptr using pahole.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ixgbe/ixgbe_fcoe.c |  105 ++++++++++++++++++++++++++++-----------
 drivers/net/ixgbe/ixgbe_fcoe.h |   13 +++--
 2 files changed, 82 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_fcoe.c b/drivers/net/ixgbe/ixgbe_fcoe.c
index 0592072..f5f39ed 100644
--- a/drivers/net/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ixgbe/ixgbe_fcoe.c
@@ -128,7 +128,11 @@ int ixgbe_fcoe_ddp_put(struct net_device *netdev, u16 xid)
 	if (ddp->sgl)
 		pci_unmap_sg(adapter->pdev, ddp->sgl, ddp->sgc,
 			     DMA_FROM_DEVICE);
-	pci_pool_free(fcoe->pool, ddp->udl, ddp->udp);
+	if (ddp->pool) {
+		pci_pool_free(ddp->pool, ddp->udl, ddp->udp);
+		ddp->pool = NULL;
+	}
+
 	ixgbe_fcoe_clear_ddp(ddp);
 
 out_ddp_put:
@@ -163,6 +167,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 	unsigned int thislen = 0;
 	u32 fcbuff, fcdmarw, fcfltrw, fcrxctl;
 	dma_addr_t addr = 0;
+	struct pci_pool *pool;
 
 	if (!netdev || !sgl)
 		return 0;
@@ -199,12 +204,14 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 		return 0;
 	}
 
-	/* alloc the udl from our ddp pool */
-	ddp->udl = pci_pool_alloc(fcoe->pool, GFP_ATOMIC, &ddp->udp);
+	/* alloc the udl from per cpu ddp pool */
+	pool = *per_cpu_ptr(fcoe->pool, get_cpu());
+	ddp->udl = pci_pool_alloc(pool, GFP_ATOMIC, &ddp->udp);
 	if (!ddp->udl) {
 		e_err(drv, "failed allocated ddp context\n");
 		goto out_noddp_unmap;
 	}
+	ddp->pool = pool;
 	ddp->sgl = sgl;
 	ddp->sgc = sgc;
 
@@ -268,6 +275,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 		j++;
 		lastsize = 1;
 	}
+	put_cpu();
 
 	fcbuff = (IXGBE_FCBUFF_4KB << IXGBE_FCBUFF_BUFFSIZE_SHIFT);
 	fcbuff |= ((j & 0xff) << IXGBE_FCBUFF_BUFFCNT_SHIFT);
@@ -311,11 +319,12 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
 	return 1;
 
 out_noddp_free:
-	pci_pool_free(fcoe->pool, ddp->udl, ddp->udp);
+	pci_pool_free(pool, ddp->udl, ddp->udp);
 	ixgbe_fcoe_clear_ddp(ddp);
 
 out_noddp_unmap:
 	pci_unmap_sg(adapter->pdev, sgl, sgc, DMA_FROM_DEVICE);
+	put_cpu();
 	return 0;
 }
 
@@ -585,6 +594,46 @@ int ixgbe_fso(struct ixgbe_adapter *adapter,
 	return skb_is_gso(skb);
 }
 
+static void ixgbe_fcoe_ddp_pools_free(struct ixgbe_fcoe *fcoe)
+{
+	unsigned int cpu;
+	struct pci_pool **pool;
+
+	for_each_possible_cpu(cpu) {
+		pool = per_cpu_ptr(fcoe->pool, cpu);
+		if (*pool)
+			pci_pool_destroy(*pool);
+	}
+	free_percpu(fcoe->pool);
+	fcoe->pool = NULL;
+}
+
+static void ixgbe_fcoe_ddp_pools_alloc(struct ixgbe_adapter *adapter)
+{
+	struct ixgbe_fcoe *fcoe = &adapter->fcoe;
+	unsigned int cpu;
+	struct pci_pool **pool;
+	char pool_name[32];
+
+	fcoe->pool = alloc_percpu(struct pci_pool *);
+	if (!fcoe->pool)
+		return;
+
+	/* allocate pci pool for each cpu */
+	for_each_possible_cpu(cpu) {
+		snprintf(pool_name, 32, "ixgbe_fcoe_ddp_%d", cpu);
+		pool = per_cpu_ptr(fcoe->pool, cpu);
+		*pool = pci_pool_create(pool_name,
+					adapter->pdev, IXGBE_FCPTR_MAX,
+					IXGBE_FCPTR_ALIGN, PAGE_SIZE);
+		if (!*pool) {
+			e_err(drv, "failed to alloc DDP pool on cpu:%d\n", cpu);
+			ixgbe_fcoe_ddp_pools_free(fcoe);
+			return;
+		}
+	}
+}
+
 /**
  * ixgbe_configure_fcoe - configures registers for fcoe at start
  * @adapter: ptr to ixgbe adapter
@@ -604,22 +653,20 @@ void ixgbe_configure_fcoe(struct ixgbe_adapter *adapter)
 	u32 up2tc;
 #endif
 
-	/* create the pool for ddp if not created yet */
 	if (!fcoe->pool) {
-		/* allocate ddp pool */
-		fcoe->pool = pci_pool_create("ixgbe_fcoe_ddp",
-					     adapter->pdev, IXGBE_FCPTR_MAX,
-					     IXGBE_FCPTR_ALIGN, PAGE_SIZE);
-		if (!fcoe->pool)
-			e_err(drv, "failed to allocated FCoE DDP pool\n");
-
 		spin_lock_init(&fcoe->lock);
 
+		ixgbe_fcoe_ddp_pools_alloc(adapter);
+		if (!fcoe->pool) {
+			e_err(drv, "failed to alloc percpu fcoe DDP pools\n");
+			return;
+		}
+
 		/* Extra buffer to be shared by all DDPs for HW work around */
 		fcoe->extra_ddp_buffer = kmalloc(IXGBE_FCBUFF_MIN, GFP_ATOMIC);
 		if (fcoe->extra_ddp_buffer == NULL) {
 			e_err(drv, "failed to allocated extra DDP buffer\n");
-			goto out_extra_ddp_buffer_alloc;
+			goto out_ddp_pools;
 		}
 
 		fcoe->extra_ddp_buffer_dma =
@@ -630,7 +677,7 @@ void ixgbe_configure_fcoe(struct ixgbe_adapter *adapter)
 		if (dma_mapping_error(&adapter->pdev->dev,
 				      fcoe->extra_ddp_buffer_dma)) {
 			e_err(drv, "failed to map extra DDP buffer\n");
-			goto out_extra_ddp_buffer_dma;
+			goto out_extra_ddp_buffer;
 		}
 	}
 
@@ -684,11 +731,10 @@ void ixgbe_configure_fcoe(struct ixgbe_adapter *adapter)
 
 	return;
 
-out_extra_ddp_buffer_dma:
+out_extra_ddp_buffer:
 	kfree(fcoe->extra_ddp_buffer);
-out_extra_ddp_buffer_alloc:
-	pci_pool_destroy(fcoe->pool);
-	fcoe->pool = NULL;
+out_ddp_pools:
+	ixgbe_fcoe_ddp_pools_free(fcoe);
 }
 
 /**
@@ -704,18 +750,17 @@ void ixgbe_cleanup_fcoe(struct ixgbe_adapter *adapter)
 	int i;
 	struct ixgbe_fcoe *fcoe = &adapter->fcoe;
 
-	/* release ddp resource */
-	if (fcoe->pool) {
-		for (i = 0; i < IXGBE_FCOE_DDP_MAX; i++)
-			ixgbe_fcoe_ddp_put(adapter->netdev, i);
-		dma_unmap_single(&adapter->pdev->dev,
-				 fcoe->extra_ddp_buffer_dma,
-				 IXGBE_FCBUFF_MIN,
-				 DMA_FROM_DEVICE);
-		kfree(fcoe->extra_ddp_buffer);
-		pci_pool_destroy(fcoe->pool);
-		fcoe->pool = NULL;
-	}
+	if (!fcoe->pool)
+		return;
+
+	for (i = 0; i < IXGBE_FCOE_DDP_MAX; i++)
+		ixgbe_fcoe_ddp_put(adapter->netdev, i);
+	dma_unmap_single(&adapter->pdev->dev,
+			 fcoe->extra_ddp_buffer_dma,
+			 IXGBE_FCBUFF_MIN,
+			 DMA_FROM_DEVICE);
+	kfree(fcoe->extra_ddp_buffer);
+	ixgbe_fcoe_ddp_pools_free(fcoe);
 }
 
 /**
diff --git a/drivers/net/ixgbe/ixgbe_fcoe.h b/drivers/net/ixgbe/ixgbe_fcoe.h
index 5a650a4..d876e7a 100644
--- a/drivers/net/ixgbe/ixgbe_fcoe.h
+++ b/drivers/net/ixgbe/ixgbe_fcoe.h
@@ -62,20 +62,21 @@ struct ixgbe_fcoe_ddp {
 	struct scatterlist *sgl;
 	dma_addr_t udp;
 	u64 *udl;
+	struct pci_pool *pool;
 };
 
 struct ixgbe_fcoe {
-#ifdef CONFIG_IXGBE_DCB
-	u8 tc;
-	u8 up;
-#endif
-	unsigned long mode;
+	struct pci_pool **pool;
 	atomic_t refcnt;
 	spinlock_t lock;
-	struct pci_pool *pool;
 	struct ixgbe_fcoe_ddp ddp[IXGBE_FCOE_DDP_MAX];
 	unsigned char *extra_ddp_buffer;
 	dma_addr_t extra_ddp_buffer_dma;
+	unsigned long mode;
+#ifdef CONFIG_IXGBE_DCB
+	u8 tc;
+	u8 up;
+#endif
 };
 
 #endif /* _IXGBE_FCOE_H */
-- 
1.7.5.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox