Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Devel] Re: [PATCH v5 00/10] per-cgroup tcp memory pressure
From: James Bottomley @ 2011-11-15 18:27 UTC (permalink / raw)
  To: davem@davemloft.net, eric.dumazet@gmail.com
  Cc: Glauber Costa, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, paul@paulmenage.org, lizf@cn.fujitsu.com,
	linux-mm@kvack.org, devel@openvz.org, kirill@shutemov.name,
	gthelen@google.com, kamezawa.hiroyu@jp.fujitsu.com
In-Reply-To: <4EBAC04F.1010901@parallels.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2029 bytes --]

On Wed, 2011-11-09 at 16:02 -0200, Glauber Costa wrote:
> On 11/07/2011 01:26 PM, Glauber Costa wrote:
> > Hi all,
> >
> > This is my new attempt at implementing per-cgroup tcp memory pressure.
> > I am particularly interested in what the network folks have to comment on
> > it: my main goal is to achieve the least impact possible in the network code.
> >
> > Here's a brief description of my approach:
> >
> > When only the root cgroup is present, the code should behave the same way as
> > before - with the exception of the inclusion of an extra field in struct sock,
> > and one in struct proto. All tests are patched out with static branch, and we
> > still access addresses directly - the same as we did before.
> >
> > When a cgroup other than root is created, we patch in the branches, and account
> > resources for that cgroup. The variables in the root cgroup are still updated.
> > If we were to try to be 100 % coherent with the memcg code, that should depend
> > on use_hierarchy. However, I feel that this is a good compromise in terms of
> > leaving the network code untouched, and still having a global vision of its
> > resources. I also do not compute max_usage for the root cgroup, for a similar
> > reason.
> >
> > Please let me know what you think of it.
> 
> Dave, Eric,
> 
> Can you let me know what you think of the general approach I've followed 
> in this series? The impact on the common case should be minimal, or at 
> least as expensive as a static branch (0 in most arches, I believe).
> 
> I am mostly interested in knowing if this a valid pursue path. I'll be 
> happy to address any specific concerns you have once you're ok with the 
> general approach.

Ping on this, please.  We're blocked on this patch set until we can get
an ack that the approach is acceptable to network people.

Thanks,

James

N‹§²æìr¸›zÇ§u©ž²Æ {\b†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒñb‚^[nö¢®×¥yÊ&Š{^®wr\x16«ë"œ&§iÖ¬Š	á¶Ú\x7fþËh¦Ø^™ë^Æ¿\x0e‰ízf¢•¨ky

^ permalink raw reply

* [RFT] bridge: checksum not updated after pull
From: Stephen Hemminger @ 2011-11-15 18:09 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: Martin Volf, bridge@lists.linux-foundation.org,
	netdev@vger.kernel.org, davem@davemloft.net, wcang@sfc.wide.ad.jp
In-Reply-To: <4EC23EE7.2010606@intel.com>

I think this is what is necessary, please test.

Subject: bridge: correct IPv6 checksum after pull

Bridge multicast snooping of ICMPv6 would incorrectly report a checksum problem
when used with Ethernet devices like sky2 that use CHECKSUM_COMPLETE.
When bytes are removed from skb, the computed checksum needs to be adjusted.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/bridge/br_multicast.c	2011-11-09 13:55:00.028012483 -0800
+++ b/net/bridge/br_multicast.c	2011-11-15 10:05:06.171314194 -0800
@@ -1501,7 +1501,9 @@ static int br_multicast_ipv6_rcv(struct
 
 	__skb_pull(skb2, offset);
 	skb_reset_transport_header(skb2);
-
+	skb_postpull_rcsum(skb2, skb_network_header(skb2),
+			   skb_network_header_len(skb2));
+
 	icmp6_type = icmp6_hdr(skb2)->icmp6_type;
 
 	switch (icmp6_type) {

^ permalink raw reply

* Re: bnx2 cards intermittantly going offline
From: Ken @ 2011-11-15 17:41 UTC (permalink / raw)
  To: netdev
In-Reply-To: <6DD3782C33561D44B47071B09946026405F63853AB@exchange1>

+1 with identical L2 components and symptoms.

^ permalink raw reply

* Re: sky2 hw csum failure
From: Stephen Hemminger @ 2011-11-15 17:45 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: Martin Volf, shemminger@linux-foundation.org,
	bridge@lists.linux-foundation.org, netdev@vger.kernel.org,
	davem@davemloft.net, wcang@sfc.wide.ad.jp
In-Reply-To: <4EC23EE7.2010606@intel.com>

On Tue, 15 Nov 2011 18:28:55 +0800
"Yan, Zheng" <zheng.z.yan@intel.com> wrote:

> I re-tested the checksum code, both CHECKSUM_NONE and CHECKSUM_COMPLETE
> cases are OK. Maybe the bug is related to sky2.
> 
> Regards
> Yan, Zheng

There are three types of receive checksumming:
  1. Hardware does not do checksumming (CHECKSUM_NONE)
  2. Hardware validates checksum (CHECKSUM_UNNECESSARY)
  3. Hardware computes sum of bytes in skb (CHECKSUM_COMPLETE)

Most hardware does #2, but sky2 uses #3.
For the second case, the hardware does not look at headers but only
reports the one's complement value in ip_summed. It is up to the
protocol layers to adjust accordingly.  This means if data is removed
or added the checksum needs to be adjusted.

^ permalink raw reply

* Re: [PATCH V2] vlan:return error when real dev is enslaved
From: Ben Hutchings @ 2011-11-15 17:34 UTC (permalink / raw)
  To: Weiping Pan
  Cc: Patrick McHardy, David S. Miller, open list:VLAN (802.1Q),
	open list
In-Reply-To: <d7ea491a500c99c0b4839ddcedab027a3c865c59.1321360959.git.wpan@redhat.com>

On Tue, 2011-11-15 at 20:44 +0800, Weiping Pan wrote:
> Qinhuibin reported a kernel panic when he do some operation about vlan.
> https://lkml.org/lkml/2011/11/6/218
> 
> The operation is as below:
> ifconfig eth2 up
> modprobe bonding
> modprobe 8021q
> ifconfig bond0 up
> ifenslave bond0 eth2
> vconfig add eth2 3300
> vconfig add bond0 33
> vconfig rem eth2.3300
>
> the panic stack is as below:
> [<ffffffffa002f1c9>] panic_event+0x49/0x70 [ipmi_msghandler]
> [<ffffffff80378917>] notifier_call_chain+0x37/0x70
> [<ffffffff80372122>] panic+0xa2/0x195
> [<ffffffff80376ed8>] oops_end+0xd8/0x140
> [<ffffffff8001bea7>] no_context+0xf7/0x280
> [<ffffffff8001c1a5>] __bad_area_nosemaphore+0x175/0x250
> [<ffffffff80376318>] page_fault+0x28/0x30
> [<ffffffffa039dabd>] igb_vlan_rx_kill_vid+0x4d/0x100 [igb]
> [<ffffffffa044045f>] bond_vlan_rx_kill_vid+0x9f/0x290 [bonding]
> [<ffffffffa047e636>] unregister_vlan_dev+0x136/0x180 [8021q]
> [<ffffffffa047ed20>] vlan_ioctl_handler+0x170/0x3f0 [8021q]
> [<ffffffff802c1d3f>] sock_ioctl+0x21f/0x280
> [<ffffffff800e6d7f>] vfs_ioctl+0x2f/0xb0
> [<ffffffff800e726b>] do_vfs_ioctl+0x3cb/0x5a0
> [<ffffffff800e74e1>] sys_ioctl+0xa1/0xb0
> [<ffffffff80007388>] system_call_fastpath+0x16/0x1b
> [<00007f108a2b8bd7>] 0x7f108a2b8bd7
> And the nic is as below:
> [root@localhost ~]# ethtool -i eth2
> driver: igb
> version: 3.0.6-k2
> firmware-version: 1.2-1
> bus-info: 0000:04:00.0
> kernel version：
> 2.6.32.12-0.7 also happen in 2.6.32-131
> 
> For kernel 2.6.32, the reason of this bug is that when we do "vconfig add bond0 33",
> adapter->vlgrp is overwritten in igb_vlan_rx_register. So when we do "vconfig rem
> eth2.3300", it can't find the correct vlgrp.
> 
> And this bug is avoided by vlan cleanup patchset from Jiri Pirko
> <jpirko@redhat.com>, especially commit b2cb09b1a772(igb: do vlan cleanup).

Since this won't be applied to mainline first, you should send it
directly to stable@vger.kernel.org as well as to netdev.

> But it is not a correct operation to creat a vlan interface on eth2
> when it have been enslaved by bond0, so this patch is to return error
> when the real dev is already enslaved.
> 
> Changelog:
> V2: use pr_err instead of pr_info
> 
> Signed-off-by: Weiping Pan <wpan@redhat.com>
> ---
>  net/8021q/vlan.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 5471628..7ce50ba 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -148,6 +148,11 @@ int vlan_check_real_dev(struct net_device *real_dev, u16 vlan_id)
>  	const char *name = real_dev->name;
>  	const struct net_device_ops *ops = real_dev->netdev_ops;
>  
> +	if (real_dev->flags & IFF_SLAVE) {
> +		pr_err("Error, %s was already enslaved\n", name);
> +		return -EOPNOTSUPP;

I think the appropriate error code is EBUSY.  The operation is supported
(probably - we haven't checked for VLAN_CHALLENGED yet) but the device
is otherwise occupied.

Ben.

> +	}
> +
>  	if (real_dev->features & NETIF_F_VLAN_CHALLENGED) {
>  		pr_info("VLANs not supported on %s\n", name);
>  		return -EOPNOTSUPP;

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH net-next] bnx2: switch to build_skb() infrastructure
From: Eric Dumazet @ 2011-11-15 17:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Michael Chan, Eilon Greenstein

This is very similar to bnx2x conversion, but bnx2 only requires 16bytes
alignement at start of the received frame to store its l2_fhdr, so goal
was not to reduce skb truesize (in fact it should not change after this
patch)

Using build_skb() reduces cache line misses in the driver, since we
use cache hot skb instead of cold ones. Number of in-flight sk_buff
structures is lower, they are more likely recycled in SLUB caches
while still hot.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Eilon Greenstein <eilong@broadcom.com>
---
Tested with SLUB/SLAB/SLOB on my dev machine
 drivers/net/ethernet/broadcom/bnx2.c |  137 ++++++++++++-------------
 drivers/net/ethernet/broadcom/bnx2.h |   17 ++-
 2 files changed, 85 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index 32d1f92..8556077 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -2734,31 +2734,27 @@ bnx2_free_rx_page(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index)
 }
 
 static inline int
-bnx2_alloc_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
+bnx2_alloc_rx_data(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u16 index, gfp_t gfp)
 {
-	struct sk_buff *skb;
+	u8 *data;
 	struct sw_bd *rx_buf = &rxr->rx_buf_ring[index];
 	dma_addr_t mapping;
 	struct rx_bd *rxbd = &rxr->rx_desc_ring[RX_RING(index)][RX_IDX(index)];
-	unsigned long align;
 
-	skb = __netdev_alloc_skb(bp->dev, bp->rx_buf_size, gfp);
-	if (skb == NULL) {
+	data = kmalloc(bp->rx_buf_size, gfp);
+	if (!data)
 		return -ENOMEM;
-	}
 
-	if (unlikely((align = (unsigned long) skb->data & (BNX2_RX_ALIGN - 1))))
-		skb_reserve(skb, BNX2_RX_ALIGN - align);
-
-	mapping = dma_map_single(&bp->pdev->dev, skb->data, bp->rx_buf_use_size,
+	mapping = dma_map_single(&bp->pdev->dev,
+				 get_l2_fhdr(data),
+				 bp->rx_buf_use_size,
 				 PCI_DMA_FROMDEVICE);
 	if (dma_mapping_error(&bp->pdev->dev, mapping)) {
-		dev_kfree_skb(skb);
+		kfree(data);
 		return -EIO;
 	}
 
-	rx_buf->skb = skb;
-	rx_buf->desc = (struct l2_fhdr *) skb->data;
+	rx_buf->data = data;
 	dma_unmap_addr_set(rx_buf, mapping, mapping);
 
 	rxbd->rx_bd_haddr_hi = (u64) mapping >> 32;
@@ -2965,8 +2961,8 @@ bnx2_reuse_rx_skb_pages(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
 }
 
 static inline void
-bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
-		  struct sk_buff *skb, u16 cons, u16 prod)
+bnx2_reuse_rx_data(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
+		   u8 *data, u16 cons, u16 prod)
 {
 	struct sw_bd *cons_rx_buf, *prod_rx_buf;
 	struct rx_bd *cons_bd, *prod_bd;
@@ -2980,8 +2976,7 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
 
 	rxr->rx_prod_bseq += bp->rx_buf_use_size;
 
-	prod_rx_buf->skb = skb;
-	prod_rx_buf->desc = (struct l2_fhdr *) skb->data;
+	prod_rx_buf->data = data;
 
 	if (cons == prod)
 		return;
@@ -2995,33 +2990,39 @@ bnx2_reuse_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr,
 	prod_bd->rx_bd_haddr_lo = cons_bd->rx_bd_haddr_lo;
 }
 
-static int
-bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
+static struct sk_buff *
+bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u8 *data,
 	    unsigned int len, unsigned int hdr_len, dma_addr_t dma_addr,
 	    u32 ring_idx)
 {
 	int err;
 	u16 prod = ring_idx & 0xffff;
+	struct sk_buff *skb;
 
-	err = bnx2_alloc_rx_skb(bp, rxr, prod, GFP_ATOMIC);
+	err = bnx2_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnx2_reuse_rx_skb(bp, rxr, skb, (u16) (ring_idx >> 16), prod);
+		bnx2_reuse_rx_data(bp, rxr, data, (u16) (ring_idx >> 16), prod);
+error:
 		if (hdr_len) {
 			unsigned int raw_len = len + 4;
 			int pages = PAGE_ALIGN(raw_len - hdr_len) >> PAGE_SHIFT;
 
 			bnx2_reuse_rx_skb_pages(bp, rxr, NULL, pages);
 		}
-		return err;
+		return NULL;
 	}
 
-	skb_reserve(skb, BNX2_RX_OFFSET);
 	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
 			 PCI_DMA_FROMDEVICE);
-
+	skb = build_skb(data);
+	if (!skb) {
+		kfree(data);
+		goto error;
+	}
+	skb_reserve(skb, ((u8 *)get_l2_fhdr(data) - data) + BNX2_RX_OFFSET);
 	if (hdr_len == 0) {
 		skb_put(skb, len);
-		return 0;
+		return skb;
 	} else {
 		unsigned int i, frag_len, frag_size, pages;
 		struct sw_pg *rx_pg;
@@ -3052,7 +3053,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 					skb_frag_size_sub(frag, tail);
 					skb->data_len -= tail;
 				}
-				return 0;
+				return skb;
 			}
 			rx_pg = &rxr->rx_pg_ring[pg_cons];
 
@@ -3074,7 +3075,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 				rxr->rx_pg_prod = pg_prod;
 				bnx2_reuse_rx_skb_pages(bp, rxr, skb,
 							pages - i);
-				return err;
+				return NULL;
 			}
 
 			dma_unmap_page(&bp->pdev->dev, mapping_old,
@@ -3091,7 +3092,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, struct sk_buff *skb,
 		rxr->rx_pg_prod = pg_prod;
 		rxr->rx_pg_cons = pg_cons;
 	}
-	return 0;
+	return skb;
 }
 
 static inline u16
@@ -3130,19 +3131,17 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		struct sw_bd *rx_buf, *next_rx_buf;
 		struct sk_buff *skb;
 		dma_addr_t dma_addr;
+		u8 *data;
 
 		sw_ring_cons = RX_RING_IDX(sw_cons);
 		sw_ring_prod = RX_RING_IDX(sw_prod);
 
 		rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
-		skb = rx_buf->skb;
-		prefetchw(skb);
+		data = rx_buf->data;
+		rx_buf->data = NULL;
 
-		next_rx_buf =
-			&rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
-		prefetch(next_rx_buf->desc);
-
-		rx_buf->skb = NULL;
+		rx_hdr = get_l2_fhdr(data);
+		prefetch(rx_hdr);
 
 		dma_addr = dma_unmap_addr(rx_buf, mapping);
 
@@ -3150,7 +3149,10 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 			BNX2_RX_OFFSET + BNX2_RX_COPY_THRESH,
 			PCI_DMA_FROMDEVICE);
 
-		rx_hdr = rx_buf->desc;
+		next_rx_buf =
+			&rxr->rx_buf_ring[RX_RING_IDX(NEXT_RX_BD(sw_cons))];
+		prefetch(get_l2_fhdr(next_rx_buf->data));
+
 		len = rx_hdr->l2_fhdr_pkt_len;
 		status = rx_hdr->l2_fhdr_status;
 
@@ -3169,7 +3171,7 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 				       L2_FHDR_ERRORS_TOO_SHORT |
 				       L2_FHDR_ERRORS_GIANT_FRAME))) {
 
-			bnx2_reuse_rx_skb(bp, rxr, skb, sw_ring_cons,
+			bnx2_reuse_rx_data(bp, rxr, data, sw_ring_cons,
 					  sw_ring_prod);
 			if (pg_ring_used) {
 				int pages;
@@ -3184,30 +3186,29 @@ bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		len -= 4;
 
 		if (len <= bp->rx_copy_thresh) {
-			struct sk_buff *new_skb;
-
-			new_skb = netdev_alloc_skb(bp->dev, len + 6);
-			if (new_skb == NULL) {
-				bnx2_reuse_rx_skb(bp, rxr, skb, sw_ring_cons,
+			skb = netdev_alloc_skb(bp->dev, len + 6);
+			if (skb == NULL) {
+				bnx2_reuse_rx_data(bp, rxr, data, sw_ring_cons,
 						  sw_ring_prod);
 				goto next_rx;
 			}
 
 			/* aligned copy */
-			skb_copy_from_linear_data_offset(skb,
-							 BNX2_RX_OFFSET - 6,
-				      new_skb->data, len + 6);
-			skb_reserve(new_skb, 6);
-			skb_put(new_skb, len);
+			memcpy(skb->data,
+			       (u8 *)rx_hdr + BNX2_RX_OFFSET - 6,
+			       len + 6);
+			skb_reserve(skb, 6);
+			skb_put(skb, len);
 
-			bnx2_reuse_rx_skb(bp, rxr, skb,
+			bnx2_reuse_rx_data(bp, rxr, data,
 				sw_ring_cons, sw_ring_prod);
 
-			skb = new_skb;
-		} else if (unlikely(bnx2_rx_skb(bp, rxr, skb, len, hdr_len,
-			   dma_addr, (sw_ring_cons << 16) | sw_ring_prod)))
-			goto next_rx;
-
+		} else {
+			skb = bnx2_rx_skb(bp, rxr, data, len, hdr_len, dma_addr,
+					  (sw_ring_cons << 16) | sw_ring_prod);
+			if (!skb)
+				goto next_rx;
+		}
 		if ((status & L2_FHDR_STATUS_L2_VLAN_TAG) &&
 		    !(bp->rx_mode & BNX2_EMAC_RX_MODE_KEEP_VLAN_TAG))
 			__vlan_hwaccel_put_tag(skb, rx_hdr->l2_fhdr_vlan_tag);
@@ -5234,7 +5235,7 @@ bnx2_init_rx_ring(struct bnx2 *bp, int ring_num)
 
 	ring_prod = prod = rxr->rx_prod;
 	for (i = 0; i < bp->rx_ring_size; i++) {
-		if (bnx2_alloc_rx_skb(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
+		if (bnx2_alloc_rx_data(bp, rxr, ring_prod, GFP_KERNEL) < 0) {
 			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d skbs only\n",
 				    ring_num, i, bp->rx_ring_size);
 			break;
@@ -5329,7 +5330,7 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size)
 	rx_size = bp->dev->mtu + ETH_HLEN + BNX2_RX_OFFSET + 8;
 
 	rx_space = SKB_DATA_ALIGN(rx_size + BNX2_RX_ALIGN) + NET_SKB_PAD +
-		sizeof(struct skb_shared_info);
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	bp->rx_copy_thresh = BNX2_RX_COPY_THRESH;
 	bp->rx_pg_ring_size = 0;
@@ -5351,8 +5352,9 @@ bnx2_set_rx_ring_size(struct bnx2 *bp, u32 size)
 	}
 
 	bp->rx_buf_use_size = rx_size;
-	/* hw alignment */
-	bp->rx_buf_size = bp->rx_buf_use_size + BNX2_RX_ALIGN;
+	/* hw alignment + build_skb() overhead*/
+	bp->rx_buf_size = SKB_DATA_ALIGN(bp->rx_buf_use_size + BNX2_RX_ALIGN) +
+		NET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	bp->rx_jumbo_thresh = rx_size - BNX2_RX_OFFSET;
 	bp->rx_ring_size = size;
 	bp->rx_max_ring = bnx2_find_max_ring(size, MAX_RX_RINGS);
@@ -5418,9 +5420,9 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
 
 		for (j = 0; j < bp->rx_max_ring_idx; j++) {
 			struct sw_bd *rx_buf = &rxr->rx_buf_ring[j];
-			struct sk_buff *skb = rx_buf->skb;
+			u8 *data = rx_buf->data;
 
-			if (skb == NULL)
+			if (data == NULL)
 				continue;
 
 			dma_unmap_single(&bp->pdev->dev,
@@ -5428,9 +5430,9 @@ bnx2_free_rx_skbs(struct bnx2 *bp)
 					 bp->rx_buf_use_size,
 					 PCI_DMA_FROMDEVICE);
 
-			rx_buf->skb = NULL;
+			rx_buf->data = NULL;
 
-			dev_kfree_skb(skb);
+			kfree(data);
 		}
 		for (j = 0; j < bp->rx_max_pg_ring_idx; j++)
 			bnx2_free_rx_page(bp, rxr, j);
@@ -5736,7 +5738,8 @@ static int
 bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 {
 	unsigned int pkt_size, num_pkts, i;
-	struct sk_buff *skb, *rx_skb;
+	struct sk_buff *skb;
+	u8 *data;
 	unsigned char *packet;
 	u16 rx_start_idx, rx_idx;
 	dma_addr_t map;
@@ -5828,14 +5831,14 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 	}
 
 	rx_buf = &rxr->rx_buf_ring[rx_start_idx];
-	rx_skb = rx_buf->skb;
+	data = rx_buf->data;
 
-	rx_hdr = rx_buf->desc;
-	skb_reserve(rx_skb, BNX2_RX_OFFSET);
+	rx_hdr = get_l2_fhdr(data);
+	data = (u8 *)rx_hdr + BNX2_RX_OFFSET;
 
 	dma_sync_single_for_cpu(&bp->pdev->dev,
 		dma_unmap_addr(rx_buf, mapping),
-		bp->rx_buf_size, PCI_DMA_FROMDEVICE);
+		bp->rx_buf_use_size, PCI_DMA_FROMDEVICE);
 
 	if (rx_hdr->l2_fhdr_status &
 		(L2_FHDR_ERRORS_BAD_CRC |
@@ -5852,7 +5855,7 @@ bnx2_run_loopback(struct bnx2 *bp, int loopback_mode)
 	}
 
 	for (i = 14; i < pkt_size; i++) {
-		if (*(rx_skb->data + i) != (unsigned char) (i & 0xff)) {
+		if (*(data + i) != (unsigned char) (i & 0xff)) {
 			goto loopback_test_done;
 		}
 	}
diff --git a/drivers/net/ethernet/broadcom/bnx2.h b/drivers/net/ethernet/broadcom/bnx2.h
index 99d31a7..1db2d51 100644
--- a/drivers/net/ethernet/broadcom/bnx2.h
+++ b/drivers/net/ethernet/broadcom/bnx2.h
@@ -6563,12 +6563,25 @@ struct l2_fhdr {
 #define MB_TX_CID_ADDR	MB_GET_CID_ADDR(TX_CID)
 #define MB_RX_CID_ADDR	MB_GET_CID_ADDR(RX_CID)
 
+/*
+ * This driver uses new build_skb() API :
+ * RX ring buffer contains pointer to kmalloc() data only,
+ * skb are built only after Hardware filled the frame.
+ */
 struct sw_bd {
-	struct sk_buff		*skb;
-	struct l2_fhdr		*desc;
+	u8			*data;
 	DEFINE_DMA_UNMAP_ADDR(mapping);
 };
 
+/* Its faster to compute this from data than storing it in sw_bd
+ * (less cache misses)
+ */
+static inline struct l2_fhdr *get_l2_fhdr(u8 *data)
+{
+	return (struct l2_fhdr *)(PTR_ALIGN(data, BNX2_RX_ALIGN) + NET_SKB_PAD);
+}
+
+
 struct sw_pg {
 	struct page		*page;
 	DEFINE_DMA_UNMAP_ADDR(mapping);

^ permalink raw reply related

* Re: [patch net-next V8] net: introduce ethernet teaming device
From: Rick Jones @ 2011-11-15 17:22 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jiri Pirko, netdev, davem, eric.dumazet, bhutchings, shemminger,
	fubar, tgraf, ebiederm, mirqus, kaber, greearb, jesse, fbl,
	benjamin.poirier, jzupka, ivecera
In-Reply-To: <20111115015616.GA25132@gospo.rdu.redhat.com>

> On most modern systems I suspect there will be little to no difference
> between bonding RX peformance and team performance.
>
> If there is any now, I suspect team and bond performance to be similar
> by the time team has to account for the corner-cases bonding has already
> resolved.  :-)
>
> Benchmarks may prove otherwise, but I've yet to see Jiri produce
> anything.  My initial testing doesn't demonstrate any measureable
> differences with 1Gbps interfaces on a multi-core, multi-socket system.

I wouldn't expect much difference in terms of bandwidth, I was thinking 
the demonstration would be made in the area of service demand (CPU 
consumed per unit work) and perhaps aggregate packets per second.

happy benchmarking,

rick jones

^ permalink raw reply

* [PATCH 2/5] net-next:asix:poll in asix_get_phyid in case phy not ready
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
	Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>

From: Grant Grundler <grundler@google.com>

Sometimes the phy isn't ready after reset...poll and pray it will be soon.

Signed-off-by: Freddy Xin <freddy@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/usb/asix.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 873860d..b4675e8 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -652,9 +652,17 @@ static u32 asix_get_phyid(struct usbnet *dev)
 {
 	int phy_reg;
 	u32 phy_id;
+	int i;
 
-	phy_reg = asix_mdio_read(dev->net, dev->mii.phy_id, MII_PHYSID1);
-	if (phy_reg < 0)
+	/* Poll for the rare case the FW or phy isn't ready yet.  */
+	for (i = 0; i < 100; i++) {
+		phy_reg = asix_mdio_read(dev->net, dev->mii.phy_id, MII_PHYSID1);
+		if (phy_reg != 0 && phy_reg != 0xFFFF)
+			break;
+		mdelay(1);
+	}
+
+	if (phy_reg <= 0 || phy_reg == 0xFFFF)
 		return 0;
 
 	phy_id = (phy_reg & 0xffff) << 16;
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 5/5] net-next:asix: V2 Update VERSION
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
	Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>

From: Grant Grundler <grundler@google.com>

Only update VERSION to reflect previous changes.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/usb/asix.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index f870ab9..e6fed4d 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -36,7 +36,7 @@
 #include <linux/usb/usbnet.h>
 #include <linux/slab.h>
 
-#define DRIVER_VERSION "26-Sep-2011"
+#define DRIVER_VERSION "08-Nov-2011"
 #define DRIVER_NAME "asix"
 
 /* ASIX AX8817X based USB 2.0 Ethernet Devices */
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 4/5] net-next:asix: V2 more fixes for ax88178 phy init sequence
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
	Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>

From: Grant Grundler <grundler@google.com>

Now works on Samsung Series 5 (chromebook)

Two fixes here:
o use 0x7F mask for phymode
o read phyid *AFTER* phy is powered up (via GPIOs)

Signed-off-by: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
Dave,
Apologies again for botching this patch (not compiling).
I had failed to s/ax88178_reset/asix_sw_reset/ and gave a blend of the two.
I've reviewed and compile tested all 5 patches.

 drivers/net/usb/asix.c |   22 +++++++++++++++-------
 1 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index 8462be5..f870ab9 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -1248,6 +1248,7 @@ static int ax88178_reset(struct usbnet *dev)
 	__le16 eeprom;
 	u8 status;
 	int gpio0 = 0;
+	u32 phyid;
 
 	asix_read_cmd(dev, AX_CMD_READ_GPIOS, 0, 0, 1, &status);
 	dbg("GPIO Status: 0x%04x", status);
@@ -1263,12 +1264,13 @@ static int ax88178_reset(struct usbnet *dev)
 		data->ledmode = 0;
 		gpio0 = 1;
 	} else {
-		data->phymode = le16_to_cpu(eeprom) & 7;
+		data->phymode = le16_to_cpu(eeprom) & 0x7F;
 		data->ledmode = le16_to_cpu(eeprom) >> 8;
 		gpio0 = (le16_to_cpu(eeprom) & 0x80) ? 0 : 1;
 	}
 	dbg("GPIO0: %d, PhyMode: %d", gpio0, data->phymode);
 
+	/* Power up external GigaPHY through AX88178 GPIO pin */
 	asix_write_gpio(dev, AX_GPIO_RSE | AX_GPIO_GPO_1 | AX_GPIO_GPO1EN, 40);
 	if ((le16_to_cpu(eeprom) >> 8) != 1) {
 		asix_write_gpio(dev, 0x003c, 30);
@@ -1280,6 +1282,13 @@ static int ax88178_reset(struct usbnet *dev)
 		asix_write_gpio(dev, AX_GPIO_GPO1EN | AX_GPIO_GPO_1, 30);
 	}
 
+	/* Read PHYID register *AFTER* powering up PHY */
+	phyid = asix_get_phyid(dev);
+	dbg("PHYID=0x%08x", phyid);
+
+	/* Set AX88178 to enable MII/GMII/RGMII interface for external PHY */
+	asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, 0, 0, 0, NULL);
+
 	asix_sw_reset(dev, 0);
 	msleep(150);
 
@@ -1424,7 +1433,6 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
 {
 	int ret;
 	u8 buf[ETH_ALEN];
-	u32 phyid;
 	struct asix_data *data = (struct asix_data *)&dev->data;
 
 	data->eeprom_len = AX88772_EEPROM_LEN;
@@ -1451,12 +1459,12 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
 	dev->net->netdev_ops = &ax88178_netdev_ops;
 	dev->net->ethtool_ops = &ax88178_ethtool_ops;
 
-	phyid = asix_get_phyid(dev);
-	dbg("PHYID=0x%08x", phyid);
+	/* Blink LEDS so users know driver saw dongle */
+	asix_sw_reset(dev, 0);
+	msleep(150);
 
-	ret = ax88178_reset(dev);
-	if (ret < 0)
-		return ret;
+	asix_sw_reset(dev, AX_SWRESET_PRL | AX_SWRESET_IPPD);
+	msleep(150);
 
 	/* Asix framing packs multiple eth frames into a 2K usb bulk transfer */
 	if (dev->driver_info->flags & FLAG_FRAMING_AX) {
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 3/5] net-next:asix: reduce AX88772 init time by about 2 seconds
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler,
	Grant Grundler
In-Reply-To: <1321377163-26308-1-git-send-email-grundler@chromium.org>

From: Grant Grundler <grundler@google.com>

ax88772_reset takes about 2 seconds and is called twice.
Once from ax88772_bind() directly and again indirectly from usbnet_open().
Reset the USB FW/Phy enough to blink the LEDs when inserted.

Signed-off-by: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/usb/asix.c |   30 +++++++++++++++++++++++++-----
 1 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index b4675e8..8462be5 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -1083,7 +1083,7 @@ static const struct net_device_ops ax88772_netdev_ops = {
 
 static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
 {
-	int ret;
+	int ret, embd_phy;
 	struct asix_data *data = (struct asix_data *)&dev->data;
 	u8 buf[ETH_ALEN];
 	u32 phyid;
@@ -1108,16 +1108,36 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
 	dev->mii.reg_num_mask = 0x1f;
 	dev->mii.phy_id = asix_get_phy_addr(dev);
 
-	phyid = asix_get_phyid(dev);
-	dbg("PHYID=0x%08x", phyid);
-
 	dev->net->netdev_ops = &ax88772_netdev_ops;
 	dev->net->ethtool_ops = &ax88772_ethtool_ops;
 
-	ret = ax88772_reset(dev);
+	embd_phy = ((dev->mii.phy_id & 0x1f) == 0x10 ? 1 : 0);
+
+	/* Reset the PHY to normal operation mode */
+	ret = asix_write_cmd(dev, AX_CMD_SW_PHY_SELECT, embd_phy, 0, 0, NULL);
+	if (ret < 0) {
+		dbg("Select PHY #1 failed: %d", ret);
+		return ret;
+	}
+
+	ret = asix_sw_reset(dev, AX_SWRESET_IPPD | AX_SWRESET_PRL);
 	if (ret < 0)
 		return ret;
 
+	msleep(150);
+
+	ret = asix_sw_reset(dev, AX_SWRESET_CLEAR);
+	if (ret < 0)
+		return ret;
+
+	msleep(150);
+
+	ret = asix_sw_reset(dev, embd_phy ? AX_SWRESET_IPRL : AX_SWRESET_PRTE);
+
+	/* Read PHYID register *AFTER* the PHY was reset properly */
+	phyid = asix_get_phyid(dev);
+	dbg("PHYID=0x%08x", phyid);
+
 	/* Asix framing packs multiple eth frames into a 2K usb bulk transfer */
 	if (dev->driver_info->flags & FLAG_FRAMING_AX) {
 		/* hard_mtu  is still the default - the device does not support
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 1/5] net-next:asix:PHY_MODE_RTL8211CL should be 0xC
From: Grant Grundler @ 2011-11-15 17:12 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel, Allan Chou, Freddy Xin, Grant Grundler

From: Grant Grundler <grundler@google.com>

Use correct value for rtl phy support.
(rtl phy are in AX88178 devices like NWU220G and USB2-ET1000).

Signed-off-by: Allan Chou <allan@asix.com.tw>
Tested-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/usb/asix.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/asix.c b/drivers/net/usb/asix.c
index e81e22e..873860d 100644
--- a/drivers/net/usb/asix.c
+++ b/drivers/net/usb/asix.c
@@ -163,7 +163,7 @@
 #define MARVELL_CTRL_TXDELAY	0x0002
 #define MARVELL_CTRL_RXDELAY	0x0080
 
-#define	PHY_MODE_RTL8211CL	0x0004
+#define	PHY_MODE_RTL8211CL	0x000C
 
 /* This structure cannot exceed sizeof(unsigned long [5]) AKA 20 bytes */
 struct asix_data {
-- 
1.7.3.1

^ permalink raw reply related

* Re: [PATCH] bonding: Don't allow mode change via sysfs with slaves present
From: Andy Gospodarek @ 2011-11-15 17:00 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, Andy Gospodarek, Jay Vosburgh
In-Reply-To: <1321375482-8637-1-git-send-email-vfalico@redhat.com>

On Tue, Nov 15, 2011 at 05:44:42PM +0100, Veaceslav Falico wrote:
> When changing mode via bonding's sysfs, the slaves are not initialized
> correctly. Forbid to change modes with slaves present to ensure that every
> slave is initialized correctly via bond_enslave().
> 
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>

Looks good.  This behavior forces someone who wants to change to mode to
go through steps that are almost as destructive as when module options
are used to configure the mode.  I do not see a problem with this.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>

^ permalink raw reply

* [PATCH] bonding: Don't allow mode change via sysfs with slaves present
From: Veaceslav Falico @ 2011-11-15 16:44 UTC (permalink / raw)
  To: netdev; +Cc: Andy Gospodarek, Jay Vosburgh

When changing mode via bonding's sysfs, the slaves are not initialized
correctly. Forbid to change modes with slaves present to ensure that every
slave is initialized correctly via bond_enslave().

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 5a20804..4ef7e2f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -319,6 +319,13 @@ static ssize_t bonding_store_mode(struct device *d,
 		goto out;
 	}
 
+	if (bond->slave_cnt > 0) {
+		pr_err("unable to update mode of %s because it has slaves.\n",
+			bond->dev->name);
+		ret = -EPERM;
+		goto out;
+	}
+
 	new_value = bond_parse_parm(buf, bond_mode_tbl);
 	if (new_value < 0)  {
 		pr_err("%s: Ignoring invalid mode value %.*s.\n",
-- 
1.7.6.4

^ permalink raw reply related

* Re: [PATCH] r8169: add module param for control of ASPM disable
From: Matthew Garrett @ 2011-11-15 16:32 UTC (permalink / raw)
  To: Todd Broch
  Cc: Francois Romieu, Realtek linux nic maintainers, netdev,
	Hayes Wang
In-Reply-To: <CA+iF6Rog3ptpmQZzhcRODmZUKN18_uw5t9xfpQjbJ86qKUA0eQ@mail.gmail.com>

On Tue, Nov 15, 2011 at 08:27:41AM -0800, Todd Broch wrote:
> On Sat, Nov 12, 2011 at 2:46 AM, Francois Romieu <romieu@fr.zoreil.com>wrote:
> >
> > Re-visiting the original change that disabled ASPM,
> 
> http://www.google.com/url?q=http%3A//git.kernel.org/%3Fp%3Dlinux/kernel/git/torvalds/linux-2.6.git%3Ba%3Dcommit%3Bh%3Dba04c7c93bbcb48ce880cf75b6e9dffcd79d4c7b&usg=AFQjCNFfPARrhwg-nBtW09W_n4qr1hgvdA
> 
> Led me to,
>   https://bugzilla.redhat.com/show_bug.cgi?id=642861#c4
> 
> This comment by tomi.leppikangas@, is later re-canted as a h/w issue in,
>  https://bugzilla.redhat.com/show_bug.cgi?id=642861#c9
>    'I am now pretty sure that my problems were caused by faulty hardware.
> Cpu or
>     motherboard seems to be broken, so pcie_aspm=off  didnt help for me.
> Sorry
>     about misleading info.'

Mike Khusid's issue was fixed by disabling ASPM.

> My assement from above is that ASPM was disabled prematurely and given the
> power
> savings should be re-enabled.

Power savings are great. I'm all in favour of power savings. But not 
when they break otherwise working setups.

> I'd certainly be agreeable to switching the assertion of patch to default
> being disabled.
> Unfortunately I fear that means most will never benefit from the power
> savings.

I'd recommend working with your hardware partners to figure out which 
parts are expected to work and which aren't. There's no problem with 
making this code conditional on product ID or version.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: Grant Grundler @ 2011-11-15 15:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, allan, freddy, kernel
In-Reply-To: <20111114.214542.1423779515286773837.davem@davemloft.net>

On Mon, Nov 14, 2011 at 6:45 PM, David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 14 Nov 2011 21:41:51 -0500 (EST)
>
>> Come on man... are you kidding me?

Dave,
my apologies. That's obviously my fail.

The problem is I can't test your git tree on my systems...but I should
have at least compile tested it. Or just submitted the changes
straight from chromium.org tree. *sigh*

> Want to know what really pisses me off about this?
>
> All of Mark Lord's hard work to bring the entire vendor driver over
> was thrown out.

Not entirely correct as Mark pointed out. I was able to convince ASIX
they should be working with Mark and they committed to doing so.

> And it was thrown out in favor of this!  Code that doesn't even
> compile.

*sigh* sorry...I'll resubmit the entire mess and compile test first. /o\

thanks for your patience,
grant

^ permalink raw reply

* Re: [PATCH] net: fsl_pq_mdio: fix non tbi phy access
From: Baruch Siach @ 2011-11-15 15:44 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev@vger.kernel.org, linuxppc-dev
In-Reply-To: <74631EEB-F6F8-4969-AD05-81DEAFB0EAB4@freescale.com>

Hi Andy,

On Tue, Nov 15, 2011 at 09:06:03AM -0600, Andy Fleming wrote:
> On Nov 14, 2011, at 11:17 PM, Baruch Siach wrote:
> > On Mon, Nov 14, 2011 at 09:04:47PM +0000, Fleming Andy-AFLEMING wrote:

[snip]

> >> And looking at the p1010si.dtsi, I see that it's automatically there for 
> >> you.
> >> 
> >> How were you breaking?
> > 
> > Adding linuxppc to Cc.
> > 
> > My board is P1011 based, the single core version of P1020, not P1010. In 
> > p1020si.dtsi I see no tbi node. In p1020rdb.dts I see a tbi node but only for 
> > mdio@25000, not mdio@24000, which is what I'm using.
> > 
> > Am I missing something?
> 
> Well, that's a bug. In truth, the silicon dtsi trees should not have tbi 
> nodes, as that's highly machine-specific. The p1020rdb is apparently relying 
> on the old behavior, which is broken, and due to the fact that the first 
> ethernet interface doesn't *use* the TBI PHY.
> 
> You should add this to your board tree:
> 
>                 mdio@24000 {
> 
>                         tbi0: tbi-phy@11 {
>                                 reg = <0x11>;
>                                 device_type = "tbi-phy";
>                         };
>                 };
> 
> And add the PHYs you use, as well as set reg (and the value after the "@") 
> to something that makes sense for your board.

Thanks for your detailed explanation and prompt response. I've added a tbi 
node, dropped my patch, and now my board works as expected.

> I am going to go right now, and add tbi nodes for all of the Freescale 
> platforms. I will also modify the fsl_pq_mdio code to be more explicit about 
> its reason for failure.

Please Cc me on these.

Thanks,
baruch

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply

* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Sasha Levin @ 2011-11-15 15:30 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: Asias He, gorcunov, kvm, mingo, Michael S. Tsirkin, netdev,
	penberg, Rusty Russell, virtualization
In-Reply-To: <OFDA747DDD.8D1C8FD8-ON65257949.001837DA-65257949.0019D25F@in.ibm.com>

On Tue, 2011-11-15 at 10:14 +0530, Krishna Kumar2 wrote:
> Sasha Levin <levinsasha928@gmail.com> wrote on 11/14/2011 03:45:40 PM:
> 
> > > Why both the bandwidth and latency performance are dropping so
> > > dramatically with multiple VQ?
> >
> > It looks like theres no hash sync between host and guest, which makes
> > the RX VQ change for every packet. This is my guess.
> 
> Yes, I confirmed this happens for macvtap. I am
> using ixgbe - it calls skb_record_rx_queue when
> a skb is allocated, but sets rxhash when a packet
> arrives. Macvtap is relying on record_rx_queue
> first ahead of rxhash (as part of my patch making
> macvtap multiqueue), hence different skbs result
> in macvtap selecting different vq's.

I'm seeing this behavior in non-macvtep related setup as well (simple
tap <-> virtio-net).

-- 

Sasha.


^ permalink raw reply

* [PATCH] iproute2: Display closed UDP sockets on 'ss -ul'
From: Petr Šabata @ 2011-11-15 15:19 UTC (permalink / raw)
  To: netdev; +Cc: Petr Šabata

This patch emulates 'netstat -ul' behavior, showing 'closed'
(state 07) UDP sockets when ss is called with '-ul' options.
Although dirty, this seems like the least invasive way to fix
it and shouldn't really break anything.

Signed-off-by: Petr Šabata <contyk@redhat.com>
---
 misc/ss.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 1353620..af774d1 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2568,7 +2568,7 @@ int main(int argc, char *argv[])
 			current_filter.states = SS_ALL;
 			break;
 		case 'l':
-			current_filter.states = (1<<SS_LISTEN);
+			current_filter.states = (1<<SS_LISTEN) | (1<<SS_CLOSE);
 			break;
 		case '4':
 			preferred_family = AF_INET;
-- 
1.7.7.1

^ permalink raw reply related

* Re: [PATCH 5/5] net-next:asix: update VERSION and white space changes
From: Mark Lord @ 2011-11-15 15:19 UTC (permalink / raw)
  To: David Miller; +Cc: grundler, netdev, linux-kernel, allan, freddy
In-Reply-To: <20111114.214542.1423779515286773837.davem@davemloft.net>

On 11-11-14 09:45 PM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Mon, 14 Nov 2011 21:41:51 -0500 (EST)
> 
>> Come on man... are you kidding me?
> 
> Want to know what really pisses me off about this?
> 
> All of Mark Lord's hard work to bring the entire vendor driver over
> was thrown out.

Well, ASIX and I appear to be back on track again.

So once the dust settles in net-dev with Grant's patches,
I will take over development of the asix driver,
and start sending you (Dave) patches to merge the
rest of the vendor's driver code.

With luck, it might all make it in there in time for the next (3.3) merge.

Cheers

^ permalink raw reply

* Re: [PATCH] net: fsl_pq_mdio: fix non tbi phy access
From: Andy Fleming @ 2011-11-15 15:06 UTC (permalink / raw)
  To: Baruch Siach; +Cc: netdev@vger.kernel.org, linuxppc-dev
In-Reply-To: <20111115051713.GA4052@sapphire.tkos.co.il>

On Nov 14, 2011, at 11:17 PM, Baruch Siach wrote:

> Hi Andy,
> 
> On Mon, Nov 14, 2011 at 09:04:47PM +0000, Fleming Andy-AFLEMING wrote:
>> Well, this got applied quickly, so I guess I can't NAK, but this requires discussion.
>> 
>> On Nov 14, 2011, at 0:22, "Baruch Siach" <baruch@tkos.co.il> wrote:
>> 
>>> Since 952c5ca1 (fsl_pq_mdio: Clean up tbi address configuration) .probe returns
>>> -EBUSY when the "tbi-phy" node is missing. Fix this.
>> 
>> It returns an error because it finds no tbi node. Because without the tbi 
>> node, there is no way for the driver to determine which address to set.
>> 
>> Your solution is to ignore the error, and hope. That's a broken approach.  
>> The real solution for a p1010 should be to have a tbi node in the dts.
> 
> Can you elaborate a bit on why this approach is broken? The PHY used to work 
> for me until 952c5ca1, and with this applied.

Yes, well, just because a problem goes away when a patch is applied does not mean that the patch is correct, or that it made things work.

An explanation:

In order to support certain types of serial data interfaces with external PHYs (like SGMII), it is necessary to translate the MAC's data signaling into the serialized signaling. On Freescale parts, this is done via a SerDes block, but the SerDes link needs a small amount of management. To perform this management, we have an onboard "TBI" PHY. This PHY is highly integrated with the MAC and MDIO devices. Each MAC has two relevant components:

1) a TBIPA register, which declares the address of the TBI PHY
2) an associated MDIO controller.

In order to configure the SerDes link, it is necessary to communicate via the "local" MDIO controller with the TBI PHY. For most of the MACs, this is simple: Choose an address for TBIPA, and then use that address to communicate with the TBI PHY. However, the *first* MDIO controller is also used to communicate with external PHYs. On this controller, we have to be fairly particular about which address we put in TBIPA, because all transactions to that address will go to the TBI PHY. On older parts, this value defaulted to "0", but it now defaults to "31", I believe.

Ok, so now we're at this code. The of_mdiobus_register() function will parse the device tree, and find all of the PHYs on the MDIO bus, and register them as devices. In order to ensure that all of those PHYs are accessible, we *MUST* set TBIPA to something that won't conflict with any existing addresses. The mechanism we have chosen for this is to assign the address in the device tree, via a tbi-phy node.

My recent patch changed the behavior, because we used to try to find a free address via scanning, but this was somewhat ugly, and failed (as you noticed) due to uninitialized mutexes.

The reason your latest patch is wrong is because it doesn't set the TBIPA register at all if there is no tbi-phy node. Instead, it just relies on luck, hoping that the TBIPA register was set to something that doesn't conflict already. It will work if 0x1f or 0 aren't necessary PHY addresses for your board, or if the firmware set it to something sensible.

> 
>> And looking at the p1010si.dtsi, I see that it's automatically there for 
>> you.
>> 
>> How were you breaking?
> 
> Adding linuxppc to Cc.
> 
> My board is P1011 based, the single core version of P1020, not P1010. In 
> p1020si.dtsi I see no tbi node. In p1020rdb.dts I see a tbi node but only for 
> mdio@25000, not mdio@24000, which is what I'm using.
> 
> Am I missing something?

Well, that's a bug. In truth, the silicon dtsi trees should not have tbi nodes, as that's highly machine-specific. The p1020rdb is apparently relying on the old behavior, which is broken, and due to the fact that the first ethernet interface doesn't *use* the TBI PHY.

You should add this to your board tree:

                mdio@24000 {

                        tbi0: tbi-phy@11 {
                                reg = <0x11>;
                                device_type = "tbi-phy";
                        };
                };

And add the PHYs you use, as well as set reg (and the value after the "@") to something that makes sense for your board.

I am going to go right now, and add tbi nodes for all of the Freescale platforms. I will also modify the fsl_pq_mdio code to be more explicit about its reason for failure.

Andy

^ permalink raw reply

* [PATCN net-next] net: use jump_label for netstamp_needed
From: Eric Dumazet @ 2011-11-15 14:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

netstamp_needed seems a good candidate to jump_label conversion.

This avoids 3 conditional branches per incoming packet in fast path.

No measurable difference, given that these conditional branches are
predicted on modern cpus. Only a small icache reduction, thanks to the
unlikely() stuff.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dev.c |   32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6ba50a1..51f89cd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -137,6 +137,7 @@
 #include <linux/if_pppox.h>
 #include <linux/ppp_defs.h>
 #include <linux/net_tstamp.h>
+#include <linux/jump_label.h>
 
 #include "net-sysfs.h"
 
@@ -1449,34 +1450,32 @@ int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
 }
 EXPORT_SYMBOL(call_netdevice_notifiers);
 
-/* When > 0 there are consumers of rx skb time stamps */
-static atomic_t netstamp_needed = ATOMIC_INIT(0);
+static struct jump_label_key netstamp_needed __read_mostly;
 
 void net_enable_timestamp(void)
 {
-	atomic_inc(&netstamp_needed);
+	jump_label_inc(&netstamp_needed);
 }
 EXPORT_SYMBOL(net_enable_timestamp);
 
 void net_disable_timestamp(void)
 {
-	atomic_dec(&netstamp_needed);
+	jump_label_dec(&netstamp_needed);
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 
 static inline void net_timestamp_set(struct sk_buff *skb)
 {
-	if (atomic_read(&netstamp_needed))
+	skb->tstamp.tv64 = 0;
+	if (static_branch(&netstamp_needed))
 		__net_timestamp(skb);
-	else
-		skb->tstamp.tv64 = 0;
 }
 
-static inline void net_timestamp_check(struct sk_buff *skb)
-{
-	if (!skb->tstamp.tv64 && atomic_read(&netstamp_needed))
-		__net_timestamp(skb);
-}
+#define net_timestamp_check(COND, SKB)			\
+	if (static_branch(&netstamp_needed)) {		\
+		if ((COND) && !(SKB)->tstamp.tv64)	\
+			__net_timestamp(SKB);		\
+	}						\
 
 static int net_hwtstamp_validate(struct ifreq *ifr)
 {
@@ -2997,8 +2996,7 @@ int netif_rx(struct sk_buff *skb)
 	if (netpoll_rx(skb))
 		return NET_RX_DROP;
 
-	if (netdev_tstamp_prequeue)
-		net_timestamp_check(skb);
+	net_timestamp_check(netdev_tstamp_prequeue, skb);
 
 	trace_netif_rx(skb);
 #ifdef CONFIG_RPS
@@ -3230,8 +3228,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
 	int ret = NET_RX_DROP;
 	__be16 type;
 
-	if (!netdev_tstamp_prequeue)
-		net_timestamp_check(skb);
+	net_timestamp_check(!netdev_tstamp_prequeue, skb);
 
 	trace_netif_receive_skb(skb);
 
@@ -3362,8 +3359,7 @@ out:
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
-	if (netdev_tstamp_prequeue)
-		net_timestamp_check(skb);
+	net_timestamp_check(netdev_tstamp_prequeue, skb);
 
 	if (skb_defer_rx_timestamp(skb))
 		return NET_RX_SUCCESS;

^ permalink raw reply related

* Re: [PATCH 3/3] MIPS: Octeon: Rearrange CVMX files in preperation for device tree
From: Ralf Baechle @ 2011-11-15 14:08 UTC (permalink / raw)
  To: ddaney.cavm; +Cc: linux-mips, netdev, gregkh, devel, David Daney
In-Reply-To: <1320971387-29343-4-git-send-email-ddaney.cavm@gmail.com>

Queued for 3.3.  Thanks,

  Ralf

^ permalink raw reply

* Re: [PATCH 1/3] MIPS: Octeon: Move some Ethernet support files out of staging.
From: Ralf Baechle @ 2011-11-15 14:08 UTC (permalink / raw)
  To: ddaney.cavm; +Cc: linux-mips, netdev, gregkh, devel, David Daney
In-Reply-To: <1320971387-29343-2-git-send-email-ddaney.cavm@gmail.com>

Queued for 3.3.  Thanks,

  Ralf

^ permalink raw reply

* Re: [PATCH 2/3] MIPS: Octeon: Update bootloader board type constants.
From: Ralf Baechle @ 2011-11-15 14:08 UTC (permalink / raw)
  To: ddaney.cavm; +Cc: linux-mips, netdev, gregkh, devel, David Daney
In-Reply-To: <1320971387-29343-3-git-send-email-ddaney.cavm@gmail.com>

Queued for 3.3.  Thanks,

  Ralf

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox