Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] cxgb3: Set vlan_feature on net_device
From: brenohl @ 2012-07-18 19:29 UTC (permalink / raw)
  To: divy; +Cc: netdev, Breno Leitao

cxgb3 interface has a bad performance when VLAN is set. On my current
setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
(8 instances, 4k message).
With this patch, I am able to reach 9.5 Gbps.

Signed-off-by: Breno Leitao <brenohl@br.ibm.com>

diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
index abb6ce7..fcf4b31 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
@@ -3173,6 +3173,9 @@ static void __devinit cxgb3_init_iscsi_mac(struct net_device *dev)
 	pi->iscsic.mac_addr[3] |= 0x80;
 }
 
+#define TSO_FLAGS (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
+#define VLAN_FEAT (NETIF_F_SG | NETIF_F_IP_CSUM | TSO_FLAGS | \
+			NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
 static int __devinit init_one(struct pci_dev *pdev,
 			      const struct pci_device_id *ent)
 {
@@ -3293,6 +3296,7 @@ static int __devinit init_one(struct pci_dev *pdev,
 		netdev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM |
 			NETIF_F_TSO | NETIF_F_RXCSUM | NETIF_F_HW_VLAN_RX;
 		netdev->features |= netdev->hw_features | NETIF_F_HW_VLAN_TX;
+		netdev->vlan_features |= netdev->features & VLAN_FEAT;
 		if (pci_using_dac)
 			netdev->features |= NETIF_F_HIGHDMA;
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 08/15] ipv4: Kill routes during PMTU/redirect updates.
From: David Miller @ 2012-07-18 19:30 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <1342638944.2013.10.camel@joe2Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 18 Jul 2012 12:15:44 -0700

> On Wed, 2012-07-18 at 11:23 -0700, David Miller wrote:
>> Mark them obsolete so there will be a re-lookup to fetch the
>> FIB nexthop exception info.
> []
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> []
>> @@ -716,8 +717,8 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
>>  					fnhe->fnhe_gw = new_gw;
>>  				spin_unlock_bh(&fnhe_lock);
>>  			}
>> -			rt->rt_gateway = new_gw;
>> -			rt->rt_flags |= RTCF_REDIRECTED;
>> +			if (kill_route)
>> +				rt->dst.obsolete = -2;
> 
> Perhaps -2 should be a #define?
> 
> Perhaps struct dst_entry.obsolete could be a char instead of
> a short and a pad byte could added for some future use.

First thing, char is not signed by default on all systems :-)

But yes, this should be cleaned up.  Also with a big fat comment above
the struct member detailing it's usage.

I'll put this on my TODO list, thanks a lot Joe.

^ permalink raw reply

* Re: [PATCH 10/15] ipv4: Cache input routes in fib_info nexthops.
From: David Miller @ 2012-07-18 19:30 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <1342639675.2013.18.camel@joe2Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 18 Jul 2012 12:27:55 -0700

> Maybe a helper like:
> 
> 	if (some_do_cache_name(rth, res, itag, flags, &do_cache))
> 		goto foo;

Yep that would make a lot of sense.

^ permalink raw reply

* Re: r8169: link up, link down
From: Francois Romieu @ 2012-07-18 19:23 UTC (permalink / raw)
  To: J. Christopher Pereira; +Cc: netdev
In-Reply-To: <038001cd6512$613125c0$23937140$@cl>

J. Christopher Pereira <kripper@imatronix.cl> :
[...]
> dmesg says "eth0: RTL8110s at 0xffffc2000067ec00, 00:4f:4a:10:1e:cf, XID
> 04000000 IRQ 16".

It's an old chipset (RTL_GIGA_MAC_VER_03). The PCI remove / rescan trick
may or may not work.

> > Building a modern kernel is strongly suggested if the hardware includes a
> recent 816x chipset.
> 
> Is there any particular patch I could apply and just recompile the driver?

Your kernel is more than three years old. It is not _that_ long but there
are ~182 r8169 patches between v2.6.30 and v3.4. There will still be a lot
even after the trivial ones are factored out.

See git log -p v2.6.30..v3.4 --follow -- drivers/net/ethernet/realtek/r8169.c
for the whole gore.

> My hope was to first receive feedback and identify some probably related
> known bug, in order to avoid searching for a solution by trial and error or
> by updating the whole environment.

You should really try 3.4 and revert 036dafa28da1e2565a8529de2ae663c37b7a0060.
It's the best I can suggest.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH 08/15] ipv4: Kill routes during PMTU/redirect updates.
From: Joe Perches @ 2012-07-18 19:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120718.123015.476222169838022819.davem@davemloft.net>

On Wed, 2012-07-18 at 12:30 -0700, David Miller wrote:
> From: Joe Perches <joe@perches.com>
> > Perhaps struct dst_entry.obsolete could be a char instead of
> > a short and a pad byte could added for some future use.
> 
> First thing, char is not signed by default on all systems :-)

yeah, yeah. I'm sure you'll dtrt :)

^ permalink raw reply

* Re: [PATCH 1/7] net-tcp: Fast Open base
From: Eric Dumazet @ 2012-07-18 19:55 UTC (permalink / raw)
  To: David Miller; +Cc: ycheng, hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <20120716.231644.1189536600250332545.davem@davemloft.net>

On Mon, 2012-07-16 at 23:16 -0700, David Miller wrote:
> From: Yuchung Cheng <ycheng@google.com>
> Date: Mon, 16 Jul 2012 14:16:44 -0700
> 
> > +#define TCPOPT_EXP		254	/* Experimental */
> > +/* Magic number to be after the option value for sharing TCP
> > + * experimental options. See draft-ietf-tcpm-experimental-options-00.txt
> > + */
> > +#define TCPOPT_FASTOPEN_MAGIC	0xF989
> 
> If I apply this, we're stuck supporting this experimental number
> forever.
> 
> Because somewhere, someone will have a kernel running using this
> number, so we have to support this option value as well as whatever
> the official one is.
> 
> Therefore I think the only logical thing we can do is only deploy
> this once an official option number is choosen.

Hi David

This is a chicken and egg problem.

IANA wont grant an official number like that in 2012+. Maybe if billions
of Android/linux devices use TFO in 2015 IANA will grant an official
number.

So we chose to follow Joe touch proposal
(http://tools.ietf.org/html/draft-ietf-tcpm-experimental-options-01) and
the magic 0xF989 was generated according to section 3) to avoid possible
clashes with other experimental options using code option 254

(Code options 253 & 254 are reserved for experimental use.
Linux Cookie extension uses 253 without a magic cookie so 253 cannot be
shared. By the way I wonder if anybody uses it... oh well...) 

Only servers will need to cope with this experimental option plus the
official one (_if_ IANA accepts to unblock one of the many reserved
options, in two or three years)

Yuchung only posted the Client side in this patch series. But we already
run the server side, and supporting the official TFO option plus the
experimental one is adding less than 10 lines of code.

So the plan would be :

1) Use the experimental 254 + magic on TFO Clients/Servers in 2012

2) When/If IANA grants an official number, add its support to servers
   (keeping support for experimental option as well)

3) One/two years later, switch client side to use this official number

4) Ten years later, remove experimental from server side.

Thanks !

PS :

TFO is not mandatory : If the initial SYN TFO option is not understood
by a server, it will reply with a SYN/ACK without the option and cookie,
and client will proceed as today.

^ permalink raw reply

* Re: [PATCH] SUNRPC: Prevent kernel stack corruption on long values of flush
From: Jim Rees @ 2012-07-18 20:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Sasha Levin, Trond.Myklebust, davem, davej, linux-nfs, netdev,
	linux-kernel
In-Reply-To: <20120718173913.GA1298@fieldses.org>

J. Bruce Fields wrote:

  On Tue, Jul 17, 2012 at 12:01:26AM +0200, Sasha Levin wrote:
  > The buffer size in read_flush() is too small for the longest possible values
  > for it. This can lead to a kernel stack corruption:
  
  Thanks!
  
  > 
  > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
  > index 2afd2a8..f86d95e 100644
  > --- a/net/sunrpc/cache.c
  > +++ b/net/sunrpc/cache.c
  > @@ -1409,11 +1409,11 @@ static ssize_t read_flush(struct file *file, char __user *buf,
  >  			  size_t count, loff_t *ppos,
  >  			  struct cache_detail *cd)
  >  {
  > -	char tbuf[20];
  > +	char tbuf[22];
  
  I wonder how common this sort of calculation is in the kernel?  It might
  provide some peace of mind to be able to write this something like
  
  	char tbuf[MAXLEN_BASE10_UL + 2]  /* + 2 for final "\n\0" */

You could use something like:

    char tbuf[sizeof (unsigned long) * 24 / 10 + 1 + 2]; /* + 2 for final "\n\0" */

since there are roughly 10 bits for every 3 decimal digits.

But I'm obviously confused, because I don't understand why tbuf needs to be
any more than 10 + 2.

^ permalink raw reply

* [PATCH v3] ipv4: use seqlock for nh_exceptions
From: Julian Anastasov @ 2012-07-18 20:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

	Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/ip_fib.h |    2 +-
 net/ipv4/route.c     |  118 +++++++++++++++++++++++++++++---------------------
 2 files changed, 69 insertions(+), 51 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index e9ee1ca..2daf096 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -51,7 +51,7 @@ struct fib_nh_exception {
 	struct fib_nh_exception __rcu	*fnhe_next;
 	__be32				fnhe_daddr;
 	u32				fnhe_pmtu;
-	u32				fnhe_gw;
+	__be32				fnhe_gw;
 	unsigned long			fnhe_expires;
 	unsigned long			fnhe_stamp;
 };
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f67e702..e9802d8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1333,9 +1333,9 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk,
 		build_sk_flow_key(fl4, sk);
 }
 
-static DEFINE_SPINLOCK(fnhe_lock);
+static DEFINE_SEQLOCK(fnhe_seqlock);
 
-static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash, __be32 daddr)
+static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash)
 {
 	struct fib_nh_exception *fnhe, *oldest;
 
@@ -1358,47 +1358,63 @@ static inline u32 fnhe_hashfun(__be32 daddr)
 	return hval & (FNHE_HASH_SIZE - 1);
 }
 
-static struct fib_nh_exception *find_or_create_fnhe(struct fib_nh *nh, __be32 daddr)
+static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
+				  u32 pmtu, unsigned long expires)
 {
-	struct fnhe_hash_bucket *hash = nh->nh_exceptions;
+	struct fnhe_hash_bucket *hash;
 	struct fib_nh_exception *fnhe;
 	int depth;
-	u32 hval;
+	u32 hval = fnhe_hashfun(daddr);
+
+	write_seqlock_bh(&fnhe_seqlock);
 
+	hash = nh->nh_exceptions;
 	if (!hash) {
-		hash = nh->nh_exceptions = kzalloc(FNHE_HASH_SIZE * sizeof(*hash),
-						   GFP_ATOMIC);
+		hash = kzalloc(FNHE_HASH_SIZE * sizeof(*hash), GFP_ATOMIC);
 		if (!hash)
-			return NULL;
+			goto out_unlock;
+		nh->nh_exceptions = hash;
 	}
 
-	hval = fnhe_hashfun(daddr);
 	hash += hval;
 
 	depth = 0;
 	for (fnhe = rcu_dereference(hash->chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
 		if (fnhe->fnhe_daddr == daddr)
-			goto out;
+			break;
 		depth++;
 	}
 
-	if (depth > FNHE_RECLAIM_DEPTH) {
-		fnhe = fnhe_oldest(hash + hval, daddr);
-		goto out_daddr;
+	if (fnhe) {
+		if (gw)
+			fnhe->fnhe_gw = gw;
+		if (pmtu) {
+			fnhe->fnhe_pmtu = pmtu;
+			fnhe->fnhe_expires = expires;
+		}
+	} else {
+		if (depth > FNHE_RECLAIM_DEPTH)
+			fnhe = fnhe_oldest(hash);
+		else {
+			fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC);
+			if (!fnhe)
+				goto out_unlock;
+
+			fnhe->fnhe_next = hash->chain;
+			rcu_assign_pointer(hash->chain, fnhe);
+		}
+		fnhe->fnhe_daddr = daddr;
+		fnhe->fnhe_gw = gw;
+		fnhe->fnhe_pmtu = pmtu;
+		fnhe->fnhe_expires = expires;
 	}
-	fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC);
-	if (!fnhe)
-		return NULL;
-
-	fnhe->fnhe_next = hash->chain;
-	rcu_assign_pointer(hash->chain, fnhe);
 
-out_daddr:
-	fnhe->fnhe_daddr = daddr;
-out:
 	fnhe->fnhe_stamp = jiffies;
-	return fnhe;
+
+out_unlock:
+	write_sequnlock_bh(&fnhe_seqlock);
+	return;
 }
 
 static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4)
@@ -1452,13 +1468,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 		} else {
 			if (fib_lookup(net, fl4, &res) == 0) {
 				struct fib_nh *nh = &FIB_RES_NH(res);
-				struct fib_nh_exception *fnhe;
 
-				spin_lock_bh(&fnhe_lock);
-				fnhe = find_or_create_fnhe(nh, fl4->daddr);
-				if (fnhe)
-					fnhe->fnhe_gw = new_gw;
-				spin_unlock_bh(&fnhe_lock);
+				update_or_create_fnhe(nh, fl4->daddr, new_gw,
+						      0, 0);
 			}
 			rt->rt_gateway = new_gw;
 			rt->rt_flags |= RTCF_REDIRECTED;
@@ -1663,15 +1675,9 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 
 	if (fib_lookup(dev_net(rt->dst.dev), fl4, &res) == 0) {
 		struct fib_nh *nh = &FIB_RES_NH(res);
-		struct fib_nh_exception *fnhe;
 
-		spin_lock_bh(&fnhe_lock);
-		fnhe = find_or_create_fnhe(nh, fl4->daddr);
-		if (fnhe) {
-			fnhe->fnhe_pmtu = mtu;
-			fnhe->fnhe_expires = jiffies + ip_rt_mtu_expires;
-		}
-		spin_unlock_bh(&fnhe_lock);
+		update_or_create_fnhe(nh, fl4->daddr, 0, mtu,
+				      jiffies + ip_rt_mtu_expires);
 	}
 	rt->rt_pmtu = mtu;
 	dst_set_expires(&rt->dst, ip_rt_mtu_expires);
@@ -1902,23 +1908,35 @@ static void rt_bind_exception(struct rtable *rt, struct fib_nh *nh, __be32 daddr
 
 	hval = fnhe_hashfun(daddr);
 
+restart:
 	for (fnhe = rcu_dereference(hash[hval].chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
-		if (fnhe->fnhe_daddr == daddr) {
-			if (fnhe->fnhe_pmtu) {
-				unsigned long expires = fnhe->fnhe_expires;
-				unsigned long diff = jiffies - expires;
-
-				if (time_before(jiffies, expires)) {
-					rt->rt_pmtu = fnhe->fnhe_pmtu;
-					dst_set_expires(&rt->dst, diff);
-				}
+		__be32 fnhe_daddr, gw;
+		u32 pmtu;
+		unsigned long expires;
+		unsigned int seq;
+
+		seq = read_seqbegin(&fnhe_seqlock);
+		fnhe_daddr = fnhe->fnhe_daddr;
+		gw = fnhe->fnhe_gw;
+		pmtu = fnhe->fnhe_pmtu;
+		expires = fnhe->fnhe_expires;
+		if (read_seqretry(&fnhe_seqlock, seq))
+			goto restart;
+		if (daddr != fnhe_daddr)
+			continue;
+		if (pmtu) {
+			unsigned long diff = expires - jiffies;
+
+			if (time_before(jiffies, expires)) {
+				rt->rt_pmtu = pmtu;
+				dst_set_expires(&rt->dst, diff);
 			}
-			if (fnhe->fnhe_gw)
-				rt->rt_gateway = fnhe->fnhe_gw;
-			fnhe->fnhe_stamp = jiffies;
-			break;
 		}
+		if (gw)
+			rt->rt_gateway = gw;
+		fnhe->fnhe_stamp = jiffies;
+		break;
 	}
 }
 
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH v2] net: cgroup: null ptr dereference in netprio cgroup during init
From: Neil Horman @ 2012-07-18 20:10 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, gaofeng, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <20120718183408.27037.16130.stgit@jf-dev1-dcblab>

On Wed, Jul 18, 2012 at 11:34:09AM -0700, John Fastabend wrote:
> When the netprio cgroup is built in the kernel cgroup_init will call
> cgrp_create which eventually calls update_netdev_tables. This is
> being called before do_initcalls() so a null ptr dereference occurs
> on init_net.
> 
> This patch adds a check on init_net.count to verify the structure
> has been initialized. The failure was introduced here,
> 
> commit ef209f15980360f6945873df3cd710c5f62f2a3e
> Author: Gao feng <gaofeng@cn.fujitsu.com>
> Date:   Wed Jul 11 21:50:15 2012 +0000
> 
>     net: cgroup: fix access the unallocated memory in netprio cgroup
> 
> Tested with ping with netprio_cgroup as a module and built in.
> 
> [    0.256451] Initializing cgroup subsys net_prio
> [    0.269948] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000698
> [    0.293303] IP: [<ffffffff81512e37>] cgrp_create+0x107/0x1c0
> [    0.310175] PGD 0
> [    0.316157] Oops: 0000 [#1] SMP
> [    0.325775] CPU 0
> [    0.331227] Modules linked in:
> [    0.340846]
> [    0.345264] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7+ #1 AMD Dinar/Dinar
> [    0.366555] RIP: 0010:[<ffffffff81512e37>]  [<ffffffff81512e37>]
> cgrp_create+0x107/0x1c0
> [    0.390681] RSP: 0000:ffffffff81c01ea8  EFLAGS: 00010213
> [    0.406501] RAX: 0000000000000000 RBX: ffffffffffffff10 RCX: 0000000000000000
> [    0.427764] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff81c9d840
> [    0.449026] RBP: ffffffff81c01ed8 R08: 00000000000164e0 R09: 0000000000000000
> [    0.470289] R10: ffff8804278303c0 R11: 0000000000000000 R12: 0000000000000001
> [    0.491553] R13: ffff8804278303c0 R14: ffff881036fd0700 R15: 0000000000000000
> [    0.512819] FS:  0000000000000000(0000) GS:ffff880427c00000(0000)
> knlGS:0000000000000000
> [    0.536932] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    0.554049] CR2: 0000000000000698 CR3: 0000000001c0b000 CR4: 00000000000406b0
> [    0.575311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.596574] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    0.617838] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task
> ffffffff81c13420)
> [    0.642471] Stack:
> [    0.648442]  ffffffff81c01eb8 ffffffff81c9f320 ffffffff81c9f320
> ffffffff81c9f320
> [    0.670522]  ffffffff81c9f320 ffffffff81d482c0 ffffffff81c01ef8
> ffffffff81d10397
> [    0.692604]  ffffffff81e99790 0000000000000048 ffffffff81c01f18
> ffffffff81d1062e
> [    0.714687] Call Trace:
> [    0.721960]  [<ffffffff81d10397>] cgroup_init_subsys+0x51/0xdf
> [    0.739337]  [<ffffffff81d1062e>] cgroup_init+0x36/0x119
> [    0.755160]  [<ffffffff81cf5c02>] start_kernel+0x38f/0x3c4
> [    0.771501]  [<ffffffff81cf5672>] ? repair_env_string+0x5e/0x5e
> [    0.789138]  [<ffffffff81cf5356>] x86_64_start_reservations+0x131/0x135
> [    0.808849]  [<ffffffff81cf545a>] x86_64_start_kernel+0x100/0x10f
> 
> 
> Reported-by: Mark Rustad <mark.d.rustad@intel.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Gao feng <gaofeng@cn.fujitsu.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 
>  net/core/net_namespace.c  |    4 +++-
>  net/core/netprio_cgroup.c |    3 +++
>  2 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index dddbacb..faa33bb 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
>  LIST_HEAD(net_namespace_list);
>  EXPORT_SYMBOL_GPL(net_namespace_list);
>  
> -struct net init_net;
> +struct net init_net = {
> +	.count = ATOMIC_INIT(0),
> +};
>  EXPORT_SYMBOL(init_net);
>  
>  #define INITIAL_NET_GEN_PTRS	13 /* +1 for len +2 for rcu_head */
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index b2e9caa..e9fd7fd 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -116,6 +116,9 @@ static int update_netdev_tables(void)
>  	u32 max_len;
>  	struct netprio_map *map;
>  
> +	if (!atomic_read(&init_net.count))
> +		return ret;
> +
>  	rtnl_lock();
>  	max_len = atomic_read(&max_prioidx) + 1;
>  	for_each_netdev(&init_net, dev) {
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: Neil Horman @ 2012-07-18 20:11 UTC (permalink / raw)
  To: Mark Rustad; +Cc: netdev, davem, gaofeng, eric.dumazet
In-Reply-To: <20120718190607.22923.77935.stgit@host1-mdrustad.localdomain>

On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
> This change eliminates an initialization-order hazard most
> recently seen when netprio_cgroup is built into the kernel.
> 
> With thanks to Eric Dumazet for catching a bug.
> 
> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
> ---
> 
>  net/core/dev.c           |    3 ++-
>  net/core/net_namespace.c |    4 +++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0f28a9e..1cb0d8a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6283,7 +6283,8 @@ static struct hlist_head *netdev_create_hash(void)
>  /* Initialize per network namespace state */
>  static int __net_init netdev_init(struct net *net)
>  {
> -	INIT_LIST_HEAD(&net->dev_base_head);
> +	if (net != &init_net)
> +		INIT_LIST_HEAD(&net->dev_base_head);
>  
>  	net->dev_name_head = netdev_create_hash();
>  	if (net->dev_name_head == NULL)
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index dddbacb..42f1e1c 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
>  LIST_HEAD(net_namespace_list);
>  EXPORT_SYMBOL_GPL(net_namespace_list);
>  
> -struct net init_net;
> +struct net init_net = {
> +	.dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
> +};
>  EXPORT_SYMBOL(init_net);
>  
>  #define INITIAL_NET_GEN_PTRS	13 /* +1 for len +2 for rcu_head */
> 
> 

I think dave was going to take John Fastabends patch from earlier today, but
this works just as well.  Long term I'm going to look into delaying
initzlization for cgroups, as it creates a strange initialization state when you
have a module_init routine registered.
Neil

^ permalink raw reply

* Re: [PATCH] cxgb3: Set vlan_feature on net_device
From: Rick Jones @ 2012-07-18 20:12 UTC (permalink / raw)
  To: brenohl@br.ibm.com; +Cc: divy@chelsio.com, netdev@vger.kernel.org
In-Reply-To: <1342639748-16276-1-git-send-email-brenohl@br.ibm.com>

On 07/18/2012 12:29 PM, brenohl@br.ibm.com wrote:
> cxgb3 interface has a bad performance when VLAN is set. On my current
> setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
> (8 instances, 4k message).
> With this patch, I am able to reach 9.5 Gbps.
Getting service demand out of an aggregate netperf test is a chore, but 
reporting the change in CPU utilization should be pretty 
straightforward.   Since you ended-up being constrained by link-rate, 
showing the CPU utilization change (and calculating service demand 
manually if you feel up to it) may help show the change has an even 
greater effect then (9.5-7)/7 or 35%.

What does the change do for latency and/or maximum,  min-sized packets 
per second.

rick jones
there is more to the network than just bits/s :)

>
> Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
> index abb6ce7..fcf4b31 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
> @@ -3173,6 +3173,9 @@ static void __devinit cxgb3_init_iscsi_mac(struct net_device *dev)
>   	pi->iscsic.mac_addr[3] |= 0x80;
>   }
>   
> +#define TSO_FLAGS (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
> +#define VLAN_FEAT (NETIF_F_SG | NETIF_F_IP_CSUM | TSO_FLAGS | \
> +			NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
>   static int __devinit init_one(struct pci_dev *pdev,
>   			      const struct pci_device_id *ent)
>   {
> @@ -3293,6 +3296,7 @@ static int __devinit init_one(struct pci_dev *pdev,
>   		netdev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM |
>   			NETIF_F_TSO | NETIF_F_RXCSUM | NETIF_F_HW_VLAN_RX;
>   		netdev->features |= netdev->hw_features | NETIF_F_HW_VLAN_TX;
> +		netdev->vlan_features |= netdev->features & VLAN_FEAT;
>   		if (pci_using_dac)
>   			netdev->features |= NETIF_F_HIGHDMA;
>   

^ permalink raw reply

* Re: [PATCH 1/7] net-tcp: Fast Open base
From: David Miller @ 2012-07-18 20:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ycheng, hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <1342641349.2626.3555.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 18 Jul 2012 21:55:49 +0200

> So the plan would be :
> 
> 1) Use the experimental 254 + magic on TFO Clients/Servers in 2012
> 
> 2) When/If IANA grants an official number, add its support to servers
>    (keeping support for experimental option as well)
> 
> 3) One/two years later, switch client side to use this official number
> 
> 4) Ten years later, remove experimental from server side.

Fair enough.

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: David Miller @ 2012-07-18 20:20 UTC (permalink / raw)
  To: nhorman; +Cc: mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <20120718201149.GB22057@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 18 Jul 2012 16:11:49 -0400

> On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
>> This change eliminates an initialization-order hazard most
>> recently seen when netprio_cgroup is built into the kernel.
>> 
>> With thanks to Eric Dumazet for catching a bug.
>> 
>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
 ...
> I think dave was going to take John Fastabends patch from earlier today, but
> this works just as well.  Long term I'm going to look into delaying
> initzlization for cgroups, as it creates a strange initialization state when you
> have a module_init routine registered.

Neil, any particular preference between John's and Mark's version
of the fix?

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: Neil Horman @ 2012-07-18 20:21 UTC (permalink / raw)
  To: David Miller; +Cc: mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <20120718.132010.1765790775051953381.davem@davemloft.net>

On Wed, Jul 18, 2012 at 01:20:10PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Wed, 18 Jul 2012 16:11:49 -0400
> 
> > On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
> >> This change eliminates an initialization-order hazard most
> >> recently seen when netprio_cgroup is built into the kernel.
> >> 
> >> With thanks to Eric Dumazet for catching a bug.
> >> 
> >> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>  ...
> > I think dave was going to take John Fastabends patch from earlier today, but
> > this works just as well.  Long term I'm going to look into delaying
> > initzlization for cgroups, as it creates a strange initialization state when you
> > have a module_init routine registered.
> 
> Neil, any particular preference between John's and Mark's version
> of the fix?
> 
I think they're both perfectly good.  If I had to choose I'd say Marks, just
because its done by initializing data, rather than adding more code to run every
time we create a cgroup.

Neil

^ permalink raw reply

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
From: Francois Romieu @ 2012-07-18 20:12 UTC (permalink / raw)
  To: David Miller; +Cc: hayeswang, eric.dumazet, netdev
In-Reply-To: <20120718.092346.1263036873056516097.davem@davemloft.net>

David Miller <davem@davemloft.net> :
> From: hayeswang <hayeswang@realtek.com>
> > Francois Romieu [mailto:romieu@fr.zoreil.com] 
> > [...]
> > 
> >> Hayes, should we not add into the kernel driver something similar to
> >> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
> >> 8168 driver ?
> >> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
> > 
> > For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
> > with the length less than 60 bytes. The hardware should pad this kind of packet
> > to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
> > 60 bytes. However, the hw checksum would be incorrect for the modified packet,
> > so the software checksum is necessary.
> 
> I wonder how the hardware checksum can be incorrectly calculated if the padding
> is done with zeros?

A part of the apparent problem may stem from the fact that Realtek's 8168
driver claims a modified length but it does not really skb_padto... 

Hayes, would the patch below fix the original problem ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index be4e00f..a463697 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5740,7 +5740,7 @@ err_out:
 	return -EIO;
 }
 
-static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
+static inline bool rtl8169_tso_csum(struct rtl8169_private *tp,
 				    struct sk_buff *skb, u32 *opts)
 {
 	const struct rtl_tx_desc_info *info = tx_desc_info + tp->txd_version;
@@ -5753,6 +5753,12 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
+		if (unlikely(skb->len < 60 &&
+		    (tp->mac_version == RTL_GIGA_MAC_VER_34) &&
+		    skb_padto(skb, ETH_ZLEN))) {
+			return false;
+		}
+
 		if (ip->protocol == IPPROTO_TCP)
 			opts[offset] |= info->checksum.tcp;
 		else if (ip->protocol == IPPROTO_UDP)
@@ -5760,6 +5766,7 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 		else
 			WARN_ON_ONCE(1);
 	}
+	return true;
 }
 
 static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
@@ -5797,7 +5804,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
 	opts[0] = DescOwn;
 
-	rtl8169_tso_csum(tp, skb, opts);
+	if (!rtl8169_tso_csum(tp, skb, opts))
+		goto err_update_stats;
 
 	frags = rtl8169_xmit_frags(tp, skb, opts);
 	if (frags < 0)
@@ -5853,6 +5861,7 @@ err_dma_1:
 	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
 err_dma_0:
 	dev_kfree_skb(skb);
+err_update_stats:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 

^ permalink raw reply related

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
From: David Miller @ 2012-07-18 20:28 UTC (permalink / raw)
  To: romieu; +Cc: hayeswang, eric.dumazet, netdev
In-Reply-To: <20120718201201.GC14149@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 18 Jul 2012 22:12:01 +0200

> David Miller <davem@davemloft.net> :
>> From: hayeswang <hayeswang@realtek.com>
>> > Francois Romieu [mailto:romieu@fr.zoreil.com] 
>> > [...]
>> > 
>> >> Hayes, should we not add into the kernel driver something similar to
>> >> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
>> >> 8168 driver ?
>> >> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
>> > 
>> > For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
>> > with the length less than 60 bytes. The hardware should pad this kind of packet
>> > to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
>> > 60 bytes. However, the hw checksum would be incorrect for the modified packet,
>> > so the software checksum is necessary.
>> 
>> I wonder how the hardware checksum can be incorrectly calculated if the padding
>> is done with zeros?
> 
> A part of the apparent problem may stem from the fact that Realtek's 8168
> driver claims a modified length but it does not really skb_padto... 
> 
> Hayes, would the patch below fix the original problem ?

A NETDEV_TX_OK return means we accepted the SKB, it doesn't look like
that's what you are doing in the skb_padto() failure path.

^ permalink raw reply

* Re: [PATCH v2] sctp: Implement quick failover draft from tsvwg
From: Joe Perches @ 2012-07-18 20:30 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	linux-sctp
In-Reply-To: <1342634466-17930-1-git-send-email-nhorman@tuxdriver.com>

On Wed, 2012-07-18 at 14:01 -0400, Neil Horman wrote:
> I've seen several attempts recently made to do quick failover of sctp transports
> by reducing various retransmit timers and counters.  While its possible to
> implement a faster failover on multihomed sctp associations, its not
> particularly robust, in that it can lead to unneeded retransmits, as well as
> false connection failures due to intermittent latency on a network.

trivia:

> diff --git a/net/sctp/associola.c b/net/sctp/associola.c

> @@ -871,6 +885,10 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
>  		spc_state = SCTP_ADDR_UNREACHABLE;
>  		break;
>  
> +	case SCTP_TRANSPORT_PF:
> +		transport->state = SCTP_PF;
> +		ulp_notify = false;
> +		break;

nicer to add a newline here

>  	default:
>  		return;
>  	}
> @@ -878,12 +896,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
[]
> +	if (ulp_notify) {
> +		memset(&addr, 0, sizeof(struct sockaddr_storage));
> +		memcpy(&addr, &transport->ipaddr,
> +		       transport->af_specific->sockaddr_len);

Perhaps it's better to do the memcpy then the memset of the
space left instead.

		memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len);
		memset((char *)&addr) + transport->af_specific->sockaddr_len, 0,
		       sizeof(struct sockaddr_storage) - transport->af_specific->sockaddr_len);
		       

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: John Fastabend @ 2012-07-18 20:31 UTC (permalink / raw)
  To: Neil Horman, David Miller; +Cc: mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <20120718202159.GA30706@hmsreliant.think-freely.org>

On 7/18/2012 1:21 PM, Neil Horman wrote:
> On Wed, Jul 18, 2012 at 01:20:10PM -0700, David Miller wrote:
>> From: Neil Horman <nhorman@tuxdriver.com>
>> Date: Wed, 18 Jul 2012 16:11:49 -0400
>>
>>> On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
>>>> This change eliminates an initialization-order hazard most
>>>> recently seen when netprio_cgroup is built into the kernel.
>>>>
>>>> With thanks to Eric Dumazet for catching a bug.
>>>>
>>>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>>   ...
>>> I think dave was going to take John Fastabends patch from earlier today, but
>>> this works just as well.  Long term I'm going to look into delaying
>>> initzlization for cgroups, as it creates a strange initialization state when you
>>> have a module_init routine registered.
>>
>> Neil, any particular preference between John's and Mark's version
>> of the fix?
>>
> I think they're both perfectly good.  If I had to choose I'd say Marks, just
> because its done by initializing data, rather than adding more code to run every
> time we create a cgroup.
>
> Neil
>

Fine by me if we take this version instead.

^ permalink raw reply

* [net-next 0/9][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to ixgbevf & ixgbe.

The following are changes since commit ddbe503203855939946430e39bae58de11b70b69:
  ipv6: add ipv6_addr_hash() helper
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (8):
  ixgbevf: Do not rewind the Rx ring before bumping tail
  ixgbevf: Add netdev to ring structure
  ixgbevf: Consolidate Tx context descriptor creation code
  ixgbevf: Fix multiple issues in ixgbevf_get/set_ringparam
  ixgbe: Update configure virtualization to allow for multiple PF pools
  ixgbe: Add support for SR-IOV w/ DCB or RSS
  ixgbe: Retire RSS enabled and capable flags
  ixgbe: Cleanup holes in flags after removing several of them

Pascal Bouchareine (1):
  ixgbevf: fix VF untagging when 802.1 prio is set

 drivers/net/ethernet/intel/ixgbe/ixgbe.h          |   56 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |    4 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c      |  387 ++++++++++++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   90 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |   52 ++-
 drivers/net/ethernet/intel/ixgbevf/defines.h      |    1 +
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  159 ++++----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |    2 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  405 ++++++++++-----------
 9 files changed, 745 insertions(+), 411 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [net-next 1/9] ixgbevf: fix VF untagging when 802.1 prio is set
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Pascal Bouchareine, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Pascal Bouchareine <pascal@gandi.net>

We have had an issue when using ixgbe+ixgbevf and 802.1 VLAN tagging.

When attaching a VLAN to a VF, frames with a 802.1q priority appeared
untagged on the VF hence not reaching the VLAN, where frames with
priority 0 where tagged as expected and seen by the VLAN device.

This seems due to the way ixgbevf is looking up the full tag
(prio+cfi+vlan) against the adapter active_vlans, as a condition to mark
the skb tagged.

Signed-off-by: Pascal Bouchareine <pascal@gandi.net>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index c98cdf7..b88218c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -279,7 +279,7 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
 	bool is_vlan = (status & IXGBE_RXD_STAT_VP);
 	u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
 
-	if (is_vlan && test_bit(tag, adapter->active_vlans))
+	if (is_vlan && test_bit(tag & VLAN_VID_MASK, adapter->active_vlans))
 		__vlan_hwaccel_put_tag(skb, tag);
 
 	napi_gro_receive(&q_vector->napi, skb);
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 2/9] ixgbevf: Do not rewind the Rx ring before bumping tail
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

The driver is going back one step from its' previous location before
bumping tail. This is incorrect.  We should just be writing the value of
next_to_use into the tail register.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index b88218c..c27ce44 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -375,8 +375,6 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 no_buffers:
 	if (rx_ring->next_to_use != i) {
 		rx_ring->next_to_use = i;
-		if (i-- == 0)
-			i = (rx_ring->count - 1);
 
 		ixgbevf_release_rx_desc(&adapter->hw, rx_ring, i);
 	}
@@ -1240,9 +1238,8 @@ static void ixgbevf_configure(struct ixgbevf_adapter *adapter)
 	ixgbevf_configure_rx(adapter);
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		struct ixgbevf_ring *ring = &adapter->rx_ring[i];
-		ixgbevf_alloc_rx_buffers(adapter, ring, ring->count);
-		ring->next_to_use = ring->count - 1;
-		writel(ring->next_to_use, adapter->hw.hw_addr + ring->tail);
+		ixgbevf_alloc_rx_buffers(adapter, ring,
+					 IXGBE_DESC_UNUSED(ring));
 	}
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 3/9] ixgbevf: Add netdev to ring structure
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change adds the netdev to the ring structure.  This allows for a
quicker transition from ring to netdev without having to go from ring to
adapter to netdev.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |    6 +--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |    2 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   54 +++++++++------------
 3 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 15947c9..2c3b20ed 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -359,8 +359,7 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 		if (err) {
 			while (i) {
 				i--;
-				ixgbevf_free_tx_resources(adapter,
-							  &tx_ring[i]);
+				ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
 			}
 			goto err_tx_ring_setup;
 		}
@@ -374,8 +373,7 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 		if (err) {
 			while (i) {
 				i--;
-				ixgbevf_free_rx_resources(adapter,
-							  &rx_ring[i]);
+				ixgbevf_free_rx_resources(adapter, &rx_ring[i]);
 			}
 				goto err_rx_ring_setup;
 		}
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 1f13765..e167d1b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -56,6 +56,8 @@ struct ixgbevf_rx_buffer {
 
 struct ixgbevf_ring {
 	struct ixgbevf_ring *next;
+	struct net_device *netdev;
+	struct device *dev;
 	struct ixgbevf_adapter *adapter;  /* backlink */
 	void *desc;			/* descriptor ring memory */
 	dma_addr_t dma;			/* phys. address of descriptor ring */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index c27ce44..1c53e13 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -187,7 +187,6 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 				 struct ixgbevf_ring *tx_ring)
 {
 	struct ixgbevf_adapter *adapter = q_vector->adapter;
-	struct net_device *netdev = adapter->netdev;
 	union ixgbe_adv_tx_desc *tx_desc, *eop_desc;
 	struct ixgbevf_tx_buffer *tx_buffer_info;
 	unsigned int i, eop, count = 0;
@@ -241,15 +240,17 @@ cont_loop:
 	tx_ring->next_to_clean = i;
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(count && netif_carrier_ok(netdev) &&
+	if (unlikely(count && netif_carrier_ok(tx_ring->netdev) &&
 		     (IXGBE_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) &&
+		if (__netif_subqueue_stopped(tx_ring->netdev,
+					     tx_ring->queue_index) &&
 		    !test_bit(__IXGBEVF_DOWN, &adapter->state)) {
-			netif_wake_subqueue(netdev, tx_ring->queue_index);
+			netif_wake_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
 			++adapter->restart_queue;
 		}
 	}
@@ -292,12 +293,13 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
  * @skb: skb currently being received and modified
  **/
 static inline void ixgbevf_rx_checksum(struct ixgbevf_adapter *adapter,
+				       struct ixgbevf_ring *ring,
 				       u32 status_err, struct sk_buff *skb)
 {
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(adapter->netdev->features & NETIF_F_RXCSUM))
+	if (!(ring->netdev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -332,31 +334,21 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 	union ixgbe_adv_rx_desc *rx_desc;
 	struct ixgbevf_rx_buffer *bi;
 	struct sk_buff *skb;
-	unsigned int i;
-	unsigned int bufsz = rx_ring->rx_buf_len + NET_IP_ALIGN;
+	unsigned int i = rx_ring->next_to_use;
 
-	i = rx_ring->next_to_use;
 	bi = &rx_ring->rx_buffer_info[i];
 
 	while (cleaned_count--) {
 		rx_desc = IXGBEVF_RX_DESC(rx_ring, i);
 		skb = bi->skb;
 		if (!skb) {
-			skb = netdev_alloc_skb(adapter->netdev,
-							       bufsz);
-
+			skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+							rx_ring->rx_buf_len);
 			if (!skb) {
 				adapter->alloc_rx_buff_failed++;
 				goto no_buffers;
 			}
 
-			/*
-			 * Make buffer alignment 2 beyond a 16 byte boundary
-			 * this will result in a 16 byte aligned IP header after
-			 * the 14 byte MAC header is removed
-			 */
-			skb_reserve(skb, NET_IP_ALIGN);
-
 			bi->skb = skb;
 		}
 		if (!bi->dma) {
@@ -449,7 +441,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			goto next_desc;
 		}
 
-		ixgbevf_rx_checksum(adapter, staterr, skb);
+		ixgbevf_rx_checksum(adapter, rx_ring, staterr, skb);
 
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
@@ -464,7 +456,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			if (header_fixup_len < 14)
 				skb_push(skb, header_fixup_len);
 		}
-		skb->protocol = eth_type_trans(skb, adapter->netdev);
+		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
 		ixgbevf_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
 
@@ -1669,12 +1661,16 @@ static int ixgbevf_alloc_queues(struct ixgbevf_adapter *adapter)
 		adapter->tx_ring[i].count = adapter->tx_ring_count;
 		adapter->tx_ring[i].queue_index = i;
 		adapter->tx_ring[i].reg_idx = i;
+		adapter->tx_ring[i].dev = &adapter->pdev->dev;
+		adapter->tx_ring[i].netdev = adapter->netdev;
 	}
 
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		adapter->rx_ring[i].count = adapter->rx_ring_count;
 		adapter->rx_ring[i].queue_index = i;
 		adapter->rx_ring[i].reg_idx = i;
+		adapter->rx_ring[i].dev = &adapter->pdev->dev;
+		adapter->rx_ring[i].netdev = adapter->netdev;
 	}
 
 	return 0;
@@ -2721,12 +2717,11 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 	writel(i, adapter->hw.hw_addr + tx_ring->tail);
 }
 
-static int __ixgbevf_maybe_stop_tx(struct net_device *netdev,
-				   struct ixgbevf_ring *tx_ring, int size)
+static int __ixgbevf_maybe_stop_tx(struct ixgbevf_ring *tx_ring, int size)
 {
-	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
+	struct ixgbevf_adapter *adapter = netdev_priv(tx_ring->netdev);
 
-	netif_stop_subqueue(netdev, tx_ring->queue_index);
+	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -2738,17 +2733,16 @@ static int __ixgbevf_maybe_stop_tx(struct net_device *netdev,
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(netdev, tx_ring->queue_index);
+	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
 	++adapter->restart_queue;
 	return 0;
 }
 
-static int ixgbevf_maybe_stop_tx(struct net_device *netdev,
-				 struct ixgbevf_ring *tx_ring, int size)
+static int ixgbevf_maybe_stop_tx(struct ixgbevf_ring *tx_ring, int size)
 {
 	if (likely(IXGBE_DESC_UNUSED(tx_ring) >= size))
 		return 0;
-	return __ixgbevf_maybe_stop_tx(netdev, tx_ring, size);
+	return __ixgbevf_maybe_stop_tx(tx_ring, size);
 }
 
 static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
@@ -2779,7 +2773,7 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 #else
 	count += skb_shinfo(skb)->nr_frags;
 #endif
-	if (ixgbevf_maybe_stop_tx(netdev, tx_ring, count + 3)) {
+	if (ixgbevf_maybe_stop_tx(tx_ring, count + 3)) {
 		adapter->tx_busy++;
 		return NETDEV_TX_BUSY;
 	}
@@ -2810,7 +2804,7 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 			 ixgbevf_tx_map(adapter, tx_ring, skb, tx_flags, first),
 			 skb->len, hdr_len);
 
-	ixgbevf_maybe_stop_tx(netdev, tx_ring, DESC_NEEDED);
+	ixgbevf_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	return NETDEV_TX_OK;
 }
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 5/9] ixgbevf: Fix multiple issues in ixgbevf_get/set_ringparam
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

In ixgbevf_get_ringparam we could run into a NULL pointer dereference
if the rings were not allocated when we attempted the call.  To prevent
that we can just access the tx/rx_ring_count values instead of attempting
to access the rings to get the count.

This change corrects a memory leak and memory corruption in
ixgbevf_set_ringparam.

The memory leak was due to us not freeing the resources from the ring
before overwriting them.  This change corrects the memory leak by making
certain to call ixgbe_free_tx/rx_resources on the rings prior to freeing
them.

The memory corruption was because we were replacing the rings but not
updating the q_vectors.  It addresses the memory corruption by leaving the
rings in place and instead just copying the contents of the new rings into
the existing rings.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c |  153 ++++++++++++++------------
 1 file changed, 83 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 2c3b20ed..8f20704 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -284,13 +284,11 @@ static void ixgbevf_get_ringparam(struct net_device *netdev,
 				  struct ethtool_ringparam *ring)
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
-	struct ixgbevf_ring *tx_ring = adapter->tx_ring;
-	struct ixgbevf_ring *rx_ring = adapter->rx_ring;
 
 	ring->rx_max_pending = IXGBEVF_MAX_RXD;
 	ring->tx_max_pending = IXGBEVF_MAX_TXD;
-	ring->rx_pending = rx_ring->count;
-	ring->tx_pending = tx_ring->count;
+	ring->rx_pending = adapter->rx_ring_count;
+	ring->tx_pending = adapter->tx_ring_count;
 }
 
 static int ixgbevf_set_ringparam(struct net_device *netdev,
@@ -298,33 +296,28 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 	struct ixgbevf_ring *tx_ring = NULL, *rx_ring = NULL;
-	int i, err = 0;
 	u32 new_rx_count, new_tx_count;
+	int i, err = 0;
 
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
 		return -EINVAL;
 
-	new_rx_count = max(ring->rx_pending, (u32)IXGBEVF_MIN_RXD);
-	new_rx_count = min(new_rx_count, (u32)IXGBEVF_MAX_RXD);
-	new_rx_count = ALIGN(new_rx_count, IXGBE_REQ_RX_DESCRIPTOR_MULTIPLE);
-
-	new_tx_count = max(ring->tx_pending, (u32)IXGBEVF_MIN_TXD);
-	new_tx_count = min(new_tx_count, (u32)IXGBEVF_MAX_TXD);
+	new_tx_count = max_t(u32, ring->tx_pending, IXGBEVF_MIN_TXD);
+	new_tx_count = min_t(u32, new_tx_count, IXGBEVF_MAX_TXD);
 	new_tx_count = ALIGN(new_tx_count, IXGBE_REQ_TX_DESCRIPTOR_MULTIPLE);
 
-	if ((new_tx_count == adapter->tx_ring->count) &&
-	    (new_rx_count == adapter->rx_ring->count)) {
-		/* nothing to do */
+	new_rx_count = max_t(u32, ring->rx_pending, IXGBEVF_MIN_RXD);
+	new_rx_count = min_t(u32, new_rx_count, IXGBEVF_MAX_RXD);
+	new_rx_count = ALIGN(new_rx_count, IXGBE_REQ_RX_DESCRIPTOR_MULTIPLE);
+
+	/* if nothing to do return success */
+	if ((new_tx_count == adapter->tx_ring_count) &&
+	    (new_rx_count == adapter->rx_ring_count))
 		return 0;
-	}
 
 	while (test_and_set_bit(__IXGBEVF_RESETTING, &adapter->state))
-		msleep(1);
+		usleep_range(1000, 2000);
 
-	/*
-	 * If the adapter isn't up and running then just set the
-	 * new parameters and scurry for the exits.
-	 */
 	if (!netif_running(adapter->netdev)) {
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			adapter->tx_ring[i].count = new_tx_count;
@@ -335,78 +328,98 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 		goto clear_reset;
 	}
 
-	tx_ring = kcalloc(adapter->num_tx_queues,
-			  sizeof(struct ixgbevf_ring), GFP_KERNEL);
-	if (!tx_ring) {
-		err = -ENOMEM;
-		goto clear_reset;
-	}
-
-	rx_ring = kcalloc(adapter->num_rx_queues,
-			  sizeof(struct ixgbevf_ring), GFP_KERNEL);
-	if (!rx_ring) {
-		err = -ENOMEM;
-		goto err_rx_setup;
-	}
-
-	ixgbevf_down(adapter);
+	if (new_tx_count != adapter->tx_ring_count) {
+		tx_ring = vmalloc(adapter->num_tx_queues * sizeof(*tx_ring));
+		if (!tx_ring) {
+			err = -ENOMEM;
+			goto clear_reset;
+		}
 
-	memcpy(tx_ring, adapter->tx_ring,
-	       adapter->num_tx_queues * sizeof(struct ixgbevf_ring));
-	for (i = 0; i < adapter->num_tx_queues; i++) {
-		tx_ring[i].count = new_tx_count;
-		err = ixgbevf_setup_tx_resources(adapter, &tx_ring[i]);
-		if (err) {
+		for (i = 0; i < adapter->num_tx_queues; i++) {
+			/* clone ring and setup updated count */
+			tx_ring[i] = adapter->tx_ring[i];
+			tx_ring[i].count = new_tx_count;
+			err = ixgbevf_setup_tx_resources(adapter, &tx_ring[i]);
+			if (!err)
+				continue;
 			while (i) {
 				i--;
 				ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
 			}
-			goto err_tx_ring_setup;
+
+			vfree(tx_ring);
+			tx_ring = NULL;
+
+			goto clear_reset;
 		}
 	}
 
-	memcpy(rx_ring, adapter->rx_ring,
-	       adapter->num_rx_queues * sizeof(struct ixgbevf_ring));
-	for (i = 0; i < adapter->num_rx_queues; i++) {
-		rx_ring[i].count = new_rx_count;
-		err = ixgbevf_setup_rx_resources(adapter, &rx_ring[i]);
-		if (err) {
+	if (new_rx_count != adapter->rx_ring_count) {
+		rx_ring = vmalloc(adapter->num_rx_queues * sizeof(*rx_ring));
+		if (!rx_ring) {
+			err = -ENOMEM;
+			goto clear_reset;
+		}
+
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			/* clone ring and setup updated count */
+			rx_ring[i] = adapter->rx_ring[i];
+			rx_ring[i].count = new_rx_count;
+			err = ixgbevf_setup_rx_resources(adapter, &rx_ring[i]);
+			if (!err)
+				continue;
 			while (i) {
 				i--;
 				ixgbevf_free_rx_resources(adapter, &rx_ring[i]);
 			}
-				goto err_rx_ring_setup;
+
+			vfree(rx_ring);
+			rx_ring = NULL;
+
+			goto clear_reset;
 		}
 	}
 
-	/*
-	 * Only switch to new rings if all the prior allocations
-	 * and ring setups have succeeded.
-	 */
-	kfree(adapter->tx_ring);
-	adapter->tx_ring = tx_ring;
-	adapter->tx_ring_count = new_tx_count;
-
-	kfree(adapter->rx_ring);
-	adapter->rx_ring = rx_ring;
-	adapter->rx_ring_count = new_rx_count;
+	/* bring interface down to prepare for update */
+	ixgbevf_down(adapter);
 
-	/* success! */
-	ixgbevf_up(adapter);
+	/* Tx */
+	if (tx_ring) {
+		for (i = 0; i < adapter->num_tx_queues; i++) {
+			ixgbevf_free_tx_resources(adapter,
+						  &adapter->tx_ring[i]);
+			adapter->tx_ring[i] = tx_ring[i];
+		}
+		adapter->tx_ring_count = new_tx_count;
 
-	goto clear_reset;
+		vfree(tx_ring);
+		tx_ring = NULL;
+	}
 
-err_rx_ring_setup:
-	for(i = 0; i < adapter->num_tx_queues; i++)
-		ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
+	/* Rx */
+	if (rx_ring) {
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			ixgbevf_free_rx_resources(adapter,
+						  &adapter->rx_ring[i]);
+			adapter->rx_ring[i] = rx_ring[i];
+		}
+		adapter->rx_ring_count = new_rx_count;
 
-err_tx_ring_setup:
-	kfree(rx_ring);
+		vfree(rx_ring);
+		rx_ring = NULL;
+	}
 
-err_rx_setup:
-	kfree(tx_ring);
+	/* restore interface using new values */
+	ixgbevf_up(adapter);
 
 clear_reset:
+	/* free Tx resources if Rx error is encountered */
+	if (tx_ring) {
+		for (i = 0; i < adapter->num_tx_queues; i++)
+			ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
+		vfree(tx_ring);
+	}
+
 	clear_bit(__IXGBEVF_RESETTING, &adapter->state);
 	return err;
 }
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 4/9] ixgbevf: Consolidate Tx context descriptor creation code
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

There is a good bit of redundancy between the Tx checksum and segmentation
offloads.  In order to reduce some of this I am moving the code for
creating a context descriptor into a separate function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/defines.h      |    1 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  342 ++++++++++-----------
 2 files changed, 163 insertions(+), 180 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 10cede5..418af82 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -251,6 +251,7 @@ struct ixgbe_adv_tx_context_desc {
 #define IXGBE_ADVTXD_TUCMD_L4T_TCP   0x00000800  /* L4 Packet TYPE of TCP */
 #define IXGBE_ADVTXD_TUCMD_L4T_SCTP  0x00001000  /* L4 Packet TYPE of SCTP */
 #define IXGBE_ADVTXD_IDX_SHIFT  4 /* Adv desc Index shift */
+#define IXGBE_ADVTXD_CC		0x00000080 /* Check Context */
 #define IXGBE_ADVTXD_POPTS_SHIFT      8  /* Adv desc POPTS shift */
 #define IXGBE_ADVTXD_POPTS_IXSM (IXGBE_TXD_POPTS_IXSM << \
 				 IXGBE_ADVTXD_POPTS_SHIFT)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 1c53e13..ce81ce0 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -42,6 +42,7 @@
 #include <linux/in.h>
 #include <linux/ip.h>
 #include <linux/tcp.h>
+#include <linux/sctp.h>
 #include <linux/ipv6.h>
 #include <linux/slab.h>
 #include <net/checksum.h>
@@ -144,18 +145,18 @@ static void ixgbevf_set_ivar(struct ixgbevf_adapter *adapter, s8 direction,
 	}
 }
 
-static void ixgbevf_unmap_and_free_tx_resource(struct ixgbevf_adapter *adapter,
+static void ixgbevf_unmap_and_free_tx_resource(struct ixgbevf_ring *tx_ring,
 					       struct ixgbevf_tx_buffer
 					       *tx_buffer_info)
 {
 	if (tx_buffer_info->dma) {
 		if (tx_buffer_info->mapped_as_page)
-			dma_unmap_page(&adapter->pdev->dev,
+			dma_unmap_page(tx_ring->dev,
 				       tx_buffer_info->dma,
 				       tx_buffer_info->length,
 				       DMA_TO_DEVICE);
 		else
-			dma_unmap_single(&adapter->pdev->dev,
+			dma_unmap_single(tx_ring->dev,
 					 tx_buffer_info->dma,
 					 tx_buffer_info->length,
 					 DMA_TO_DEVICE);
@@ -222,7 +223,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 				total_bytes += bytecount;
 			}
 
-			ixgbevf_unmap_and_free_tx_resource(adapter,
+			ixgbevf_unmap_and_free_tx_resource(tx_ring,
 							   tx_buffer_info);
 
 			tx_desc->wb.status = 0;
@@ -1443,7 +1444,7 @@ static void ixgbevf_clean_tx_ring(struct ixgbevf_adapter *adapter,
 
 	for (i = 0; i < tx_ring->count; i++) {
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		ixgbevf_unmap_and_free_tx_resource(adapter, tx_buffer_info);
+		ixgbevf_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
 	}
 
 	size = sizeof(struct ixgbevf_tx_buffer) * tx_ring->count;
@@ -2389,172 +2390,153 @@ static int ixgbevf_close(struct net_device *netdev)
 	return 0;
 }
 
-static int ixgbevf_tso(struct ixgbevf_adapter *adapter,
-		       struct ixgbevf_ring *tx_ring,
-		       struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
+static void ixgbevf_tx_ctxtdesc(struct ixgbevf_ring *tx_ring,
+				u32 vlan_macip_lens, u32 type_tucmd,
+				u32 mss_l4len_idx)
 {
 	struct ixgbe_adv_tx_context_desc *context_desc;
-	unsigned int i;
-	int err;
-	struct ixgbevf_tx_buffer *tx_buffer_info;
-	u32 vlan_macip_lens = 0, type_tucmd_mlhl;
-	u32 mss_l4len_idx, l4len;
+	u16 i = tx_ring->next_to_use;
 
-	if (skb_is_gso(skb)) {
-		if (skb_header_cloned(skb)) {
-			err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
-			if (err)
-				return err;
-		}
-		l4len = tcp_hdrlen(skb);
-		*hdr_len += l4len;
-
-		if (skb->protocol == htons(ETH_P_IP)) {
-			struct iphdr *iph = ip_hdr(skb);
-			iph->tot_len = 0;
-			iph->check = 0;
-			tcp_hdr(skb)->check = ~csum_tcpudp_magic(iph->saddr,
-								 iph->daddr, 0,
-								 IPPROTO_TCP,
-								 0);
-			adapter->hw_tso_ctxt++;
-		} else if (skb_is_gso_v6(skb)) {
-			ipv6_hdr(skb)->payload_len = 0;
-			tcp_hdr(skb)->check =
-			    ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
-					     &ipv6_hdr(skb)->daddr,
-					     0, IPPROTO_TCP, 0);
-			adapter->hw_tso6_ctxt++;
-		}
+	context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
 
-		i = tx_ring->next_to_use;
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
 
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
-
-		/* VLAN MACLEN IPLEN */
-		if (tx_flags & IXGBE_TX_FLAGS_VLAN)
-			vlan_macip_lens |=
-				(tx_flags & IXGBE_TX_FLAGS_VLAN_MASK);
-		vlan_macip_lens |= ((skb_network_offset(skb)) <<
-				    IXGBE_ADVTXD_MACLEN_SHIFT);
-		*hdr_len += skb_network_offset(skb);
-		vlan_macip_lens |=
-			(skb_transport_header(skb) - skb_network_header(skb));
-		*hdr_len +=
-			(skb_transport_header(skb) - skb_network_header(skb));
-		context_desc->vlan_macip_lens = cpu_to_le32(vlan_macip_lens);
-		context_desc->seqnum_seed = 0;
-
-		/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
-		type_tucmd_mlhl = (IXGBE_TXD_CMD_DEXT |
-				    IXGBE_ADVTXD_DTYP_CTXT);
-
-		if (skb->protocol == htons(ETH_P_IP))
-			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
-		context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd_mlhl);
-
-		/* MSS L4LEN IDX */
-		mss_l4len_idx =
-			(skb_shinfo(skb)->gso_size << IXGBE_ADVTXD_MSS_SHIFT);
-		mss_l4len_idx |= (l4len << IXGBE_ADVTXD_L4LEN_SHIFT);
-		/* use index 1 for TSO */
-		mss_l4len_idx |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
-		context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
-
-		tx_buffer_info->time_stamp = jiffies;
-		tx_buffer_info->next_to_watch = i;
+	/* set bits to identify this as an advanced context descriptor */
+	type_tucmd |= IXGBE_TXD_CMD_DEXT | IXGBE_ADVTXD_DTYP_CTXT;
 
-		i++;
-		if (i == tx_ring->count)
-			i = 0;
-		tx_ring->next_to_use = i;
+	context_desc->vlan_macip_lens	= cpu_to_le32(vlan_macip_lens);
+	context_desc->seqnum_seed	= 0;
+	context_desc->type_tucmd_mlhl	= cpu_to_le32(type_tucmd);
+	context_desc->mss_l4len_idx	= cpu_to_le32(mss_l4len_idx);
+}
+
+static int ixgbevf_tso(struct ixgbevf_ring *tx_ring,
+		       struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
+{
+	u32 vlan_macip_lens, type_tucmd;
+	u32 mss_l4len_idx, l4len;
+
+	if (!skb_is_gso(skb))
+		return 0;
 
-		return true;
+	if (skb_header_cloned(skb)) {
+		int err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+		if (err)
+			return err;
 	}
 
-	return false;
+	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
+	type_tucmd = IXGBE_ADVTXD_TUCMD_L4T_TCP;
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = ip_hdr(skb);
+		iph->tot_len = 0;
+		iph->check = 0;
+		tcp_hdr(skb)->check = ~csum_tcpudp_magic(iph->saddr,
+							 iph->daddr, 0,
+							 IPPROTO_TCP,
+							 0);
+		type_tucmd |= IXGBE_ADVTXD_TUCMD_IPV4;
+	} else if (skb_is_gso_v6(skb)) {
+		ipv6_hdr(skb)->payload_len = 0;
+		tcp_hdr(skb)->check =
+		    ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+				     &ipv6_hdr(skb)->daddr,
+				     0, IPPROTO_TCP, 0);
+	}
+
+	/* compute header lengths */
+	l4len = tcp_hdrlen(skb);
+	*hdr_len += l4len;
+	*hdr_len = skb_transport_offset(skb) + l4len;
+
+	/* mss_l4len_id: use 1 as index for TSO */
+	mss_l4len_idx = l4len << IXGBE_ADVTXD_L4LEN_SHIFT;
+	mss_l4len_idx |= skb_shinfo(skb)->gso_size << IXGBE_ADVTXD_MSS_SHIFT;
+	mss_l4len_idx |= 1 << IXGBE_ADVTXD_IDX_SHIFT;
+
+	/* vlan_macip_lens: HEADLEN, MACLEN, VLAN tag */
+	vlan_macip_lens = skb_network_header_len(skb);
+	vlan_macip_lens |= skb_network_offset(skb) << IXGBE_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= tx_flags & IXGBE_TX_FLAGS_VLAN_MASK;
+
+	ixgbevf_tx_ctxtdesc(tx_ring, vlan_macip_lens,
+			    type_tucmd, mss_l4len_idx);
+
+	return 1;
 }
 
-static bool ixgbevf_tx_csum(struct ixgbevf_adapter *adapter,
-			    struct ixgbevf_ring *tx_ring,
+static bool ixgbevf_tx_csum(struct ixgbevf_ring *tx_ring,
 			    struct sk_buff *skb, u32 tx_flags)
 {
-	struct ixgbe_adv_tx_context_desc *context_desc;
-	unsigned int i;
-	struct ixgbevf_tx_buffer *tx_buffer_info;
-	u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL ||
-	    (tx_flags & IXGBE_TX_FLAGS_VLAN)) {
-		i = tx_ring->next_to_use;
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
-
-		if (tx_flags & IXGBE_TX_FLAGS_VLAN)
-			vlan_macip_lens |= (tx_flags &
-					    IXGBE_TX_FLAGS_VLAN_MASK);
-		vlan_macip_lens |= (skb_network_offset(skb) <<
-				    IXGBE_ADVTXD_MACLEN_SHIFT);
-		if (skb->ip_summed == CHECKSUM_PARTIAL)
-			vlan_macip_lens |= (skb_transport_header(skb) -
-					    skb_network_header(skb));
-
-		context_desc->vlan_macip_lens = cpu_to_le32(vlan_macip_lens);
-		context_desc->seqnum_seed = 0;
-
-		type_tucmd_mlhl |= (IXGBE_TXD_CMD_DEXT |
-				    IXGBE_ADVTXD_DTYP_CTXT);
-
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			switch (skb->protocol) {
-			case __constant_htons(ETH_P_IP):
-				type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-				if (ip_hdr(skb)->protocol == IPPROTO_TCP)
-					type_tucmd_mlhl |=
-					    IXGBE_ADVTXD_TUCMD_L4T_TCP;
-				break;
-			case __constant_htons(ETH_P_IPV6):
-				/* XXX what about other V6 headers?? */
-				if (ipv6_hdr(skb)->nexthdr == IPPROTO_TCP)
-					type_tucmd_mlhl |=
-						IXGBE_ADVTXD_TUCMD_L4T_TCP;
-				break;
-			default:
-				if (unlikely(net_ratelimit())) {
-					pr_warn("partial checksum but "
-						"proto=%x!\n", skb->protocol);
-				}
-				break;
-			}
-		}
 
-		context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd_mlhl);
-		/* use index zero for tx checksum offload */
-		context_desc->mss_l4len_idx = 0;
 
-		tx_buffer_info->time_stamp = jiffies;
-		tx_buffer_info->next_to_watch = i;
+	u32 vlan_macip_lens = 0;
+	u32 mss_l4len_idx = 0;
+	u32 type_tucmd = 0;
 
-		adapter->hw_csum_tx_good++;
-		i++;
-		if (i == tx_ring->count)
-			i = 0;
-		tx_ring->next_to_use = i;
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		u8 l4_hdr = 0;
+		switch (skb->protocol) {
+		case __constant_htons(ETH_P_IP):
+			vlan_macip_lens |= skb_network_header_len(skb);
+			type_tucmd |= IXGBE_ADVTXD_TUCMD_IPV4;
+			l4_hdr = ip_hdr(skb)->protocol;
+			break;
+		case __constant_htons(ETH_P_IPV6):
+			vlan_macip_lens |= skb_network_header_len(skb);
+			l4_hdr = ipv6_hdr(skb)->nexthdr;
+			break;
+		default:
+			if (unlikely(net_ratelimit())) {
+				dev_warn(tx_ring->dev,
+				 "partial checksum but proto=%x!\n",
+				 skb->protocol);
+			}
+			break;
+		}
 
-		return true;
+		switch (l4_hdr) {
+		case IPPROTO_TCP:
+			type_tucmd |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
+			mss_l4len_idx = tcp_hdrlen(skb) <<
+					IXGBE_ADVTXD_L4LEN_SHIFT;
+			break;
+		case IPPROTO_SCTP:
+			type_tucmd |= IXGBE_ADVTXD_TUCMD_L4T_SCTP;
+			mss_l4len_idx = sizeof(struct sctphdr) <<
+					IXGBE_ADVTXD_L4LEN_SHIFT;
+			break;
+		case IPPROTO_UDP:
+			mss_l4len_idx = sizeof(struct udphdr) <<
+					IXGBE_ADVTXD_L4LEN_SHIFT;
+			break;
+		default:
+			if (unlikely(net_ratelimit())) {
+				dev_warn(tx_ring->dev,
+				 "partial checksum but l4 proto=%x!\n",
+				 l4_hdr);
+			}
+			break;
+		}
 	}
 
-	return false;
+	/* vlan_macip_lens: MACLEN, VLAN tag */
+	vlan_macip_lens |= skb_network_offset(skb) << IXGBE_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= tx_flags & IXGBE_TX_FLAGS_VLAN_MASK;
+
+	ixgbevf_tx_ctxtdesc(tx_ring, vlan_macip_lens,
+			    type_tucmd, mss_l4len_idx);
+
+	return (skb->ip_summed == CHECKSUM_PARTIAL);
 }
 
-static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
-			  struct ixgbevf_ring *tx_ring,
+static int ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 			  struct sk_buff *skb, u32 tx_flags,
 			  unsigned int first)
 {
-	struct pci_dev *pdev = adapter->pdev;
 	struct ixgbevf_tx_buffer *tx_buffer_info;
 	unsigned int len;
 	unsigned int total = skb->len;
@@ -2573,12 +2555,11 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 
 		tx_buffer_info->length = size;
 		tx_buffer_info->mapped_as_page = false;
-		tx_buffer_info->dma = dma_map_single(&adapter->pdev->dev,
+		tx_buffer_info->dma = dma_map_single(tx_ring->dev,
 						     skb->data + offset,
 						     size, DMA_TO_DEVICE);
-		if (dma_mapping_error(&pdev->dev, tx_buffer_info->dma))
+		if (dma_mapping_error(tx_ring->dev, tx_buffer_info->dma))
 			goto dma_error;
-		tx_buffer_info->time_stamp = jiffies;
 		tx_buffer_info->next_to_watch = i;
 
 		len -= size;
@@ -2603,12 +2584,12 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 
 			tx_buffer_info->length = size;
 			tx_buffer_info->dma =
-				skb_frag_dma_map(&adapter->pdev->dev, frag,
+				skb_frag_dma_map(tx_ring->dev, frag,
 						 offset, size, DMA_TO_DEVICE);
 			tx_buffer_info->mapped_as_page = true;
-			if (dma_mapping_error(&pdev->dev, tx_buffer_info->dma))
+			if (dma_mapping_error(tx_ring->dev,
+					      tx_buffer_info->dma))
 				goto dma_error;
-			tx_buffer_info->time_stamp = jiffies;
 			tx_buffer_info->next_to_watch = i;
 
 			len -= size;
@@ -2629,15 +2610,15 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 		i = i - 1;
 	tx_ring->tx_buffer_info[i].skb = skb;
 	tx_ring->tx_buffer_info[first].next_to_watch = i;
+	tx_ring->tx_buffer_info[first].time_stamp = jiffies;
 
 	return count;
 
 dma_error:
-	dev_err(&pdev->dev, "TX DMA map failed\n");
+	dev_err(tx_ring->dev, "TX DMA map failed\n");
 
 	/* clear timestamp and dma mappings for failed tx_buffer_info map */
 	tx_buffer_info->dma = 0;
-	tx_buffer_info->time_stamp = 0;
 	tx_buffer_info->next_to_watch = 0;
 	count--;
 
@@ -2648,14 +2629,13 @@ dma_error:
 		if (i < 0)
 			i += tx_ring->count;
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		ixgbevf_unmap_and_free_tx_resource(adapter, tx_buffer_info);
+		ixgbevf_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
 	}
 
 	return count;
 }
 
-static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
-			     struct ixgbevf_ring *tx_ring, int tx_flags,
+static void ixgbevf_tx_queue(struct ixgbevf_ring *tx_ring, int tx_flags,
 			     int count, u32 paylen, u8 hdr_len)
 {
 	union ixgbe_adv_tx_desc *tx_desc = NULL;
@@ -2672,21 +2652,24 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 	if (tx_flags & IXGBE_TX_FLAGS_VLAN)
 		cmd_type_len |= IXGBE_ADVTXD_DCMD_VLE;
 
+	if (tx_flags & IXGBE_TX_FLAGS_CSUM)
+		olinfo_status |= IXGBE_ADVTXD_POPTS_TXSM;
+
 	if (tx_flags & IXGBE_TX_FLAGS_TSO) {
 		cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
 
-		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
-			IXGBE_ADVTXD_POPTS_SHIFT;
-
 		/* use index 1 context for tso */
 		olinfo_status |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
 		if (tx_flags & IXGBE_TX_FLAGS_IPV4)
-			olinfo_status |= IXGBE_TXD_POPTS_IXSM <<
-				IXGBE_ADVTXD_POPTS_SHIFT;
+			olinfo_status |= IXGBE_ADVTXD_POPTS_IXSM;
+
+	}
 
-	} else if (tx_flags & IXGBE_TX_FLAGS_CSUM)
-		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
-			IXGBE_ADVTXD_POPTS_SHIFT;
+	/*
+	 * Check Context must be set if Tx switch is enabled, which it
+	 * always is for case where virtual functions are running
+	 */
+	olinfo_status |= IXGBE_ADVTXD_CC;
 
 	olinfo_status |= ((paylen - hdr_len) << IXGBE_ADVTXD_PAYLEN_SHIFT);
 
@@ -2705,16 +2688,7 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 
 	tx_desc->read.cmd_type_len |= cpu_to_le32(txd_cmd);
 
-	/*
-	 * Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
-
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
 }
 
 static int __ixgbevf_maybe_stop_tx(struct ixgbevf_ring *tx_ring, int size)
@@ -2788,21 +2762,29 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
 	if (skb->protocol == htons(ETH_P_IP))
 		tx_flags |= IXGBE_TX_FLAGS_IPV4;
-	tso = ixgbevf_tso(adapter, tx_ring, skb, tx_flags, &hdr_len);
+	tso = ixgbevf_tso(tx_ring, skb, tx_flags, &hdr_len);
 	if (tso < 0) {
 		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
 	}
 
 	if (tso)
-		tx_flags |= IXGBE_TX_FLAGS_TSO;
-	else if (ixgbevf_tx_csum(adapter, tx_ring, skb, tx_flags) &&
-		 (skb->ip_summed == CHECKSUM_PARTIAL))
+		tx_flags |= IXGBE_TX_FLAGS_TSO | IXGBE_TX_FLAGS_CSUM;
+	else if (ixgbevf_tx_csum(tx_ring, skb, tx_flags))
 		tx_flags |= IXGBE_TX_FLAGS_CSUM;
 
-	ixgbevf_tx_queue(adapter, tx_ring, tx_flags,
-			 ixgbevf_tx_map(adapter, tx_ring, skb, tx_flags, first),
+	ixgbevf_tx_queue(tx_ring, tx_flags,
+			 ixgbevf_tx_map(tx_ring, skb, tx_flags, first),
 			 skb->len, hdr_len);
+	/*
+	 * Force memory writes to complete before letting h/w
+	 * know there are new descriptors to fetch.  (Only
+	 * applicable for weak-ordered memory model archs,
+	 * such as IA-64).
+	 */
+	wmb();
+
+	writel(tx_ring->next_to_use, adapter->hw.hw_addr + tx_ring->tail);
 
 	ixgbevf_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 6/9] ixgbe: Update configure virtualization to allow for multiple PF pools
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change allows all pools from the default pool forward to be enabled vi
ixgbe_configure_virtualization.  This is needed as we are planning to use
queues belonging to adjacent pools for FCoE when SR-IOV and FCoE are both
enabled.

In addition this patch contains some minor formatting changes as there were
a few spots that seemed to be in need of some cleanup.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2b4b791..ea94fa2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3130,28 +3130,28 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 gcr_ext;
-	u32 vt_reg_bits;
 	u32 reg_offset, vf_shift;
-	u32 vmdctl;
+	u32 gcr_ext, vmdctl;
 	int i;
 
 	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
 		return;
 
 	vmdctl = IXGBE_READ_REG(hw, IXGBE_VT_CTL);
-	vt_reg_bits = IXGBE_VMD_CTL_VMDQ_EN | IXGBE_VT_CTL_REPLEN;
-	vt_reg_bits |= (adapter->num_vfs << IXGBE_VT_CTL_POOL_SHIFT);
-	IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl | vt_reg_bits);
+	vmdctl |= IXGBE_VMD_CTL_VMDQ_EN;
+	vmdctl &= ~IXGBE_VT_CTL_POOL_MASK;
+	vmdctl |= (adapter->num_vfs << IXGBE_VT_CTL_POOL_SHIFT);
+	vmdctl |= IXGBE_VT_CTL_REPLEN;
+	IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl);
 
 	vf_shift = adapter->num_vfs % 32;
 	reg_offset = (adapter->num_vfs >= 32) ? 1 : 0;
 
 	/* Enable only the PF's pool for Tx/Rx */
-	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), (1 << vf_shift));
-	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), 0);
-	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), (1 << vf_shift));
-	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), 0);
+	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), (~0) << vf_shift);
+	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), reg_offset - 1);
+	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), (~0) << vf_shift);
+	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), reg_offset - 1);
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 
 	/* Map PF MAC address in RAR Entry 0 to first pool following VFs */
@@ -3168,9 +3168,9 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 
 	/* enable Tx loopback for VF/PF communication */
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+
 	/* Enable MAC Anti-Spoofing */
-	hw->mac.ops.set_mac_anti_spoofing(hw,
-					   (adapter->num_vfs != 0),
+	hw->mac.ops.set_mac_anti_spoofing(hw, (adapter->num_vfs != 0),
 					  adapter->num_vfs);
 	/* For VFs that have spoof checking turned off */
 	for (i = 0; i < adapter->num_vfs; i++) {
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox