Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 08/15] ipv4: Kill routes during PMTU/redirect updates.
From: Joe Perches @ 2012-07-18 19:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120718.123015.476222169838022819.davem@davemloft.net>

On Wed, 2012-07-18 at 12:30 -0700, David Miller wrote:
> From: Joe Perches <joe@perches.com>
> > Perhaps struct dst_entry.obsolete could be a char instead of
> > a short and a pad byte could added for some future use.
> 
> First thing, char is not signed by default on all systems :-)

yeah, yeah. I'm sure you'll dtrt :)

^ permalink raw reply

* Re: [PATCH 1/7] net-tcp: Fast Open base
From: Eric Dumazet @ 2012-07-18 19:55 UTC (permalink / raw)
  To: David Miller; +Cc: ycheng, hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <20120716.231644.1189536600250332545.davem@davemloft.net>

On Mon, 2012-07-16 at 23:16 -0700, David Miller wrote:
> From: Yuchung Cheng <ycheng@google.com>
> Date: Mon, 16 Jul 2012 14:16:44 -0700
> 
> > +#define TCPOPT_EXP		254	/* Experimental */
> > +/* Magic number to be after the option value for sharing TCP
> > + * experimental options. See draft-ietf-tcpm-experimental-options-00.txt
> > + */
> > +#define TCPOPT_FASTOPEN_MAGIC	0xF989
> 
> If I apply this, we're stuck supporting this experimental number
> forever.
> 
> Because somewhere, someone will have a kernel running using this
> number, so we have to support this option value as well as whatever
> the official one is.
> 
> Therefore I think the only logical thing we can do is only deploy
> this once an official option number is choosen.

Hi David

This is a chicken and egg problem.

IANA wont grant an official number like that in 2012+. Maybe if billions
of Android/linux devices use TFO in 2015 IANA will grant an official
number.

So we chose to follow Joe touch proposal
(http://tools.ietf.org/html/draft-ietf-tcpm-experimental-options-01) and
the magic 0xF989 was generated according to section 3) to avoid possible
clashes with other experimental options using code option 254

(Code options 253 & 254 are reserved for experimental use.
Linux Cookie extension uses 253 without a magic cookie so 253 cannot be
shared. By the way I wonder if anybody uses it... oh well...) 

Only servers will need to cope with this experimental option plus the
official one (_if_ IANA accepts to unblock one of the many reserved
options, in two or three years)

Yuchung only posted the Client side in this patch series. But we already
run the server side, and supporting the official TFO option plus the
experimental one is adding less than 10 lines of code.

So the plan would be :

1) Use the experimental 254 + magic on TFO Clients/Servers in 2012

2) When/If IANA grants an official number, add its support to servers
   (keeping support for experimental option as well)

3) One/two years later, switch client side to use this official number

4) Ten years later, remove experimental from server side.

Thanks !

PS :

TFO is not mandatory : If the initial SYN TFO option is not understood
by a server, it will reply with a SYN/ACK without the option and cookie,
and client will proceed as today.

^ permalink raw reply

* Re: [PATCH] SUNRPC: Prevent kernel stack corruption on long values of flush
From: Jim Rees @ 2012-07-18 20:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Sasha Levin, Trond.Myklebust, davem, davej, linux-nfs, netdev,
	linux-kernel
In-Reply-To: <20120718173913.GA1298@fieldses.org>

J. Bruce Fields wrote:

  On Tue, Jul 17, 2012 at 12:01:26AM +0200, Sasha Levin wrote:
  > The buffer size in read_flush() is too small for the longest possible values
  > for it. This can lead to a kernel stack corruption:
  
  Thanks!
  
  > 
  > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
  > index 2afd2a8..f86d95e 100644
  > --- a/net/sunrpc/cache.c
  > +++ b/net/sunrpc/cache.c
  > @@ -1409,11 +1409,11 @@ static ssize_t read_flush(struct file *file, char __user *buf,
  >  			  size_t count, loff_t *ppos,
  >  			  struct cache_detail *cd)
  >  {
  > -	char tbuf[20];
  > +	char tbuf[22];
  
  I wonder how common this sort of calculation is in the kernel?  It might
  provide some peace of mind to be able to write this something like
  
  	char tbuf[MAXLEN_BASE10_UL + 2]  /* + 2 for final "\n\0" */

You could use something like:

    char tbuf[sizeof (unsigned long) * 24 / 10 + 1 + 2]; /* + 2 for final "\n\0" */

since there are roughly 10 bits for every 3 decimal digits.

But I'm obviously confused, because I don't understand why tbuf needs to be
any more than 10 + 2.

^ permalink raw reply

* [PATCH v3] ipv4: use seqlock for nh_exceptions
From: Julian Anastasov @ 2012-07-18 20:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

	Use global seqlock for the nh_exceptions. Call
fnhe_oldest with the right hash chain. Correct the diff
value for dst_set_expires.

v2: after suggestions from Eric Dumazet:
* get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
* continue daddr search in rt_bind_exception

v3:
* remove the daddr check before seqlock in rt_bind_exception
* restart lookup in rt_bind_exception on detected seqlock change,
as suggested by David Miller

Signed-off-by: Julian Anastasov <ja@ssi.bg>
---
 include/net/ip_fib.h |    2 +-
 net/ipv4/route.c     |  118 +++++++++++++++++++++++++++++---------------------
 2 files changed, 69 insertions(+), 51 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index e9ee1ca..2daf096 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -51,7 +51,7 @@ struct fib_nh_exception {
 	struct fib_nh_exception __rcu	*fnhe_next;
 	__be32				fnhe_daddr;
 	u32				fnhe_pmtu;
-	u32				fnhe_gw;
+	__be32				fnhe_gw;
 	unsigned long			fnhe_expires;
 	unsigned long			fnhe_stamp;
 };
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f67e702..e9802d8 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1333,9 +1333,9 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const struct sock *sk,
 		build_sk_flow_key(fl4, sk);
 }
 
-static DEFINE_SPINLOCK(fnhe_lock);
+static DEFINE_SEQLOCK(fnhe_seqlock);
 
-static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash, __be32 daddr)
+static struct fib_nh_exception *fnhe_oldest(struct fnhe_hash_bucket *hash)
 {
 	struct fib_nh_exception *fnhe, *oldest;
 
@@ -1358,47 +1358,63 @@ static inline u32 fnhe_hashfun(__be32 daddr)
 	return hval & (FNHE_HASH_SIZE - 1);
 }
 
-static struct fib_nh_exception *find_or_create_fnhe(struct fib_nh *nh, __be32 daddr)
+static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
+				  u32 pmtu, unsigned long expires)
 {
-	struct fnhe_hash_bucket *hash = nh->nh_exceptions;
+	struct fnhe_hash_bucket *hash;
 	struct fib_nh_exception *fnhe;
 	int depth;
-	u32 hval;
+	u32 hval = fnhe_hashfun(daddr);
+
+	write_seqlock_bh(&fnhe_seqlock);
 
+	hash = nh->nh_exceptions;
 	if (!hash) {
-		hash = nh->nh_exceptions = kzalloc(FNHE_HASH_SIZE * sizeof(*hash),
-						   GFP_ATOMIC);
+		hash = kzalloc(FNHE_HASH_SIZE * sizeof(*hash), GFP_ATOMIC);
 		if (!hash)
-			return NULL;
+			goto out_unlock;
+		nh->nh_exceptions = hash;
 	}
 
-	hval = fnhe_hashfun(daddr);
 	hash += hval;
 
 	depth = 0;
 	for (fnhe = rcu_dereference(hash->chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
 		if (fnhe->fnhe_daddr == daddr)
-			goto out;
+			break;
 		depth++;
 	}
 
-	if (depth > FNHE_RECLAIM_DEPTH) {
-		fnhe = fnhe_oldest(hash + hval, daddr);
-		goto out_daddr;
+	if (fnhe) {
+		if (gw)
+			fnhe->fnhe_gw = gw;
+		if (pmtu) {
+			fnhe->fnhe_pmtu = pmtu;
+			fnhe->fnhe_expires = expires;
+		}
+	} else {
+		if (depth > FNHE_RECLAIM_DEPTH)
+			fnhe = fnhe_oldest(hash);
+		else {
+			fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC);
+			if (!fnhe)
+				goto out_unlock;
+
+			fnhe->fnhe_next = hash->chain;
+			rcu_assign_pointer(hash->chain, fnhe);
+		}
+		fnhe->fnhe_daddr = daddr;
+		fnhe->fnhe_gw = gw;
+		fnhe->fnhe_pmtu = pmtu;
+		fnhe->fnhe_expires = expires;
 	}
-	fnhe = kzalloc(sizeof(*fnhe), GFP_ATOMIC);
-	if (!fnhe)
-		return NULL;
-
-	fnhe->fnhe_next = hash->chain;
-	rcu_assign_pointer(hash->chain, fnhe);
 
-out_daddr:
-	fnhe->fnhe_daddr = daddr;
-out:
 	fnhe->fnhe_stamp = jiffies;
-	return fnhe;
+
+out_unlock:
+	write_sequnlock_bh(&fnhe_seqlock);
+	return;
 }
 
 static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4 *fl4)
@@ -1452,13 +1468,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 		} else {
 			if (fib_lookup(net, fl4, &res) == 0) {
 				struct fib_nh *nh = &FIB_RES_NH(res);
-				struct fib_nh_exception *fnhe;
 
-				spin_lock_bh(&fnhe_lock);
-				fnhe = find_or_create_fnhe(nh, fl4->daddr);
-				if (fnhe)
-					fnhe->fnhe_gw = new_gw;
-				spin_unlock_bh(&fnhe_lock);
+				update_or_create_fnhe(nh, fl4->daddr, new_gw,
+						      0, 0);
 			}
 			rt->rt_gateway = new_gw;
 			rt->rt_flags |= RTCF_REDIRECTED;
@@ -1663,15 +1675,9 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 
 	if (fib_lookup(dev_net(rt->dst.dev), fl4, &res) == 0) {
 		struct fib_nh *nh = &FIB_RES_NH(res);
-		struct fib_nh_exception *fnhe;
 
-		spin_lock_bh(&fnhe_lock);
-		fnhe = find_or_create_fnhe(nh, fl4->daddr);
-		if (fnhe) {
-			fnhe->fnhe_pmtu = mtu;
-			fnhe->fnhe_expires = jiffies + ip_rt_mtu_expires;
-		}
-		spin_unlock_bh(&fnhe_lock);
+		update_or_create_fnhe(nh, fl4->daddr, 0, mtu,
+				      jiffies + ip_rt_mtu_expires);
 	}
 	rt->rt_pmtu = mtu;
 	dst_set_expires(&rt->dst, ip_rt_mtu_expires);
@@ -1902,23 +1908,35 @@ static void rt_bind_exception(struct rtable *rt, struct fib_nh *nh, __be32 daddr
 
 	hval = fnhe_hashfun(daddr);
 
+restart:
 	for (fnhe = rcu_dereference(hash[hval].chain); fnhe;
 	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
-		if (fnhe->fnhe_daddr == daddr) {
-			if (fnhe->fnhe_pmtu) {
-				unsigned long expires = fnhe->fnhe_expires;
-				unsigned long diff = jiffies - expires;
-
-				if (time_before(jiffies, expires)) {
-					rt->rt_pmtu = fnhe->fnhe_pmtu;
-					dst_set_expires(&rt->dst, diff);
-				}
+		__be32 fnhe_daddr, gw;
+		u32 pmtu;
+		unsigned long expires;
+		unsigned int seq;
+
+		seq = read_seqbegin(&fnhe_seqlock);
+		fnhe_daddr = fnhe->fnhe_daddr;
+		gw = fnhe->fnhe_gw;
+		pmtu = fnhe->fnhe_pmtu;
+		expires = fnhe->fnhe_expires;
+		if (read_seqretry(&fnhe_seqlock, seq))
+			goto restart;
+		if (daddr != fnhe_daddr)
+			continue;
+		if (pmtu) {
+			unsigned long diff = expires - jiffies;
+
+			if (time_before(jiffies, expires)) {
+				rt->rt_pmtu = pmtu;
+				dst_set_expires(&rt->dst, diff);
 			}
-			if (fnhe->fnhe_gw)
-				rt->rt_gateway = fnhe->fnhe_gw;
-			fnhe->fnhe_stamp = jiffies;
-			break;
 		}
+		if (gw)
+			rt->rt_gateway = gw;
+		fnhe->fnhe_stamp = jiffies;
+		break;
 	}
 }
 
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH v2] net: cgroup: null ptr dereference in netprio cgroup during init
From: Neil Horman @ 2012-07-18 20:10 UTC (permalink / raw)
  To: John Fastabend; +Cc: davem, gaofeng, mark.d.rustad, netdev, eric.dumazet
In-Reply-To: <20120718183408.27037.16130.stgit@jf-dev1-dcblab>

On Wed, Jul 18, 2012 at 11:34:09AM -0700, John Fastabend wrote:
> When the netprio cgroup is built in the kernel cgroup_init will call
> cgrp_create which eventually calls update_netdev_tables. This is
> being called before do_initcalls() so a null ptr dereference occurs
> on init_net.
> 
> This patch adds a check on init_net.count to verify the structure
> has been initialized. The failure was introduced here,
> 
> commit ef209f15980360f6945873df3cd710c5f62f2a3e
> Author: Gao feng <gaofeng@cn.fujitsu.com>
> Date:   Wed Jul 11 21:50:15 2012 +0000
> 
>     net: cgroup: fix access the unallocated memory in netprio cgroup
> 
> Tested with ping with netprio_cgroup as a module and built in.
> 
> [    0.256451] Initializing cgroup subsys net_prio
> [    0.269948] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000698
> [    0.293303] IP: [<ffffffff81512e37>] cgrp_create+0x107/0x1c0
> [    0.310175] PGD 0
> [    0.316157] Oops: 0000 [#1] SMP
> [    0.325775] CPU 0
> [    0.331227] Modules linked in:
> [    0.340846]
> [    0.345264] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7+ #1 AMD Dinar/Dinar
> [    0.366555] RIP: 0010:[<ffffffff81512e37>]  [<ffffffff81512e37>]
> cgrp_create+0x107/0x1c0
> [    0.390681] RSP: 0000:ffffffff81c01ea8  EFLAGS: 00010213
> [    0.406501] RAX: 0000000000000000 RBX: ffffffffffffff10 RCX: 0000000000000000
> [    0.427764] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff81c9d840
> [    0.449026] RBP: ffffffff81c01ed8 R08: 00000000000164e0 R09: 0000000000000000
> [    0.470289] R10: ffff8804278303c0 R11: 0000000000000000 R12: 0000000000000001
> [    0.491553] R13: ffff8804278303c0 R14: ffff881036fd0700 R15: 0000000000000000
> [    0.512819] FS:  0000000000000000(0000) GS:ffff880427c00000(0000)
> knlGS:0000000000000000
> [    0.536932] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    0.554049] CR2: 0000000000000698 CR3: 0000000001c0b000 CR4: 00000000000406b0
> [    0.575311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.596574] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    0.617838] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task
> ffffffff81c13420)
> [    0.642471] Stack:
> [    0.648442]  ffffffff81c01eb8 ffffffff81c9f320 ffffffff81c9f320
> ffffffff81c9f320
> [    0.670522]  ffffffff81c9f320 ffffffff81d482c0 ffffffff81c01ef8
> ffffffff81d10397
> [    0.692604]  ffffffff81e99790 0000000000000048 ffffffff81c01f18
> ffffffff81d1062e
> [    0.714687] Call Trace:
> [    0.721960]  [<ffffffff81d10397>] cgroup_init_subsys+0x51/0xdf
> [    0.739337]  [<ffffffff81d1062e>] cgroup_init+0x36/0x119
> [    0.755160]  [<ffffffff81cf5c02>] start_kernel+0x38f/0x3c4
> [    0.771501]  [<ffffffff81cf5672>] ? repair_env_string+0x5e/0x5e
> [    0.789138]  [<ffffffff81cf5356>] x86_64_start_reservations+0x131/0x135
> [    0.808849]  [<ffffffff81cf545a>] x86_64_start_kernel+0x100/0x10f
> 
> 
> Reported-by: Mark Rustad <mark.d.rustad@intel.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Gao feng <gaofeng@cn.fujitsu.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 
>  net/core/net_namespace.c  |    4 +++-
>  net/core/netprio_cgroup.c |    3 +++
>  2 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index dddbacb..faa33bb 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
>  LIST_HEAD(net_namespace_list);
>  EXPORT_SYMBOL_GPL(net_namespace_list);
>  
> -struct net init_net;
> +struct net init_net = {
> +	.count = ATOMIC_INIT(0),
> +};
>  EXPORT_SYMBOL(init_net);
>  
>  #define INITIAL_NET_GEN_PTRS	13 /* +1 for len +2 for rcu_head */
> diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
> index b2e9caa..e9fd7fd 100644
> --- a/net/core/netprio_cgroup.c
> +++ b/net/core/netprio_cgroup.c
> @@ -116,6 +116,9 @@ static int update_netdev_tables(void)
>  	u32 max_len;
>  	struct netprio_map *map;
>  
> +	if (!atomic_read(&init_net.count))
> +		return ret;
> +
>  	rtnl_lock();
>  	max_len = atomic_read(&max_prioidx) + 1;
>  	for_each_netdev(&init_net, dev) {
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: Neil Horman @ 2012-07-18 20:11 UTC (permalink / raw)
  To: Mark Rustad; +Cc: netdev, davem, gaofeng, eric.dumazet
In-Reply-To: <20120718190607.22923.77935.stgit@host1-mdrustad.localdomain>

On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
> This change eliminates an initialization-order hazard most
> recently seen when netprio_cgroup is built into the kernel.
> 
> With thanks to Eric Dumazet for catching a bug.
> 
> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
> ---
> 
>  net/core/dev.c           |    3 ++-
>  net/core/net_namespace.c |    4 +++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0f28a9e..1cb0d8a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6283,7 +6283,8 @@ static struct hlist_head *netdev_create_hash(void)
>  /* Initialize per network namespace state */
>  static int __net_init netdev_init(struct net *net)
>  {
> -	INIT_LIST_HEAD(&net->dev_base_head);
> +	if (net != &init_net)
> +		INIT_LIST_HEAD(&net->dev_base_head);
>  
>  	net->dev_name_head = netdev_create_hash();
>  	if (net->dev_name_head == NULL)
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index dddbacb..42f1e1c 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -27,7 +27,9 @@ static DEFINE_MUTEX(net_mutex);
>  LIST_HEAD(net_namespace_list);
>  EXPORT_SYMBOL_GPL(net_namespace_list);
>  
> -struct net init_net;
> +struct net init_net = {
> +	.dev_base_head = LIST_HEAD_INIT(init_net.dev_base_head),
> +};
>  EXPORT_SYMBOL(init_net);
>  
>  #define INITIAL_NET_GEN_PTRS	13 /* +1 for len +2 for rcu_head */
> 
> 

I think dave was going to take John Fastabends patch from earlier today, but
this works just as well.  Long term I'm going to look into delaying
initzlization for cgroups, as it creates a strange initialization state when you
have a module_init routine registered.
Neil

^ permalink raw reply

* Re: [PATCH] cxgb3: Set vlan_feature on net_device
From: Rick Jones @ 2012-07-18 20:12 UTC (permalink / raw)
  To: brenohl@br.ibm.com; +Cc: divy@chelsio.com, netdev@vger.kernel.org
In-Reply-To: <1342639748-16276-1-git-send-email-brenohl@br.ibm.com>

On 07/18/2012 12:29 PM, brenohl@br.ibm.com wrote:
> cxgb3 interface has a bad performance when VLAN is set. On my current
> setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
> (8 instances, 4k message).
> With this patch, I am able to reach 9.5 Gbps.
Getting service demand out of an aggregate netperf test is a chore, but 
reporting the change in CPU utilization should be pretty 
straightforward.   Since you ended-up being constrained by link-rate, 
showing the CPU utilization change (and calculating service demand 
manually if you feel up to it) may help show the change has an even 
greater effect then (9.5-7)/7 or 35%.

What does the change do for latency and/or maximum,  min-sized packets 
per second.

rick jones
there is more to the network than just bits/s :)

>
> Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
> index abb6ce7..fcf4b31 100644
> --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
> @@ -3173,6 +3173,9 @@ static void __devinit cxgb3_init_iscsi_mac(struct net_device *dev)
>   	pi->iscsic.mac_addr[3] |= 0x80;
>   }
>   
> +#define TSO_FLAGS (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
> +#define VLAN_FEAT (NETIF_F_SG | NETIF_F_IP_CSUM | TSO_FLAGS | \
> +			NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
>   static int __devinit init_one(struct pci_dev *pdev,
>   			      const struct pci_device_id *ent)
>   {
> @@ -3293,6 +3296,7 @@ static int __devinit init_one(struct pci_dev *pdev,
>   		netdev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM |
>   			NETIF_F_TSO | NETIF_F_RXCSUM | NETIF_F_HW_VLAN_RX;
>   		netdev->features |= netdev->hw_features | NETIF_F_HW_VLAN_TX;
> +		netdev->vlan_features |= netdev->features & VLAN_FEAT;
>   		if (pci_using_dac)
>   			netdev->features |= NETIF_F_HIGHDMA;
>   

^ permalink raw reply

* Re: [PATCH 1/7] net-tcp: Fast Open base
From: David Miller @ 2012-07-18 20:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ycheng, hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <1342641349.2626.3555.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 18 Jul 2012 21:55:49 +0200

> So the plan would be :
> 
> 1) Use the experimental 254 + magic on TFO Clients/Servers in 2012
> 
> 2) When/If IANA grants an official number, add its support to servers
>    (keeping support for experimental option as well)
> 
> 3) One/two years later, switch client side to use this official number
> 
> 4) Ten years later, remove experimental from server side.

Fair enough.

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: David Miller @ 2012-07-18 20:20 UTC (permalink / raw)
  To: nhorman; +Cc: mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <20120718201149.GB22057@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 18 Jul 2012 16:11:49 -0400

> On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
>> This change eliminates an initialization-order hazard most
>> recently seen when netprio_cgroup is built into the kernel.
>> 
>> With thanks to Eric Dumazet for catching a bug.
>> 
>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
 ...
> I think dave was going to take John Fastabends patch from earlier today, but
> this works just as well.  Long term I'm going to look into delaying
> initzlization for cgroups, as it creates a strange initialization state when you
> have a module_init routine registered.

Neil, any particular preference between John's and Mark's version
of the fix?

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: Neil Horman @ 2012-07-18 20:21 UTC (permalink / raw)
  To: David Miller; +Cc: mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <20120718.132010.1765790775051953381.davem@davemloft.net>

On Wed, Jul 18, 2012 at 01:20:10PM -0700, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Wed, 18 Jul 2012 16:11:49 -0400
> 
> > On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
> >> This change eliminates an initialization-order hazard most
> >> recently seen when netprio_cgroup is built into the kernel.
> >> 
> >> With thanks to Eric Dumazet for catching a bug.
> >> 
> >> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>  ...
> > I think dave was going to take John Fastabends patch from earlier today, but
> > this works just as well.  Long term I'm going to look into delaying
> > initzlization for cgroups, as it creates a strange initialization state when you
> > have a module_init routine registered.
> 
> Neil, any particular preference between John's and Mark's version
> of the fix?
> 
I think they're both perfectly good.  If I had to choose I'd say Marks, just
because its done by initializing data, rather than adding more code to run every
time we create a cgroup.

Neil

^ permalink raw reply

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
From: Francois Romieu @ 2012-07-18 20:12 UTC (permalink / raw)
  To: David Miller; +Cc: hayeswang, eric.dumazet, netdev
In-Reply-To: <20120718.092346.1263036873056516097.davem@davemloft.net>

David Miller <davem@davemloft.net> :
> From: hayeswang <hayeswang@realtek.com>
> > Francois Romieu [mailto:romieu@fr.zoreil.com] 
> > [...]
> > 
> >> Hayes, should we not add into the kernel driver something similar to
> >> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
> >> 8168 driver ?
> >> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
> > 
> > For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
> > with the length less than 60 bytes. The hardware should pad this kind of packet
> > to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
> > 60 bytes. However, the hw checksum would be incorrect for the modified packet,
> > so the software checksum is necessary.
> 
> I wonder how the hardware checksum can be incorrectly calculated if the padding
> is done with zeros?

A part of the apparent problem may stem from the fact that Realtek's 8168
driver claims a modified length but it does not really skb_padto... 

Hayes, would the patch below fix the original problem ?

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index be4e00f..a463697 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5740,7 +5740,7 @@ err_out:
 	return -EIO;
 }
 
-static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
+static inline bool rtl8169_tso_csum(struct rtl8169_private *tp,
 				    struct sk_buff *skb, u32 *opts)
 {
 	const struct rtl_tx_desc_info *info = tx_desc_info + tp->txd_version;
@@ -5753,6 +5753,12 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
+		if (unlikely(skb->len < 60 &&
+		    (tp->mac_version == RTL_GIGA_MAC_VER_34) &&
+		    skb_padto(skb, ETH_ZLEN))) {
+			return false;
+		}
+
 		if (ip->protocol == IPPROTO_TCP)
 			opts[offset] |= info->checksum.tcp;
 		else if (ip->protocol == IPPROTO_UDP)
@@ -5760,6 +5766,7 @@ static inline void rtl8169_tso_csum(struct rtl8169_private *tp,
 		else
 			WARN_ON_ONCE(1);
 	}
+	return true;
 }
 
 static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
@@ -5797,7 +5804,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	opts[1] = cpu_to_le32(rtl8169_tx_vlan_tag(tp, skb));
 	opts[0] = DescOwn;
 
-	rtl8169_tso_csum(tp, skb, opts);
+	if (!rtl8169_tso_csum(tp, skb, opts))
+		goto err_update_stats;
 
 	frags = rtl8169_xmit_frags(tp, skb, opts);
 	if (frags < 0)
@@ -5853,6 +5861,7 @@ err_dma_1:
 	rtl8169_unmap_tx_skb(d, tp->tx_skb + entry, txd);
 err_dma_0:
 	dev_kfree_skb(skb);
+err_update_stats:
 	dev->stats.tx_dropped++;
 	return NETDEV_TX_OK;
 

^ permalink raw reply related

* Re: [RFC] r8169 : why SG / TX checksum are default disabled
From: David Miller @ 2012-07-18 20:28 UTC (permalink / raw)
  To: romieu; +Cc: hayeswang, eric.dumazet, netdev
In-Reply-To: <20120718201201.GC14149@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Wed, 18 Jul 2012 22:12:01 +0200

> David Miller <davem@davemloft.net> :
>> From: hayeswang <hayeswang@realtek.com>
>> > Francois Romieu [mailto:romieu@fr.zoreil.com] 
>> > [...]
>> > 
>> >> Hayes, should we not add into the kernel driver something similar to
>> >> the rtl8168_start_xmit::skb_checksum_help stuff in Realtek's 
>> >> 8168 driver ?
>> >> There seems to be a bug for (skb->len < 60 && RTL_GIGA_MAC_VER_34.
>> > 
>> > For RTL8168E-VL (RTL_GIGA_MAC_VER_34), the hardware wouldn't send the packet
>> > with the length less than 60 bytes. The hardware should pad this kind of packet
>> > to 60 bytes, but it wouldn't. Therefore, the software has to pad the packet to
>> > 60 bytes. However, the hw checksum would be incorrect for the modified packet,
>> > so the software checksum is necessary.
>> 
>> I wonder how the hardware checksum can be incorrectly calculated if the padding
>> is done with zeros?
> 
> A part of the apparent problem may stem from the fact that Realtek's 8168
> driver claims a modified length but it does not really skb_padto... 
> 
> Hayes, would the patch below fix the original problem ?

A NETDEV_TX_OK return means we accepted the SKB, it doesn't look like
that's what you are doing in the skb_padto() failure path.

^ permalink raw reply

* Re: [PATCH v2] sctp: Implement quick failover draft from tsvwg
From: Joe Perches @ 2012-07-18 20:30 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	linux-sctp
In-Reply-To: <1342634466-17930-1-git-send-email-nhorman@tuxdriver.com>

On Wed, 2012-07-18 at 14:01 -0400, Neil Horman wrote:
> I've seen several attempts recently made to do quick failover of sctp transports
> by reducing various retransmit timers and counters.  While its possible to
> implement a faster failover on multihomed sctp associations, its not
> particularly robust, in that it can lead to unneeded retransmits, as well as
> false connection failures due to intermittent latency on a network.

trivia:

> diff --git a/net/sctp/associola.c b/net/sctp/associola.c

> @@ -871,6 +885,10 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
>  		spc_state = SCTP_ADDR_UNREACHABLE;
>  		break;
>  
> +	case SCTP_TRANSPORT_PF:
> +		transport->state = SCTP_PF;
> +		ulp_notify = false;
> +		break;

nicer to add a newline here

>  	default:
>  		return;
>  	}
> @@ -878,12 +896,15 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
[]
> +	if (ulp_notify) {
> +		memset(&addr, 0, sizeof(struct sockaddr_storage));
> +		memcpy(&addr, &transport->ipaddr,
> +		       transport->af_specific->sockaddr_len);

Perhaps it's better to do the memcpy then the memset of the
space left instead.

		memcpy(&addr, &transport->ipaddr, transport->af_specific->sockaddr_len);
		memset((char *)&addr) + transport->af_specific->sockaddr_len, 0,
		       sizeof(struct sockaddr_storage) - transport->af_specific->sockaddr_len);
		       

^ permalink raw reply

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: John Fastabend @ 2012-07-18 20:31 UTC (permalink / raw)
  To: Neil Horman, David Miller; +Cc: mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <20120718202159.GA30706@hmsreliant.think-freely.org>

On 7/18/2012 1:21 PM, Neil Horman wrote:
> On Wed, Jul 18, 2012 at 01:20:10PM -0700, David Miller wrote:
>> From: Neil Horman <nhorman@tuxdriver.com>
>> Date: Wed, 18 Jul 2012 16:11:49 -0400
>>
>>> On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
>>>> This change eliminates an initialization-order hazard most
>>>> recently seen when netprio_cgroup is built into the kernel.
>>>>
>>>> With thanks to Eric Dumazet for catching a bug.
>>>>
>>>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>>   ...
>>> I think dave was going to take John Fastabends patch from earlier today, but
>>> this works just as well.  Long term I'm going to look into delaying
>>> initzlization for cgroups, as it creates a strange initialization state when you
>>> have a module_init routine registered.
>>
>> Neil, any particular preference between John's and Mark's version
>> of the fix?
>>
> I think they're both perfectly good.  If I had to choose I'd say Marks, just
> because its done by initializing data, rather than adding more code to run every
> time we create a cgroup.
>
> Neil
>

Fine by me if we take this version instead.

^ permalink raw reply

* [net-next 0/9][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to ixgbevf & ixgbe.

The following are changes since commit ddbe503203855939946430e39bae58de11b70b69:
  ipv6: add ipv6_addr_hash() helper
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (8):
  ixgbevf: Do not rewind the Rx ring before bumping tail
  ixgbevf: Add netdev to ring structure
  ixgbevf: Consolidate Tx context descriptor creation code
  ixgbevf: Fix multiple issues in ixgbevf_get/set_ringparam
  ixgbe: Update configure virtualization to allow for multiple PF pools
  ixgbe: Add support for SR-IOV w/ DCB or RSS
  ixgbe: Retire RSS enabled and capable flags
  ixgbe: Cleanup holes in flags after removing several of them

Pascal Bouchareine (1):
  ixgbevf: fix VF untagging when 802.1 prio is set

 drivers/net/ethernet/intel/ixgbe/ixgbe.h          |   56 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |    4 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c      |  387 ++++++++++++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   90 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |   52 ++-
 drivers/net/ethernet/intel/ixgbevf/defines.h      |    1 +
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  159 ++++----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |    2 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  405 ++++++++++-----------
 9 files changed, 745 insertions(+), 411 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [net-next 1/9] ixgbevf: fix VF untagging when 802.1 prio is set
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Pascal Bouchareine, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Pascal Bouchareine <pascal@gandi.net>

We have had an issue when using ixgbe+ixgbevf and 802.1 VLAN tagging.

When attaching a VLAN to a VF, frames with a 802.1q priority appeared
untagged on the VF hence not reaching the VLAN, where frames with
priority 0 where tagged as expected and seen by the VLAN device.

This seems due to the way ixgbevf is looking up the full tag
(prio+cfi+vlan) against the adapter active_vlans, as a condition to mark
the skb tagged.

Signed-off-by: Pascal Bouchareine <pascal@gandi.net>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index c98cdf7..b88218c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -279,7 +279,7 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
 	bool is_vlan = (status & IXGBE_RXD_STAT_VP);
 	u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
 
-	if (is_vlan && test_bit(tag, adapter->active_vlans))
+	if (is_vlan && test_bit(tag & VLAN_VID_MASK, adapter->active_vlans))
 		__vlan_hwaccel_put_tag(skb, tag);
 
 	napi_gro_receive(&q_vector->napi, skb);
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 2/9] ixgbevf: Do not rewind the Rx ring before bumping tail
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

The driver is going back one step from its' previous location before
bumping tail. This is incorrect.  We should just be writing the value of
next_to_use into the tail register.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index b88218c..c27ce44 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -375,8 +375,6 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 no_buffers:
 	if (rx_ring->next_to_use != i) {
 		rx_ring->next_to_use = i;
-		if (i-- == 0)
-			i = (rx_ring->count - 1);
 
 		ixgbevf_release_rx_desc(&adapter->hw, rx_ring, i);
 	}
@@ -1240,9 +1238,8 @@ static void ixgbevf_configure(struct ixgbevf_adapter *adapter)
 	ixgbevf_configure_rx(adapter);
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		struct ixgbevf_ring *ring = &adapter->rx_ring[i];
-		ixgbevf_alloc_rx_buffers(adapter, ring, ring->count);
-		ring->next_to_use = ring->count - 1;
-		writel(ring->next_to_use, adapter->hw.hw_addr + ring->tail);
+		ixgbevf_alloc_rx_buffers(adapter, ring,
+					 IXGBE_DESC_UNUSED(ring));
 	}
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 3/9] ixgbevf: Add netdev to ring structure
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change adds the netdev to the ring structure.  This allows for a
quicker transition from ring to netdev without having to go from ring to
adapter to netdev.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |    6 +--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |    2 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   54 +++++++++------------
 3 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 15947c9..2c3b20ed 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -359,8 +359,7 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 		if (err) {
 			while (i) {
 				i--;
-				ixgbevf_free_tx_resources(adapter,
-							  &tx_ring[i]);
+				ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
 			}
 			goto err_tx_ring_setup;
 		}
@@ -374,8 +373,7 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 		if (err) {
 			while (i) {
 				i--;
-				ixgbevf_free_rx_resources(adapter,
-							  &rx_ring[i]);
+				ixgbevf_free_rx_resources(adapter, &rx_ring[i]);
 			}
 				goto err_rx_ring_setup;
 		}
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 1f13765..e167d1b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -56,6 +56,8 @@ struct ixgbevf_rx_buffer {
 
 struct ixgbevf_ring {
 	struct ixgbevf_ring *next;
+	struct net_device *netdev;
+	struct device *dev;
 	struct ixgbevf_adapter *adapter;  /* backlink */
 	void *desc;			/* descriptor ring memory */
 	dma_addr_t dma;			/* phys. address of descriptor ring */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index c27ce44..1c53e13 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -187,7 +187,6 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 				 struct ixgbevf_ring *tx_ring)
 {
 	struct ixgbevf_adapter *adapter = q_vector->adapter;
-	struct net_device *netdev = adapter->netdev;
 	union ixgbe_adv_tx_desc *tx_desc, *eop_desc;
 	struct ixgbevf_tx_buffer *tx_buffer_info;
 	unsigned int i, eop, count = 0;
@@ -241,15 +240,17 @@ cont_loop:
 	tx_ring->next_to_clean = i;
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(count && netif_carrier_ok(netdev) &&
+	if (unlikely(count && netif_carrier_ok(tx_ring->netdev) &&
 		     (IXGBE_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) &&
+		if (__netif_subqueue_stopped(tx_ring->netdev,
+					     tx_ring->queue_index) &&
 		    !test_bit(__IXGBEVF_DOWN, &adapter->state)) {
-			netif_wake_subqueue(netdev, tx_ring->queue_index);
+			netif_wake_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
 			++adapter->restart_queue;
 		}
 	}
@@ -292,12 +293,13 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
  * @skb: skb currently being received and modified
  **/
 static inline void ixgbevf_rx_checksum(struct ixgbevf_adapter *adapter,
+				       struct ixgbevf_ring *ring,
 				       u32 status_err, struct sk_buff *skb)
 {
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(adapter->netdev->features & NETIF_F_RXCSUM))
+	if (!(ring->netdev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -332,31 +334,21 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 	union ixgbe_adv_rx_desc *rx_desc;
 	struct ixgbevf_rx_buffer *bi;
 	struct sk_buff *skb;
-	unsigned int i;
-	unsigned int bufsz = rx_ring->rx_buf_len + NET_IP_ALIGN;
+	unsigned int i = rx_ring->next_to_use;
 
-	i = rx_ring->next_to_use;
 	bi = &rx_ring->rx_buffer_info[i];
 
 	while (cleaned_count--) {
 		rx_desc = IXGBEVF_RX_DESC(rx_ring, i);
 		skb = bi->skb;
 		if (!skb) {
-			skb = netdev_alloc_skb(adapter->netdev,
-							       bufsz);
-
+			skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+							rx_ring->rx_buf_len);
 			if (!skb) {
 				adapter->alloc_rx_buff_failed++;
 				goto no_buffers;
 			}
 
-			/*
-			 * Make buffer alignment 2 beyond a 16 byte boundary
-			 * this will result in a 16 byte aligned IP header after
-			 * the 14 byte MAC header is removed
-			 */
-			skb_reserve(skb, NET_IP_ALIGN);
-
 			bi->skb = skb;
 		}
 		if (!bi->dma) {
@@ -449,7 +441,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			goto next_desc;
 		}
 
-		ixgbevf_rx_checksum(adapter, staterr, skb);
+		ixgbevf_rx_checksum(adapter, rx_ring, staterr, skb);
 
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
@@ -464,7 +456,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			if (header_fixup_len < 14)
 				skb_push(skb, header_fixup_len);
 		}
-		skb->protocol = eth_type_trans(skb, adapter->netdev);
+		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
 		ixgbevf_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
 
@@ -1669,12 +1661,16 @@ static int ixgbevf_alloc_queues(struct ixgbevf_adapter *adapter)
 		adapter->tx_ring[i].count = adapter->tx_ring_count;
 		adapter->tx_ring[i].queue_index = i;
 		adapter->tx_ring[i].reg_idx = i;
+		adapter->tx_ring[i].dev = &adapter->pdev->dev;
+		adapter->tx_ring[i].netdev = adapter->netdev;
 	}
 
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		adapter->rx_ring[i].count = adapter->rx_ring_count;
 		adapter->rx_ring[i].queue_index = i;
 		adapter->rx_ring[i].reg_idx = i;
+		adapter->rx_ring[i].dev = &adapter->pdev->dev;
+		adapter->rx_ring[i].netdev = adapter->netdev;
 	}
 
 	return 0;
@@ -2721,12 +2717,11 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 	writel(i, adapter->hw.hw_addr + tx_ring->tail);
 }
 
-static int __ixgbevf_maybe_stop_tx(struct net_device *netdev,
-				   struct ixgbevf_ring *tx_ring, int size)
+static int __ixgbevf_maybe_stop_tx(struct ixgbevf_ring *tx_ring, int size)
 {
-	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
+	struct ixgbevf_adapter *adapter = netdev_priv(tx_ring->netdev);
 
-	netif_stop_subqueue(netdev, tx_ring->queue_index);
+	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -2738,17 +2733,16 @@ static int __ixgbevf_maybe_stop_tx(struct net_device *netdev,
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(netdev, tx_ring->queue_index);
+	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
 	++adapter->restart_queue;
 	return 0;
 }
 
-static int ixgbevf_maybe_stop_tx(struct net_device *netdev,
-				 struct ixgbevf_ring *tx_ring, int size)
+static int ixgbevf_maybe_stop_tx(struct ixgbevf_ring *tx_ring, int size)
 {
 	if (likely(IXGBE_DESC_UNUSED(tx_ring) >= size))
 		return 0;
-	return __ixgbevf_maybe_stop_tx(netdev, tx_ring, size);
+	return __ixgbevf_maybe_stop_tx(tx_ring, size);
 }
 
 static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
@@ -2779,7 +2773,7 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 #else
 	count += skb_shinfo(skb)->nr_frags;
 #endif
-	if (ixgbevf_maybe_stop_tx(netdev, tx_ring, count + 3)) {
+	if (ixgbevf_maybe_stop_tx(tx_ring, count + 3)) {
 		adapter->tx_busy++;
 		return NETDEV_TX_BUSY;
 	}
@@ -2810,7 +2804,7 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 			 ixgbevf_tx_map(adapter, tx_ring, skb, tx_flags, first),
 			 skb->len, hdr_len);
 
-	ixgbevf_maybe_stop_tx(netdev, tx_ring, DESC_NEEDED);
+	ixgbevf_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
 	return NETDEV_TX_OK;
 }
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 5/9] ixgbevf: Fix multiple issues in ixgbevf_get/set_ringparam
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

In ixgbevf_get_ringparam we could run into a NULL pointer dereference
if the rings were not allocated when we attempted the call.  To prevent
that we can just access the tx/rx_ring_count values instead of attempting
to access the rings to get the count.

This change corrects a memory leak and memory corruption in
ixgbevf_set_ringparam.

The memory leak was due to us not freeing the resources from the ring
before overwriting them.  This change corrects the memory leak by making
certain to call ixgbe_free_tx/rx_resources on the rings prior to freeing
them.

The memory corruption was because we were replacing the rings but not
updating the q_vectors.  It addresses the memory corruption by leaving the
rings in place and instead just copying the contents of the new rings into
the existing rings.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c |  153 ++++++++++++++------------
 1 file changed, 83 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 2c3b20ed..8f20704 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -284,13 +284,11 @@ static void ixgbevf_get_ringparam(struct net_device *netdev,
 				  struct ethtool_ringparam *ring)
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
-	struct ixgbevf_ring *tx_ring = adapter->tx_ring;
-	struct ixgbevf_ring *rx_ring = adapter->rx_ring;
 
 	ring->rx_max_pending = IXGBEVF_MAX_RXD;
 	ring->tx_max_pending = IXGBEVF_MAX_TXD;
-	ring->rx_pending = rx_ring->count;
-	ring->tx_pending = tx_ring->count;
+	ring->rx_pending = adapter->rx_ring_count;
+	ring->tx_pending = adapter->tx_ring_count;
 }
 
 static int ixgbevf_set_ringparam(struct net_device *netdev,
@@ -298,33 +296,28 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 {
 	struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 	struct ixgbevf_ring *tx_ring = NULL, *rx_ring = NULL;
-	int i, err = 0;
 	u32 new_rx_count, new_tx_count;
+	int i, err = 0;
 
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
 		return -EINVAL;
 
-	new_rx_count = max(ring->rx_pending, (u32)IXGBEVF_MIN_RXD);
-	new_rx_count = min(new_rx_count, (u32)IXGBEVF_MAX_RXD);
-	new_rx_count = ALIGN(new_rx_count, IXGBE_REQ_RX_DESCRIPTOR_MULTIPLE);
-
-	new_tx_count = max(ring->tx_pending, (u32)IXGBEVF_MIN_TXD);
-	new_tx_count = min(new_tx_count, (u32)IXGBEVF_MAX_TXD);
+	new_tx_count = max_t(u32, ring->tx_pending, IXGBEVF_MIN_TXD);
+	new_tx_count = min_t(u32, new_tx_count, IXGBEVF_MAX_TXD);
 	new_tx_count = ALIGN(new_tx_count, IXGBE_REQ_TX_DESCRIPTOR_MULTIPLE);
 
-	if ((new_tx_count == adapter->tx_ring->count) &&
-	    (new_rx_count == adapter->rx_ring->count)) {
-		/* nothing to do */
+	new_rx_count = max_t(u32, ring->rx_pending, IXGBEVF_MIN_RXD);
+	new_rx_count = min_t(u32, new_rx_count, IXGBEVF_MAX_RXD);
+	new_rx_count = ALIGN(new_rx_count, IXGBE_REQ_RX_DESCRIPTOR_MULTIPLE);
+
+	/* if nothing to do return success */
+	if ((new_tx_count == adapter->tx_ring_count) &&
+	    (new_rx_count == adapter->rx_ring_count))
 		return 0;
-	}
 
 	while (test_and_set_bit(__IXGBEVF_RESETTING, &adapter->state))
-		msleep(1);
+		usleep_range(1000, 2000);
 
-	/*
-	 * If the adapter isn't up and running then just set the
-	 * new parameters and scurry for the exits.
-	 */
 	if (!netif_running(adapter->netdev)) {
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			adapter->tx_ring[i].count = new_tx_count;
@@ -335,78 +328,98 @@ static int ixgbevf_set_ringparam(struct net_device *netdev,
 		goto clear_reset;
 	}
 
-	tx_ring = kcalloc(adapter->num_tx_queues,
-			  sizeof(struct ixgbevf_ring), GFP_KERNEL);
-	if (!tx_ring) {
-		err = -ENOMEM;
-		goto clear_reset;
-	}
-
-	rx_ring = kcalloc(adapter->num_rx_queues,
-			  sizeof(struct ixgbevf_ring), GFP_KERNEL);
-	if (!rx_ring) {
-		err = -ENOMEM;
-		goto err_rx_setup;
-	}
-
-	ixgbevf_down(adapter);
+	if (new_tx_count != adapter->tx_ring_count) {
+		tx_ring = vmalloc(adapter->num_tx_queues * sizeof(*tx_ring));
+		if (!tx_ring) {
+			err = -ENOMEM;
+			goto clear_reset;
+		}
 
-	memcpy(tx_ring, adapter->tx_ring,
-	       adapter->num_tx_queues * sizeof(struct ixgbevf_ring));
-	for (i = 0; i < adapter->num_tx_queues; i++) {
-		tx_ring[i].count = new_tx_count;
-		err = ixgbevf_setup_tx_resources(adapter, &tx_ring[i]);
-		if (err) {
+		for (i = 0; i < adapter->num_tx_queues; i++) {
+			/* clone ring and setup updated count */
+			tx_ring[i] = adapter->tx_ring[i];
+			tx_ring[i].count = new_tx_count;
+			err = ixgbevf_setup_tx_resources(adapter, &tx_ring[i]);
+			if (!err)
+				continue;
 			while (i) {
 				i--;
 				ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
 			}
-			goto err_tx_ring_setup;
+
+			vfree(tx_ring);
+			tx_ring = NULL;
+
+			goto clear_reset;
 		}
 	}
 
-	memcpy(rx_ring, adapter->rx_ring,
-	       adapter->num_rx_queues * sizeof(struct ixgbevf_ring));
-	for (i = 0; i < adapter->num_rx_queues; i++) {
-		rx_ring[i].count = new_rx_count;
-		err = ixgbevf_setup_rx_resources(adapter, &rx_ring[i]);
-		if (err) {
+	if (new_rx_count != adapter->rx_ring_count) {
+		rx_ring = vmalloc(adapter->num_rx_queues * sizeof(*rx_ring));
+		if (!rx_ring) {
+			err = -ENOMEM;
+			goto clear_reset;
+		}
+
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			/* clone ring and setup updated count */
+			rx_ring[i] = adapter->rx_ring[i];
+			rx_ring[i].count = new_rx_count;
+			err = ixgbevf_setup_rx_resources(adapter, &rx_ring[i]);
+			if (!err)
+				continue;
 			while (i) {
 				i--;
 				ixgbevf_free_rx_resources(adapter, &rx_ring[i]);
 			}
-				goto err_rx_ring_setup;
+
+			vfree(rx_ring);
+			rx_ring = NULL;
+
+			goto clear_reset;
 		}
 	}
 
-	/*
-	 * Only switch to new rings if all the prior allocations
-	 * and ring setups have succeeded.
-	 */
-	kfree(adapter->tx_ring);
-	adapter->tx_ring = tx_ring;
-	adapter->tx_ring_count = new_tx_count;
-
-	kfree(adapter->rx_ring);
-	adapter->rx_ring = rx_ring;
-	adapter->rx_ring_count = new_rx_count;
+	/* bring interface down to prepare for update */
+	ixgbevf_down(adapter);
 
-	/* success! */
-	ixgbevf_up(adapter);
+	/* Tx */
+	if (tx_ring) {
+		for (i = 0; i < adapter->num_tx_queues; i++) {
+			ixgbevf_free_tx_resources(adapter,
+						  &adapter->tx_ring[i]);
+			adapter->tx_ring[i] = tx_ring[i];
+		}
+		adapter->tx_ring_count = new_tx_count;
 
-	goto clear_reset;
+		vfree(tx_ring);
+		tx_ring = NULL;
+	}
 
-err_rx_ring_setup:
-	for(i = 0; i < adapter->num_tx_queues; i++)
-		ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
+	/* Rx */
+	if (rx_ring) {
+		for (i = 0; i < adapter->num_rx_queues; i++) {
+			ixgbevf_free_rx_resources(adapter,
+						  &adapter->rx_ring[i]);
+			adapter->rx_ring[i] = rx_ring[i];
+		}
+		adapter->rx_ring_count = new_rx_count;
 
-err_tx_ring_setup:
-	kfree(rx_ring);
+		vfree(rx_ring);
+		rx_ring = NULL;
+	}
 
-err_rx_setup:
-	kfree(tx_ring);
+	/* restore interface using new values */
+	ixgbevf_up(adapter);
 
 clear_reset:
+	/* free Tx resources if Rx error is encountered */
+	if (tx_ring) {
+		for (i = 0; i < adapter->num_tx_queues; i++)
+			ixgbevf_free_tx_resources(adapter, &tx_ring[i]);
+		vfree(tx_ring);
+	}
+
 	clear_bit(__IXGBEVF_RESETTING, &adapter->state);
 	return err;
 }
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 4/9] ixgbevf: Consolidate Tx context descriptor creation code
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

There is a good bit of redundancy between the Tx checksum and segmentation
offloads.  In order to reduce some of this I am moving the code for
creating a context descriptor into a separate function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/defines.h      |    1 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  342 ++++++++++-----------
 2 files changed, 163 insertions(+), 180 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 10cede5..418af82 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -251,6 +251,7 @@ struct ixgbe_adv_tx_context_desc {
 #define IXGBE_ADVTXD_TUCMD_L4T_TCP   0x00000800  /* L4 Packet TYPE of TCP */
 #define IXGBE_ADVTXD_TUCMD_L4T_SCTP  0x00001000  /* L4 Packet TYPE of SCTP */
 #define IXGBE_ADVTXD_IDX_SHIFT  4 /* Adv desc Index shift */
+#define IXGBE_ADVTXD_CC		0x00000080 /* Check Context */
 #define IXGBE_ADVTXD_POPTS_SHIFT      8  /* Adv desc POPTS shift */
 #define IXGBE_ADVTXD_POPTS_IXSM (IXGBE_TXD_POPTS_IXSM << \
 				 IXGBE_ADVTXD_POPTS_SHIFT)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 1c53e13..ce81ce0 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -42,6 +42,7 @@
 #include <linux/in.h>
 #include <linux/ip.h>
 #include <linux/tcp.h>
+#include <linux/sctp.h>
 #include <linux/ipv6.h>
 #include <linux/slab.h>
 #include <net/checksum.h>
@@ -144,18 +145,18 @@ static void ixgbevf_set_ivar(struct ixgbevf_adapter *adapter, s8 direction,
 	}
 }
 
-static void ixgbevf_unmap_and_free_tx_resource(struct ixgbevf_adapter *adapter,
+static void ixgbevf_unmap_and_free_tx_resource(struct ixgbevf_ring *tx_ring,
 					       struct ixgbevf_tx_buffer
 					       *tx_buffer_info)
 {
 	if (tx_buffer_info->dma) {
 		if (tx_buffer_info->mapped_as_page)
-			dma_unmap_page(&adapter->pdev->dev,
+			dma_unmap_page(tx_ring->dev,
 				       tx_buffer_info->dma,
 				       tx_buffer_info->length,
 				       DMA_TO_DEVICE);
 		else
-			dma_unmap_single(&adapter->pdev->dev,
+			dma_unmap_single(tx_ring->dev,
 					 tx_buffer_info->dma,
 					 tx_buffer_info->length,
 					 DMA_TO_DEVICE);
@@ -222,7 +223,7 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector,
 				total_bytes += bytecount;
 			}
 
-			ixgbevf_unmap_and_free_tx_resource(adapter,
+			ixgbevf_unmap_and_free_tx_resource(tx_ring,
 							   tx_buffer_info);
 
 			tx_desc->wb.status = 0;
@@ -1443,7 +1444,7 @@ static void ixgbevf_clean_tx_ring(struct ixgbevf_adapter *adapter,
 
 	for (i = 0; i < tx_ring->count; i++) {
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		ixgbevf_unmap_and_free_tx_resource(adapter, tx_buffer_info);
+		ixgbevf_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
 	}
 
 	size = sizeof(struct ixgbevf_tx_buffer) * tx_ring->count;
@@ -2389,172 +2390,153 @@ static int ixgbevf_close(struct net_device *netdev)
 	return 0;
 }
 
-static int ixgbevf_tso(struct ixgbevf_adapter *adapter,
-		       struct ixgbevf_ring *tx_ring,
-		       struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
+static void ixgbevf_tx_ctxtdesc(struct ixgbevf_ring *tx_ring,
+				u32 vlan_macip_lens, u32 type_tucmd,
+				u32 mss_l4len_idx)
 {
 	struct ixgbe_adv_tx_context_desc *context_desc;
-	unsigned int i;
-	int err;
-	struct ixgbevf_tx_buffer *tx_buffer_info;
-	u32 vlan_macip_lens = 0, type_tucmd_mlhl;
-	u32 mss_l4len_idx, l4len;
+	u16 i = tx_ring->next_to_use;
 
-	if (skb_is_gso(skb)) {
-		if (skb_header_cloned(skb)) {
-			err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
-			if (err)
-				return err;
-		}
-		l4len = tcp_hdrlen(skb);
-		*hdr_len += l4len;
-
-		if (skb->protocol == htons(ETH_P_IP)) {
-			struct iphdr *iph = ip_hdr(skb);
-			iph->tot_len = 0;
-			iph->check = 0;
-			tcp_hdr(skb)->check = ~csum_tcpudp_magic(iph->saddr,
-								 iph->daddr, 0,
-								 IPPROTO_TCP,
-								 0);
-			adapter->hw_tso_ctxt++;
-		} else if (skb_is_gso_v6(skb)) {
-			ipv6_hdr(skb)->payload_len = 0;
-			tcp_hdr(skb)->check =
-			    ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
-					     &ipv6_hdr(skb)->daddr,
-					     0, IPPROTO_TCP, 0);
-			adapter->hw_tso6_ctxt++;
-		}
+	context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
 
-		i = tx_ring->next_to_use;
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
 
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
-
-		/* VLAN MACLEN IPLEN */
-		if (tx_flags & IXGBE_TX_FLAGS_VLAN)
-			vlan_macip_lens |=
-				(tx_flags & IXGBE_TX_FLAGS_VLAN_MASK);
-		vlan_macip_lens |= ((skb_network_offset(skb)) <<
-				    IXGBE_ADVTXD_MACLEN_SHIFT);
-		*hdr_len += skb_network_offset(skb);
-		vlan_macip_lens |=
-			(skb_transport_header(skb) - skb_network_header(skb));
-		*hdr_len +=
-			(skb_transport_header(skb) - skb_network_header(skb));
-		context_desc->vlan_macip_lens = cpu_to_le32(vlan_macip_lens);
-		context_desc->seqnum_seed = 0;
-
-		/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
-		type_tucmd_mlhl = (IXGBE_TXD_CMD_DEXT |
-				    IXGBE_ADVTXD_DTYP_CTXT);
-
-		if (skb->protocol == htons(ETH_P_IP))
-			type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-		type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
-		context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd_mlhl);
-
-		/* MSS L4LEN IDX */
-		mss_l4len_idx =
-			(skb_shinfo(skb)->gso_size << IXGBE_ADVTXD_MSS_SHIFT);
-		mss_l4len_idx |= (l4len << IXGBE_ADVTXD_L4LEN_SHIFT);
-		/* use index 1 for TSO */
-		mss_l4len_idx |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
-		context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
-
-		tx_buffer_info->time_stamp = jiffies;
-		tx_buffer_info->next_to_watch = i;
+	/* set bits to identify this as an advanced context descriptor */
+	type_tucmd |= IXGBE_TXD_CMD_DEXT | IXGBE_ADVTXD_DTYP_CTXT;
 
-		i++;
-		if (i == tx_ring->count)
-			i = 0;
-		tx_ring->next_to_use = i;
+	context_desc->vlan_macip_lens	= cpu_to_le32(vlan_macip_lens);
+	context_desc->seqnum_seed	= 0;
+	context_desc->type_tucmd_mlhl	= cpu_to_le32(type_tucmd);
+	context_desc->mss_l4len_idx	= cpu_to_le32(mss_l4len_idx);
+}
+
+static int ixgbevf_tso(struct ixgbevf_ring *tx_ring,
+		       struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
+{
+	u32 vlan_macip_lens, type_tucmd;
+	u32 mss_l4len_idx, l4len;
+
+	if (!skb_is_gso(skb))
+		return 0;
 
-		return true;
+	if (skb_header_cloned(skb)) {
+		int err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+		if (err)
+			return err;
 	}
 
-	return false;
+	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
+	type_tucmd = IXGBE_ADVTXD_TUCMD_L4T_TCP;
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = ip_hdr(skb);
+		iph->tot_len = 0;
+		iph->check = 0;
+		tcp_hdr(skb)->check = ~csum_tcpudp_magic(iph->saddr,
+							 iph->daddr, 0,
+							 IPPROTO_TCP,
+							 0);
+		type_tucmd |= IXGBE_ADVTXD_TUCMD_IPV4;
+	} else if (skb_is_gso_v6(skb)) {
+		ipv6_hdr(skb)->payload_len = 0;
+		tcp_hdr(skb)->check =
+		    ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
+				     &ipv6_hdr(skb)->daddr,
+				     0, IPPROTO_TCP, 0);
+	}
+
+	/* compute header lengths */
+	l4len = tcp_hdrlen(skb);
+	*hdr_len += l4len;
+	*hdr_len = skb_transport_offset(skb) + l4len;
+
+	/* mss_l4len_id: use 1 as index for TSO */
+	mss_l4len_idx = l4len << IXGBE_ADVTXD_L4LEN_SHIFT;
+	mss_l4len_idx |= skb_shinfo(skb)->gso_size << IXGBE_ADVTXD_MSS_SHIFT;
+	mss_l4len_idx |= 1 << IXGBE_ADVTXD_IDX_SHIFT;
+
+	/* vlan_macip_lens: HEADLEN, MACLEN, VLAN tag */
+	vlan_macip_lens = skb_network_header_len(skb);
+	vlan_macip_lens |= skb_network_offset(skb) << IXGBE_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= tx_flags & IXGBE_TX_FLAGS_VLAN_MASK;
+
+	ixgbevf_tx_ctxtdesc(tx_ring, vlan_macip_lens,
+			    type_tucmd, mss_l4len_idx);
+
+	return 1;
 }
 
-static bool ixgbevf_tx_csum(struct ixgbevf_adapter *adapter,
-			    struct ixgbevf_ring *tx_ring,
+static bool ixgbevf_tx_csum(struct ixgbevf_ring *tx_ring,
 			    struct sk_buff *skb, u32 tx_flags)
 {
-	struct ixgbe_adv_tx_context_desc *context_desc;
-	unsigned int i;
-	struct ixgbevf_tx_buffer *tx_buffer_info;
-	u32 vlan_macip_lens = 0, type_tucmd_mlhl = 0;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL ||
-	    (tx_flags & IXGBE_TX_FLAGS_VLAN)) {
-		i = tx_ring->next_to_use;
-		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IXGBEVF_TX_CTXTDESC(tx_ring, i);
-
-		if (tx_flags & IXGBE_TX_FLAGS_VLAN)
-			vlan_macip_lens |= (tx_flags &
-					    IXGBE_TX_FLAGS_VLAN_MASK);
-		vlan_macip_lens |= (skb_network_offset(skb) <<
-				    IXGBE_ADVTXD_MACLEN_SHIFT);
-		if (skb->ip_summed == CHECKSUM_PARTIAL)
-			vlan_macip_lens |= (skb_transport_header(skb) -
-					    skb_network_header(skb));
-
-		context_desc->vlan_macip_lens = cpu_to_le32(vlan_macip_lens);
-		context_desc->seqnum_seed = 0;
-
-		type_tucmd_mlhl |= (IXGBE_TXD_CMD_DEXT |
-				    IXGBE_ADVTXD_DTYP_CTXT);
-
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			switch (skb->protocol) {
-			case __constant_htons(ETH_P_IP):
-				type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-				if (ip_hdr(skb)->protocol == IPPROTO_TCP)
-					type_tucmd_mlhl |=
-					    IXGBE_ADVTXD_TUCMD_L4T_TCP;
-				break;
-			case __constant_htons(ETH_P_IPV6):
-				/* XXX what about other V6 headers?? */
-				if (ipv6_hdr(skb)->nexthdr == IPPROTO_TCP)
-					type_tucmd_mlhl |=
-						IXGBE_ADVTXD_TUCMD_L4T_TCP;
-				break;
-			default:
-				if (unlikely(net_ratelimit())) {
-					pr_warn("partial checksum but "
-						"proto=%x!\n", skb->protocol);
-				}
-				break;
-			}
-		}
 
-		context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd_mlhl);
-		/* use index zero for tx checksum offload */
-		context_desc->mss_l4len_idx = 0;
 
-		tx_buffer_info->time_stamp = jiffies;
-		tx_buffer_info->next_to_watch = i;
+	u32 vlan_macip_lens = 0;
+	u32 mss_l4len_idx = 0;
+	u32 type_tucmd = 0;
 
-		adapter->hw_csum_tx_good++;
-		i++;
-		if (i == tx_ring->count)
-			i = 0;
-		tx_ring->next_to_use = i;
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		u8 l4_hdr = 0;
+		switch (skb->protocol) {
+		case __constant_htons(ETH_P_IP):
+			vlan_macip_lens |= skb_network_header_len(skb);
+			type_tucmd |= IXGBE_ADVTXD_TUCMD_IPV4;
+			l4_hdr = ip_hdr(skb)->protocol;
+			break;
+		case __constant_htons(ETH_P_IPV6):
+			vlan_macip_lens |= skb_network_header_len(skb);
+			l4_hdr = ipv6_hdr(skb)->nexthdr;
+			break;
+		default:
+			if (unlikely(net_ratelimit())) {
+				dev_warn(tx_ring->dev,
+				 "partial checksum but proto=%x!\n",
+				 skb->protocol);
+			}
+			break;
+		}
 
-		return true;
+		switch (l4_hdr) {
+		case IPPROTO_TCP:
+			type_tucmd |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
+			mss_l4len_idx = tcp_hdrlen(skb) <<
+					IXGBE_ADVTXD_L4LEN_SHIFT;
+			break;
+		case IPPROTO_SCTP:
+			type_tucmd |= IXGBE_ADVTXD_TUCMD_L4T_SCTP;
+			mss_l4len_idx = sizeof(struct sctphdr) <<
+					IXGBE_ADVTXD_L4LEN_SHIFT;
+			break;
+		case IPPROTO_UDP:
+			mss_l4len_idx = sizeof(struct udphdr) <<
+					IXGBE_ADVTXD_L4LEN_SHIFT;
+			break;
+		default:
+			if (unlikely(net_ratelimit())) {
+				dev_warn(tx_ring->dev,
+				 "partial checksum but l4 proto=%x!\n",
+				 l4_hdr);
+			}
+			break;
+		}
 	}
 
-	return false;
+	/* vlan_macip_lens: MACLEN, VLAN tag */
+	vlan_macip_lens |= skb_network_offset(skb) << IXGBE_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= tx_flags & IXGBE_TX_FLAGS_VLAN_MASK;
+
+	ixgbevf_tx_ctxtdesc(tx_ring, vlan_macip_lens,
+			    type_tucmd, mss_l4len_idx);
+
+	return (skb->ip_summed == CHECKSUM_PARTIAL);
 }
 
-static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
-			  struct ixgbevf_ring *tx_ring,
+static int ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 			  struct sk_buff *skb, u32 tx_flags,
 			  unsigned int first)
 {
-	struct pci_dev *pdev = adapter->pdev;
 	struct ixgbevf_tx_buffer *tx_buffer_info;
 	unsigned int len;
 	unsigned int total = skb->len;
@@ -2573,12 +2555,11 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 
 		tx_buffer_info->length = size;
 		tx_buffer_info->mapped_as_page = false;
-		tx_buffer_info->dma = dma_map_single(&adapter->pdev->dev,
+		tx_buffer_info->dma = dma_map_single(tx_ring->dev,
 						     skb->data + offset,
 						     size, DMA_TO_DEVICE);
-		if (dma_mapping_error(&pdev->dev, tx_buffer_info->dma))
+		if (dma_mapping_error(tx_ring->dev, tx_buffer_info->dma))
 			goto dma_error;
-		tx_buffer_info->time_stamp = jiffies;
 		tx_buffer_info->next_to_watch = i;
 
 		len -= size;
@@ -2603,12 +2584,12 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 
 			tx_buffer_info->length = size;
 			tx_buffer_info->dma =
-				skb_frag_dma_map(&adapter->pdev->dev, frag,
+				skb_frag_dma_map(tx_ring->dev, frag,
 						 offset, size, DMA_TO_DEVICE);
 			tx_buffer_info->mapped_as_page = true;
-			if (dma_mapping_error(&pdev->dev, tx_buffer_info->dma))
+			if (dma_mapping_error(tx_ring->dev,
+					      tx_buffer_info->dma))
 				goto dma_error;
-			tx_buffer_info->time_stamp = jiffies;
 			tx_buffer_info->next_to_watch = i;
 
 			len -= size;
@@ -2629,15 +2610,15 @@ static int ixgbevf_tx_map(struct ixgbevf_adapter *adapter,
 		i = i - 1;
 	tx_ring->tx_buffer_info[i].skb = skb;
 	tx_ring->tx_buffer_info[first].next_to_watch = i;
+	tx_ring->tx_buffer_info[first].time_stamp = jiffies;
 
 	return count;
 
 dma_error:
-	dev_err(&pdev->dev, "TX DMA map failed\n");
+	dev_err(tx_ring->dev, "TX DMA map failed\n");
 
 	/* clear timestamp and dma mappings for failed tx_buffer_info map */
 	tx_buffer_info->dma = 0;
-	tx_buffer_info->time_stamp = 0;
 	tx_buffer_info->next_to_watch = 0;
 	count--;
 
@@ -2648,14 +2629,13 @@ dma_error:
 		if (i < 0)
 			i += tx_ring->count;
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
-		ixgbevf_unmap_and_free_tx_resource(adapter, tx_buffer_info);
+		ixgbevf_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
 	}
 
 	return count;
 }
 
-static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
-			     struct ixgbevf_ring *tx_ring, int tx_flags,
+static void ixgbevf_tx_queue(struct ixgbevf_ring *tx_ring, int tx_flags,
 			     int count, u32 paylen, u8 hdr_len)
 {
 	union ixgbe_adv_tx_desc *tx_desc = NULL;
@@ -2672,21 +2652,24 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 	if (tx_flags & IXGBE_TX_FLAGS_VLAN)
 		cmd_type_len |= IXGBE_ADVTXD_DCMD_VLE;
 
+	if (tx_flags & IXGBE_TX_FLAGS_CSUM)
+		olinfo_status |= IXGBE_ADVTXD_POPTS_TXSM;
+
 	if (tx_flags & IXGBE_TX_FLAGS_TSO) {
 		cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
 
-		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
-			IXGBE_ADVTXD_POPTS_SHIFT;
-
 		/* use index 1 context for tso */
 		olinfo_status |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
 		if (tx_flags & IXGBE_TX_FLAGS_IPV4)
-			olinfo_status |= IXGBE_TXD_POPTS_IXSM <<
-				IXGBE_ADVTXD_POPTS_SHIFT;
+			olinfo_status |= IXGBE_ADVTXD_POPTS_IXSM;
+
+	}
 
-	} else if (tx_flags & IXGBE_TX_FLAGS_CSUM)
-		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
-			IXGBE_ADVTXD_POPTS_SHIFT;
+	/*
+	 * Check Context must be set if Tx switch is enabled, which it
+	 * always is for case where virtual functions are running
+	 */
+	olinfo_status |= IXGBE_ADVTXD_CC;
 
 	olinfo_status |= ((paylen - hdr_len) << IXGBE_ADVTXD_PAYLEN_SHIFT);
 
@@ -2705,16 +2688,7 @@ static void ixgbevf_tx_queue(struct ixgbevf_adapter *adapter,
 
 	tx_desc->read.cmd_type_len |= cpu_to_le32(txd_cmd);
 
-	/*
-	 * Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch.  (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
-	 */
-	wmb();
-
 	tx_ring->next_to_use = i;
-	writel(i, adapter->hw.hw_addr + tx_ring->tail);
 }
 
 static int __ixgbevf_maybe_stop_tx(struct ixgbevf_ring *tx_ring, int size)
@@ -2788,21 +2762,29 @@ static int ixgbevf_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 
 	if (skb->protocol == htons(ETH_P_IP))
 		tx_flags |= IXGBE_TX_FLAGS_IPV4;
-	tso = ixgbevf_tso(adapter, tx_ring, skb, tx_flags, &hdr_len);
+	tso = ixgbevf_tso(tx_ring, skb, tx_flags, &hdr_len);
 	if (tso < 0) {
 		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
 	}
 
 	if (tso)
-		tx_flags |= IXGBE_TX_FLAGS_TSO;
-	else if (ixgbevf_tx_csum(adapter, tx_ring, skb, tx_flags) &&
-		 (skb->ip_summed == CHECKSUM_PARTIAL))
+		tx_flags |= IXGBE_TX_FLAGS_TSO | IXGBE_TX_FLAGS_CSUM;
+	else if (ixgbevf_tx_csum(tx_ring, skb, tx_flags))
 		tx_flags |= IXGBE_TX_FLAGS_CSUM;
 
-	ixgbevf_tx_queue(adapter, tx_ring, tx_flags,
-			 ixgbevf_tx_map(adapter, tx_ring, skb, tx_flags, first),
+	ixgbevf_tx_queue(tx_ring, tx_flags,
+			 ixgbevf_tx_map(tx_ring, skb, tx_flags, first),
 			 skb->len, hdr_len);
+	/*
+	 * Force memory writes to complete before letting h/w
+	 * know there are new descriptors to fetch.  (Only
+	 * applicable for weak-ordered memory model archs,
+	 * such as IA-64).
+	 */
+	wmb();
+
+	writel(tx_ring->next_to_use, adapter->hw.hw_addr + tx_ring->tail);
 
 	ixgbevf_maybe_stop_tx(tx_ring, DESC_NEEDED);
 
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 6/9] ixgbe: Update configure virtualization to allow for multiple PF pools
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change allows all pools from the default pool forward to be enabled vi
ixgbe_configure_virtualization.  This is needed as we are planning to use
queues belonging to adjacent pools for FCoE when SR-IOV and FCoE are both
enabled.

In addition this patch contains some minor formatting changes as there were
a few spots that seemed to be in need of some cleanup.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2b4b791..ea94fa2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3130,28 +3130,28 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 gcr_ext;
-	u32 vt_reg_bits;
 	u32 reg_offset, vf_shift;
-	u32 vmdctl;
+	u32 gcr_ext, vmdctl;
 	int i;
 
 	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
 		return;
 
 	vmdctl = IXGBE_READ_REG(hw, IXGBE_VT_CTL);
-	vt_reg_bits = IXGBE_VMD_CTL_VMDQ_EN | IXGBE_VT_CTL_REPLEN;
-	vt_reg_bits |= (adapter->num_vfs << IXGBE_VT_CTL_POOL_SHIFT);
-	IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl | vt_reg_bits);
+	vmdctl |= IXGBE_VMD_CTL_VMDQ_EN;
+	vmdctl &= ~IXGBE_VT_CTL_POOL_MASK;
+	vmdctl |= (adapter->num_vfs << IXGBE_VT_CTL_POOL_SHIFT);
+	vmdctl |= IXGBE_VT_CTL_REPLEN;
+	IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl);
 
 	vf_shift = adapter->num_vfs % 32;
 	reg_offset = (adapter->num_vfs >= 32) ? 1 : 0;
 
 	/* Enable only the PF's pool for Tx/Rx */
-	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), (1 << vf_shift));
-	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), 0);
-	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), (1 << vf_shift));
-	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), 0);
+	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), (~0) << vf_shift);
+	IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), reg_offset - 1);
+	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), (~0) << vf_shift);
+	IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), reg_offset - 1);
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 
 	/* Map PF MAC address in RAR Entry 0 to first pool following VFs */
@@ -3168,9 +3168,9 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 
 	/* enable Tx loopback for VF/PF communication */
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+
 	/* Enable MAC Anti-Spoofing */
-	hw->mac.ops.set_mac_anti_spoofing(hw,
-					   (adapter->num_vfs != 0),
+	hw->mac.ops.set_mac_anti_spoofing(hw, (adapter->num_vfs != 0),
 					  adapter->num_vfs);
 	/* For VFs that have spoof checking turned off */
 	for (i = 0; i < adapter->num_vfs; i++) {
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 8/9] ixgbe: Retire RSS enabled and capable flags
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

All of our hardware supports RSS even if it is only for a single queue.  So
instead of toting around the RSS enable flag I am updating the code so that
all devices are enabled and if we want to disable RSS it is indicated via
the RSS mask.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    4 ---
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   10 +-------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   29 ++++++----------------
 4 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 67743aa..4ca10e6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -446,8 +446,6 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG_IMIR_ENABLED                 (u32)(1 << 12)
 #define IXGBE_FLAG_MQ_CAPABLE                   (u32)(1 << 13)
 #define IXGBE_FLAG_DCB_ENABLED                  (u32)(1 << 14)
-#define IXGBE_FLAG_RSS_ENABLED                  (u32)(1 << 16)
-#define IXGBE_FLAG_RSS_CAPABLE                  (u32)(1 << 17)
 #define IXGBE_FLAG_VMDQ_CAPABLE                 (u32)(1 << 18)
 #define IXGBE_FLAG_VMDQ_ENABLED                 (u32)(1 << 19)
 #define IXGBE_FLAG_FAN_FAIL_CAPABLE             (u32)(1 << 20)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 8e1be50..4104ea25 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2245,10 +2245,6 @@ static int ixgbe_get_rss_hash_opts(struct ixgbe_adapter *adapter,
 {
 	cmd->data = 0;
 
-	/* if RSS is disabled then report no hashing */
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
-		return 0;
-
 	/* Report default options for RSS on ixgbe */
 	switch (cmd->flow_type) {
 	case TCP_V4_FLOW:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 676e93f..38d1b65 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -265,9 +265,6 @@ static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
 {
 	int i;
 
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
-		return false;
-
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		adapter->rx_ring[i]->reg_idx = i;
 	for (i = 0; i < adapter->num_tx_queues; i++)
@@ -602,11 +599,6 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
 	struct ixgbe_ring_feature *f;
 	u16 rss_i;
 
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED)) {
-		adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
-		return false;
-	}
-
 	/* set mask for 16 queue limit of RSS */
 	f = &adapter->ring_feature[RING_F_RSS];
 	rss_i = f->limit;
@@ -1062,7 +1054,6 @@ static void ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 	}
 
 	adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
-	adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
 	if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) {
 		e_err(probe,
 		      "ATR is not supported while multiple "
@@ -1073,6 +1064,7 @@ static void ixgbe_set_interrupt_capability(struct ixgbe_adapter *adapter)
 	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
 		ixgbe_disable_sriov(adapter);
 
+	adapter->ring_feature[RING_F_RSS].limit = 1;
 	ixgbe_set_num_queues(adapter);
 	adapter->num_q_vectors = 1;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 454e556..a3dc965 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2891,9 +2891,6 @@ static void ixgbe_setup_mrqc(struct ixgbe_adapter *adapter)
 	int i, j;
 	u16 rss_i = adapter->ring_feature[RING_F_RSS].indices;
 
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
-		rss_i = 1;
-
 	/*
 	 * Program table for at least 2 queues w/ SR-IOV so that VFs can
 	 * make full use of any rings they may have.  We will use the
@@ -2923,7 +2920,7 @@ static void ixgbe_setup_mrqc(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_REG(hw, IXGBE_RXCSUM, rxcsum);
 
 	if (adapter->hw.mac.type == ixgbe_mac_82598EB) {
-		if (adapter->flags & IXGBE_FLAG_RSS_ENABLED)
+		if (adapter->ring_feature[RING_F_RSS].mask)
 			mrqc = IXGBE_MRQC_RSSEN;
 	} else {
 		u8 tcs = netdev_get_num_tc(adapter->netdev);
@@ -3102,6 +3099,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	int rss_i = adapter->ring_feature[RING_F_RSS].indices;
 	int p;
 
 	/* PSRTYPE must be initialized in non 82598 adapters */
@@ -3114,13 +3112,10 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	if (hw->mac.type == ixgbe_mac_82598EB)
 		return;
 
-	if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) {
-		int rss_i = adapter->ring_feature[RING_F_RSS].indices;
-		if (rss_i > 3)
-			psrtype |= 2 << 29;
-		else if (rss_i > 1)
-			psrtype |= 1 << 29;
-	}
+	if (rss_i > 3)
+		psrtype |= 2 << 29;
+	else if (rss_i > 1)
+		psrtype |= 1 << 29;
 
 	for (p = 0; p < adapter->num_rx_pools; p++)
 		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(adapter->num_vfs + p),
@@ -4408,7 +4403,6 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 	/* Set capability flags */
 	rss = min_t(int, IXGBE_MAX_RSS_INDICES, num_online_cpus());
 	adapter->ring_feature[RING_F_RSS].limit = rss;
-	adapter->flags |= IXGBE_FLAG_RSS_ENABLED;
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
 		if (hw->device_id == IXGBE_DEV_ID_82598AT)
@@ -6756,10 +6750,6 @@ static netdev_features_t ixgbe_fix_features(struct net_device *netdev,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-	/* return error if RXHASH is being enabled when RSS is not supported */
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED))
-		features &= ~NETIF_F_RXHASH;
-
 	/* If Rx checksum is disabled, then RSC/LRO should also be disabled */
 	if (!(features & NETIF_F_RXCSUM))
 		features &= ~NETIF_F_LRO;
@@ -6802,7 +6792,7 @@ static int ixgbe_set_features(struct net_device *netdev,
 	if (!(features & NETIF_F_NTUPLE)) {
 		if (adapter->flags & IXGBE_FLAG_FDIR_PERFECT_CAPABLE) {
 			/* turn off Flow Director, set ATR and reset */
-			if ((adapter->flags & IXGBE_FLAG_RSS_ENABLED) &&
+			if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) &&
 			    !(adapter->flags & IXGBE_FLAG_DCB_ENABLED))
 				adapter->flags |= IXGBE_FLAG_FDIR_HASH_CAPABLE;
 			need_reset = true;
@@ -7294,11 +7284,6 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 	if (err)
 		goto err_sw_init;
 
-	if (!(adapter->flags & IXGBE_FLAG_RSS_ENABLED)) {
-		netdev->hw_features &= ~NETIF_F_RXHASH;
-		netdev->features &= ~NETIF_F_RXHASH;
-	}
-
 	/* WOL not supported for all devices */
 	adapter->wol = 0;
 	hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 7/9] ixgbe: Add support for SR-IOV w/ DCB or RSS
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Greg Rose,
	John Fastabend, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change essentially makes it so that we can enable almost all of the
features all at once.  This patch allows for the combination of SR-IOV,
DCB, and FCoE in the case of the x540.  It also beefs up the SR-IOV by
adding support for RSS to the PF.

The testing matrix gets to be very complex for this patch as there are a
number of different features and subsets for queueing options.  I tried to
narrow these down a bit by restricting the PF to only supporting 4TC DCB
when it is enabled in addition to SR-IOV.

Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h       |    4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c   |  377 ++++++++++++++++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   37 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   52 +++-
 4 files changed, 423 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5a75a9c..67743aa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -284,6 +284,10 @@ struct ixgbe_ring_feature {
 	u16 offset;	/* offset to start of feature */
 } ____cacheline_internodealigned_in_smp;
 
+#define IXGBE_82599_VMDQ_8Q_MASK 0x78
+#define IXGBE_82599_VMDQ_4Q_MASK 0x7C
+#define IXGBE_82599_VMDQ_2Q_MASK 0x7E
+
 /*
  * FCoE requires that all Rx buffers be over 2200 bytes in length.  Since
  * this is twice the size of a half page we need to double the page order
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 4c3822f..676e93f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -29,6 +29,83 @@
 #include "ixgbe_sriov.h"
 
 #ifdef CONFIG_IXGBE_DCB
+/**
+ * ixgbe_cache_ring_dcb_sriov - Descriptor ring to register mapping for SR-IOV
+ * @adapter: board private structure to initialize
+ *
+ * Cache the descriptor ring offsets for SR-IOV to the assigned rings.  It
+ * will also try to cache the proper offsets if RSS/FCoE are enabled along
+ * with VMDq.
+ *
+ **/
+static bool ixgbe_cache_ring_dcb_sriov(struct ixgbe_adapter *adapter)
+{
+#ifdef IXGBE_FCOE
+	struct ixgbe_ring_feature *fcoe = &adapter->ring_feature[RING_F_FCOE];
+#endif /* IXGBE_FCOE */
+	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
+	int i;
+	u16 reg_idx;
+	u8 tcs = netdev_get_num_tc(adapter->netdev);
+
+	/* verify we have DCB queueing enabled before proceeding */
+	if (tcs <= 1)
+		return false;
+
+	/* verify we have VMDq enabled before proceeding */
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return false;
+
+	/* start at VMDq register offset for SR-IOV enabled setups */
+	reg_idx = vmdq->offset * __ALIGN_MASK(1, ~vmdq->mask);
+	for (i = 0; i < adapter->num_rx_queues; i++, reg_idx++) {
+		/* If we are greater than indices move to next pool */
+		if ((reg_idx & ~vmdq->mask) >= tcs)
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
+		adapter->rx_ring[i]->reg_idx = reg_idx;
+	}
+
+	reg_idx = vmdq->offset * __ALIGN_MASK(1, ~vmdq->mask);
+	for (i = 0; i < adapter->num_tx_queues; i++, reg_idx++) {
+		/* If we are greater than indices move to next pool */
+		if ((reg_idx & ~vmdq->mask) >= tcs)
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
+		adapter->tx_ring[i]->reg_idx = reg_idx;
+	}
+
+#ifdef IXGBE_FCOE
+	/* nothing to do if FCoE is disabled */
+	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
+		return true;
+
+	/* The work is already done if the FCoE ring is shared */
+	if (fcoe->offset < tcs)
+		return true;
+
+	/* The FCoE rings exist separately, we need to move their reg_idx */
+	if (fcoe->indices) {
+		u16 queues_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
+		u8 fcoe_tc = ixgbe_fcoe_get_tc(adapter);
+
+		reg_idx = (vmdq->offset + vmdq->indices) * queues_per_pool;
+		for (i = fcoe->offset; i < adapter->num_rx_queues; i++) {
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask) + fcoe_tc;
+			adapter->rx_ring[i]->reg_idx = reg_idx;
+			reg_idx++;
+		}
+
+		reg_idx = (vmdq->offset + vmdq->indices) * queues_per_pool;
+		for (i = fcoe->offset; i < adapter->num_tx_queues; i++) {
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask) + fcoe_tc;
+			adapter->tx_ring[i]->reg_idx = reg_idx;
+			reg_idx++;
+		}
+	}
+
+#endif /* IXGBE_FCOE */
+	return true;
+}
+
 /* ixgbe_get_first_reg_idx - Return first register index associated with ring */
 static void ixgbe_get_first_reg_idx(struct ixgbe_adapter *adapter, u8 tc,
 				    unsigned int *tx, unsigned int *rx)
@@ -120,14 +197,61 @@ static bool ixgbe_cache_ring_dcb(struct ixgbe_adapter *adapter)
  * no other mapping is used.
  *
  */
-static inline bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
+static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
 {
-	adapter->rx_ring[0]->reg_idx = adapter->num_vfs * 2;
-	adapter->tx_ring[0]->reg_idx = adapter->num_vfs * 2;
-	if (adapter->num_vfs)
-		return true;
-	else
+#ifdef IXGBE_FCOE
+	struct ixgbe_ring_feature *fcoe = &adapter->ring_feature[RING_F_FCOE];
+#endif /* IXGBE_FCOE */
+	struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
+	struct ixgbe_ring_feature *rss = &adapter->ring_feature[RING_F_RSS];
+	int i;
+	u16 reg_idx;
+
+	/* only proceed if VMDq is enabled */
+	if (!(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED))
 		return false;
+
+	/* start at VMDq register offset for SR-IOV enabled setups */
+	reg_idx = vmdq->offset * __ALIGN_MASK(1, ~vmdq->mask);
+	for (i = 0; i < adapter->num_rx_queues; i++, reg_idx++) {
+#ifdef IXGBE_FCOE
+		/* Allow first FCoE queue to be mapped as RSS */
+		if (fcoe->offset && (i > fcoe->offset))
+			break;
+#endif
+		/* If we are greater than indices move to next pool */
+		if ((reg_idx & ~vmdq->mask) >= rss->indices)
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
+		adapter->rx_ring[i]->reg_idx = reg_idx;
+	}
+
+#ifdef IXGBE_FCOE
+	/* FCoE uses a linear block of queues so just assigning 1:1 */
+	for (; i < adapter->num_rx_queues; i++, reg_idx++)
+		adapter->rx_ring[i]->reg_idx = reg_idx;
+
+#endif
+	reg_idx = vmdq->offset * __ALIGN_MASK(1, ~vmdq->mask);
+	for (i = 0; i < adapter->num_tx_queues; i++, reg_idx++) {
+#ifdef IXGBE_FCOE
+		/* Allow first FCoE queue to be mapped as RSS */
+		if (fcoe->offset && (i > fcoe->offset))
+			break;
+#endif
+		/* If we are greater than indices move to next pool */
+		if ((reg_idx & rss->mask) >= rss->indices)
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
+		adapter->tx_ring[i]->reg_idx = reg_idx;
+	}
+
+#ifdef IXGBE_FCOE
+	/* FCoE uses a linear block of queues so just assigning 1:1 */
+	for (; i < adapter->num_tx_queues; i++, reg_idx++)
+		adapter->tx_ring[i]->reg_idx = reg_idx;
+
+#endif
+
+	return true;
 }
 
 /**
@@ -169,30 +293,20 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
 	adapter->rx_ring[0]->reg_idx = 0;
 	adapter->tx_ring[0]->reg_idx = 0;
 
-	if (ixgbe_cache_ring_sriov(adapter))
+#ifdef CONFIG_IXGBE_DCB
+	if (ixgbe_cache_ring_dcb_sriov(adapter))
 		return;
 
-#ifdef CONFIG_IXGBE_DCB
 	if (ixgbe_cache_ring_dcb(adapter))
 		return;
+
 #endif
+	if (ixgbe_cache_ring_sriov(adapter))
+		return;
 
 	ixgbe_cache_ring_rss(adapter);
 }
 
-/**
- * ixgbe_set_sriov_queues - Allocate queues for IOV use
- * @adapter: board private structure to initialize
- *
- * IOV doesn't actually use anything, so just NAK the
- * request for now and let the other queue routines
- * figure out what to do.
- */
-static inline bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
-{
-	return false;
-}
-
 #define IXGBE_RSS_16Q_MASK	0xF
 #define IXGBE_RSS_8Q_MASK	0x7
 #define IXGBE_RSS_4Q_MASK	0x3
@@ -200,6 +314,109 @@ static inline bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 #define IXGBE_RSS_DISABLED_MASK	0x0
 
 #ifdef CONFIG_IXGBE_DCB
+/**
+ * ixgbe_set_dcb_sriov_queues: Allocate queues for SR-IOV devices w/ DCB
+ * @adapter: board private structure to initialize
+ *
+ * When SR-IOV (Single Root IO Virtualiztion) is enabled, allocate queues
+ * and VM pools where appropriate.  Also assign queues based on DCB
+ * priorities and map accordingly..
+ *
+ **/
+static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
+{
+	int i;
+	u16 vmdq_i = adapter->ring_feature[RING_F_VMDQ].limit;
+	u16 vmdq_m = 0;
+#ifdef IXGBE_FCOE
+	u16 fcoe_i = 0;
+#endif
+	u8 tcs = netdev_get_num_tc(adapter->netdev);
+
+	/* verify we have DCB queueing enabled before proceeding */
+	if (tcs <= 1)
+		return false;
+
+	/* verify we have VMDq enabled before proceeding */
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return false;
+
+	/* Add starting offset to total pool count */
+	vmdq_i += adapter->ring_feature[RING_F_VMDQ].offset;
+
+	/* 16 pools w/ 8 TC per pool */
+	if (tcs > 4) {
+		vmdq_i = min_t(u16, vmdq_i, 16);
+		vmdq_m = IXGBE_82599_VMDQ_8Q_MASK;
+	/* 32 pools w/ 4 TC per pool */
+	} else {
+		vmdq_i = min_t(u16, vmdq_i, 32);
+		vmdq_m = IXGBE_82599_VMDQ_4Q_MASK;
+	}
+
+#ifdef IXGBE_FCOE
+	/* queues in the remaining pools are available for FCoE */
+	fcoe_i = (128 / __ALIGN_MASK(1, ~vmdq_m)) - vmdq_i;
+
+#endif
+	/* remove the starting offset from the pool count */
+	vmdq_i -= adapter->ring_feature[RING_F_VMDQ].offset;
+
+	/* save features for later use */
+	adapter->ring_feature[RING_F_VMDQ].indices = vmdq_i;
+	adapter->ring_feature[RING_F_VMDQ].mask = vmdq_m;
+
+	/*
+	 * We do not support DCB, VMDq, and RSS all simultaneously
+	 * so we will disable RSS since it is the lowest priority
+	 */
+	adapter->ring_feature[RING_F_RSS].indices = 1;
+	adapter->ring_feature[RING_F_RSS].mask = IXGBE_RSS_DISABLED_MASK;
+
+	adapter->num_rx_pools = vmdq_i;
+	adapter->num_rx_queues_per_pool = tcs;
+
+	adapter->num_tx_queues = vmdq_i * tcs;
+	adapter->num_rx_queues = vmdq_i * tcs;
+
+#ifdef IXGBE_FCOE
+	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
+		struct ixgbe_ring_feature *fcoe;
+
+		fcoe = &adapter->ring_feature[RING_F_FCOE];
+
+		/* limit ourselves based on feature limits */
+		fcoe_i = min_t(u16, fcoe_i, num_online_cpus());
+		fcoe_i = min_t(u16, fcoe_i, fcoe->limit);
+
+		if (fcoe_i) {
+			/* alloc queues for FCoE separately */
+			fcoe->indices = fcoe_i;
+			fcoe->offset = vmdq_i * tcs;
+
+			/* add queues to adapter */
+			adapter->num_tx_queues += fcoe_i;
+			adapter->num_rx_queues += fcoe_i;
+		} else if (tcs > 1) {
+			/* use queue belonging to FcoE TC */
+			fcoe->indices = 1;
+			fcoe->offset = ixgbe_fcoe_get_tc(adapter);
+		} else {
+			adapter->flags &= ~IXGBE_FLAG_FCOE_ENABLED;
+
+			fcoe->indices = 0;
+			fcoe->offset = 0;
+		}
+	}
+
+#endif /* IXGBE_FCOE */
+	/* configure TC to queue mapping */
+	for (i = 0; i < tcs; i++)
+		netdev_set_tc_queue(adapter->netdev, i, 1, i);
+
+	return true;
+}
+
 static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
 {
 	struct net_device *dev = adapter->netdev;
@@ -262,6 +479,117 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
 
 #endif
 /**
+ * ixgbe_set_sriov_queues - Allocate queues for SR-IOV devices
+ * @adapter: board private structure to initialize
+ *
+ * When SR-IOV (Single Root IO Virtualiztion) is enabled, allocate queues
+ * and VM pools where appropriate.  If RSS is available, then also try and
+ * enable RSS and map accordingly.
+ *
+ **/
+static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
+{
+	u16 vmdq_i = adapter->ring_feature[RING_F_VMDQ].limit;
+	u16 vmdq_m = 0;
+	u16 rss_i = adapter->ring_feature[RING_F_RSS].limit;
+	u16 rss_m = IXGBE_RSS_DISABLED_MASK;
+#ifdef IXGBE_FCOE
+	u16 fcoe_i = 0;
+#endif
+
+	/* only proceed if SR-IOV is enabled */
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+		return false;
+
+	/* Add starting offset to total pool count */
+	vmdq_i += adapter->ring_feature[RING_F_VMDQ].offset;
+
+	/* double check we are limited to maximum pools */
+	vmdq_i = min_t(u16, IXGBE_MAX_VMDQ_INDICES, vmdq_i);
+
+	/* 64 pool mode with 2 queues per pool */
+	if ((vmdq_i > 32) || (rss_i < 4)) {
+		vmdq_m = IXGBE_82599_VMDQ_2Q_MASK;
+		rss_m = IXGBE_RSS_2Q_MASK;
+		rss_i = min_t(u16, rss_i, 2);
+	/* 32 pool mode with 4 queues per pool */
+	} else {
+		vmdq_m = IXGBE_82599_VMDQ_4Q_MASK;
+		rss_m = IXGBE_RSS_4Q_MASK;
+		rss_i = 4;
+	}
+
+#ifdef IXGBE_FCOE
+	/* queues in the remaining pools are available for FCoE */
+	fcoe_i = 128 - (vmdq_i * __ALIGN_MASK(1, ~vmdq_m));
+
+#endif
+	/* remove the starting offset from the pool count */
+	vmdq_i -= adapter->ring_feature[RING_F_VMDQ].offset;
+
+	/* save features for later use */
+	adapter->ring_feature[RING_F_VMDQ].indices = vmdq_i;
+	adapter->ring_feature[RING_F_VMDQ].mask = vmdq_m;
+
+	/* limit RSS based on user input and save for later use */
+	adapter->ring_feature[RING_F_RSS].indices = rss_i;
+	adapter->ring_feature[RING_F_RSS].mask = rss_m;
+
+	adapter->num_rx_pools = vmdq_i;
+	adapter->num_rx_queues_per_pool = rss_i;
+
+	adapter->num_rx_queues = vmdq_i * rss_i;
+	adapter->num_tx_queues = vmdq_i * rss_i;
+
+	/* disable ATR as it is not supported when VMDq is enabled */
+	adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
+
+#ifdef IXGBE_FCOE
+	/*
+	 * FCoE can use rings from adjacent buffers to allow RSS
+	 * like behavior.  To account for this we need to add the
+	 * FCoE indices to the total ring count.
+	 */
+	if (adapter->flags & IXGBE_FLAG_FCOE_ENABLED) {
+		struct ixgbe_ring_feature *fcoe;
+
+		fcoe = &adapter->ring_feature[RING_F_FCOE];
+
+		/* limit ourselves based on feature limits */
+		fcoe_i = min_t(u16, fcoe_i, fcoe->limit);
+
+		if (vmdq_i > 1 && fcoe_i) {
+			/* reserve no more than number of CPUs */
+			fcoe_i = min_t(u16, fcoe_i, num_online_cpus());
+
+			/* alloc queues for FCoE separately */
+			fcoe->indices = fcoe_i;
+			fcoe->offset = vmdq_i * rss_i;
+		} else {
+			/* merge FCoE queues with RSS queues */
+			fcoe_i = min_t(u16, fcoe_i + rss_i, num_online_cpus());
+
+			/* limit indices to rss_i if MSI-X is disabled */
+			if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
+				fcoe_i = rss_i;
+
+			/* attempt to reserve some queues for just FCoE */
+			fcoe->indices = min_t(u16, fcoe_i, fcoe->limit);
+			fcoe->offset = fcoe_i - fcoe->indices;
+
+			fcoe_i -= rss_i;
+		}
+
+		/* add queues to adapter */
+		adapter->num_tx_queues += fcoe_i;
+		adapter->num_rx_queues += fcoe_i;
+	}
+
+#endif
+	return true;
+}
+
+/**
  * ixgbe_set_rss_queues - Allocate queues for RSS
  * @adapter: board private structure to initialize
  *
@@ -353,14 +681,17 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 	adapter->num_rx_pools = adapter->num_rx_queues;
 	adapter->num_rx_queues_per_pool = 1;
 
-	if (ixgbe_set_sriov_queues(adapter))
+#ifdef CONFIG_IXGBE_DCB
+	if (ixgbe_set_dcb_sriov_queues(adapter))
 		return;
 
-#ifdef CONFIG_IXGBE_DCB
 	if (ixgbe_set_dcb_queues(adapter))
 		return;
 
 #endif
+	if (ixgbe_set_sriov_queues(adapter))
+		return;
+
 	ixgbe_set_rss_queues(adapter);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea94fa2..454e556 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3161,9 +3161,18 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	 * Set up VF register offsets for selected VT Mode,
 	 * i.e. 32 or 64 VFs for SR-IOV
 	 */
-	gcr_ext = IXGBE_READ_REG(hw, IXGBE_GCR_EXT);
-	gcr_ext |= IXGBE_GCR_EXT_MSIX_EN;
-	gcr_ext |= IXGBE_GCR_EXT_VT_MODE_64;
+	switch (adapter->ring_feature[RING_F_VMDQ].mask) {
+	case IXGBE_82599_VMDQ_8Q_MASK:
+		gcr_ext = IXGBE_GCR_EXT_VT_MODE_16;
+		break;
+	case IXGBE_82599_VMDQ_4Q_MASK:
+		gcr_ext = IXGBE_GCR_EXT_VT_MODE_32;
+		break;
+	default:
+		gcr_ext = IXGBE_GCR_EXT_VT_MODE_64;
+		break;
+	}
+
 	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, gcr_ext);
 
 	/* enable Tx loopback for VF/PF communication */
@@ -3947,7 +3956,18 @@ static void ixgbe_setup_gpie(struct ixgbe_adapter *adapter)
 
 	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
 		gpie &= ~IXGBE_GPIE_VTMODE_MASK;
-		gpie |= IXGBE_GPIE_VTMODE_64;
+
+		switch (adapter->ring_feature[RING_F_VMDQ].mask) {
+		case IXGBE_82599_VMDQ_8Q_MASK:
+			gpie |= IXGBE_GPIE_VTMODE_16;
+			break;
+		case IXGBE_82599_VMDQ_4Q_MASK:
+			gpie |= IXGBE_GPIE_VTMODE_32;
+			break;
+		default:
+			gpie |= IXGBE_GPIE_VTMODE_64;
+			break;
+		}
 	}
 
 	/* Enable Thermal over heat sensor interrupt */
@@ -6674,11 +6694,6 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 		return -EINVAL;
 	}
 
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) {
-		e_err(drv, "Enable failed, SR-IOV enabled\n");
-		return -EINVAL;
-	}
-
 	/* Hardware supports up to 8 traffic classes */
 	if (tc > adapter->dcb_cfg.num_tcs.pg_tcs ||
 	    (hw->mac.type == ixgbe_mac_82598EB &&
@@ -7225,10 +7240,6 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 	netdev->priv_flags |= IFF_UNICAST_FLT;
 	netdev->priv_flags |= IFF_SUPP_NOFCS;
 
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
-		adapter->flags &= ~(IXGBE_FLAG_RSS_ENABLED |
-				    IXGBE_FLAG_DCB_ENABLED);
-
 #ifdef CONFIG_IXGBE_DCB
 	netdev->dcbnl_ops = &dcbnl_ops;
 #endif
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index eb3f67c..d285443 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -107,15 +107,21 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 			 "VF drivers to avoid spoofed packet errors\n");
 	} else {
 		err = pci_enable_sriov(adapter->pdev, adapter->num_vfs);
+		if (err) {
+			e_err(probe, "Failed to enable PCI sriov: %d\n", err);
+			goto err_novfs;
+		}
 	}
-	if (err) {
-		e_err(probe, "Failed to enable PCI sriov: %d\n", err);
-		goto err_novfs;
-	}
-	adapter->flags |= IXGBE_FLAG_SRIOV_ENABLED;
 
+	adapter->flags |= IXGBE_FLAG_SRIOV_ENABLED;
 	e_info(probe, "SR-IOV enabled with %d VFs\n", adapter->num_vfs);
 
+	/* Enable VMDq flag so device will be set in VM mode */
+	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED;
+	if (!adapter->ring_feature[RING_F_VMDQ].limit)
+		adapter->ring_feature[RING_F_VMDQ].limit = 1;
+	adapter->ring_feature[RING_F_VMDQ].offset = adapter->num_vfs;
+
 	num_vf_macvlans = hw->mac.num_rar_entries -
 	(IXGBE_MAX_PF_MACVLANS + 1 + adapter->num_vfs);
 
@@ -146,12 +152,39 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 		 * and memory allocated set up the mailbox parameters
 		 */
 		ixgbe_init_mbx_params_pf(hw);
-		memcpy(&hw->mbx.ops, ii->mbx_ops,
-		       sizeof(hw->mbx.ops));
+		memcpy(&hw->mbx.ops, ii->mbx_ops, sizeof(hw->mbx.ops));
+
+		/* limit trafffic classes based on VFs enabled */
+		if ((adapter->hw.mac.type == ixgbe_mac_82599EB) &&
+		    (adapter->num_vfs < 16)) {
+			adapter->dcb_cfg.num_tcs.pg_tcs = MAX_TRAFFIC_CLASS;
+			adapter->dcb_cfg.num_tcs.pfc_tcs = MAX_TRAFFIC_CLASS;
+		} else if (adapter->num_vfs < 32) {
+			adapter->dcb_cfg.num_tcs.pg_tcs = 4;
+			adapter->dcb_cfg.num_tcs.pfc_tcs = 4;
+		} else {
+			adapter->dcb_cfg.num_tcs.pg_tcs = 1;
+			adapter->dcb_cfg.num_tcs.pfc_tcs = 1;
+		}
+
+		/* We do not support RSS w/ SR-IOV */
+		adapter->ring_feature[RING_F_RSS].limit = 1;
 
 		/* Disable RSC when in SR-IOV mode */
 		adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 				     IXGBE_FLAG2_RSC_ENABLED);
+
+#ifdef IXGBE_FCOE
+		/*
+		 * When SR-IOV is enabled 82599 cannot support jumbo frames
+		 * so we must disable FCoE because we cannot support FCoE MTU.
+		 */
+		if (adapter->hw.mac.type == ixgbe_mac_82599EB)
+			adapter->flags &= ~(IXGBE_FLAG_FCOE_ENABLED |
+					    IXGBE_FLAG_FCOE_CAPABLE);
+#endif
+
+		/* enable spoof checking for all VFs */
 		for (i = 0; i < adapter->num_vfs; i++)
 			adapter->vfinfo[i].spoofchk_enabled = true;
 		return;
@@ -171,7 +204,6 @@ err_novfs:
 void ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 gcr;
 	u32 gpie;
 	u32 vmdctl;
 	int i;
@@ -182,9 +214,7 @@ void ixgbe_disable_sriov(struct ixgbe_adapter *adapter)
 #endif
 
 	/* turn off device IOV mode */
-	gcr = IXGBE_READ_REG(hw, IXGBE_GCR_EXT);
-	gcr &= ~(IXGBE_GCR_EXT_SRIOV);
-	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, gcr);
+	IXGBE_WRITE_REG(hw, IXGBE_GCR_EXT, 0);
 	gpie = IXGBE_READ_REG(hw, IXGBE_GPIE);
 	gpie &= ~IXGBE_GPIE_VTMODE_MASK;
 	IXGBE_WRITE_REG(hw, IXGBE_GPIE, gpie);
-- 
1.7.10.4

^ permalink raw reply related

* [net-next 9/9] ixgbe: Cleanup holes in flags after removing several of them
From: Jeff Kirsher @ 2012-07-18 20:31 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1342643516-2696-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is just meant to defragment the flags as there are several hole
that have been introduced since several features, or the flags for them,
have been removed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h |   50 +++++++++++++++---------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4ca10e6..f7f6fe2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -433,33 +433,33 @@ struct ixgbe_adapter {
 	 * thus the additional *_CAPABLE flags.
 	 */
 	u32 flags;
-#define IXGBE_FLAG_MSI_CAPABLE                  (u32)(1 << 1)
-#define IXGBE_FLAG_MSI_ENABLED                  (u32)(1 << 2)
-#define IXGBE_FLAG_MSIX_CAPABLE                 (u32)(1 << 3)
-#define IXGBE_FLAG_MSIX_ENABLED                 (u32)(1 << 4)
-#define IXGBE_FLAG_RX_1BUF_CAPABLE              (u32)(1 << 6)
-#define IXGBE_FLAG_RX_PS_CAPABLE                (u32)(1 << 7)
-#define IXGBE_FLAG_RX_PS_ENABLED                (u32)(1 << 8)
-#define IXGBE_FLAG_IN_NETPOLL                   (u32)(1 << 9)
-#define IXGBE_FLAG_DCA_ENABLED                  (u32)(1 << 10)
-#define IXGBE_FLAG_DCA_CAPABLE                  (u32)(1 << 11)
-#define IXGBE_FLAG_IMIR_ENABLED                 (u32)(1 << 12)
-#define IXGBE_FLAG_MQ_CAPABLE                   (u32)(1 << 13)
-#define IXGBE_FLAG_DCB_ENABLED                  (u32)(1 << 14)
-#define IXGBE_FLAG_VMDQ_CAPABLE                 (u32)(1 << 18)
-#define IXGBE_FLAG_VMDQ_ENABLED                 (u32)(1 << 19)
-#define IXGBE_FLAG_FAN_FAIL_CAPABLE             (u32)(1 << 20)
-#define IXGBE_FLAG_NEED_LINK_UPDATE             (u32)(1 << 22)
-#define IXGBE_FLAG_NEED_LINK_CONFIG             (u32)(1 << 23)
-#define IXGBE_FLAG_FDIR_HASH_CAPABLE            (u32)(1 << 24)
-#define IXGBE_FLAG_FDIR_PERFECT_CAPABLE         (u32)(1 << 25)
-#define IXGBE_FLAG_FCOE_CAPABLE                 (u32)(1 << 26)
-#define IXGBE_FLAG_FCOE_ENABLED                 (u32)(1 << 27)
-#define IXGBE_FLAG_SRIOV_CAPABLE                (u32)(1 << 28)
-#define IXGBE_FLAG_SRIOV_ENABLED                (u32)(1 << 29)
+#define IXGBE_FLAG_MSI_CAPABLE                  (u32)(1 << 0)
+#define IXGBE_FLAG_MSI_ENABLED                  (u32)(1 << 1)
+#define IXGBE_FLAG_MSIX_CAPABLE                 (u32)(1 << 2)
+#define IXGBE_FLAG_MSIX_ENABLED                 (u32)(1 << 3)
+#define IXGBE_FLAG_RX_1BUF_CAPABLE              (u32)(1 << 4)
+#define IXGBE_FLAG_RX_PS_CAPABLE                (u32)(1 << 5)
+#define IXGBE_FLAG_RX_PS_ENABLED                (u32)(1 << 6)
+#define IXGBE_FLAG_IN_NETPOLL                   (u32)(1 << 7)
+#define IXGBE_FLAG_DCA_ENABLED                  (u32)(1 << 8)
+#define IXGBE_FLAG_DCA_CAPABLE                  (u32)(1 << 9)
+#define IXGBE_FLAG_IMIR_ENABLED                 (u32)(1 << 10)
+#define IXGBE_FLAG_MQ_CAPABLE                   (u32)(1 << 11)
+#define IXGBE_FLAG_DCB_ENABLED                  (u32)(1 << 12)
+#define IXGBE_FLAG_VMDQ_CAPABLE                 (u32)(1 << 13)
+#define IXGBE_FLAG_VMDQ_ENABLED                 (u32)(1 << 14)
+#define IXGBE_FLAG_FAN_FAIL_CAPABLE             (u32)(1 << 15)
+#define IXGBE_FLAG_NEED_LINK_UPDATE             (u32)(1 << 16)
+#define IXGBE_FLAG_NEED_LINK_CONFIG             (u32)(1 << 17)
+#define IXGBE_FLAG_FDIR_HASH_CAPABLE            (u32)(1 << 18)
+#define IXGBE_FLAG_FDIR_PERFECT_CAPABLE         (u32)(1 << 19)
+#define IXGBE_FLAG_FCOE_CAPABLE                 (u32)(1 << 20)
+#define IXGBE_FLAG_FCOE_ENABLED                 (u32)(1 << 21)
+#define IXGBE_FLAG_SRIOV_CAPABLE                (u32)(1 << 22)
+#define IXGBE_FLAG_SRIOV_ENABLED                (u32)(1 << 23)
 
 	u32 flags2;
-#define IXGBE_FLAG2_RSC_CAPABLE                 (u32)(1)
+#define IXGBE_FLAG2_RSC_CAPABLE                 (u32)(1 << 0)
 #define IXGBE_FLAG2_RSC_ENABLED                 (u32)(1 << 1)
 #define IXGBE_FLAG2_TEMP_SENSOR_CAPABLE         (u32)(1 << 2)
 #define IXGBE_FLAG2_TEMP_SENSOR_EVENT           (u32)(1 << 3)
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] net: Statically initialize init_net.dev_base_head
From: David Miller @ 2012-07-18 20:32 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: nhorman, mark.d.rustad, netdev, gaofeng, eric.dumazet
In-Reply-To: <50071D11.7080207@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Wed, 18 Jul 2012 13:31:13 -0700

> On 7/18/2012 1:21 PM, Neil Horman wrote:
>> On Wed, Jul 18, 2012 at 01:20:10PM -0700, David Miller wrote:
>>> From: Neil Horman <nhorman@tuxdriver.com>
>>> Date: Wed, 18 Jul 2012 16:11:49 -0400
>>>
>>>> On Wed, Jul 18, 2012 at 12:06:07PM -0700, Mark Rustad wrote:
>>>>> This change eliminates an initialization-order hazard most
>>>>> recently seen when netprio_cgroup is built into the kernel.
>>>>>
>>>>> With thanks to Eric Dumazet for catching a bug.
>>>>>
>>>>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>>>   ...
>>>> I think dave was going to take John Fastabends patch from earlier
>>>> today, but
>>>> this works just as well.  Long term I'm going to look into delaying
>>>> initzlization for cgroups, as it creates a strange initialization
>>>> state when you
>>>> have a module_init routine registered.
>>>
>>> Neil, any particular preference between John's and Mark's version
>>> of the fix?
>>>
>> I think they're both perfectly good.  If I had to choose I'd say
>> Marks, just
>> because its done by initializing data, rather than adding more code to
>> run every
>> time we create a cgroup.
>>
>> Neil
>>
> 
> Fine by me if we take this version instead.

I think that's what I'll do, sorry for all the trouble John :)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox