Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/3] ipip: fix sparse warnings in ipip_netlink_parms()
From: Eric Dumazet @ 2012-11-15 13:20 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem
In-Reply-To: <50A4B8B0.1090409@6wind.com>

On Thu, 2012-11-15 at 10:41 +0100, Nicolas Dichtel wrote:
> Le 15/11/2012 09:53, Nicolas Dichtel a écrit :
> > Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> I forget "Reported-by: Fengguang Wu <fengguang.wu@intel.com>"
> 
> Should I resend the serie?

Your patches generally lack changelogs and proper attribution.

Having a good changelog can avoid to actually look at the patch content,
its incredibly useful in the long term.

^ permalink raw reply

* Re: [RFC] tcp: use order-3 pages in tcp_sendmsg()
From: Eric Dumazet @ 2012-11-15 13:47 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: netdev
In-Reply-To: <CAAM7YAm+XJKTxJStLMoo4RRV85oogN5wCHfXorkEiUHKqNKHDQ@mail.gmail.com>

On Thu, 2012-11-15 at 15:52 +0800, Yan, Zheng wrote:
> LLC misses happen on the receiver size. It means most pages allocated by the
> senders are cache hot. But when using order-3 pages, 2048 * 32k = 64M, 64M
> is much larger than LLC size.

By the way, this 2048*32k is wrong, as the receiver only uses fragments
in pages, and its not related to the "tcp: use order-3 pages in
tcp_sendmsg()" commit

Many drivers use order-0 pages to hold ethernet frames, regardless of
what was used by the sender ;)

^ permalink raw reply

* [PATCHv2] sctp: fix /proc/net/sctp/ memory leak
From: Tommi Rantala @ 2012-11-15 13:49 UTC (permalink / raw)
  To: netdev
  Cc: Neil Horman, Vlad Yasevich, Sridhar Samudrala, David S. Miller,
	linux-sctp, Dave Jones, Tommi Rantala, Eric W. Biederman

Commit 13d782f ("sctp: Make the proc files per network namespace.")
changed the /proc/net/sctp/ struct file_operations opener functions to
use single_open_net() and seq_open_net().

Avoid leaking memory by using single_release_net() and seq_release_net()
as the release functions.

Discovered with Trinity (the syscall fuzzer).

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
---
 net/sctp/proc.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index c3bea26..9966e7b 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -102,7 +102,7 @@ static const struct file_operations sctp_snmp_seq_fops = {
 	.open	 = sctp_snmp_seq_open,
 	.read	 = seq_read,
 	.llseek	 = seq_lseek,
-	.release = single_release,
+	.release = single_release_net,
 };
 
 /* Set up the proc fs entry for 'snmp' object. */
@@ -251,7 +251,7 @@ static const struct file_operations sctp_eps_seq_fops = {
 	.open	 = sctp_eps_seq_open,
 	.read	 = seq_read,
 	.llseek	 = seq_lseek,
-	.release = seq_release,
+	.release = seq_release_net,
 };
 
 /* Set up the proc fs entry for 'eps' object. */
@@ -372,7 +372,7 @@ static const struct file_operations sctp_assocs_seq_fops = {
 	.open	 = sctp_assocs_seq_open,
 	.read	 = seq_read,
 	.llseek	 = seq_lseek,
-	.release = seq_release,
+	.release = seq_release_net,
 };
 
 /* Set up the proc fs entry for 'assocs' object. */
@@ -517,7 +517,7 @@ static const struct file_operations sctp_remaddr_seq_fops = {
 	.open = sctp_remaddr_seq_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
-	.release = seq_release,
+	.release = seq_release_net,
 };
 
 int __net_init sctp_remaddr_proc_init(struct net *net)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCHv2 iproute2] add DOVE extensions for iproute2
From: David L Stevens @ 2012-11-15 13:49 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger; +Cc: netdev


	This patch adds a new flag to iproute2 for vxlan devices to enable
DOVE features. It also adds support for L2 and L3 switch lookup miss
netlink messages to "ip monitor".

Changes since v1:
	- split "dove" flag into separate feature flags:
		- "proxy" for ARP reduction
		- "rsc" for route short circuiting
		- "l2miss" for L2 switch miss notifications
		- "l3miss" for L3 switch miss notifications

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 012d95a..a163702 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -283,6 +283,10 @@ enum {
 	IFLA_VXLAN_AGEING,
 	IFLA_VXLAN_LIMIT,
 	IFLA_VXLAN_PORT_RANGE,
+	IFLA_VXLAN_PROXY,
+	IFLA_VXLAN_RSC,
+	IFLA_VXLAN_L2MISS,
+	IFLA_VXLAN_L3MISS,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index ba5c4ab..f2e6bef 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -26,6 +26,8 @@ static void explain(void)
 	fprintf(stderr, "Usage: ... vxlan id VNI [ group ADDR ] [ local ADDR ]\n");
 	fprintf(stderr, "                 [ ttl TTL ] [ tos TOS ] [ dev PHYS_DEV ]\n");
 	fprintf(stderr, "                 [ port MIN MAX ] [ [no]learning ]\n");
+	fprintf(stderr, "                 [ [no]proxy ] [ [no]rsc ]\n");
+	fprintf(stderr, "                 [ [no]l2miss ] [ [no]l3miss ]\n");
 	fprintf(stderr, "\n");
 	fprintf(stderr, "Where: VNI := 0-16777215\n");
 	fprintf(stderr, "       ADDR := { IP_ADDRESS | any }\n");
@@ -44,6 +46,10 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 tos = 0;
 	__u8 ttl = 0;
 	__u8 learning = 1;
+	__u8 proxy = 0;
+	__u8 rsc = 0;
+	__u8 l2miss = 0;
+	__u8 l3miss = 0;
 	__u8 noage = 0;
 	__u32 age = 0;
 	__u32 maxaddr = 0;
@@ -123,6 +129,22 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			learning = 0;
 		} else if (!matches(*argv, "learning")) {
 			learning = 1;
+		} else if (!matches(*argv, "noproxy")) {
+			proxy = 0;
+		} else if (!matches(*argv, "proxy")) {
+			proxy = 1;
+		} else if (!matches(*argv, "norsc")) {
+			rsc = 0;
+		} else if (!matches(*argv, "rsc")) {
+			rsc = 1;
+		} else if (!matches(*argv, "nol2miss")) {
+			l2miss = 0;
+		} else if (!matches(*argv, "l2miss")) {
+			l2miss = 1;
+		} else if (!matches(*argv, "nol3miss")) {
+			l3miss = 0;
+		} else if (!matches(*argv, "l3miss")) {
+			l3miss = 1;
 		} else if (matches(*argv, "help") == 0) {
 			explain();
 			return -1;
@@ -148,6 +170,10 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
 	addattr8(n, 1024, IFLA_VXLAN_TOS, tos);
 	addattr8(n, 1024, IFLA_VXLAN_LEARNING, learning);
+	addattr8(n, 1024, IFLA_VXLAN_PROXY, proxy);
+	addattr8(n, 1024, IFLA_VXLAN_RSC, rsc);
+	addattr8(n, 1024, IFLA_VXLAN_L2MISS, l2miss);
+	addattr8(n, 1024, IFLA_VXLAN_L3MISS, l3miss);
 	if (noage)
 		addattr32(n, 1024, IFLA_VXLAN_AGEING, 0);
 	else if (age)
@@ -213,6 +239,18 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_VXLAN_LEARNING] &&
 	    !rta_getattr_u8(tb[IFLA_VXLAN_LEARNING]))
 		fputs("nolearning ", f);
+ 
+	if (tb[IFLA_VXLAN_PROXY] && rta_getattr_u8(tb[IFLA_VXLAN_PROXY]))
+		fputs("proxy ", f);
+ 
+	if (tb[IFLA_VXLAN_RSC] && rta_getattr_u8(tb[IFLA_VXLAN_RSC]))
+		fputs("rsc ", f);
+
+	if (tb[IFLA_VXLAN_L2MISS] && rta_getattr_u8(tb[IFLA_VXLAN_L2MISS]))
+		fputs("l2miss ", f);
+
+	if (tb[IFLA_VXLAN_L3MISS] && rta_getattr_u8(tb[IFLA_VXLAN_L3MISS]))
+		fputs("l3miss ", f);
 	
 	if (tb[IFLA_VXLAN_TOS] &&
 	    (tos = rta_getattr_u8(tb[IFLA_VXLAN_TOS]))) {
diff --git a/ip/ipmonitor.c b/ip/ipmonitor.c
index 4b1d469..7a7cc88 100644
--- a/ip/ipmonitor.c
+++ b/ip/ipmonitor.c
@@ -67,7 +67,8 @@ int accept_msg(const struct sockaddr_nl *who,
 		print_addrlabel(who, n, arg);
 		return 0;
 	}
-	if (n->nlmsg_type == RTM_NEWNEIGH || n->nlmsg_type == RTM_DELNEIGH) {
+	if (n->nlmsg_type == RTM_NEWNEIGH || n->nlmsg_type == RTM_DELNEIGH ||
+	    n->nlmsg_type == RTM_GETNEIGH) {
 		if (prefix_banner)
 			fprintf(fp, "[NEIGH]");
 		print_neigh(who, n, arg);
diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 56e56b2..1b7600b 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -189,7 +189,8 @@ int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	struct rtattr * tb[NDA_MAX+1];
 	char abuf[256];
 
-	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH) {
+	if (n->nlmsg_type != RTM_NEWNEIGH && n->nlmsg_type != RTM_DELNEIGH &&
+	    n->nlmsg_type != RTM_GETNEIGH) {
 		fprintf(stderr, "Not RTM_NEWNEIGH: %08x %08x %08x\n",
 			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
 
@@ -251,6 +252,8 @@ int print_neigh(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 
 	if (n->nlmsg_type == RTM_DELNEIGH)
 		fprintf(fp, "delete ");
+	else if (n->nlmsg_type == RTM_GETNEIGH)
+		fprintf(fp, "miss ");
 	if (tb[NDA_DST]) {
 		fprintf(fp, "%s ",
 			format_host(r->ndm_family,

^ permalink raw reply related

* [PATCHv2 net-next] add DOVE extensions for VXLAN
From: David L Stevens @ 2012-11-15 13:46 UTC (permalink / raw)
  To: David Miller, Stephen Hemminger; +Cc: netdev


	This patch provides extensions to VXLAN for supporting Distributed
Overlay Virtual Ethernet (DOVE) networks. The patch includes:

	+ a dove flag per VXLAN device to enable DOVE extensions
	+ ARP reduction, whereby a bridge-connected VXLAN tunnel endpoint
		answers ARP requests from the local bridge on behalf of
		remote DOVE clients
	+ route short-circuiting (aka L3 switching). Known destination IP
		addresses use the corresponding destination MAC address for
		switching rather than going to a (possibly remote) router first.
	+ netlink notification messages for forwarding table and L3 switching
		misses

Changes since v1:
	- split "dove" flag into separate feature flags:
		- "proxy" for ARP reduction
		- "rsc" for route short circuiting
		- "l2miss" for L2 switch miss notifications
		- "l3miss" for L3 switch miss notifications

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8aca888..6503091 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -29,6 +29,8 @@
 #include <linux/etherdevice.h>
 #include <linux/if_ether.h>
 #include <linux/hash.h>
+#include <net/arp.h>
+#include <net/ndisc.h>
 #include <net/ip.h>
 #include <net/icmp.h>
 #include <net/udp.h>
@@ -111,6 +113,10 @@ struct vxlan_dev {
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
 	bool		  learn;
+	bool		  proxy;	/* ARP reduction */
+	bool		  rsc;		/* route short-circuit */
+	bool		  l2miss;	/* L2 miss notifications */
+	bool		  l3miss;	/* L3 miss notifications */
 
 	unsigned long	  age_interval;
 	struct timer_list age_timer;
@@ -155,6 +161,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 	struct nda_cacheinfo ci;
 	struct nlmsghdr *nlh;
 	struct ndmsg *ndm;
+	bool send_ip, send_eth;
 
 	nlh = nlmsg_put(skb, portid, seq, type, sizeof(*ndm), flags);
 	if (nlh == NULL)
@@ -162,16 +169,28 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 
 	ndm = nlmsg_data(nlh);
 	memset(ndm, 0, sizeof(*ndm));
-	ndm->ndm_family	= AF_BRIDGE;
+
+	send_eth = send_ip = true;
+
+	if (type == RTM_GETNEIGH) {
+		int i;
+
+		ndm->ndm_family	= AF_INET;
+		send_ip = fdb->remote_ip != 0;
+		send_eth = 0;
+		for (i=0; i<ETH_ALEN; ++i)
+			send_eth |= !!fdb->eth_addr[i];
+	} else
+		ndm->ndm_family	= AF_BRIDGE;
 	ndm->ndm_state = fdb->state;
 	ndm->ndm_ifindex = vxlan->dev->ifindex;
 	ndm->ndm_flags = NTF_SELF;
 	ndm->ndm_type = NDA_DST;
 
-	if (nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
+	if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
 		goto nla_put_failure;
 
-	if (nla_put_be32(skb, NDA_DST, fdb->remote_ip))
+	if (send_ip && nla_put_be32(skb, NDA_DST, fdb->remote_ip))
 		goto nla_put_failure;
 
 	ci.ndm_used	 = jiffies_to_clock_t(now - fdb->used);
@@ -223,6 +242,29 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
+static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct vxlan_fdb f;
+
+	memset(&f, 0, sizeof f);
+	f.state = NUD_STALE;
+	f.remote_ip = ipa; /* goes to NDA_DST */
+
+	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
+}
+
+static void vxlan_fdb_miss(struct vxlan_dev *vxlan, const u8 eth_addr[ETH_ALEN])
+{
+	struct vxlan_fdb	f;
+
+	memset(&f, 0, sizeof f);
+	f.state = NUD_STALE;
+	memcpy(f.eth_addr, eth_addr, ETH_ALEN);
+
+	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
+}
+
 /* Hash Ethernet address */
 static u32 eth_hash(const unsigned char *addr)
 {
@@ -552,6 +594,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 		goto drop;
 	}
 
+	skb_reset_mac_header(skb);
+
 	/* Re-examine inner Ethernet packet */
 	oip = ip_hdr(skb);
 	skb->protocol = eth_type_trans(skb, vxlan->dev);
@@ -600,6 +644,117 @@ drop:
 	return 0;
 }
 
+static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct arphdr *parp;
+	u8 *arpptr, *sha;
+	__be32 sip, tip;
+	struct neighbour *n;
+
+	if (dev->flags & IFF_NOARP)
+		goto out;
+
+	if (!pskb_may_pull(skb, arp_hdr_len(dev))) {
+		dev->stats.tx_dropped++;
+		goto out;
+	}
+	parp = arp_hdr(skb);
+
+	if ((parp->ar_hrd != htons(ARPHRD_ETHER) &&
+	     parp->ar_hrd != htons(ARPHRD_IEEE802)) ||
+	    parp->ar_pro != htons(ETH_P_IP) ||
+	    parp->ar_op != htons(ARPOP_REQUEST) ||
+	    parp->ar_hln != dev->addr_len ||
+	    parp->ar_pln != 4)
+		goto out;
+	arpptr = (u8 *)parp + sizeof(struct arphdr);
+	sha = arpptr;
+	arpptr += dev->addr_len;	/* sha */
+	memcpy(&sip, arpptr, sizeof(sip));
+	arpptr += sizeof(sip);
+	arpptr += dev->addr_len;	/* tha */
+	memcpy(&tip, arpptr, sizeof(tip));
+
+	if (ipv4_is_loopback(tip) ||
+	    ipv4_is_multicast(tip))
+		goto out;
+
+	n = neigh_lookup(&arp_tbl, &tip, dev);
+
+	if (n) {
+		struct vxlan_dev *vxlan = netdev_priv(dev);
+		struct vxlan_fdb *f;
+		struct sk_buff	*reply;
+
+		if (!(n->nud_state & NUD_CONNECTED)) {
+			neigh_release(n);
+			goto out;
+		}
+
+		f = vxlan_find_mac(vxlan, n->ha);
+		if (f && f->remote_ip == 0) {
+			/* bridge-local neighbor */
+			neigh_release(n);
+			goto out;
+		}
+
+		reply = arp_create(ARPOP_REPLY, ETH_P_ARP, sip, dev, tip, sha,
+				n->ha, sha);
+
+		neigh_release(n);
+
+		skb_reset_mac_header(reply);
+		__skb_pull(reply, skb_network_offset(reply));
+		reply->ip_summed = CHECKSUM_UNNECESSARY;
+		reply->pkt_type = PACKET_HOST;
+
+		if (netif_rx_ni(reply) == NET_RX_DROP)
+			dev->stats.rx_dropped++;
+	} else if (vxlan->l3miss)
+		vxlan_ip_miss(dev, tip);
+out:
+	consume_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct neighbour *n;
+	struct iphdr *pip;
+
+	if (is_multicast_ether_addr(eth_hdr(skb)->h_dest))
+		return false;
+
+	n = NULL;
+	switch (ntohs(eth_hdr(skb)->h_proto)) {
+	case ETH_P_IP:
+		if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+			return false;
+		pip = ip_hdr(skb);
+		n = neigh_lookup(&arp_tbl, &pip->daddr, dev);
+		break;
+	default:
+		return false;
+	}
+
+	if (n) {
+		bool diff;
+
+		diff = compare_ether_addr(eth_hdr(skb)->h_dest, n->ha) != 0;
+		if (diff) {
+			memcpy(eth_hdr(skb)->h_source, eth_hdr(skb)->h_dest,
+				dev->addr_len);
+			memcpy(eth_hdr(skb)->h_dest, n->ha, dev->addr_len);
+		}
+		neigh_release(n);
+		return diff;
+	} else if (vxlan->l3miss)
+		vxlan_ip_miss(dev, pip->daddr);
+	return false;
+}
+
 /* Extract dsfield from inner protocol */
 static inline u8 vxlan_get_dsfield(const struct iphdr *iph,
 				   const struct sk_buff *skb)
@@ -622,22 +777,6 @@ static inline u8 vxlan_ecn_encap(u8 tos,
 	return INET_ECN_encapsulate(tos, inner);
 }
 
-static __be32 vxlan_find_dst(struct vxlan_dev *vxlan, struct sk_buff *skb)
-{
-	const struct ethhdr *eth = (struct ethhdr *) skb->data;
-	const struct vxlan_fdb *f;
-
-	if (is_multicast_ether_addr(eth->h_dest))
-		return vxlan->gaddr;
-
-	f = vxlan_find_mac(vxlan, eth->h_dest);
-	if (f)
-		return f->remote_ip;
-	else
-		return vxlan->gaddr;
-
-}
-
 static void vxlan_sock_free(struct sk_buff *skb)
 {
 	sock_put(skb->sk);
@@ -684,6 +823,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct rtable *rt;
 	const struct iphdr *old_iph;
+	struct ethhdr *eth;
 	struct iphdr *iph;
 	struct vxlanhdr *vxh;
 	struct udphdr *uh;
@@ -694,10 +834,50 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	__be16 df = 0;
 	__u8 tos, ttl;
 	int err;
+	bool rsc = false;
+	const struct vxlan_fdb *f;
+
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+
+	if (vxlan->proxy && ntohs(eth->h_proto) == ETH_P_ARP)
+		return arp_reduce(dev, skb);
+	else if (vxlan->rsc && ntohs(eth->h_proto) == ETH_P_IP)
+		rsc = route_shortcircuit(dev, skb);
 
-	dst = vxlan_find_dst(vxlan, skb);
-	if (!dst)
+	f = vxlan_find_mac(vxlan, eth->h_dest);
+	if (f == NULL) {
+		rsc = false;
+		dst = vxlan->gaddr;
+		if (!dst && vxlan->l2miss &&
+		    !is_multicast_ether_addr(eth->h_dest))
+			vxlan_fdb_miss(vxlan, eth->h_dest);
+	} else
+		dst = f->remote_ip;
+
+	if (!dst) {
+		if (rsc) {
+			__skb_pull(skb, skb_network_offset(skb));
+			skb->ip_summed = CHECKSUM_NONE;
+			skb->pkt_type = PACKET_HOST;
+
+			/* short-circuited back to local bridge */
+			if (netif_rx(skb) == NET_RX_SUCCESS) {
+				struct vxlan_stats *stats =
+						this_cpu_ptr(vxlan->stats);
+		
+				u64_stats_update_begin(&stats->syncp);
+				stats->tx_packets++;
+				stats->tx_bytes += pkt_len;
+				u64_stats_update_end(&stats->syncp);
+			} else {
+				dev->stats.tx_errors++;
+				dev->stats.tx_aborted_errors++;
+			}
+			return NETDEV_TX_OK;
+		}
 		goto drop;
+	}
 
 	/* Need space for new headers (invalidates iph ptr) */
 	if (skb_cow_head(skb, VXLAN_HEADROOM))
@@ -1020,6 +1200,10 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_AGEING]	= { .type = NLA_U32 },
 	[IFLA_VXLAN_LIMIT]	= { .type = NLA_U32 },
 	[IFLA_VXLAN_PORT_RANGE] = { .len  = sizeof(struct ifla_vxlan_port_range) },
+	[IFLA_VXLAN_PROXY]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_RSC]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_L2MISS]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_L3MISS]	= { .type = NLA_U8 },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -1118,6 +1302,18 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	else
 		vxlan->age_interval = FDB_AGE_DEFAULT;
 
+	if (data[IFLA_VXLAN_PROXY] && nla_get_u8(data[IFLA_VXLAN_PROXY]))
+		vxlan->proxy = true;
+
+	if (data[IFLA_VXLAN_RSC] && nla_get_u8(data[IFLA_VXLAN_RSC]))
+		vxlan->rsc = true;
+
+	if (data[IFLA_VXLAN_L2MISS] && nla_get_u8(data[IFLA_VXLAN_L2MISS]))
+		vxlan->l2miss = true;
+
+	if (data[IFLA_VXLAN_L3MISS] && nla_get_u8(data[IFLA_VXLAN_L3MISS]))
+		vxlan->l3miss = true;
+
 	if (data[IFLA_VXLAN_LIMIT])
 		vxlan->addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);
 
@@ -1154,6 +1350,7 @@ static size_t vxlan_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TTL */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_TOS */
 		nla_total_size(sizeof(__u8)) +	/* IFLA_VXLAN_LEARNING */
+		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_DOVE */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_AGEING */
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_LIMIT */
 		nla_total_size(sizeof(struct ifla_vxlan_port_range)) +
@@ -1183,6 +1380,10 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||
 	    nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) ||
 	    nla_put_u8(skb, IFLA_VXLAN_LEARNING, vxlan->learn) ||
+	    nla_put_u8(skb, IFLA_VXLAN_PROXY, vxlan->proxy) ||
+	    nla_put_u8(skb, IFLA_VXLAN_RSC, vxlan->rsc) ||
+	    nla_put_u8(skb, IFLA_VXLAN_L2MISS, vxlan->l2miss) ||
+	    nla_put_u8(skb, IFLA_VXLAN_L3MISS, vxlan->l3miss) ||
 	    nla_put_u32(skb, IFLA_VXLAN_AGEING, vxlan->age_interval) ||
 	    nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->addrmax))
 		goto nla_put_failure;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 5c80cb1..89695ff 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -285,6 +285,10 @@ enum {
 	IFLA_VXLAN_AGEING,
 	IFLA_VXLAN_LIMIT,
 	IFLA_VXLAN_PORT_RANGE,
+	IFLA_VXLAN_PROXY,
+	IFLA_VXLAN_RSC,
+	IFLA_VXLAN_L2MISS,
+	IFLA_VXLAN_L3MISS,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)

^ permalink raw reply related

* [PATCH] tcp: fix retransmission in repair mode
From: Andrey Vagin @ 2012-11-15 14:03 UTC (permalink / raw)
  To: netdev
  Cc: criu, linux-kernel, Andrew Vagin, Pavel Emelyanov,
	David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy

From: Andrew Vagin <avagin@openvz.org>

Currently if a socket was repaired with a few packet in a write queue,
a kernel bug may be triggered:

kernel BUG at net/ipv4/tcp_output.c:2330!
RIP: 0010:[<ffffffff8155784f>] tcp_retransmit_skb+0x5ff/0x610

According to the initial realization v3.4-rc2-963-gc0e88ff,
all skb-s should look like already posted. This patch fixes code
according with this sentence.

Here are three points, which were not done in the initial patch:
1. A tcp send head should not be changed
2. Initialize TSO state of a skb
3. Reset the retransmission time

This patch moves logic from tcp_sendmsg to tcp_write_xmit. A packet
passes the ussual way, but isn't sent to network. This patch solves
all described problems and handles tcp_sendpages.

Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 net/ipv4/tcp.c        | 4 ++--
 net/ipv4/tcp_output.c | 4 ++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197c000..083092e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1212,7 +1212,7 @@ new_segment:
 wait_for_sndbuf:
 			set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
-			if (copied && likely(!tp->repair))
+			if (copied)
 				tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
 
 			if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
@@ -1223,7 +1223,7 @@ wait_for_memory:
 	}
 
 out:
-	if (copied && likely(!tp->repair))
+	if (copied)
 		tcp_push(sk, flags, mss_now, tp->nonagle);
 	release_sock(sk);
 	return copied + copied_syn;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cfe6ffe..2798706 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1986,6 +1986,9 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
 		BUG_ON(!tso_segs);
 
+		if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE)
+			goto repair; /* Skip network transmission */
+
 		cwnd_quota = tcp_cwnd_test(tp, skb);
 		if (!cwnd_quota)
 			break;
@@ -2026,6 +2029,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
 			break;
 
+repair:
 		/* Advance the send_head.  This one is sent out.
 		 * This call will increment packets_out.
 		 */
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next v2 3/3] ip6tnl: fix sparse warnings in ip6_tnl_netlink_parms()
From: Nicolas Dichtel @ 2012-11-15 14:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, davem, fengguang.wu, Nicolas Dichtel
In-Reply-To: <1352988402-16950-1-git-send-email-nicolas.dichtel@6wind.com>

This change fixes a sparse warning triggered by casting the flowinfo from
netlink messages in an u32 instead of be32. This change corrects that in order
to resolve the sparse warning.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
v2: enhance commit log and add "Reported-by" line

 net/ipv6/ip6_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index ab4d056..bf3a549 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1568,7 +1568,7 @@ static void ip6_tnl_netlink_parms(struct nlattr *data[],
 		parms->encap_limit = nla_get_u8(data[IFLA_IPTUN_ENCAP_LIMIT]);
 
 	if (data[IFLA_IPTUN_FLOWINFO])
-		parms->flowinfo = nla_get_u32(data[IFLA_IPTUN_FLOWINFO]);
+		parms->flowinfo = nla_get_be32(data[IFLA_IPTUN_FLOWINFO]);
 
 	if (data[IFLA_IPTUN_FLAGS])
 		parms->flags = nla_get_u32(data[IFLA_IPTUN_FLAGS]);
-- 
1.7.12

^ permalink raw reply related

* [PATCH net-next v2 2/3] sit: fix sparse warnings
From: Nicolas Dichtel @ 2012-11-15 14:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, davem, fengguang.wu, Nicolas Dichtel
In-Reply-To: <1352988402-16950-1-git-send-email-nicolas.dichtel@6wind.com>

This change fixes several sparse warnings about endianness problem. The wrong
nla_*() functions were used.
It also fix a sparse warning about a flag test (field i_flags). This field is
used in this file like a local flag only, so it is more an u16 (gre uses it as a
be16). This sparse warning was already there before the patch that add netlink
management, the code has just been moved.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
v2: enhance commit log and add "Reported-by" line

 net/ipv6/sit.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 7bd2a06..ca6c2c8 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -228,7 +228,7 @@ static int ipip6_tunnel_create(struct net_device *dev)
 		goto out;
 	ipip6_tunnel_clone_6rd(dev, sitn);
 
-	if (t->parms.i_flags & SIT_ISATAP)
+	if ((__force u16)t->parms.i_flags & SIT_ISATAP)
 		dev->priv_flags |= IFF_ISATAP;
 
 	err = register_netdevice(dev);
@@ -1240,10 +1240,10 @@ static void ipip6_netlink_parms(struct nlattr *data[],
 		parms->link = nla_get_u32(data[IFLA_IPTUN_LINK]);
 
 	if (data[IFLA_IPTUN_LOCAL])
-		parms->iph.saddr = nla_get_u32(data[IFLA_IPTUN_LOCAL]);
+		parms->iph.saddr = nla_get_be32(data[IFLA_IPTUN_LOCAL]);
 
 	if (data[IFLA_IPTUN_REMOTE])
-		parms->iph.daddr = nla_get_u32(data[IFLA_IPTUN_REMOTE]);
+		parms->iph.daddr = nla_get_be32(data[IFLA_IPTUN_REMOTE]);
 
 	if (data[IFLA_IPTUN_TTL]) {
 		parms->iph.ttl = nla_get_u8(data[IFLA_IPTUN_TTL]);
@@ -1258,7 +1258,7 @@ static void ipip6_netlink_parms(struct nlattr *data[],
 		parms->iph.frag_off = htons(IP_DF);
 
 	if (data[IFLA_IPTUN_FLAGS])
-		parms->i_flags = nla_get_u16(data[IFLA_IPTUN_FLAGS]);
+		parms->i_flags = nla_get_be16(data[IFLA_IPTUN_FLAGS]);
 }
 
 static int ipip6_newlink(struct net *src_net, struct net_device *dev,
@@ -1337,7 +1337,7 @@ static int ipip6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u8(skb, IFLA_IPTUN_TOS, parm->iph.tos) ||
 	    nla_put_u8(skb, IFLA_IPTUN_PMTUDISC,
 		       !!(parm->iph.frag_off & htons(IP_DF))) ||
-	    nla_put_u16(skb, IFLA_IPTUN_FLAGS, parm->i_flags))
+	    nla_put_be16(skb, IFLA_IPTUN_FLAGS, parm->i_flags))
 		goto nla_put_failure;
 	return 0;
 
-- 
1.7.12

^ permalink raw reply related

* [PATCH net-next v2 1/3] ipip: fix sparse warnings in ipip_netlink_parms()
From: Nicolas Dichtel @ 2012-11-15 14:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, davem, fengguang.wu, Nicolas Dichtel
In-Reply-To: <1352985602.4497.31.camel@edumazet-glaptop>

This change fixes two sparse warnings triggered by casting the ip addresses
from netlink messages in an u32 instead of be32. This change corrects that
in order to resolve the sparse warnings.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
v2: enhance commit log and add "Reported-by" line

 net/ipv4/ipip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 64686e1..c26c171 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -864,10 +864,10 @@ static void ipip_netlink_parms(struct nlattr *data[],
 		parms->link = nla_get_u32(data[IFLA_IPTUN_LINK]);
 
 	if (data[IFLA_IPTUN_LOCAL])
-		parms->iph.saddr = nla_get_u32(data[IFLA_IPTUN_LOCAL]);
+		parms->iph.saddr = nla_get_be32(data[IFLA_IPTUN_LOCAL]);
 
 	if (data[IFLA_IPTUN_REMOTE])
-		parms->iph.daddr = nla_get_u32(data[IFLA_IPTUN_REMOTE]);
+		parms->iph.daddr = nla_get_be32(data[IFLA_IPTUN_REMOTE]);
 
 	if (data[IFLA_IPTUN_TTL]) {
 		parms->iph.ttl = nla_get_u8(data[IFLA_IPTUN_TTL]);
-- 
1.7.12

^ permalink raw reply related

* Re: [PATCH] tcp: fix retransmission in repair mode
From: Pavel Emelyanov @ 2012-11-15 14:10 UTC (permalink / raw)
  To: Andrey Vagin, David S. Miller
  Cc: netdev, criu, linux-kernel, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1352988197-14414-1-git-send-email-avagin@openvz.org>

On 11/15/2012 06:03 PM, Andrey Vagin wrote:
> From: Andrew Vagin <avagin@openvz.org>
> 
> Currently if a socket was repaired with a few packet in a write queue,
> a kernel bug may be triggered:
> 
> kernel BUG at net/ipv4/tcp_output.c:2330!
> RIP: 0010:[<ffffffff8155784f>] tcp_retransmit_skb+0x5ff/0x610
> 
> According to the initial realization v3.4-rc2-963-gc0e88ff,
> all skb-s should look like already posted. This patch fixes code
> according with this sentence.
> 
> Here are three points, which were not done in the initial patch:
> 1. A tcp send head should not be changed
> 2. Initialize TSO state of a skb
> 3. Reset the retransmission time
> 
> This patch moves logic from tcp_sendmsg to tcp_write_xmit. A packet
> passes the ussual way, but isn't sent to network. This patch solves
> all described problems and handles tcp_sendpages.
> 
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: James Morris <jmorris@namei.org>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>

Acked-by: Pavel Emelyanov <xemul@parallels.com>

^ permalink raw reply

* Re: [PATCH 0/9 v4] use efficient this_cpu_* helper
From: Christoph Lameter @ 2012-11-15 14:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Miller, NetDev, Shan Wei, Kernel-Maillist
In-Reply-To: <50A1A7A9.6050501@gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2786 bytes --]

Tejon: Could you pick up this patchset?

On Tue, 13 Nov 2012, Shan Wei wrote:

> this_cpu_ptr/this_cpu_read is faster than per_cpu_ptr(p, smp_processor_id())
> and can reduce  memory accesses.
> The latter helper needs to find the offset for current cpu,
> and needs more assembler instructions which objdump shows in following.
>
> this_cpu_ptr relocates and address. this_cpu_read() relocates the address
> and performs the fetch. If you want to operate on rda(defined as per_cpu)
> then you can only use this_cpu_ptr. this_cpu_read() saves you more instructions
> since it can do the relocation and the fetch in one instruction.
>
> per_cpu_ptr(p, smp_processor_id())：
>   1e:   65 8b 04 25 00 00 00 00         mov    %gs:0x0,%eax
>   26:   48 98                           cltq
>   28:   31 f6                           xor    %esi,%esi
>   2a:   48 c7 c7 00 00 00 00            mov    $0x0,%rdi
>   31:   48 8b 04 c5 00 00 00 00         mov    0x0(,%rax,8),%rax
>   39:   c7 44 10 04 14 00 00 00         movl   $0x14,0x4(%rax,%rdx,1)
>
> this_cpu_ptr(p)
>   1e:   65 48 03 14 25 00 00 00 00      add    %gs:0x0,%rdx
>   27:   31 f6                           xor    %esi,%esi
>   29:   c7 42 04 14 00 00 00            movl   $0x14,0x4(%rdx)
>   30:   48 c7 c7 00 00 00 00            mov    $0x0,%rdi
>
>
>
> Changelog V4:
> 1. [read|write]ing fields of struct rds_ib_cache_head using __this_cpu_* operation for rds subsystem.
>    see patch2
> 2. fix bug in xfrm to read pointer. see patch3.
> 3. avoid type cast in patch7.
>
> Changelog V3:
> 1. use this_cpu_read directly read member of per-cpu variable,
>    so that droping the this_cpu_ptr operation.
> 2. for preemption off and bottom halves off case,
>    use __this_cpu_read instead of this_cpu_read.
>
> Changelog V2:
> 1. Use this_cpu_read directly instead of ref to field of per-cpu variable.
> 2. Patch5 about ftrace is dropped from this series.
> 3. Add new patch9 to replace get_cpu;per_cpu_ptr;put_cpu with this_cpu_add opt.
> 4. For preemption disable case, use __this_cpu_read instead.
>
>
> $ git diff --stat d4185bbf62a5d8d777ee445db1581beb17882a07
>  drivers/clocksource/arm_generic.c |    2 +-
>  kernel/padata.c                   |    5 ++---
>  kernel/rcutree.c                  |    2 +-
>  kernel/trace/blktrace.c           |    2 +-
>  kernel/trace/trace.c              |    5 +----
>  net/batman-adv/main.h             |    4 +---
>  net/core/flow.c                   |    4 +---
>  net/openvswitch/datapath.c        |    4 ++--
>  net/openvswitch/vport.c           |    5 ++---
>  net/rds/ib.h                      |    2 +-
>  net/rds/ib_recv.c                 |   24 +++++++++++++-----------
>  net/xfrm/xfrm_ipcomp.c            |    8 +++-----
>  12 files changed, 29 insertions(+), 38 deletions(-)
>

^ permalink raw reply

* [net-next 0/9][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to ioat (DCA) and ixgbevf.

The following are changes since commit 702ed3c1a9dfe4dfe112f13542d0c9d689f5008b:
  Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (1):
  ioat: Do not enable DCA if tag map is invalid

Greg Rose (8):
  ixgbevf: Streamline the rx buffer allocation
  ixgbevf: Fix unnecessary dereference where local var is available.
  ixgbevf: Remove the ring adapter pointer value
  ixgbevf: Remove checking for mac.ops function pointers
  ixgbevf: Remove mailbox spinlock from the reset function
  ixgbevf: White space and comments clean up
  ixgbevf: Remove unneeded and obsolete comment
  ixgbevf: Add checksum statistics counters to rings

 drivers/dma/ioat/dca.c                            |  23 ++++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |   3 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 123 +++++++++-------------
 3 files changed, 72 insertions(+), 77 deletions(-)

-- 
1.7.11.7

^ permalink raw reply

* [net-next 1/9] ioat: Do not enable DCA if tag map is invalid
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

I have encountered several systems that have an invalid tag map.  This
invalid tag map results in only two tags being generated 0x1F which is
usually applied to the first core in a Hyper-threaded pair and 0x00 which
is applied to the second core in a Hyper-threaded pair.  The net result of
all this is that DCA causes significant cache thrash because the 0x1F tag
will send traffic to the second socket, which the 0x00 tag will not DCA tag
the frame resulting in no benefit.

For now the best solution from the driver's perspective is to just disable
DCA if the tag map is invalid.  The correct solution for this would be to
have the BIOS on affected systems updated so that the correct tags are
generated for a given APIC ID.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/dma/ioat/dca.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/dma/ioat/dca.c b/drivers/dma/ioat/dca.c
index abd9038..d666807 100644
--- a/drivers/dma/ioat/dca.c
+++ b/drivers/dma/ioat/dca.c
@@ -604,6 +604,23 @@ static int ioat3_dca_count_dca_slots(void *iobase, u16 dca_offset)
 	return slots;
 }
 
+static inline int dca3_tag_map_invalid(u8 *tag_map)
+{
+	/*
+	 * If the tag map is not programmed by the BIOS the default is:
+	 * 0x80 0x80 0x80 0x80 0x80 0x00 0x00 0x00
+	 *
+	 * This an invalid map and will result in only 2 possible tags
+	 * 0x1F and 0x00.  0x00 is an invalid DCA tag so we know that
+	 * this entire definition is invalid.
+	 */
+	return ((tag_map[0] == DCA_TAG_MAP_VALID) &&
+		(tag_map[1] == DCA_TAG_MAP_VALID) &&
+		(tag_map[2] == DCA_TAG_MAP_VALID) &&
+		(tag_map[3] == DCA_TAG_MAP_VALID) &&
+		(tag_map[4] == DCA_TAG_MAP_VALID));
+}
+
 struct dca_provider * __devinit
 ioat3_dca_init(struct pci_dev *pdev, void __iomem *iobase)
 {
@@ -674,6 +691,12 @@ ioat3_dca_init(struct pci_dev *pdev, void __iomem *iobase)
 		ioatdca->tag_map[i] = bit & DCA_TAG_MAP_MASK;
 	}
 
+	if (dca3_tag_map_invalid(ioatdca->tag_map)) {
+		dev_err(&pdev->dev, "APICID_TAG_MAP set incorrectly by BIOS, disabling DCA\n");
+		free_dca_provider(dca);
+		return NULL;
+	}
+
 	err = register_dca_provider(dca, &pdev->dev);
 	if (err) {
 		free_dca_provider(dca);
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 3/9] ixgbevf: Fix unnecessary dereference where local var is available.
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Remove dereference of hw pointer from adapter structure since a pointer
to the hw structure has already been allocated off the stack.  Also clean
up useless parenthesis.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index b46dff8..a001684 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1312,8 +1312,8 @@ static inline void ixgbevf_rx_desc_queue_enable(struct ixgbevf_adapter *adapter,
 		       "not set within the polling period\n", rxr);
 	}

-	ixgbevf_release_rx_desc(&adapter->hw, &adapter->rx_ring[rxr],
-				(adapter->rx_ring[rxr].count - 1));
+	ixgbevf_release_rx_desc(hw, &adapter->rx_ring[rxr],
+				adapter->rx_ring[rxr].count - 1);
 }

 static void ixgbevf_save_reset_stats(struct ixgbevf_adapter *adapter)
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 2/9] ixgbevf: Streamline the rx buffer allocation
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Moves allocation of local variable to section where it is needed and
removes unnecessary if statement.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 9d88153..b46dff8 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -341,15 +341,16 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 	struct pci_dev *pdev = adapter->pdev;
 	union ixgbe_adv_rx_desc *rx_desc;
 	struct ixgbevf_rx_buffer *bi;
-	struct sk_buff *skb;
 	unsigned int i = rx_ring->next_to_use;
 
 	bi = &rx_ring->rx_buffer_info[i];
 
 	while (cleaned_count--) {
 		rx_desc = IXGBEVF_RX_DESC(rx_ring, i);
-		skb = bi->skb;
-		if (!skb) {
+
+		if (!bi->skb) {
+			struct sk_buff *skb;
+
 			skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
 							rx_ring->rx_buf_len);
 			if (!skb) {
@@ -357,8 +358,7 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 				goto no_buffers;
 			}
 			bi->skb = skb;
-		}
-		if (!bi->dma) {
+
 			bi->dma = dma_map_single(&pdev->dev, skb->data,
 						 rx_ring->rx_buf_len,
 						 DMA_FROM_DEVICE);
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 5/9] ixgbevf: Remove checking for mac.ops function pointers
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

The function pointers will always be set - there is no good reason to
check them.  Also just remove get_bus_info() call as the VF has no bus
info to report.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 56 ++++++++---------------
 1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index a001684..8b8a685 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1150,9 +1150,6 @@ static int ixgbevf_vlan_rx_add_vid(struct net_device *netdev, u16 vid)
 	struct ixgbe_hw *hw = &adapter->hw;
 	int err;
 
-	if (!hw->mac.ops.set_vfta)
-		return -EOPNOTSUPP;
-
 	spin_lock_bh(&adapter->mbx_lock);
 
 	/* add VID to filter table */
@@ -1181,8 +1178,7 @@ static int ixgbevf_vlan_rx_kill_vid(struct net_device *netdev, u16 vid)
 	spin_lock_bh(&adapter->mbx_lock);
 
 	/* remove VID from filter table */
-	if (hw->mac.ops.set_vfta)
-		err = hw->mac.ops.set_vfta(hw, vid, 0, false);
+	err = hw->mac.ops.set_vfta(hw, vid, 0, false);
 
 	spin_unlock_bh(&adapter->mbx_lock);
 
@@ -1243,8 +1239,7 @@ static void ixgbevf_set_rx_mode(struct net_device *netdev)
 	spin_lock_bh(&adapter->mbx_lock);
 
 	/* reprogram multicast list */
-	if (hw->mac.ops.update_mc_addr_list)
-		hw->mac.ops.update_mc_addr_list(hw, netdev);
+	hw->mac.ops.update_mc_addr_list(hw, netdev);
 
 	ixgbevf_write_uc_addr_list(netdev);
 
@@ -1414,12 +1409,10 @@ static void ixgbevf_up_complete(struct ixgbevf_adapter *adapter)
 
 	spin_lock_bh(&adapter->mbx_lock);
 
-	if (hw->mac.ops.set_rar) {
-		if (is_valid_ether_addr(hw->mac.addr))
-			hw->mac.ops.set_rar(hw, 0, hw->mac.addr, 0);
-		else
-			hw->mac.ops.set_rar(hw, 0, hw->mac.perm_addr, 0);
-	}
+	if (is_valid_ether_addr(hw->mac.addr))
+		hw->mac.ops.set_rar(hw, 0, hw->mac.addr, 0);
+	else
+		hw->mac.ops.set_rar(hw, 0, hw->mac.perm_addr, 0);
 
 	spin_unlock_bh(&adapter->mbx_lock);
 
@@ -2201,6 +2194,7 @@ static void ixgbevf_watchdog_task(struct work_struct *work)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 link_speed = adapter->link_speed;
 	bool link_up = adapter->link_up;
+	s32 need_reset;
 
 	adapter->flags |= IXGBE_FLAG_IN_WATCHDOG_TASK;
 
@@ -2208,29 +2202,20 @@ static void ixgbevf_watchdog_task(struct work_struct *work)
 	 * Always check the link on the watchdog because we have
 	 * no LSC interrupt
 	 */
-	if (hw->mac.ops.check_link) {
-		s32 need_reset;
 
-		spin_lock_bh(&adapter->mbx_lock);
+	spin_lock_bh(&adapter->mbx_lock);
 
-		need_reset = hw->mac.ops.check_link(hw, &link_speed,
-						    &link_up, false);
+	need_reset = hw->mac.ops.check_link(hw, &link_speed, &link_up, false);
 
-		spin_unlock_bh(&adapter->mbx_lock);
+	spin_unlock_bh(&adapter->mbx_lock);
 
-		if (need_reset) {
-			adapter->link_up = link_up;
-			adapter->link_speed = link_speed;
-			netif_carrier_off(netdev);
-			netif_tx_stop_all_queues(netdev);
-			schedule_work(&adapter->reset_task);
-			goto pf_has_reset;
-		}
-	} else {
-		/* always assume link is up, if no check link
-		 * function */
-		link_speed = IXGBE_LINK_SPEED_10GB_FULL;
-		link_up = true;
+	if (need_reset) {
+		adapter->link_up = link_up;
+		adapter->link_speed = link_speed;
+		netif_carrier_off(netdev);
+		netif_tx_stop_all_queues(netdev);
+		schedule_work(&adapter->reset_task);
+		goto pf_has_reset;
 	}
 	adapter->link_up = link_up;
 	adapter->link_speed = link_speed;
@@ -3070,8 +3055,7 @@ static int ixgbevf_set_mac(struct net_device *netdev, void *p)
 
 	spin_lock_bh(&adapter->mbx_lock);
 
-	if (hw->mac.ops.set_rar)
-		hw->mac.ops.set_rar(hw, 0, hw->mac.addr, 0);
+	hw->mac.ops.set_rar(hw, 0, hw->mac.addr, 0);
 
 	spin_unlock_bh(&adapter->mbx_lock);
 
@@ -3396,10 +3380,6 @@ static int __devinit ixgbevf_probe(struct pci_dev *pdev,
 	if (err)
 		goto err_sw_init;
 
-	/* pick up the PCI bus settings for reporting later */
-	if (hw->mac.ops.get_bus_info)
-		hw->mac.ops.get_bus_info(hw);
-
 	strcpy(netdev->name, "eth%d");
 
 	err = register_netdev(netdev);
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 4/9] ixgbevf: Remove the ring adapter pointer value
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

It is unused - remove it.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 1211fa0..dbdf39b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -58,7 +58,6 @@ struct ixgbevf_ring {
 	struct ixgbevf_ring *next;
 	struct net_device *netdev;
 	struct device *dev;
-	struct ixgbevf_adapter *adapter;  /* backlink */
 	void *desc;			/* descriptor ring memory */
 	dma_addr_t dma;			/* phys. address of descriptor ring */
 	unsigned int size;		/* length in bytes */
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 7/9] ixgbevf: White space and comments clean up
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 592fe99..57ae5cd 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -121,7 +121,6 @@ static inline void ixgbevf_release_rx_desc(struct ixgbe_hw *hw,
  * @direction: 0 for Rx, 1 for Tx, -1 for other causes
  * @queue: queue to map the corresponding interrupt to
  * @msix_vector: the vector to map to the corresponding queue
- *
  */
 static void ixgbevf_set_ivar(struct ixgbevf_adapter *adapter, s8 direction,
 			     u8 queue, u8 msix_vector)
@@ -380,7 +379,6 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_adapter *adapter,
 no_buffers:
 	if (rx_ring->next_to_use != i) {
 		rx_ring->next_to_use = i;
-
 		ixgbevf_release_rx_desc(&adapter->hw, rx_ring, i);
 	}
 }
@@ -765,7 +763,6 @@ static irqreturn_t ixgbevf_msix_other(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
-
 /**
  * ixgbevf_msix_clean_rings - single unshared vector rx clean (all queues)
  * @irq: unused
@@ -1224,12 +1221,13 @@ static int ixgbevf_write_uc_addr_list(struct net_device *netdev)
 }
 
 /**
- * ixgbevf_set_rx_mode - Multicast set
+ * ixgbevf_set_rx_mode - Multicast and unicast set
  * @netdev: network interface device structure
  *
  * The set_rx_method entry point is called whenever the multicast address
- * list or the network interface flags are updated.  This routine is
- * responsible for configuring the hardware for proper multicast mode.
+ * list, unicast address list or the network interface flags are updated.
+ * This routine is responsible for configuring the hardware for proper
+ * multicast mode and configuring requested unicast filters.
  **/
 static void ixgbevf_set_rx_mode(struct net_device *netdev)
 {
@@ -1588,7 +1586,6 @@ static void ixgbevf_clean_tx_ring(struct ixgbevf_adapter *adapter,
 		return;
 
 	/* Free all the Tx ring sk_buffs */
-
 	for (i = 0; i < tx_ring->count; i++) {
 		tx_buffer_info = &tx_ring->tx_buffer_info[i];
 		ixgbevf_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
@@ -1757,6 +1754,7 @@ static int ixgbevf_acquire_msix_vectors(struct ixgbevf_adapter *adapter,
 		 */
 		adapter->num_msix_vectors = vectors;
 	}
+
 	return err;
 }
 
@@ -2053,7 +2051,7 @@ static int __devinit ixgbevf_sw_init(struct ixgbevf_adapter *adapter)
 			goto out;
 		}
 		memcpy(adapter->netdev->dev_addr, adapter->hw.mac.addr,
-			adapter->netdev->addr_len);
+		       adapter->netdev->addr_len);
 	}
 
 	/* lock to protect mailbox accesses */
@@ -2198,7 +2196,6 @@ static void ixgbevf_watchdog_task(struct work_struct *work)
 	 * Always check the link on the watchdog because we have
 	 * no LSC interrupt
 	 */
-
 	spin_lock_bh(&adapter->mbx_lock);
 
 	need_reset = hw->mac.ops.check_link(hw, &link_speed, &link_up, false);
@@ -2704,9 +2701,6 @@ static int ixgbevf_tso(struct ixgbevf_ring *tx_ring,
 static bool ixgbevf_tx_csum(struct ixgbevf_ring *tx_ring,
 			    struct sk_buff *skb, u32 tx_flags)
 {
-
-
-
 	u32 vlan_macip_lens = 0;
 	u32 mss_l4len_idx = 0;
 	u32 type_tucmd = 0;
@@ -2896,7 +2890,6 @@ static void ixgbevf_tx_queue(struct ixgbevf_ring *tx_ring, int tx_flags,
 		olinfo_status |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
 		if (tx_flags & IXGBE_TX_FLAGS_IPV4)
 			olinfo_status |= IXGBE_ADVTXD_POPTS_IXSM;
-
 	}
 
 	/*
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 6/9] ixgbevf: Remove mailbox spinlock from the reset function
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

The spinlocks are not required during reset.  There won't be any
contention for the mailbox resource.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 8b8a685..592fe99 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1702,15 +1702,11 @@ void ixgbevf_reset(struct ixgbevf_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct net_device *netdev = adapter->netdev;
 
-	spin_lock_bh(&adapter->mbx_lock);
-
 	if (hw->mac.ops.reset_hw(hw))
 		hw_dbg(hw, "PF still resetting\n");
 	else
 		hw->mac.ops.init_hw(hw);
 
-	spin_unlock_bh(&adapter->mbx_lock);
-
 	if (is_valid_ether_addr(adapter->hw.mac.addr)) {
 		memcpy(netdev->dev_addr, adapter->hw.mac.addr,
 		       netdev->addr_len);
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 8/9] ixgbevf: Remove unneeded and obsolete comment
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 57ae5cd..a52b14e 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1681,13 +1681,6 @@ void ixgbevf_reinit_locked(struct ixgbevf_adapter *adapter)
 	while (test_and_set_bit(__IXGBEVF_RESETTING, &adapter->state))
 		msleep(1);
 
-	/*
-	 * Check if PF is up before re-init.  If not then skip until
-	 * later when the PF is up and ready to service requests from
-	 * the VF via mailbox.  If the VF is up and running then the
-	 * watchdog task will continue to schedule reset tasks until
-	 * the PF is up and running.
-	 */
 	ixgbevf_down(adapter);
 	ixgbevf_up(adapter);
 
-- 
1.7.11.7

^ permalink raw reply related

* [net-next 9/9] ixgbevf: Add checksum statistics counters to rings
From: Jeff Kirsher @ 2012-11-15 14:39 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1352990387-3872-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Add hardware checksum statistic counters to the ring structures and
then during packet processing update those counters instead of the
global counters in the adapter structure.  Only update the adapter
structure counters when all other statistics are gathered in the
ixgbevf_update_stats() function.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h      |  2 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 23 ++++++++++++++++-------
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index dbdf39b..fc0af9a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -74,6 +74,8 @@ struct ixgbevf_ring {
 	u64			total_bytes;
 	u64			total_packets;
 	struct u64_stats_sync	syncp;
+	u64 hw_csum_rx_error;
+	u64 hw_csum_rx_good;
 
 	u16 head;
 	u16 tail;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index a52b14e..f267c00 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -295,12 +295,11 @@ static void ixgbevf_receive_skb(struct ixgbevf_q_vector *q_vector,
 
 /**
  * ixgbevf_rx_checksum - indicate in skb if hw indicated a good cksum
- * @adapter: address of board private structure
+ * @ring: pointer to Rx descriptor ring structure
  * @status_err: hardware indication of status of receive
  * @skb: skb currently being received and modified
  **/
-static inline void ixgbevf_rx_checksum(struct ixgbevf_adapter *adapter,
-				       struct ixgbevf_ring *ring,
+static inline void ixgbevf_rx_checksum(struct ixgbevf_ring *ring,
 				       u32 status_err, struct sk_buff *skb)
 {
 	skb_checksum_none_assert(skb);
@@ -312,7 +311,7 @@ static inline void ixgbevf_rx_checksum(struct ixgbevf_adapter *adapter,
 	/* if IP and error */
 	if ((status_err & IXGBE_RXD_STAT_IPCS) &&
 	    (status_err & IXGBE_RXDADV_ERR_IPE)) {
-		adapter->hw_csum_rx_error++;
+		ring->hw_csum_rx_error++;
 		return;
 	}
 
@@ -320,13 +319,13 @@ static inline void ixgbevf_rx_checksum(struct ixgbevf_adapter *adapter,
 		return;
 
 	if (status_err & IXGBE_RXDADV_ERR_TCPE) {
-		adapter->hw_csum_rx_error++;
+		ring->hw_csum_rx_error++;
 		return;
 	}
 
 	/* It must be a TCP or UDP packet with a valid checksum */
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
-	adapter->hw_csum_rx_good++;
+	ring->hw_csum_rx_good++;
 }
 
 /**
@@ -462,7 +461,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			goto next_desc;
 		}
 
-		ixgbevf_rx_checksum(adapter, rx_ring, staterr, skb);
+		ixgbevf_rx_checksum(rx_ring, staterr, skb);
 
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
@@ -2094,6 +2093,7 @@ out:
 void ixgbevf_update_stats(struct ixgbevf_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	int i;
 
 	UPDATE_VF_COUNTER_32bit(IXGBE_VFGPRC, adapter->stats.last_vfgprc,
 				adapter->stats.vfgprc);
@@ -2107,6 +2107,15 @@ void ixgbevf_update_stats(struct ixgbevf_adapter *adapter)
 				adapter->stats.vfgotc);
 	UPDATE_VF_COUNTER_32bit(IXGBE_VFMPRC, adapter->stats.last_vfmprc,
 				adapter->stats.vfmprc);
+
+	for (i = 0;  i  < adapter->num_rx_queues;  i++) {
+		adapter->hw_csum_rx_error +=
+			adapter->rx_ring[i].hw_csum_rx_error;
+		adapter->hw_csum_rx_good +=
+			adapter->rx_ring[i].hw_csum_rx_good;
+		adapter->rx_ring[i].hw_csum_rx_error = 0;
+		adapter->rx_ring[i].hw_csum_rx_good = 0;
+	}
 }
 
 /**
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH 0/9 v4] use efficient this_cpu_* helper
From: Tejun Heo @ 2012-11-15 14:53 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: David Miller, NetDev, Shan Wei, Kernel-Maillist
In-Reply-To: <0000013b047079ab-01971b64-07e2-4f2e-a204-6baff2180fc3-000000@email.amazonses.com>

On Thu, Nov 15, 2012 at 02:19:38PM +0000, Christoph Lameter wrote:
> Tejon: Could you pick up this patchset?

Sure, but, Shan, when posting patchset, please make the patches
replies to the head message; otherwise, it's pretty difficult to track
what's going on with the patchset as a whole.  I see that some patches
are being picked up by respective subsystems.  If you have patches
left, please let me know.

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH v2 net-next] sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call
From: Michele Baldessari @ 2012-11-15 15:01 UTC (permalink / raw)
  To: linux-sctp
  Cc: michele, Neil Horman, Thomas Graf, Vlad Yasevich, netdev,
	David S. Miller

The current SCTP stack is lacking a mechanism to have per association
statistics. This is an implementation modeled after OpenSolaris'
SCTP_GET_ASSOC_STATS.

Userspace part will follow on lksctp if/when there is a general ACK on
this.

V2)
  - Implement partial retrieval of stat struct to cope for future expansion
  - Kill the rtxpackets counter as it cannot be precise anyway
  - Rename outseqtsns to outofseqtsns to make it clearer that these are out
    of sequence unexpected TSNs
  - Move asoc->ipackets++ under a lock to avoid potential miscounts
  - Fold asoc->opackets++ into the already existing asoc check
  - Kill unneeded (q->asoc) test when increasing rtxchunks
  - Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
  - Don't count SHUTDOWNs as SACKs
  - Move SCTP_GET_ASSOC_STATS to the private space API
  - Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
    future struct growth
  - Move association statistics in their own struct
  - Update idupchunks when we send a SACK with dup TSNs
  - return min_rto in max_rto when RTO has not changed. Also return the
    transport when max_rto last changed.

Signed-off: Michele Baldessari <michele@acksyn.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
---
 include/net/sctp/sctp.h    | 12 ++++++++
 include/net/sctp/structs.h | 36 +++++++++++++++++++++++
 include/net/sctp/user.h    | 27 +++++++++++++++++
 net/sctp/associola.c       |  9 +++++-
 net/sctp/endpointola.c     |  5 +++-
 net/sctp/input.c           |  2 ++
 net/sctp/output.c          | 14 +++++----
 net/sctp/outqueue.c        | 12 ++++++--
 net/sctp/sm_make_chunk.c   |  5 ++--
 net/sctp/sm_sideeffect.c   |  1 +
 net/sctp/sm_statefuns.c    | 10 +++++--
 net/sctp/socket.c          | 72 ++++++++++++++++++++++++++++++++++++++++++++++
 net/sctp/transport.c       |  2 ++
 13 files changed, 194 insertions(+), 13 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 9c6414f..7fdf298 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -272,6 +272,18 @@ struct sctp_mib {
         unsigned long   mibs[SCTP_MIB_MAX];
 };
 
+/* helper function to track stats about max rto and related transport */
+static inline void sctp_max_rto(struct sctp_association *asoc,
+				struct sctp_transport *trans)
+{
+	if (asoc->stats.max_obs_rto < (__u64)trans->rto) {
+		asoc->stats.max_obs_rto = trans->rto;
+		memset(&asoc->stats.obs_rto_ipaddr, 0,
+			sizeof(struct sockaddr_storage));
+		memcpy(&asoc->stats.obs_rto_ipaddr, &trans->ipaddr,
+			trans->af_specific->sockaddr_len);
+	}
+}
 
 /* Print debugging messages.  */
 #if SCTP_DEBUG
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2b2f61d..bf94ddf 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1312,6 +1312,40 @@ struct sctp_inithdr_host {
 	__u32 initial_tsn;
 };
 
+/* SCTP_GET_ASSOC_STATS counters */
+struct sctp_priv_assoc_stats {
+	/* Maximum observed rto in the association. Value is set to
+	 * 0 when read and the max rto did not change. The transport where
+	 * the max_rto was observed is returned in obs_rto_ipaddr
+	 */
+	struct sockaddr_storage obs_rto_ipaddr;
+	__u64 max_obs_rto;
+	__u64 max_prev_obs_rto;
+	/* Total In and Out SACKs received and sent */
+	__u64 isacks;
+	__u64 osacks;
+	/* Total In and Out packets received and sent */
+	__u64 opackets;
+	__u64 ipackets;
+	/* Total retransmitted chunks */
+	__u64 rtxchunks;
+	/* TSN received > next expected */
+	__u64 outofseqtsns;
+	/* Duplicate Chunks received */
+	__u64 idupchunks;
+	/* Gap Ack Blocks received */
+	__u64 gapcnt;
+	/* Unordered data chunks sent and received */
+	__u64 ouodchunks;
+	__u64 iuodchunks;
+	/* Ordered data chunks sent and received */
+	__u64 oodchunks;
+	__u64 iodchunks;
+	/* Control chunks sent and received */
+	__u64 octrlchunks;
+	__u64 ictrlchunks;
+};
+
 /* RFC2960
  *
  * 12. Recommended Transmission Control Block (TCB) Parameters
@@ -1830,6 +1864,8 @@ struct sctp_association {
 
 	__u8 need_ecne:1,	/* Need to send an ECNE Chunk? */
 	     temp:1;		/* Is it a temporary association? */
+
+	struct sctp_priv_assoc_stats stats;
 };
 
 
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index 1b02d7a..c851e72 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -107,6 +107,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_GET_LOCAL_ADDRS	109		/* Get all local address. */
 #define SCTP_SOCKOPT_CONNECTX	110		/* CONNECTX requests. */
 #define SCTP_SOCKOPT_CONNECTX3	111	/* CONNECTX requests (updated) */
+#define SCTP_GET_ASSOC_STATS	112	/* Read only */
 
 /*
  * 5.2.1 SCTP Initiation Structure (SCTP_INIT)
@@ -719,6 +720,32 @@ struct sctp_getaddrs {
 	__u8			addrs[0]; /*output, variable size*/
 };
 
+/* A socket user request obtained via SCTP_GET_ASSOC_STATS that retrieves
+ * association stats. All stats are counts except sas_maxrto and
+ * sas_obs_rto_ipaddr. maxrto is the max observed rto + transport since
+ * the last call. Will return 0 when did not change since last call
+ */
+struct sctp_assoc_stats {
+	sctp_assoc_t	sas_assoc_id;    /* Input */
+					 /* Transport of observed max RTO */
+	struct sockaddr_storage sas_obs_rto_ipaddr;
+	__u64		sas_maxrto;      /* Maximum Observed RTO for period */
+	__u64		sas_isacks;	 /* SACKs received */
+	__u64		sas_osacks;	 /* SACKs sent */
+	__u64		sas_opackets;	 /* Packets sent */
+	__u64		sas_ipackets;	 /* Packets received */
+	__u64		sas_rtxchunks;   /* Retransmitted Chunks */
+	__u64		sas_outofseqtsns;/* TSN received > next expected */
+	__u64		sas_idupchunks;  /* Dups received (ordered+unordered) */
+	__u64		sas_gapcnt;      /* Gap Acknowledgements Received */
+	__u64		sas_ouodchunks;  /* Unordered data chunks sent */
+	__u64		sas_iuodchunks;  /* Unordered data chunks received */
+	__u64		sas_oodchunks;	 /* Ordered data chunks sent */
+	__u64		sas_iodchunks;	 /* Ordered data chunks received */
+	__u64		sas_octrlchunks; /* Control chunks sent */
+	__u64		sas_ictrlchunks; /* Control chunks received */
+};
+
 /* These are bit fields for msghdr->msg_flags.  See section 5.1.  */
 /* On user space Linux, these live in <bits/socket.h> as an enum.  */
 enum sctp_msg_flags {
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index b1ef3bc..afb7522 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -321,6 +321,9 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 	asoc->default_timetolive = sp->default_timetolive;
 	asoc->default_rcv_context = sp->default_rcv_context;
 
+	/* SCTP_GET_ASSOC_STATS COUNTERS */
+	memset(&asoc->stats, 0, sizeof(struct sctp_priv_assoc_stats));
+
 	/* AUTH related initializations */
 	INIT_LIST_HEAD(&asoc->endpoint_shared_keys);
 	err = sctp_auth_asoc_copy_shkeys(ep, asoc, gfp);
@@ -760,6 +763,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
 
 	/* Set the transport's RTO.initial value */
 	peer->rto = asoc->rto_initial;
+	sctp_max_rto(asoc, peer);
 
 	/* Set the peer's active state. */
 	peer->state = peer_state;
@@ -1152,8 +1156,11 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
 		 */
 		if (sctp_chunk_is_data(chunk))
 			asoc->peer.last_data_from = chunk->transport;
-		else
+		else {
 			SCTP_INC_STATS(net, SCTP_MIB_INCTRLCHUNKS);
+			if (chunk->chunk_hdr->type == SCTP_CID_SACK)
+				asoc->stats.isacks++;
+		}
 
 		if (chunk->transport)
 			chunk->transport->last_time_heard = jiffies;
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 1859e2b..32ab55b 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -480,8 +480,11 @@ normal:
 		 */
 		if (asoc && sctp_chunk_is_data(chunk))
 			asoc->peer.last_data_from = chunk->transport;
-		else
+		else {
 			SCTP_INC_STATS(sock_net(ep->base.sk), SCTP_MIB_INCTRLCHUNKS);
+			if (asoc)
+				asoc->stats.ictrlchunks++;
+		}
 
 		if (chunk->transport)
 			chunk->transport->last_time_heard = jiffies;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 8bd3c27..54c449b 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -281,6 +281,8 @@ int sctp_rcv(struct sk_buff *skb)
 		SCTP_INC_STATS_BH(net, SCTP_MIB_IN_PKT_SOFTIRQ);
 		sctp_inq_push(&chunk->rcvr->inqueue, chunk);
 	}
+	if (asoc)
+		asoc->stats.ipackets++;
 
 	sctp_bh_unlock_sock(sk);
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 4e90188bf..f5200a2 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -311,6 +311,8 @@ static sctp_xmit_t __sctp_packet_append_chunk(struct sctp_packet *packet,
 
 	    case SCTP_CID_SACK:
 		packet->has_sack = 1;
+		if (chunk->asoc)
+			chunk->asoc->stats.osacks++;
 		break;
 
 	    case SCTP_CID_AUTH:
@@ -584,11 +586,13 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 */
 
 	/* Dump that on IP!  */
-	if (asoc && asoc->peer.last_sent_to != tp) {
-		/* Considering the multiple CPU scenario, this is a
-		 * "correcter" place for last_sent_to.  --xguo
-		 */
-		asoc->peer.last_sent_to = tp;
+	if (asoc) {
+		asoc->stats.opackets++;
+		if (asoc->peer.last_sent_to != tp)
+			/* Considering the multiple CPU scenario, this is a
+			 * "correcter" place for last_sent_to.  --xguo
+			 */
+			asoc->peer.last_sent_to = tp;
 	}
 
 	if (has_data) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 1b4a7f8..379c81d 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -667,6 +667,7 @@ redo:
 				chunk->fast_retransmit = SCTP_DONT_FRTX;
 
 			q->empty = 0;
+			q->asoc->stats.rtxchunks++;
 			break;
 		}
 
@@ -876,12 +877,14 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 			if (status  != SCTP_XMIT_OK) {
 				/* put the chunk back */
 				list_add(&chunk->list, &q->control_chunk_list);
-			} else if (chunk->chunk_hdr->type == SCTP_CID_FWD_TSN) {
+			} else {
+				asoc->stats.octrlchunks++;
 				/* PR-SCTP C5) If a FORWARD TSN is sent, the
 				 * sender MUST assure that at least one T3-rtx
 				 * timer is running.
 				 */
-				sctp_transport_reset_timers(transport);
+				if (chunk->chunk_hdr->type == SCTP_CID_FWD_TSN)
+					sctp_transport_reset_timers(transport);
 			}
 			break;
 
@@ -1055,6 +1058,10 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 				 */
 				if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
 					chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
+				if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+					asoc->stats.ouodchunks++;
+				else
+					asoc->stats.oodchunks++;
 
 				break;
 
@@ -1162,6 +1169,7 @@ int sctp_outq_sack(struct sctp_outq *q, struct sctp_chunk *chunk)
 
 	sack_ctsn = ntohl(sack->cum_tsn_ack);
 	gap_ack_blocks = ntohs(sack->num_gap_ack_blocks);
+	asoc->stats.gapcnt += gap_ack_blocks;
 	/*
 	 * SFR-CACC algorithm:
 	 * On receipt of a SACK the sender SHOULD execute the
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index fbe1636..eb7633f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -804,10 +804,11 @@ struct sctp_chunk *sctp_make_sack(const struct sctp_association *asoc)
 				 gabs);
 
 	/* Add the duplicate TSN information.  */
-	if (num_dup_tsns)
+	if (num_dup_tsns) {
+		aptr->stats.idupchunks += num_dup_tsns;
 		sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns,
 				 sctp_tsnmap_get_dups(map));
-
+	}
 	/* Once we have a sack generated, check to see what our sack
 	 * generation is, if its 0, reset the transports to 0, and reset
 	 * the association generation to 1
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 6eecf7e..363727e 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -542,6 +542,7 @@ static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
 	 */
 	if (!is_hb || transport->hb_sent) {
 		transport->rto = min((transport->rto * 2), transport->asoc->rto_max);
+		sctp_max_rto(asoc, transport);
 	}
 }
 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index b6adef8..ecf7a17 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -6127,6 +6127,8 @@ static int sctp_eat_data(const struct sctp_association *asoc,
 		/* The TSN is too high--silently discard the chunk and
 		 * count on it getting retransmitted later.
 		 */
+		if (chunk->asoc)
+			chunk->asoc->stats.outofseqtsns++;
 		return SCTP_IERROR_HIGH_TSN;
 	} else if (tmp > 0) {
 		/* This is a duplicate.  Record it.  */
@@ -6226,10 +6228,14 @@ static int sctp_eat_data(const struct sctp_association *asoc,
 	/* Note: Some chunks may get overcounted (if we drop) or overcounted
 	 * if we renege and the chunk arrives again.
 	 */
-	if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+	if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED) {
 		SCTP_INC_STATS(net, SCTP_MIB_INUNORDERCHUNKS);
-	else {
+		if (chunk->asoc)
+			chunk->asoc->stats.iuodchunks++;
+	} else {
 		SCTP_INC_STATS(net, SCTP_MIB_INORDERCHUNKS);
+		if (chunk->asoc)
+			chunk->asoc->stats.iodchunks++;
 		ordered = 1;
 	}
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 15379ac..8113249 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -609,6 +609,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 				    2*asoc->pathmtu, 4380));
 				trans->ssthresh = asoc->peer.i.a_rwnd;
 				trans->rto = asoc->rto_initial;
+				sctp_max_rto(asoc, trans);
 				trans->rtt = trans->srtt = trans->rttvar = 0;
 				sctp_transport_route(trans, NULL,
 				    sctp_sk(asoc->base.sk));
@@ -5633,6 +5634,74 @@ static int sctp_getsockopt_paddr_thresholds(struct sock *sk,
 	return 0;
 }
 
+/*
+ * SCTP_GET_ASSOC_STATS
+ *
+ * This option retrieves local per endpoint statistics. It is modeled
+ * after OpenSolaris' implementation
+ */
+static int sctp_getsockopt_assoc_stats(struct sock *sk, int len,
+				       char __user *optval,
+				       int __user *optlen)
+{
+	struct sctp_assoc_stats sas;
+	struct sctp_association *asoc = NULL;
+
+	/* User must provide at least the assoc id */
+	if (len < sizeof(sctp_assoc_t))
+		return -EINVAL;
+
+	if (copy_from_user(&sas, optval, len))
+		return -EFAULT;
+
+	asoc = sctp_id2assoc(sk, sas.sas_assoc_id);
+	if (!asoc)
+		return -EINVAL;
+
+	sas.sas_rtxchunks = asoc->stats.rtxchunks;
+	sas.sas_gapcnt = asoc->stats.gapcnt;
+	sas.sas_outofseqtsns = asoc->stats.outofseqtsns;
+	sas.sas_osacks = asoc->stats.osacks;
+	sas.sas_isacks = asoc->stats.isacks;
+	sas.sas_octrlchunks = asoc->stats.octrlchunks;
+	sas.sas_ictrlchunks = asoc->stats.ictrlchunks;
+	sas.sas_oodchunks = asoc->stats.oodchunks;
+	sas.sas_iodchunks = asoc->stats.iodchunks;
+	sas.sas_ouodchunks = asoc->stats.ouodchunks;
+	sas.sas_iuodchunks = asoc->stats.iuodchunks;
+	sas.sas_idupchunks = asoc->stats.idupchunks;
+	sas.sas_opackets = asoc->stats.opackets;
+	sas.sas_ipackets = asoc->stats.ipackets;
+
+	memcpy(&sas.sas_obs_rto_ipaddr, &asoc->stats.obs_rto_ipaddr,
+		sizeof(struct sockaddr_storage));
+	/* New high max rto observed */
+	if (asoc->stats.max_obs_rto > asoc->stats.max_prev_obs_rto)
+		sas.sas_maxrto = asoc->stats.max_obs_rto;
+	else /* return min_rto since max rto has not changed */
+		sas.sas_maxrto = asoc->rto_min;
+
+	/* Record the value sent to the user this period */
+	asoc->stats.max_prev_obs_rto = sas.sas_maxrto;
+
+	/* Mark beginning of a new observation period */
+	asoc->stats.max_obs_rto = 0;
+
+	/* Allow the struct to grow and fill in as much as possible */
+	len = min_t(size_t, len, sizeof(sas));
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+
+	SCTP_DEBUG_PRINTK("sctp_getsockopt_assoc_stat(%d): %d\n",
+			  len, sas.sas_assoc_id);
+
+	if (copy_to_user(optval, &sas, len))
+		return -EFAULT;
+
+	return 0;
+}
+
 SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 				char __user *optval, int __user *optlen)
 {
@@ -5774,6 +5843,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 	case SCTP_PEER_ADDR_THLDS:
 		retval = sctp_getsockopt_paddr_thresholds(sk, optval, len, optlen);
 		break;
+	case SCTP_GET_ASSOC_STATS:
+		retval = sctp_getsockopt_assoc_stats(sk, len, optval, optlen);
+		break;
 	default:
 		retval = -ENOPROTOOPT;
 		break;
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 953c21e..8c6920d 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -350,6 +350,7 @@ void sctp_transport_update_rto(struct sctp_transport *tp, __u32 rtt)
 
 	/* 6.3.1 C3) After the computation, update RTO <- SRTT + 4 * RTTVAR. */
 	tp->rto = tp->srtt + (tp->rttvar << 2);
+	sctp_max_rto(tp->asoc, tp);
 
 	/* 6.3.1 C6) Whenever RTO is computed, if it is less than RTO.Min
 	 * seconds then it is rounded up to RTO.Min seconds.
@@ -620,6 +621,7 @@ void sctp_transport_reset(struct sctp_transport *t)
 	t->burst_limited = 0;
 	t->ssthresh = asoc->peer.i.a_rwnd;
 	t->rto = asoc->rto_initial;
+	sctp_max_rto(asoc, t);
 	t->rtt = 0;
 	t->srtt = 0;
 	t->rttvar = 0;
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH] tcp: fix retransmission in repair mode
From: Eric Dumazet @ 2012-11-15 15:03 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: netdev, criu, linux-kernel, Pavel Emelyanov, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy
In-Reply-To: <1352988197-14414-1-git-send-email-avagin@openvz.org>

On Thu, 2012-11-15 at 18:03 +0400, Andrey Vagin wrote:
> From: Andrew Vagin <avagin@openvz.org>
> 
> Currently if a socket was repaired with a few packet in a write queue,
> a kernel bug may be triggered:
> 
> kernel BUG at net/ipv4/tcp_output.c:2330!
> RIP: 0010:[<ffffffff8155784f>] tcp_retransmit_skb+0x5ff/0x610
> 
> According to the initial realization v3.4-rc2-963-gc0e88ff,
> all skb-s should look like already posted. This patch fixes code
> according with this sentence.
> 
> Here are three points, which were not done in the initial patch:
> 1. A tcp send head should not be changed
> 2. Initialize TSO state of a skb
> 3. Reset the retransmission time
> 
> This patch moves logic from tcp_sendmsg to tcp_write_xmit. A packet
> passes the ussual way, but isn't sent to network. This patch solves
> all described problems and handles tcp_sendpages.
> 
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: James Morris <jmorris@namei.org>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>

Any chance these tcp repair hacks could be done outside of tcp fast
path ?

^ permalink raw reply

* Re: tc adding two filters with different protocols
From: Nieścierowicz Adam @ 2012-11-15 15:04 UTC (permalink / raw)
  To: Netdev

Hi,

anyone can help, or explain what i do wrong?

W dniu 12.11.2012 20:45, Nieścierowicz Adam napisał(a):

> Hello,
>
> writes the qos where i want classify traffic for two protocol: ip and
> pppoe. Example below
>
> ---
> tc qdisc del root dev eth2
> tc qdisc add dev eth2 root handle 1: htb
> tc class add dev eth2 parent 1:0 classid 1:1 htb rate 500Mbit ceil
> 500Mbit
>
> tc filter add dev eth2 parent 1:0 prio 5 protocol ip u32
> tc filter add dev eth2 parent 1:0 prio 5 handle 9: protocol ip u32
> divisor 256
> tc filter add dev eth2 protocol ip parent 1:0 prio 5 u32 ht 800:: 
> match
> ip dst 172.16.0.0/22 hashkey mask 0x000000ff at 16 link 9:
>
> tc filter add dev eth2 parent 1:0 prio 6 protocol ppp_ses u32
> tc filter add dev eth2 parent 1:0 prio 6 handle 4a: protocol ppp_ses
> u32 divisor 256
> tc filter add dev eth2 protocol ppp_ses parent 1:0 prio 6 u32 ht 
> 800::
> match ip dst 31.41.209.0/24 at 24 hashkey mask 0x000000ff at 24 link 
> 4a:
> ---
>
> Unfortunately, when you add lines "tc filter add dev eth2 protocol
> ppp_ses parent 1:0 prio 6 u32 ht 800:: match ip dst 31.41.209.0/24 at 
> 24
> hashkey mask 0x000000ff at 24 link 4a:" protocol change to ip
>
> command "tc -s -d filter show dev eth2" shows
> ---
> filter parent 1: protocol ip pref 5 u32
> filter parent 1: protocol ip pref 5 u32 fh 9: ht divisor 256
> filter parent 1: protocol ip pref 5 u32 fh 800: ht divisor 1
> filter parent 1: protocol ip pref 5 u32 fh 800::800 order 2048 key ht
> 800 bkt 0 link 9: (rule hit 557082 success 0)
> match ac100000/fffffc00 at 16 (success 267341 )
> hash mask 000000ff at 16
> filter parent 1: protocol ip pref 5 u32 fh 800::801 order 2049 key ht
> 800 bkt 0 link 4a: (rule hit 557039 success 0)
> match 1f29d100/ffffff00 at 24 (success 0 )
> hash mask 000000ff at 24
> filter parent 1: protocol ppp_ses pref 6 u32
> filter parent 1: protocol ppp_ses pref 6 u32 fh 4a: ht divisor 256
> filter parent 1: protocol ppp_ses pref 6 u32 fh 801: ht divisor 1
> ---
> if i change order and first protocol is ppp_ses the hashkey for ip
> protocol is changed to ppp_ses.
>
> Is this correct behavior and I do something wrong?
>
> Thanks
> Adam N.
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox