Netdev List

Netdev List
 help / color / mirror / Atom feed

* traceroute reporting wrong nexthop addr when xfrm is involved
From: Florian Westphal @ 2013-09-25 15:42 UTC (permalink / raw)
  To: netdev

Hi.

When running traceroute to a host behind an ipsec tunnel, the first hop
appears to be the destination host instead of the "real" address, when
"flag icmp" is set on outbound xfrm policy on the 1st hop gateway.

Given

A ---IPSEC ---- B ---<Internal-Network>--- D

D is an arbitrary machine on internal network 10/8, with address
10.128.8.1

A is a client (address: 192.168.20.8), connected to ipsec gateway B with
address 192.168.20.80.  B has "flag icmp" set on its in/fwd/out xfrm policies.

A is configured to encapsulate traffic to either B or the internal network
10/8 in ESP.

B decapsulates packets, (or vv when packets go to A from 10/8 network or B).

This works fine.  However, when running traceroute from
A to D, the very first hop (B) reports the address of D instead:

A $: traceroute -n 10.128.8.1
traceroute to 10.128.8.1 (10.128.8.1), 30 hops max, 40 byte packets using UDP
 1  10.128.8.1 (10.128.8.1)  0.450 ms   0.391 ms   0.357 ms
 2  192.168.10.1 (192.168.10.1)  0.654 ms   0.585 ms   0.642 ms
 3  10.128.128.1 (10.128.128.1)  0.957 ms   0.986 ms   1.410 ms
 4  10.128.254.6 (10.128.254.6)  1.745 ms   1.656 ms   1.240 ms
 5  10.128.8.1 (10.128.8.1)  1.514 ms   1.728 ms   1.495 ms

I tracked this down to icmp.c:icmp_route_lookup()

 447         rt2 = (struct rtable *) xfrm_lookup(net, &rt2->dst,
 448                                             flowi4_to_flowi(&fl4_dec), NULL,
 449                                             XFRM_LOOKUP_ICMP);
 450         if (!IS_ERR(rt2)) {
 451                 dst_release(&rt->dst);
 452                 memcpy(fl4, &fl4_dec, sizeof(*fl4));
 453                 rt = rt2;

fl4 has the correct information, fl4.saddr is the one chosen in
icmp_send() earlier - 192.168.20.80 in my case.

xfrm_lookup() succeds and icmp_route_lookup() commits to using rt2.

In my case, fl4_dec.saddr is 10.128.8.1, the memcpy then copies the new
flowi that will be used for the icmp packet.

Removing the memcpy 'fixes' the problem, but I'm not sure if thats
even correct.

Does anyone know what the purpose of fl4_dec is, or if the behaviour
is the expected one?

Thanks,
Florian

^ permalink raw reply

* [PATCH net-next 1/2] bonding: modify the old and add new xmit hash functions
From: Nikolay Aleksandrov @ 2013-09-25 15:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet
In-Reply-To: <1380122398-7370-1-git-send-email-nikolay@redhat.com>

This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.

The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
One line is intentionally left at 82 chars since it's the whole function
and IMO looks better that way.

 drivers/net/bonding/bond_3ad.c   |   2 +-
 drivers/net/bonding/bond_main.c  | 211 +++++++++++++++------------------------
 drivers/net/bonding/bond_sysfs.c |   2 -
 drivers/net/bonding/bonding.h    |   3 +-
 include/uapi/linux/if_bonding.h  |   2 +
 5 files changed, 86 insertions(+), 134 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0d8f427..b3ab703 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2442,7 +2442,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
 
 	bond_for_each_slave(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 55bbb8b..73e416b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -78,6 +78,7 @@
 #include <net/netns/generic.h>
 #include <net/pkt_sched.h>
 #include <linux/rculist.h>
+#include <net/flow_keys.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 #include "bond_alb.h"
@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
 module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
-				   "2 for layer 2+3");
+				   "2 for layer 2+3, 3 for encap layer 2+3, "
+				   "4 for encap layer 3+4");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
 {	"layer2",		BOND_XMIT_POLICY_LAYER2},
 {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
 {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
 {	NULL,			-1},
 };
 
@@ -3026,99 +3030,99 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+/* L2 hash helper */
+static inline u32 bond_eth_hash(struct sk_buff *skb)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
-		return (data->h_dest[5] ^ data->h_source[5]) % count;
+		return data->h_dest[5] ^ data->h_source[5];
 
 	return 0;
 }
 
-/*
- * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
- */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+static void bond_flow_get_ports(struct sk_buff *skb, struct flow_keys *fk,
+				int offset)
+{
+	__be32 *ports;
+
+	ports = skb_header_pointer(skb, offset, sizeof(fk->ports), &fk->ports);
+	if (ports)
+		fk->ports = *ports;
+}
+
+/* Extract the appropriate headers based on bond's xmit policy */
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
+			      struct flow_keys *fk)
 {
-	const struct ethhdr *data;
+	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	u32 v6hash;
-	const __be32 *s, *d;
+	int noff, proto = -1, poff = -1;
+	bool ret = false;
+
+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
+		return skb_flow_dissect(skb, fk);
 
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_network_may_pull(skb, sizeof(*iph))) {
+	fk->ports = 0;
+	noff = skb_network_offset(skb);
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
+			goto out;
 		iph = ip_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
-			(data->h_dest[5] ^ data->h_source[5])) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
-	}
-
-	return bond_xmit_hash_policy_l2(skb, count);
+		fk->src = iph->saddr;
+		fk->dst = iph->daddr;
+		noff += iph->ihl << 2;
+		if (!ip_is_fragment(iph))
+			proto = iph->protocol;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
+			goto out;
+		iph6 = ipv6_hdr(skb);
+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
+		noff += sizeof(*iph6);
+		proto = iph6->nexthdr;
+	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34) {
+		poff = proto_ports_offset(proto);
+		if (poff >= 0)
+			bond_flow_get_ports(skb, fk, poff + noff);
+	}
+	ret = true;
+
+out:
+	return ret;
 }
 
-/*
- * Hash for the output device based upon layer 3 and layer 4 data. If
- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ * @count: modulo value
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ * which will be reduced modulo count before returning.
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
 {
-	u32 layer4_xor = 0;
-	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	const __be32 *s, *d;
-	const __be16 *l4 = NULL;
-	__be16 _l4[2];
-	int noff = skb_network_offset(skb);
-	int poff;
-
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
-		iph = ip_hdr(skb);
-		poff = proto_ports_offset(iph->protocol);
+	struct flow_keys flow;
+	u32 hash;
 
-		if (!ip_is_fragment(iph) && poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		poff = proto_ports_offset(ipv6h->nexthdr);
-		if (poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
-			       (layer4_xor >> 8);
-		return layer4_xor % count;
-	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
+	    !bond_flow_dissect(bond, skb, &flow))
+		return bond_eth_hash(skb) % count;
+
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
+		hash = bond_eth_hash(skb);
+	else
+		hash = (__force u32)flow.ports;
+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
+	hash ^= (hash >> 16);
+	hash ^= (hash >> 8);
 
-	return bond_xmit_hash_policy_l2(skb, count);
+	return hash % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
@@ -3700,8 +3704,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
 	return NETDEV_TX_OK;
 }
 
-/*
- * In bond_xmit_xor() , we determine the output device by using a pre-
+/* In bond_xmit_xor() , we determine the output device by using a pre-
  * determined xmit_hash_policy(), If the selected device is not enabled,
  * find the next active slave.
  */
@@ -3709,8 +3712,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 
-	bond_xmit_slave_id(bond, skb,
-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
 
 	return NETDEV_TX_OK;
 }
@@ -3746,22 +3748,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
 
 /*------------------------- Device initialization ---------------------------*/
 
-static void bond_set_xmit_hash_policy(struct bonding *bond)
-{
-	switch (bond->params.xmit_policy) {
-	case BOND_XMIT_POLICY_LAYER23:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
-		break;
-	case BOND_XMIT_POLICY_LAYER34:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
-		break;
-	case BOND_XMIT_POLICY_LAYER2:
-	default:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
-		break;
-	}
-}
-
 /*
  * Lookup the slave that corresponds to a qid
  */
@@ -3871,38 +3857,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
-/*
- * set bond mode specific net device operations
- */
-void bond_set_mode_ops(struct bonding *bond, int mode)
-{
-	struct net_device *bond_dev = bond->dev;
-
-	switch (mode) {
-	case BOND_MODE_ROUNDROBIN:
-		break;
-	case BOND_MODE_ACTIVEBACKUP:
-		break;
-	case BOND_MODE_XOR:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_BROADCAST:
-		break;
-	case BOND_MODE_8023AD:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_ALB:
-		/* FALLTHRU */
-	case BOND_MODE_TLB:
-		break;
-	default:
-		/* Should never happen, mode already checked */
-		pr_err("%s: Error: Unknown bonding mode %d\n",
-		       bond_dev->name, mode);
-		break;
-	}
-}
-
 static int bond_ethtool_get_settings(struct net_device *bond_dev,
 				     struct ethtool_cmd *ecmd)
 {
@@ -4004,7 +3958,6 @@ static void bond_setup(struct net_device *bond_dev)
 	ether_setup(bond_dev);
 	bond_dev->netdev_ops = &bond_netdev_ops;
 	bond_dev->ethtool_ops = &bond_ethtool_ops;
-	bond_set_mode_ops(bond, bond->params.mode);
 
 	bond_dev->destructor = bond_destructor;
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index c29b836..dba3b9b 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -352,7 +352,6 @@ static ssize_t bonding_store_mode(struct device *d,
 	/* don't cache arp_validate between modes */
 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
 	bond->params.mode = new_value;
-	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",
 		bond->dev->name, bond_mode_tbl[new_value].modename,
 		new_value);
@@ -392,7 +391,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 		ret = -EINVAL;
 	} else {
 		bond->params.xmit_policy = new_value;
-		bond_set_mode_ops(bond, bond->params.mode);
 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
 			bond->dev->name,
 			xmit_hashtype_tbl[new_value].modename, new_value);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 03cf3fd..4db9ec4 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -245,7 +245,6 @@ struct bonding {
 	char     proc_file_name[IFNAMSIZ];
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, int);
 	u16      rr_tx_counter;
 	struct   ad_bond_info ad_info;
 	struct   alb_bond_info alb_info;
@@ -446,7 +445,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
 void bond_mii_monitor(struct work_struct *);
 void bond_loadbalance_arp_mon(struct work_struct *);
 void bond_activebackup_arp_mon(struct work_struct *);
-void bond_set_mode_ops(struct bonding *bond, int mode);
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
 int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
 void bond_select_active_slave(struct bonding *bond);
 void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index a17edda..9635a62 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -91,6 +91,8 @@
 #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
 #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
 #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 
 typedef struct ifbond {
 	__s32 bond_mode;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next 2/2] bonding: document the new xmit policy modes and update the changed ones
From: Nikolay Aleksandrov @ 2013-09-25 15:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet
In-Reply-To: <1380122398-7370-1-git-send-email-nikolay@redhat.com>

Add new documentation for encap2+3 and encap3+4, also update the formula
for the old modes due to the changes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
I'm not good at writing documentation so any suggestions on how to improve
this text are very welcome.

 Documentation/networking/bonding.txt | 66 ++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9b28e71..3856ed2 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -743,21 +743,16 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The IPv4 formula is
+		generate the hash.  The formula is
 
-		(((source IP XOR dest IP) AND 0xffff) XOR
-			( source MAC XOR destination MAC ))
-				modulo slave count
+		hash = source MAC XOR destination MAC
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		The IPv6 formula is
-
-		hash = (source ip quad 2 XOR dest IP quad 2) XOR
-		       (source ip quad 3 XOR dest IP quad 3) XOR
-		       (source ip quad 4 XOR dest IP quad 4)
-
-		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			XOR (source MAC XOR destination MAC))
-				modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
@@ -779,21 +774,16 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented IPv4 TCP and UDP packets is
-
-		((source port XOR dest port) XOR
-			 ((source IP XOR dest IP) AND 0xffff)
-				modulo slave count
+		The formula for unfragmented TCP and UDP packets is
 
-		The formula for unfragmented IPv6 TCP and UDP packets is
+		hash = source port, destination port (as in the header)
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		hash = (source port XOR dest port) XOR
-		       ((source ip quad 2 XOR dest IP quad 2) XOR
-			(source ip quad 3 XOR dest IP quad 3) XOR
-			(source ip quad 4 XOR dest IP quad 4))
-
-		((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		For fragmented TCP or UDP packets and all other IPv4 and
 		IPv6 protocol traffic, the source and destination port
@@ -801,10 +791,6 @@ xmit_hash_policy
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		The IPv4 policy is intended to mimic the behavior of
-		certain switches, notably Cisco switches with PFC2 as
-		well as some Foundry and IBM products.
-
 		This algorithm is not fully 802.3ad compliant.  A
 		single TCP or UDP conversation containing both
 		fragmented and unfragmented packets will see packets
@@ -815,6 +801,26 @@ xmit_hash_policy
 		conversations.  Other implementations of 802.3ad may
 		or may not tolerate this noncompliance.
 
+	encap2+3
+
+		This policy uses the same formula as layer2+3 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
+	encap3+4
+
+		This policy uses the same formula as layer3+4 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next 0/2] bonding: modify the current and add new hash functions
From: Nikolay Aleksandrov @ 2013-09-25 15:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet

Hi all,
This is a complete remake of my old patch that modified the bonding hash
functions to use skb_flow_dissect which was suggested by Eric Dumazet.
This time around I've left the old modes although using a new hash function
again suggested by Eric, which is the same for all modes. The only
difference is the way the headers are obtained. The old modes obtain them
as before in order to address concerns about speed, but the 2 new ones use
skb_flow_dissect. The unification of the hash function allows to remove a
pointer from struct bonding and also a few extra functions that dealt with
it. Two new functions are added which take care of the hashing based on
bond->params.xmit_policy only:
bond_xmit_hash() - global function, used by XOR and 3ad modes
bond_flow_dissect() - used by bond_xmit_hash() to obtain the necessary
headers and combine them according to bond->params.xmit_policy.

Best regards,
 Nikolay Aleksandrov

Nikolay Aleksandrov (2):
  bonding: modify the old and add new xmit hash functions
  bonding: document the new xmit policy modes and update the changed
    ones

 Documentation/networking/bonding.txt |  66 ++++++-----
 drivers/net/bonding/bond_3ad.c       |   2 +-
 drivers/net/bonding/bond_main.c      | 211 ++++++++++++++---------------------
 drivers/net/bonding/bond_sysfs.c     |   2 -
 drivers/net/bonding/bonding.h        |   3 +-
 include/uapi/linux/if_bonding.h      |   2 +
 6 files changed, 122 insertions(+), 164 deletions(-)

-- 
1.8.1.4

^ permalink raw reply

* [PATCH] net: qmi_wwan: fix Cinterion PLXX product ID
From: Aleksander Morgado @ 2013-09-25 15:02 UTC (permalink / raw)
  To: netdev, linux-usb
  Cc: davem, bjorn, dcbw, Aleksander Morgado, Hans-Christoph Schemmel,
	Christian Schmiedl, Nicolaus Colberg

Cinterion PLXX LTE devices have a 0x0060 product ID, not 0x12d1.

The blacklisting in the serial/option driver does actually use the correct PID,
as per commit 8ff10bdb14a52e3f25d4ce09e0582a8684c1a6db ('USB: Blacklisted
Cinterion's PLxx WWAN Interface').

CC: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
CC: Christian Schmiedl <christian.schmiedl@gemalto.com>
CC: Nicolaus Colberg <nicolaus.colberg@gemalto.com>
Signed-off-by: Aleksander Morgado <aleksander@lanedo.com>
---
 drivers/net/usb/qmi_wwan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 6312332..3d6aaf7 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -714,7 +714,7 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x2357, 0x0201, 4)},	/* TP-LINK HSUPA Modem MA180 */
 	{QMI_FIXED_INTF(0x2357, 0x9000, 4)},	/* TP-LINK MA260 */
 	{QMI_FIXED_INTF(0x1bc7, 0x1200, 5)},	/* Telit LE920 */
-	{QMI_FIXED_INTF(0x1e2d, 0x12d1, 4)},	/* Cinterion PLxx */
+	{QMI_FIXED_INTF(0x1e2d, 0x0060, 4)},	/* Cinterion PLxx */
 
 	/* 4. Gobi 1000 devices */
 	{QMI_GOBI1K_DEVICE(0x05c6, 0x9212)},	/* Acer Gobi Modem Device */
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH RFC] random: introduce get_random_bytes_busy_wait_initialized
From: Hannes Frederic Sowa @ 2013-09-25 13:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tom Herbert, davem, netdev, jesse.brandeburg, tytso, linux-kernel
In-Reply-To: <1380110810.3165.140.camel@edumazet-glaptop>

On Wed, Sep 25, 2013 at 05:06:50AM -0700, Eric Dumazet wrote:
> On Wed, 2013-09-25 at 11:00 +0200, Hannes Frederic Sowa wrote:
> 
> >  /*
> > + * Busy loop until the nonblocking_pool is intialized and return
> > + * random data in buf of size nbytes.
> > + *
> > + * This is used by the network stack to defer the extraction of
> > + * entropy from the nonblocking_pool until the pool is initialized.
> > + *
> > + * We need to busy loop here, because we could be called from an
> > + * atomic section.
> > + */
> > +void get_random_bytes_busy_wait_initialized(void *buf, int nbytes)
> > +{
> > +	while (!nonblocking_pool.initialized)
> > +		cpu_relax();
> > +	get_random_bytes(buf, nbytes);
> > +}
> 
> No idea if this can work if called from IRQ context.
> 
> How is nonblocking_poll initialized if host has a single cpu ?

We increase the entropy_count and can initialize the random pool from
handle_irq_event.

We can document that it should not get called from irq context and
maybe add a WARN_ON(irqs_disabled())?

^ permalink raw reply

* Re: [PATCH 2/2] net: qmi_wwan: fix checkpatch warnings
From: Bjørn Mork @ 2013-09-25 13:31 UTC (permalink / raw)
  To: Fabio Porcedda
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	David S. Miller, Dan Williams
In-Reply-To: <1380100886-16531-2-git-send-email-fabio.porcedda-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Sorry, I really don't see the point of this.  Yes, the lines are longer
than 80 columns, but breaking them don't improve the readability at
all.  On the contrary, IMHO.

So NAK from me for this part.


Bjørn

Fabio Porcedda <fabio.porcedda-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Signed-off-by: Fabio Porcedda <fabio.porcedda-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/net/usb/qmi_wwan.c | 56 +++++++++++++++++++++++++++++-----------------
>  1 file changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 5f6b6fa..0e59f9e 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -143,16 +143,22 @@ static const struct net_device_ops qmi_wwan_netdev_ops = {
>  	.ndo_validate_addr	= eth_validate_addr,
>  };
>  
> -/* using a counter to merge subdriver requests with our own into a combined state */
> +/* using a counter to merge subdriver requests with our own into a
> + * combined state
> + */
>  static int qmi_wwan_manage_power(struct usbnet *dev, int on)
>  {
>  	struct qmi_wwan_state *info = (void *)&dev->data;
>  	int rv = 0;
>  
> -	dev_dbg(&dev->intf->dev, "%s() pmcount=%d, on=%d\n", __func__, atomic_read(&info->pmcount), on);
> +	dev_dbg(&dev->intf->dev, "%s() pmcount=%d, on=%d\n", __func__,
> +		atomic_read(&info->pmcount), on);
>  
> -	if ((on && atomic_add_return(1, &info->pmcount) == 1) || (!on && atomic_dec_and_test(&info->pmcount))) {
> -		/* need autopm_get/put here to ensure the usbcore sees the new value */
> +	if ((on && atomic_add_return(1, &info->pmcount) == 1) ||
> +	    (!on && atomic_dec_and_test(&info->pmcount))) {
> +		/* need autopm_get/put here to ensure the usbcore sees
> +		 * the new value
> +		 */
>  		rv = usb_autopm_get_interface(dev->intf);
>  		if (rv < 0)
>  			goto err;
> @@ -199,7 +205,8 @@ static int qmi_wwan_register_subdriver(struct usbnet *dev)
>  	atomic_set(&info->pmcount, 0);
>  
>  	/* register subdriver */
> -	subdriver = usb_cdc_wdm_register(info->control, &dev->status->desc, 4096, &qmi_wwan_cdc_wdm_manage_power);
> +	subdriver = usb_cdc_wdm_register(info->control, &dev->status->desc,
> +					 4096, &qmi_wwan_cdc_wdm_manage_power);
>  	if (IS_ERR(subdriver)) {
>  		dev_err(&info->control->dev, "subdriver registration failed\n");
>  		rv = PTR_ERR(subdriver);
> @@ -228,7 +235,8 @@ static int qmi_wwan_bind(struct usbnet *dev, struct usb_interface *intf)
>  	struct usb_driver *driver = driver_of(intf);
>  	struct qmi_wwan_state *info = (void *)&dev->data;
>  
> -	BUILD_BUG_ON((sizeof(((struct usbnet *)0)->data) < sizeof(struct qmi_wwan_state)));
> +	BUILD_BUG_ON((sizeof(((struct usbnet *)0)->data) <
> +		      sizeof(struct qmi_wwan_state)));
>  
>  	/* set up initial state */
>  	info->control = intf;
> @@ -250,7 +258,8 @@ static int qmi_wwan_bind(struct usbnet *dev, struct usb_interface *intf)
>  				goto err;
>  			}
>  			if (h->bLength != sizeof(struct usb_cdc_header_desc)) {
> -				dev_dbg(&intf->dev, "CDC header len %u\n", h->bLength);
> +				dev_dbg(&intf->dev, "CDC header len %u\n",
> +					h->bLength);
>  				goto err;
>  			}
>  			break;
> @@ -260,7 +269,8 @@ static int qmi_wwan_bind(struct usbnet *dev, struct usb_interface *intf)
>  				goto err;
>  			}
>  			if (h->bLength != sizeof(struct usb_cdc_union_desc)) {
> -				dev_dbg(&intf->dev, "CDC union len %u\n", h->bLength);
> +				dev_dbg(&intf->dev, "CDC union len %u\n",
> +					h->bLength);
>  				goto err;
>  			}
>  			cdc_union = (struct usb_cdc_union_desc *)buf;
> @@ -271,15 +281,15 @@ static int qmi_wwan_bind(struct usbnet *dev, struct usb_interface *intf)
>  				goto err;
>  			}
>  			if (h->bLength != sizeof(struct usb_cdc_ether_desc)) {
> -				dev_dbg(&intf->dev, "CDC ether len %u\n",  h->bLength);
> +				dev_dbg(&intf->dev, "CDC ether len %u\n",
> +					h->bLength);
>  				goto err;
>  			}
>  			cdc_ether = (struct usb_cdc_ether_desc *)buf;
>  			break;
>  		}
>  
> -		/*
> -		 * Remember which CDC functional descriptors we've seen.  Works
> +		/* Remember which CDC functional descriptors we've seen.  Works
>  		 * for all types we care about, of which USB_CDC_ETHERNET_TYPE
>  		 * (0x0f) is the highest numbered
>  		 */
> @@ -293,10 +303,14 @@ next_desc:
>  
>  	/* Use separate control and data interfaces if we found a CDC Union */
>  	if (cdc_union) {
> -		info->data = usb_ifnum_to_if(dev->udev, cdc_union->bSlaveInterface0);
> -		if (desc->bInterfaceNumber != cdc_union->bMasterInterface0 || !info->data) {
> -			dev_err(&intf->dev, "bogus CDC Union: master=%u, slave=%u\n",
> -				cdc_union->bMasterInterface0, cdc_union->bSlaveInterface0);
> +		info->data = usb_ifnum_to_if(dev->udev,
> +					     cdc_union->bSlaveInterface0);
> +		if (desc->bInterfaceNumber != cdc_union->bMasterInterface0 ||
> +		    !info->data) {
> +			dev_err(&intf->dev,
> +				"bogus CDC Union: master=%u, slave=%u\n",
> +				cdc_union->bMasterInterface0,
> +				cdc_union->bSlaveInterface0);
>  			goto err;
>  		}
>  	}
> @@ -374,8 +388,7 @@ static int qmi_wwan_suspend(struct usb_interface *intf, pm_message_t message)
>  	struct qmi_wwan_state *info = (void *)&dev->data;
>  	int ret;
>  
> -	/*
> -	 * Both usbnet_suspend() and subdriver->suspend() MUST return 0
> +	/* Both usbnet_suspend() and subdriver->suspend() MUST return 0
>  	 * in system sleep context, otherwise, the resume callback has
>  	 * to recover device from previous suspend failure.
>  	 */
> @@ -383,7 +396,8 @@ static int qmi_wwan_suspend(struct usb_interface *intf, pm_message_t message)
>  	if (ret < 0)
>  		goto err;
>  
> -	if (intf == info->control && info->subdriver && info->subdriver->suspend)
> +	if (intf == info->control && info->subdriver &&
> +	    info->subdriver->suspend)
>  		ret = info->subdriver->suspend(intf, message);
>  	if (ret < 0)
>  		usbnet_resume(intf);
> @@ -396,7 +410,8 @@ static int qmi_wwan_resume(struct usb_interface *intf)
>  	struct usbnet *dev = usb_get_intfdata(intf);
>  	struct qmi_wwan_state *info = (void *)&dev->data;
>  	int ret = 0;
> -	bool callsub = (intf == info->control && info->subdriver && info->subdriver->resume);
> +	bool callsub = (intf == info->control && info->subdriver &&
> +			info->subdriver->resume);
>  
>  	if (callsub)
>  		ret = info->subdriver->resume(intf);
> @@ -777,7 +792,8 @@ static const struct usb_device_id products[] = {
>  };
>  MODULE_DEVICE_TABLE(usb, products);
>  
> -static int qmi_wwan_probe(struct usb_interface *intf, const struct usb_device_id *prod)
> +static int qmi_wwan_probe(struct usb_interface *intf,
> +			  const struct usb_device_id *prod)
>  {
>  	struct usb_device_id *id = (struct usb_device_id *)prod;
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net 1/2] ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
From: Eric Dumazet @ 2013-09-25 12:44 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20130925121337.GZ7660@secunet.com>

On Wed, 2013-09-25 at 14:13 +0200, Steffen Klassert wrote:
> It was commit c544193214 (GRE: Refactor GRE tunneling code.)
> when net/ipv4/ip_tunnel.c was created.

Yeah. This one begins to be very upsetting.

> 
> > 
> > (commit id and title in your changelog would be really nice)
> > 
> 
> I can send a v2 with these informations included if you want
> that.

I think that patches coming from experimented kernel developers
should always include a study of bug origin.

Otherwise, both maintainer and stable teams have to figure it, and
they sometime lack the time or context.

Plus it's good to CC patch author and reviewers so that he can learn of
its mistakes, and Ack your work as well.

Thanks

^ permalink raw reply

* Re: [PATCH net 1/2] ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
From: Steffen Klassert @ 2013-09-25 12:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1380110163.3165.138.camel@edumazet-glaptop>

On Wed, Sep 25, 2013 at 04:56:03AM -0700, Eric Dumazet wrote:
> On Wed, 2013-09-25 at 07:54 +0200, Steffen Klassert wrote:
> > We might extend the used aera of a skb beyond the total
> > headroom when we install the ipip header. Fix this by
> > calling skb_cow_head() unconditionally.
> > 
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> > ---
> >  net/ipv4/ip_tunnel.c |   12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Do you know or can we find when was the bug introduced ?

It was commit c544193214 (GRE: Refactor GRE tunneling code.)
when net/ipv4/ip_tunnel.c was created.

> 
> (commit id and title in your changelog would be really nice)
> 

I can send a v2 with these informations included if you want
that.

^ permalink raw reply

* Re: [PATCH RFC] random: introduce get_random_bytes_busy_wait_initialized
From: Eric Dumazet @ 2013-09-25 12:06 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, davem, netdev, jesse.brandeburg, tytso, linux-kernel
In-Reply-To: <20130925090034.GC4904@order.stressinduktion.org>

On Wed, 2013-09-25 at 11:00 +0200, Hannes Frederic Sowa wrote:

>  /*
> + * Busy loop until the nonblocking_pool is intialized and return
> + * random data in buf of size nbytes.
> + *
> + * This is used by the network stack to defer the extraction of
> + * entropy from the nonblocking_pool until the pool is initialized.
> + *
> + * We need to busy loop here, because we could be called from an
> + * atomic section.
> + */
> +void get_random_bytes_busy_wait_initialized(void *buf, int nbytes)
> +{
> +	while (!nonblocking_pool.initialized)
> +		cpu_relax();
> +	get_random_bytes(buf, nbytes);
> +}

No idea if this can work if called from IRQ context.

How is nonblocking_poll initialized if host has a single cpu ?

^ permalink raw reply

* Re: [PATCH net 1/2] ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
From: Eric Dumazet @ 2013-09-25 11:56 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20130925055418.GV7660@secunet.com>

On Wed, 2013-09-25 at 07:54 +0200, Steffen Klassert wrote:
> We might extend the used aera of a skb beyond the total
> headroom when we install the ipip header. Fix this by
> calling skb_cow_head() unconditionally.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv4/ip_tunnel.c |   12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)

Do you know or can we find when was the bug introduced ?

(commit id and title in your changelog would be really nice)

Thanks !

^ permalink raw reply

* Re: [PATCH net-next v5 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Veaceslav Falico @ 2013-09-25 10:50 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <5242B24F.5000309@huawei.com>

On Wed, Sep 25, 2013 at 05:52:15PM +0800, Ding Tianhong wrote:
>The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
>(bonding: initial RCU conversion) has convert the roundrobin, active-backup,
>broadcast and xor xmit path to rcu protection, the performance will be better
>for these mode, so this time, convert xmit path for 3ad mode.
>
>Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
>Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>Signed-off-by: Wang Yufen <wangyufen@huawei.com>
>Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>Cc: Veaceslav Falico <vfalico@redhat.com>
>---
> drivers/net/bonding/bond_3ad.c | 32 ++++++++++++++------------------
> drivers/net/bonding/bonding.h  | 30 +++++++++++++++++++++++++++++-
> 2 files changed, 43 insertions(+), 19 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 0d8f427..13f1deb 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -143,7 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
>  */
> static inline struct port *__get_first_port(struct bonding *bond)
> {
>-	struct slave *first_slave = bond_first_slave(bond);
>+	struct slave *first_slave = bond_first_slave_rcu(bond);
>
> 	return first_slave ? &(SLAVE_AD_INFO(first_slave).port) : NULL;
> }
>@@ -163,7 +163,7 @@ static inline struct port *__get_next_port(struct port *port)
> 	// If there's no bond for this port, or this is the last slave
> 	if (bond == NULL)
> 		return NULL;
>-	slave_next = bond_next_slave(bond, slave);
>+	slave_next = bond_next_slave_rcu(bond, slave);
> 	if (!slave_next || bond_is_first_slave(bond, slave_next))
> 		return NULL;
>
>@@ -2417,16 +2417,14 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
>
> int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> {
>-	struct slave *slave, *start_at;
> 	struct bonding *bond = netdev_priv(dev);
>+	struct slave *slave;
> 	int slave_agg_no;
> 	int slaves_in_agg;
> 	int agg_id;
>-	int i;
> 	struct ad_info ad_info;
> 	int res = 1;
>
>-	read_lock(&bond->lock);
> 	if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
> 		pr_debug("%s: Error: __bond_3ad_get_active_agg_info failed\n",
> 			 dev->name);
>@@ -2444,13 +2442,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
>
> 	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>
>-	bond_for_each_slave(bond, slave) {
>+	bond_for_each_slave_rcu(bond, slave) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>
> 		if (agg && (agg->aggregator_identifier == agg_id)) {
>-			slave_agg_no--;
>-			if (slave_agg_no < 0)
>-				break;
>+			if (--slave_agg_no < 0) {
>+				if (SLAVE_IS_OK(slave)) {
>+					res = bond_dev_queue_xmit(bond,
>+						skb, slave->dev);
>+					goto out;
>+				}
>+			}
> 		}
> 	}
>
>@@ -2460,23 +2462,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	start_at = slave;
>-
>-	bond_for_each_slave_from(bond, slave, i, start_at) {
>-		int slave_agg_id = 0;
>+	bond_for_each_slave_rcu(bond, slave) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>
>-		if (agg)
>-			slave_agg_id = agg->aggregator_identifier;
>-
>-		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
>+		if (SLAVE_IS_OK(slave) && agg &&
>+			agg->aggregator_identifier == agg_id) {
> 			res = bond_dev_queue_xmit(bond, skb, slave->dev);
> 			break;
> 		}
> 	}
>
> out:
>-	read_unlock(&bond->lock);
> 	if (res) {
> 		/* no suitable interface, frame not sent */
> 		kfree_skb(skb);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 03cf3fd..eb36f57 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -74,13 +74,31 @@
> /* slave list primitives */
> #define bond_to_slave(ptr) list_entry(ptr, struct slave, list)
>
>+/* slave list primitives, Caller must hold rcu_read_lock */
>+#define bond_to_slave_rcu(ptr) list_entry_rcu(ptr, struct slave, list)
>+
>+/* bond_is_empty return NULL if slave list is empty*/
>+#define bond_is_empty(bond) \
>+	(list_empty(&(bond)->slave_list))
>+
>+/* bond_is_empty_rcu return NULL if slave list is empty*/
>+#define bond_is_empty_rcu(bond) \
>+	(!list_first_or_null_rcu(&(bond)->slave_list, struct slave, list))

Useless/dangerous function. It's not used in this patch and there can be no
constructs like:

1 if (!bond_is_empty_rcu(bond)) {
2 	slave = bond_first_slave_rcu(bond);
3 	do_something(slave);

because between 1) and 2) the slave can go away, and we'll end up with
slave == NULL.

>+
> /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
> #define bond_first_slave(bond) \
> 	list_first_entry_or_null(&(bond)->slave_list, struct slave, list)
> #define bond_last_slave(bond) \
>-	(list_empty(&(bond)->slave_list) ? NULL : \
>+	(bond_is_empty(bond) ? NULL : \
> 					   bond_to_slave((bond)->slave_list.prev))
>
>+/**
>+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
>+ * Caller must hold rcu_read_lock
>+ */
>+#define bond_first_slave_rcu(bond) \
>+	list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);

bond_first_slave_or_null_rcu() ?

>+
> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
> #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
>
>@@ -93,6 +111,16 @@
> 	(bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
> 					  bond_to_slave((pos)->list.prev))
>
>+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
>+#define bond_next_slave_rcu(bond, pos) \
>+	({struct list_head *__slave_list = &(bond)->slave_list; \
>+	 struct list_head __rcu *__next = list_next_rcu(__slave_list); \
>+	 struct list_head __rcu *__pos_next = list_next_rcu(&(pos)->list); \
>+	 likely(__pos_next != __slave_list) ? \
>+	 container_of(__pos_next, struct slave, list) : \
>+	 container_of(__next, struct slave, list); \
>+	 })

And if bond has no slaves, whilst pos being the slave that has just
detached? Then bond->slave_list->next == pos->list->next ==
&bond->slave_list, and we'll return the container_of(bond->slave_list),
which is garbage, but not NULL.

>+
> /**
>  * bond_for_each_slave_from - iterate the slaves list from a starting point
>  * @bond:	the bond holding this list.
>-- 
>1.8.0
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v5 2/6] bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
From: Veaceslav Falico @ 2013-09-25 10:33 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <5242B253.2070002@huawei.com>

On Wed, Sep 25, 2013 at 05:52:19PM +0800, Ding Tianhong wrote:
>There is no pointer needed read lock protection, remove the unnecessary lock
>and improve performance for the 3ad recv path.

I don't really understand it. Here's the code path:

rx_handler (holding rcu_read_lock()) -> bond_handle_frame() ->
bond->recv_probe -> bond_3ad_lacpdu_recv(). So we're holding only the
rcu_read_lock() there. What stops us from racing with
bond_3ad_unbind_slave(), for example?

As in:

CPU0				CPU1
--------			-----------
...				bond_3ad_unbind_slave()
bond_3ad_rx_indication()	...
if (!port->slave) {		...			//slave is ok
				port->slave = NULL;
ad_marker_info_received()	...
ad_marker_send()		...
slave = port->slave;		...
skb->dev = slave->dev;		...
	   ^^^ NULL pointer dereference.


I'm not saying that this approach is wrong, maybe I'm missing something,
but when removing locks it's usually a good thing to do - to comment it in
depth in the commit message why it's not already needed.

>
>Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>---
> drivers/net/bonding/bond_3ad.c | 2 --
> 1 file changed, 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 7a3860f..c134f43 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2494,9 +2494,7 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
> 	if (!lacpdu)
> 		return ret;
>
>-	read_lock(&bond->lock);
> 	ret = bond_3ad_rx_indication(lacpdu, slave, skb->len);
>-	read_unlock(&bond->lock);
> 	return ret;
> }
>
>-- 
>1.8.2.1
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v5 5/6] bonding: remove the counter and simplify bond_for_each_slave_from()
From: Veaceslav Falico @ 2013-09-25 10:10 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <5242B25F.7020505@huawei.com>

On Wed, Sep 25, 2013 at 05:52:31PM +0800, Ding Tianhong wrote:
>Restructure the bond_for_each_slave_from(),
>remove the checking for bond->slave_cnt,
>make the new loop be more simple and racy.

The whole bond_for_each_slave_from() should be removed and replaced with a
standard bond_for_each_slave() - which is already in the process:

http://patchwork.ozlabs.org/patch/277701/
http://patchwork.ozlabs.org/patch/277717/
http://patchwork.ozlabs.org/patch/277716/
http://patchwork.ozlabs.org/patch/277702/
http://patchwork.ozlabs.org/patch/277703/

After that we're ready to use the standard and less painful
list_for_each_entry() analogue - bond_for_each_slave() - which can
afterwards be easily RCUified by just replacing bond_for_each_slave() with
bond_for_each_slave_rcu().

>
>Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
>Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>Cc: Nikolay Aleksandrov <nikolay@redhat.com>
>Cc: Veaceslav Falico <vfalico@redhat.com>
>---
> drivers/net/bonding/bond_alb.c  | 3 +--
> drivers/net/bonding/bond_main.c | 6 ++----
> drivers/net/bonding/bonding.h   | 6 +++---
> 3 files changed, 6 insertions(+), 9 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index f428ef57..4813dc6 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -383,7 +383,6 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
> {
> 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
> 	struct slave *rx_slave, *slave, *start_at;
>-	int i = 0;
>
> 	if (bond_info->next_rx_slave)
> 		start_at = bond_info->next_rx_slave;
>@@ -392,7 +391,7 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
>
> 	rx_slave = NULL;
>
>-	bond_for_each_slave_from(bond, slave, i, start_at) {
>+	bond_for_each_slave_from(bond, slave, start_at) {
> 		if (SLAVE_IS_OK(slave)) {
> 			if (!rx_slave) {
> 				rx_slave = slave;
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 55bbb8b..fbe25bc 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -782,7 +782,6 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
> 	struct slave *new_active, *old_active;
> 	struct slave *bestslave = NULL;
> 	int mintime = bond->params.updelay;
>-	int i;
>
> 	new_active = bond->curr_active_slave;
>
>@@ -801,7 +800,7 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
> 	/* remember where to stop iterating over the slaves */
> 	old_active = new_active;
>
>-	bond_for_each_slave_from(bond, new_active, i, old_active) {
>+	bond_for_each_slave_from(bond, new_active, old_active) {
> 		if (new_active->link == BOND_LINK_UP) {
> 			return new_active;
> 		} else if (new_active->link == BOND_LINK_BACK &&
>@@ -2756,7 +2755,6 @@ do_failover:
> static void bond_ab_arp_probe(struct bonding *bond)
> {
> 	struct slave *slave, *next_slave;
>-	int i;
>
> 	read_lock(&bond->curr_slave_lock);
>
>@@ -2788,7 +2786,7 @@ static void bond_ab_arp_probe(struct bonding *bond)
>
> 	/* search for next candidate */
> 	next_slave = bond_next_slave(bond, bond->current_arp_slave);
>-	bond_for_each_slave_from(bond, slave, i, next_slave) {
>+	bond_for_each_slave_from(bond, slave, next_slave) {
> 		if (IS_UP(slave->dev)) {
> 			slave->link = BOND_LINK_BACK;
> 			bond_set_slave_active_flags(slave);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index eb36f57..e5734ad 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -130,9 +130,9 @@
>  *
>  * Caller must hold bond->lock
>  */
>-#define bond_for_each_slave_from(bond, pos, cnt, start) \
>-	for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>-	     cnt++, pos = bond_next_slave(bond, pos))
>+#define bond_for_each_slave_from(bond, pos, start) \
>+	for (pos = start; pos; (pos = bond_next_slave(bond, pos)) != start ? \
>+	    (pos) : (pos = NULL))
>
> /**
>  * bond_for_each_slave - iterate over all slaves
>-- 
>1.8.0
>
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v3 1/2] dev: update __dev_notify_flags() to send rtnl msg
From: Nicolas Dichtel @ 2013-09-25 10:02 UTC (permalink / raw)
  To: stephen; +Cc: netdev, davem, Nicolas Dichtel
In-Reply-To: <20130924094707.41641042@nehalam.linuxnetplumber.net>

This patch only prepares the next one, there is no functional change.
Now, __dev_notify_flags() can also be used to notify flags changes via
rtnetlink.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v3: add patch #1
    use __dev_notify_flags() to notify rxflags (note that notifier chains are
    not called when flags IFF_PROMISC and IFF_ALLMULTI are changed)

v2: rework the patch to avoid double notification

 include/linux/netdevice.h |  4 +++-
 net/core/dev.c            | 11 ++++++-----
 net/core/rtnetlink.c      |  3 +--
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3de49aca4519..4466db0cb673 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2368,7 +2368,9 @@ extern int		dev_ethtool(struct net *net, struct ifreq *);
 extern unsigned int	dev_get_flags(const struct net_device *);
 extern int		__dev_change_flags(struct net_device *, unsigned int flags);
 extern int		dev_change_flags(struct net_device *, unsigned int);
-extern void		__dev_notify_flags(struct net_device *, unsigned int old_flags);
+void			__dev_notify_flags(struct net_device *,
+					   unsigned int old_flags,
+					   unsigned int gchanges);
 extern int		dev_change_name(struct net_device *, const char *);
 extern int		dev_set_alias(struct net_device *, const char *, size_t);
 extern int		dev_change_net_namespace(struct net_device *,
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2239cc..1222309feeaa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5069,10 +5069,14 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
 	return ret;
 }
 
-void __dev_notify_flags(struct net_device *dev, unsigned int old_flags)
+void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
+			unsigned int gchanges)
 {
 	unsigned int changes = dev->flags ^ old_flags;
 
+	if (gchanges)
+		rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges);
+
 	if (changes & IFF_UP) {
 		if (dev->flags & IFF_UP)
 			call_netdevice_notifiers(NETDEV_UP, dev);
@@ -5108,10 +5112,7 @@ int dev_change_flags(struct net_device *dev, unsigned int flags)
 		return ret;
 
 	changes = old_flags ^ dev->flags;
-	if (changes)
-		rtmsg_ifinfo(RTM_NEWLINK, dev, changes);
-
-	__dev_notify_flags(dev, old_flags);
+	__dev_notify_flags(dev, old_flags, changes);
 	return ret;
 }
 EXPORT_SYMBOL(dev_change_flags);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2a0e21de3060..4aedf03da052 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1647,9 +1647,8 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm)
 	}
 
 	dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
-	rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U);
 
-	__dev_notify_flags(dev, old_flags);
+	__dev_notify_flags(dev, old_flags, ~0U);
 	return 0;
 }
 EXPORT_SYMBOL(rtnl_configure_link);
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v3 2/2] dev: always advertise rx_flags changes via netlink
From: Nicolas Dichtel @ 2013-09-25 10:02 UTC (permalink / raw)
  To: stephen; +Cc: netdev, davem, Nicolas Dichtel
In-Reply-To: <1380103365-5185-1-git-send-email-nicolas.dichtel@6wind.com>

When flags IFF_PROMISC and IFF_ALLMULTI are changed, netlink messages are not
consistent. For example, if a multicast daemon is running (flag IFF_ALLMULTI
set in dev->flags but not dev->gflags, ie not exported to userspace) and then a
user sets it via netlink (flag IFF_ALLMULTI set in dev->flags and dev->gflags, ie
exported to userspace), no netlink message is sent.
Same for IFF_PROMISC and because dev->promiscuity is exported via
IFLA_PROMISCUITY, we may send a netlink message after each change of this
counter.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v3: add patch #1
    use __dev_notify_flags() to notify rxflags (note that notifier chains are
    not called when flags IFF_PROMISC and IFF_ALLMULTI are changed)

v2: rework the patch to avoid double notification

 net/core/dev.c | 60 ++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 37 insertions(+), 23 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1222309feeaa..b23aefcb3c42 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4822,7 +4822,7 @@ static void dev_change_rx_flags(struct net_device *dev, int flags)
 		ops->ndo_change_rx_flags(dev, flags);
 }
 
-static int __dev_set_promiscuity(struct net_device *dev, int inc)
+static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify)
 {
 	unsigned int old_flags = dev->flags;
 	kuid_t uid;
@@ -4865,6 +4865,8 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc)
 
 		dev_change_rx_flags(dev, IFF_PROMISC);
 	}
+	if (notify)
+		__dev_notify_flags(dev, old_flags, IFF_PROMISC);
 	return 0;
 }
 
@@ -4884,7 +4886,7 @@ int dev_set_promiscuity(struct net_device *dev, int inc)
 	unsigned int old_flags = dev->flags;
 	int err;
 
-	err = __dev_set_promiscuity(dev, inc);
+	err = __dev_set_promiscuity(dev, inc, true);
 	if (err < 0)
 		return err;
 	if (dev->flags != old_flags)
@@ -4893,22 +4895,9 @@ int dev_set_promiscuity(struct net_device *dev, int inc)
 }
 EXPORT_SYMBOL(dev_set_promiscuity);
 
-/**
- *	dev_set_allmulti	- update allmulti count on a device
- *	@dev: device
- *	@inc: modifier
- *
- *	Add or remove reception of all multicast frames to a device. While the
- *	count in the device remains above zero the interface remains listening
- *	to all interfaces. Once it hits zero the device reverts back to normal
- *	filtering operation. A negative @inc value is used to drop the counter
- *	when releasing a resource needing all multicasts.
- *	Return 0 if successful or a negative errno code on error.
- */
-
-int dev_set_allmulti(struct net_device *dev, int inc)
+static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify)
 {
-	unsigned int old_flags = dev->flags;
+	unsigned int old_flags = dev->flags, old_gflags = dev->gflags;
 
 	ASSERT_RTNL();
 
@@ -4931,9 +4920,30 @@ int dev_set_allmulti(struct net_device *dev, int inc)
 	if (dev->flags ^ old_flags) {
 		dev_change_rx_flags(dev, IFF_ALLMULTI);
 		dev_set_rx_mode(dev);
+		if (notify)
+			__dev_notify_flags(dev, old_flags,
+					   dev->gflags ^ old_gflags);
 	}
 	return 0;
 }
+
+/**
+ *	dev_set_allmulti	- update allmulti count on a device
+ *	@dev: device
+ *	@inc: modifier
+ *
+ *	Add or remove reception of all multicast frames to a device. While the
+ *	count in the device remains above zero the interface remains listening
+ *	to all interfaces. Once it hits zero the device reverts back to normal
+ *	filtering operation. A negative @inc value is used to drop the counter
+ *	when releasing a resource needing all multicasts.
+ *	Return 0 if successful or a negative errno code on error.
+ */
+
+int dev_set_allmulti(struct net_device *dev, int inc)
+{
+	return __dev_set_allmulti(dev, inc, true);
+}
 EXPORT_SYMBOL(dev_set_allmulti);
 
 /*
@@ -4958,10 +4968,10 @@ void __dev_set_rx_mode(struct net_device *dev)
 		 * therefore calling __dev_set_promiscuity here is safe.
 		 */
 		if (!netdev_uc_empty(dev) && !dev->uc_promisc) {
-			__dev_set_promiscuity(dev, 1);
+			__dev_set_promiscuity(dev, 1, false);
 			dev->uc_promisc = true;
 		} else if (netdev_uc_empty(dev) && dev->uc_promisc) {
-			__dev_set_promiscuity(dev, -1);
+			__dev_set_promiscuity(dev, -1, false);
 			dev->uc_promisc = false;
 		}
 	}
@@ -5050,9 +5060,13 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
 
 	if ((flags ^ dev->gflags) & IFF_PROMISC) {
 		int inc = (flags & IFF_PROMISC) ? 1 : -1;
+		unsigned int old_flags = dev->flags;
 
 		dev->gflags ^= IFF_PROMISC;
-		dev_set_promiscuity(dev, inc);
+
+		if (__dev_set_promiscuity(dev, inc, false) >= 0)
+			if (dev->flags != old_flags)
+				dev_set_rx_mode(dev);
 	}
 
 	/* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI
@@ -5063,7 +5077,7 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags)
 		int inc = (flags & IFF_ALLMULTI) ? 1 : -1;
 
 		dev->gflags ^= IFF_ALLMULTI;
-		dev_set_allmulti(dev, inc);
+		__dev_set_allmulti(dev, inc, false);
 	}
 
 	return ret;
@@ -5105,13 +5119,13 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags,
 int dev_change_flags(struct net_device *dev, unsigned int flags)
 {
 	int ret;
-	unsigned int changes, old_flags = dev->flags;
+	unsigned int changes, old_flags = dev->flags, old_gflags = dev->gflags;
 
 	ret = __dev_change_flags(dev, flags);
 	if (ret < 0)
 		return ret;
 
-	changes = old_flags ^ dev->flags;
+	changes = (old_flags ^ dev->flags) | (old_gflags ^ dev->gflags);
 	__dev_notify_flags(dev, old_flags, changes);
 	return ret;
 }
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net v5 1/1] xen-netback: Handle backend state transitions in a more robust way
From: Paul Durrant @ 2013-09-25  9:59 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Ian Campbell, Wei Liu, David Vrabel
In-Reply-To: <1380103168-12067-1-git-send-email-paul.durrant@citrix.com>

When the frontend state changes netback now specifies its desired state to
a new function, set_backend_state(), which transitions through any
necessary intermediate states.
This fixes an issue observed with some old Windows frontend drivers where
they failed to transition through the Closing state and netback would not
behave correctly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/xenbus.c |  143 ++++++++++++++++++++++++++++++--------
 1 file changed, 113 insertions(+), 30 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index a53782e..01329b2 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -24,6 +24,7 @@
 struct backend_info {
 	struct xenbus_device *dev;
 	struct xenvif *vif;
+	enum xenbus_state state;
 	enum xenbus_state frontend_state;
 	struct xenbus_watch hotplug_status_watch;
 	u8 have_hotplug_status_watch:1;
@@ -136,6 +137,8 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		goto fail;
 
+	be->state = XenbusStateInitWait;
+
 	/* This kicks hotplug scripts, so do it immediately. */
 	backend_create_xenvif(be);
 
@@ -208,24 +211,113 @@ static void backend_create_xenvif(struct backend_info *be)
 	kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE);
 }
 
-
-static void disconnect_backend(struct xenbus_device *dev)
+static void backend_disconnect(struct backend_info *be)
 {
-	struct backend_info *be = dev_get_drvdata(&dev->dev);
-
 	if (be->vif)
 		xenvif_disconnect(be->vif);
 }
 
-static void destroy_backend(struct xenbus_device *dev)
+static void backend_connect(struct backend_info *be)
 {
-	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	if (be->vif)
+		connect(be);
+}
 
-	if (be->vif) {
-		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
-		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_free(be->vif);
-		be->vif = NULL;
+static inline void backend_switch_state(struct backend_info *be,
+					enum xenbus_state state)
+{
+	struct xenbus_device *dev = be->dev;
+
+	pr_debug("%s -> %s\n", dev->nodename, xenbus_strstate(state));
+	be->state = state;
+
+	/* If we are waiting for a hotplug script then defer the
+	 * actual xenbus state change.
+	 */
+	if (!be->have_hotplug_status_watch)
+		xenbus_switch_state(dev, state);
+}
+
+/* Handle backend state transitions:
+ *
+ * The backend state starts in InitWait and the following transitions are
+ * allowed.
+ *
+ * InitWait -> Connected
+ *
+ *    ^    \         |
+ *    |     \        |
+ *    |      \       |
+ *    |       \      |
+ *    |        \     |
+ *    |         \    |
+ *    |          V   V
+ *
+ *  Closed  <-> Closing
+ *
+ * The state argument specifies the eventual state of the backend and the
+ * function transitions to that state via the shortest path.
+ */
+static void set_backend_state(struct backend_info *be,
+			      enum xenbus_state state)
+{
+	while (be->state != state) {
+		switch (be->state) {
+		case XenbusStateClosed:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+				pr_info("%s: prepare for reconnect\n",
+					be->dev->nodename);
+				backend_switch_state(be, XenbusStateInitWait);
+				break;
+			case XenbusStateClosing:
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateInitWait:
+			switch (state) {
+			case XenbusStateConnected:
+				backend_connect(be);
+				backend_switch_state(be, XenbusStateConnected);
+				break;
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateConnected:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				backend_disconnect(be);
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateClosing:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosed);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		default:
+			BUG();
+		}
 	}
 }
 
@@ -237,40 +329,33 @@ static void frontend_changed(struct xenbus_device *dev,
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
 
-	pr_debug("frontend state %s\n", xenbus_strstate(frontend_state));
+	pr_debug("%s -> %s\n", dev->otherend, xenbus_strstate(frontend_state));
 
 	be->frontend_state = frontend_state;
 
 	switch (frontend_state) {
 	case XenbusStateInitialising:
-		if (dev->state == XenbusStateClosed) {
-			pr_info("%s: prepare for reconnect\n", dev->nodename);
-			xenbus_switch_state(dev, XenbusStateInitWait);
-		}
+		set_backend_state(be, XenbusStateInitWait);
 		break;
 
 	case XenbusStateInitialised:
 		break;
 
 	case XenbusStateConnected:
-		if (dev->state == XenbusStateConnected)
-			break;
-		if (be->vif)
-			connect(be);
+		set_backend_state(be, XenbusStateConnected);
 		break;
 
 	case XenbusStateClosing:
-		disconnect_backend(dev);
-		xenbus_switch_state(dev, XenbusStateClosing);
+		set_backend_state(be, XenbusStateClosing);
 		break;
 
 	case XenbusStateClosed:
-		xenbus_switch_state(dev, XenbusStateClosed);
+		set_backend_state(be, XenbusStateClosed);
 		if (xenbus_dev_is_online(dev))
 			break;
-		destroy_backend(dev);
 		/* fall through if not online */
 	case XenbusStateUnknown:
+		set_backend_state(be, XenbusStateClosed);
 		device_unregister(&dev->dev);
 		break;
 
@@ -363,7 +448,9 @@ static void hotplug_status_changed(struct xenbus_watch *watch,
 	if (IS_ERR(str))
 		return;
 	if (len == sizeof("connected")-1 && !memcmp(str, "connected", len)) {
-		xenbus_switch_state(be->dev, XenbusStateConnected);
+		/* Complete any pending state change */
+		xenbus_switch_state(be->dev, be->state);
+
 		/* Not interested in this watch anymore. */
 		unregister_hotplug_status_watch(be);
 	}
@@ -393,12 +480,8 @@ static void connect(struct backend_info *be)
 	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
 				   hotplug_status_changed,
 				   "%s/%s", dev->nodename, "hotplug-status");
-	if (err) {
-		/* Switch now, since we can't do a watch. */
-		xenbus_switch_state(dev, XenbusStateConnected);
-	} else {
+	if (!err)
 		be->have_hotplug_status_watch = 1;
-	}
 
 	netif_wake_queue(be->vif->dev);
 }
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net v5 0/1] xen-netback: windows frontend compatibility fixes
From: Paul Durrant @ 2013-09-25  9:59 UTC (permalink / raw)
  To: xen-devel, netdev

The following patches fix a couple more issues found when testing with
Windows frontends.

v5:
- Fix typos pointed out by Wei Liu

v4:
- Fix problem with hotplug failure noticed by Wei Liu

v3:
- Collapse both v2 patches into a single patch that introduces a new
  function to handle backend state transtions. By doing this we ensure that
  we always transition through intermediate states and that we don't attempt
  repeated connects or disconnects.

v2:
- Add comment in 2/2 to note that state transitions from Connected to Closed
  are incorrect.

^ permalink raw reply

* [PATCH net-next v5 5/6] bonding: remove the counter and simplify bond_for_each_slave_from()
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

Restructure the bond_for_each_slave_from(),
remove the checking for bond->slave_cnt,
make the new loop be more simple and racy.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_alb.c  | 3 +--
 drivers/net/bonding/bond_main.c | 6 ++----
 drivers/net/bonding/bonding.h   | 6 +++---
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index f428ef57..4813dc6 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -383,7 +383,6 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
 {
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 	struct slave *rx_slave, *slave, *start_at;
-	int i = 0;
 
 	if (bond_info->next_rx_slave)
 		start_at = bond_info->next_rx_slave;
@@ -392,7 +391,7 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
 
 	rx_slave = NULL;
 
-	bond_for_each_slave_from(bond, slave, i, start_at) {
+	bond_for_each_slave_from(bond, slave, start_at) {
 		if (SLAVE_IS_OK(slave)) {
 			if (!rx_slave) {
 				rx_slave = slave;
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 55bbb8b..fbe25bc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -782,7 +782,6 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
 	struct slave *new_active, *old_active;
 	struct slave *bestslave = NULL;
 	int mintime = bond->params.updelay;
-	int i;
 
 	new_active = bond->curr_active_slave;
 
@@ -801,7 +800,7 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
 	/* remember where to stop iterating over the slaves */
 	old_active = new_active;
 
-	bond_for_each_slave_from(bond, new_active, i, old_active) {
+	bond_for_each_slave_from(bond, new_active, old_active) {
 		if (new_active->link == BOND_LINK_UP) {
 			return new_active;
 		} else if (new_active->link == BOND_LINK_BACK &&
@@ -2756,7 +2755,6 @@ do_failover:
 static void bond_ab_arp_probe(struct bonding *bond)
 {
 	struct slave *slave, *next_slave;
-	int i;
 
 	read_lock(&bond->curr_slave_lock);
 
@@ -2788,7 +2786,7 @@ static void bond_ab_arp_probe(struct bonding *bond)
 
 	/* search for next candidate */
 	next_slave = bond_next_slave(bond, bond->current_arp_slave);
-	bond_for_each_slave_from(bond, slave, i, next_slave) {
+	bond_for_each_slave_from(bond, slave, next_slave) {
 		if (IS_UP(slave->dev)) {
 			slave->link = BOND_LINK_BACK;
 			bond_set_slave_active_flags(slave);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index eb36f57..e5734ad 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -130,9 +130,9 @@
  *
  * Caller must hold bond->lock
  */
-#define bond_for_each_slave_from(bond, pos, cnt, start) \
-	for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
-	     cnt++, pos = bond_next_slave(bond, pos))
+#define bond_for_each_slave_from(bond, pos, start) \
+	for (pos = start; pos; (pos = bond_next_slave(bond, pos)) != start ? \
+	    (pos) : (pos = NULL))
 
 /**
  * bond_for_each_slave - iterate over all slaves
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v5 6/6] bonding: use RCU protection for alb xmit path
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev, Hideaki YOSHIFUJI

The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
(bonding: initial RCU conversion) has convert the roundrobin, active-backup,
broadcast and xor xmit path to rcu protection, the performance will be better
for these mode, so this time, convert xmit path for alb mode.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_alb.c | 32 ++++++++++++++------------------
 drivers/net/bonding/bonding.h  |  2 +-
 2 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 4813dc6..619a611 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -229,7 +229,7 @@ static struct slave *tlb_get_least_loaded_slave(struct bonding *bond)
 	max_gap = LLONG_MIN;
 
 	/* Find the slave with the largest gap */
-	bond_for_each_slave(bond, slave) {
+	bond_for_each_slave_rcu(bond, slave) {
 		if (SLAVE_IS_OK(slave)) {
 			long long gap = compute_gap(slave);
 
@@ -382,16 +382,14 @@ out:
 static struct slave *rlb_next_rx_slave(struct bonding *bond)
 {
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
-	struct slave *rx_slave, *slave, *start_at;
+	struct slave *rx_slave, *slave;
 
 	if (bond_info->next_rx_slave)
-		start_at = bond_info->next_rx_slave;
+		rx_slave = bond_info->next_rx_slave;
 	else
-		start_at = bond_first_slave(bond);
-
-	rx_slave = NULL;
+		rx_slave = bond_first_slave_rcu(bond);
 
-	bond_for_each_slave_from(bond, slave, start_at) {
+	bond_for_each_slave_rcu(bond, slave) {
 		if (SLAVE_IS_OK(slave)) {
 			if (!rx_slave) {
 				rx_slave = slave;
@@ -402,7 +400,7 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
 	}
 
 	if (rx_slave) {
-		slave = bond_next_slave(bond, rx_slave);
+		slave = bond_next_slave_rcu(bond, rx_slave);
 		bond_info->next_rx_slave = slave;
 	}
 
@@ -625,12 +623,14 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 {
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 	struct arp_pkt *arp = arp_pkt(skb);
-	struct slave *assigned_slave;
+	struct slave *assigned_slave, *curr_active_slave;
 	struct rlb_client_info *client_info;
 	u32 hash_index = 0;
 
 	_lock_rx_hashtbl(bond);
 
+	curr_active_slave = rcu_dereference(bond->curr_active_slave);
+
 	hash_index = _simple_hash((u8 *)&arp->ip_dst, sizeof(arp->ip_dst));
 	client_info = &(bond_info->rx_hashtbl[hash_index]);
 
@@ -654,9 +654,9 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 			 * move the old client to primary (curr_active_slave) so
 			 * that the new client can be assigned to this entry.
 			 */
-			if (bond->curr_active_slave &&
-			    client_info->slave != bond->curr_active_slave) {
-				client_info->slave = bond->curr_active_slave;
+			if (curr_active_slave &&
+			    client_info->slave != curr_active_slave) {
+				client_info->slave = curr_active_slave;
 				rlb_update_client(client_info);
 			}
 		}
@@ -1338,8 +1338,6 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 
 	/* make sure that the curr_active_slave do not change during tx
 	 */
-	read_lock(&bond->lock);
-	read_lock(&bond->curr_slave_lock);
 
 	switch (ntohs(skb->protocol)) {
 	case ETH_P_IP: {
@@ -1422,12 +1420,12 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 
 	if (!tx_slave) {
 		/* unbalanced or unassigned, send through primary */
-		tx_slave = bond->curr_active_slave;
+		tx_slave = rcu_dereference(bond->curr_active_slave);
 		bond_info->unbalanced_load += skb->len;
 	}
 
 	if (tx_slave && SLAVE_IS_OK(tx_slave)) {
-		if (tx_slave != bond->curr_active_slave) {
+		if (tx_slave != rcu_dereference(bond->curr_active_slave)) {
 			memcpy(eth_data->h_source,
 			       tx_slave->dev->dev_addr,
 			       ETH_ALEN);
@@ -1442,8 +1440,6 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 		}
 	}
 
-	read_unlock(&bond->curr_slave_lock);
-	read_unlock(&bond->lock);
 	if (res) {
 		/* no suitable interface, frame not sent */
 		kfree_skb(skb);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index e5734ad..87782b4 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -522,7 +522,7 @@ static inline struct slave *bond_slave_has_mac(struct bonding *bond,
 {
 	struct slave *tmp;
 
-	bond_for_each_slave(bond, tmp)
+	bond_for_each_slave_rcu(bond, tmp)
 		if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr))
 			return tmp;
 
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v5 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
(bonding: initial RCU conversion) has convert the roundrobin, active-backup,
broadcast and xor xmit path to rcu protection, the performance will be better
for these mode, so this time, convert xmit path for 3ad mode.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_3ad.c | 32 ++++++++++++++------------------
 drivers/net/bonding/bonding.h  | 30 +++++++++++++++++++++++++++++-
 2 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0d8f427..13f1deb 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -143,7 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
  */
 static inline struct port *__get_first_port(struct bonding *bond)
 {
-	struct slave *first_slave = bond_first_slave(bond);
+	struct slave *first_slave = bond_first_slave_rcu(bond);
 
 	return first_slave ? &(SLAVE_AD_INFO(first_slave).port) : NULL;
 }
@@ -163,7 +163,7 @@ static inline struct port *__get_next_port(struct port *port)
 	// If there's no bond for this port, or this is the last slave
 	if (bond == NULL)
 		return NULL;
-	slave_next = bond_next_slave(bond, slave);
+	slave_next = bond_next_slave_rcu(bond, slave);
 	if (!slave_next || bond_is_first_slave(bond, slave_next))
 		return NULL;
 
@@ -2417,16 +2417,14 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
 
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 {
-	struct slave *slave, *start_at;
 	struct bonding *bond = netdev_priv(dev);
+	struct slave *slave;
 	int slave_agg_no;
 	int slaves_in_agg;
 	int agg_id;
-	int i;
 	struct ad_info ad_info;
 	int res = 1;
 
-	read_lock(&bond->lock);
 	if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
 		pr_debug("%s: Error: __bond_3ad_get_active_agg_info failed\n",
 			 dev->name);
@@ -2444,13 +2442,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 
 	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
 
-	bond_for_each_slave(bond, slave) {
+	bond_for_each_slave_rcu(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
 
 		if (agg && (agg->aggregator_identifier == agg_id)) {
-			slave_agg_no--;
-			if (slave_agg_no < 0)
-				break;
+			if (--slave_agg_no < 0) {
+				if (SLAVE_IS_OK(slave)) {
+					res = bond_dev_queue_xmit(bond,
+						skb, slave->dev);
+					goto out;
+				}
+			}
 		}
 	}
 
@@ -2460,23 +2462,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	start_at = slave;
-
-	bond_for_each_slave_from(bond, slave, i, start_at) {
-		int slave_agg_id = 0;
+	bond_for_each_slave_rcu(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
 
-		if (agg)
-			slave_agg_id = agg->aggregator_identifier;
-
-		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
+		if (SLAVE_IS_OK(slave) && agg &&
+			agg->aggregator_identifier == agg_id) {
 			res = bond_dev_queue_xmit(bond, skb, slave->dev);
 			break;
 		}
 	}
 
 out:
-	read_unlock(&bond->lock);
 	if (res) {
 		/* no suitable interface, frame not sent */
 		kfree_skb(skb);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 03cf3fd..eb36f57 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -74,13 +74,31 @@
 /* slave list primitives */
 #define bond_to_slave(ptr) list_entry(ptr, struct slave, list)
 
+/* slave list primitives, Caller must hold rcu_read_lock */
+#define bond_to_slave_rcu(ptr) list_entry_rcu(ptr, struct slave, list)
+
+/* bond_is_empty return NULL if slave list is empty*/
+#define bond_is_empty(bond) \
+	(list_empty(&(bond)->slave_list))
+
+/* bond_is_empty_rcu return NULL if slave list is empty*/
+#define bond_is_empty_rcu(bond) \
+	(!list_first_or_null_rcu(&(bond)->slave_list, struct slave, list))
+
 /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
 #define bond_first_slave(bond) \
 	list_first_entry_or_null(&(bond)->slave_list, struct slave, list)
 #define bond_last_slave(bond) \
-	(list_empty(&(bond)->slave_list) ? NULL : \
+	(bond_is_empty(bond) ? NULL : \
 					   bond_to_slave((bond)->slave_list.prev))
 
+/**
+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
+ * Caller must hold rcu_read_lock
+ */
+#define bond_first_slave_rcu(bond) \
+	list_first_or_null_rcu(&(bond)->slave_list, struct slave, list);
+
 #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
 #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
 
@@ -93,6 +111,16 @@
 	(bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
 					  bond_to_slave((pos)->list.prev))
 
+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
+#define bond_next_slave_rcu(bond, pos) \
+	({struct list_head *__slave_list = &(bond)->slave_list; \
+	 struct list_head __rcu *__next = list_next_rcu(__slave_list); \
+	 struct list_head __rcu *__pos_next = list_next_rcu(&(pos)->list); \
+	 likely(__pos_next != __slave_list) ? \
+	 container_of(__pos_next, struct slave, list) : \
+	 container_of(__next, struct slave, list); \
+	 })
+
 /**
  * bond_for_each_slave_from - iterate the slaves list from a starting point
  * @bond:	the bond holding this list.
-- 
1.8.0

^ permalink raw reply related

* [PATCH net-next v5 3/6] bonding: replace read_lock to rcu_read_lock for bond_3ad_get_active_agg_info()
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

the bond slave list will protected by rcu, so the read lock should replace by
rcu read lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_3ad.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c134f43..b678502 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2408,9 +2408,9 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
 {
 	int ret;
 
-	read_lock(&bond->lock);
+	rcu_read_lock();
 	ret = __bond_3ad_get_active_agg_info(bond, ad_info);
-	read_unlock(&bond->lock);
+	rcu_read_unlock();
 
 	return ret;
 }
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v5 2/6] bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

There is no pointer needed read lock protection, remove the unnecessary lock
and improve performance for the 3ad recv path.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_3ad.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 7a3860f..c134f43 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2494,9 +2494,7 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
 	if (!lacpdu)
 		return ret;
 
-	read_lock(&bond->lock);
 	ret = bond_3ad_rx_indication(lacpdu, slave, skb->len);
-	read_unlock(&bond->lock);
 	return ret;
 }
 
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v5 4/6] bonding: add rtnl lock for bonding_store_xmit_hash
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

the bonding_store_xmit_hash() could update bond's xmit_policy, and
the xmit_policy is used in xmit path for xor mode, maybe it is hard
to occur any problem, but just follow the logic "don't change anything
slave-related without rtnl", so I think the rntl lock is fit here. :)

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 0f539de..deb1d8e 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -382,6 +382,9 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
 
+	if (!rtnl_trylock())
+		return restart_syscall();
+
 	new_value = bond_parse_parm(buf, xmit_hashtype_tbl);
 	if (new_value < 0)  {
 		pr_err("%s: Ignoring invalid xmit hash policy value %.*s.\n",
@@ -396,6 +399,7 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 			xmit_hashtype_tbl[new_value].modename, new_value);
 	}
 
+	rtnl_unlock();
 	return ret;
 }
 static DEVICE_ATTR(xmit_hash_policy, S_IRUGO | S_IWUSR,
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v5 0/6] bonding: Patchset for rcu use in bonding
From: Ding Tianhong @ 2013-09-25  9:52 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

Hi:

The Patch Set convert the xmit of 3ad and alb mode to use rcu lock.
restructure and add more rcu list function.
add rtnl lock to protect bonding_store_xmit_hash().
remove read lock in bond_3ad_lacpdu_recv().

I test the patch well and no problems found till now.

v1: add 1 patch named remove the no effect lock for bond_3ad_lacpdu_recv().
    get the advice from Nikolay Aleksandrov, and modify some mistake in the code.

v2: accept the Nikolay Aleksandrov's advise, modify the patch 05/06.
    the new bond_for_each_slave_from will not use bond->slave_cnt and int cnt any more,
    it test well.
    move the rcu_dereference call of curr_active_slave after the lock avoid 
    a race condition with the slave removal. 

v3: according to the advise of Veaceslav Falico and David S. Miller, modify the patch 01/06 and patch 05/06.
    add rcu protect for read bond->slave_list.
    simplify the bond_for_each_slave_from() and  bond_for_each_slave_from_rcu()

v4: thanks for Veaceslav Falico's patient guidance, fix the problem of patch 01/06 and
patch 05/06.
    restructure the bond_first_slave_rcu().
    restructure the bond_for_each_slave() and bond_for_each_slave_rcu().

v5: modifty the patch 01/06,05/06,06/06
    restructure the bond_first_slave_rcu().
    remove the bond_for_each_slave_from_rcu(), it is no need to use till now, and the restructure is really
    hard work, so miss it, and rebuild the logic for bond_for_each_slave_from in rcu protect.

Ding Tianhong (4):
Wang Yufen (1):
Yang Yingliang (1):
  root (6):
  bonding: simplify and use RCU protection for 3ad xmit path
  bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
  bonding: replace read_lock to rcu_read_lock for
  bond_3ad_get_active_agg_info()
  bonding: add rtnl lock for bonding_store_xmit_hash
  bonding: restructure and add rcu for bond_for_each_slave_next()
  bonding: use RCU protection for alb xmit path

 drivers/net/bonding/bond_3ad.c   | 38 ++++++++++++---------------
 drivers/net/bonding/bond_alb.c   | 23 +++++++----------
 drivers/net/bonding/bond_main.c  |  6 ++---
 drivers/net/bonding/bond_sysfs.c |  4 +++
 drivers/net/bonding/bonding.h    | 56 +++++++++++++++++++++++++++++++++++-----
 5 files changed, 82 insertions(+), 45 deletions(-)

-- 
1.8.2.1

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox