netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] bonding: modify the current and add new hash functions
@ 2013-09-25 15:19 Nikolay Aleksandrov
  2013-09-25 15:19 ` [PATCH net-next 1/2] bonding: modify the old and add new xmit " Nikolay Aleksandrov
  2013-09-25 15:19 ` [PATCH net-next 2/2] bonding: document the new xmit policy modes and update the changed ones Nikolay Aleksandrov
  0 siblings, 2 replies; 5+ messages in thread
From: Nikolay Aleksandrov @ 2013-09-25 15:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet

Hi all,
This is a complete remake of my old patch that modified the bonding hash
functions to use skb_flow_dissect which was suggested by Eric Dumazet.
This time around I've left the old modes although using a new hash function
again suggested by Eric, which is the same for all modes. The only
difference is the way the headers are obtained. The old modes obtain them
as before in order to address concerns about speed, but the 2 new ones use
skb_flow_dissect. The unification of the hash function allows to remove a
pointer from struct bonding and also a few extra functions that dealt with
it. Two new functions are added which take care of the hashing based on
bond->params.xmit_policy only:
bond_xmit_hash() - global function, used by XOR and 3ad modes
bond_flow_dissect() - used by bond_xmit_hash() to obtain the necessary
headers and combine them according to bond->params.xmit_policy.


Best regards,
 Nikolay Aleksandrov

Nikolay Aleksandrov (2):
  bonding: modify the old and add new xmit hash functions
  bonding: document the new xmit policy modes and update the changed
    ones

 Documentation/networking/bonding.txt |  66 ++++++-----
 drivers/net/bonding/bond_3ad.c       |   2 +-
 drivers/net/bonding/bond_main.c      | 211 ++++++++++++++---------------------
 drivers/net/bonding/bond_sysfs.c     |   2 -
 drivers/net/bonding/bonding.h        |   3 +-
 include/uapi/linux/if_bonding.h      |   2 +
 6 files changed, 122 insertions(+), 164 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next 1/2] bonding: modify the old and add new xmit hash functions
  2013-09-25 15:19 [PATCH net-next 0/2] bonding: modify the current and add new hash functions Nikolay Aleksandrov
@ 2013-09-25 15:19 ` Nikolay Aleksandrov
  2013-09-25 16:26   ` Veaceslav Falico
  2013-09-25 15:19 ` [PATCH net-next 2/2] bonding: document the new xmit policy modes and update the changed ones Nikolay Aleksandrov
  1 sibling, 1 reply; 5+ messages in thread
From: Nikolay Aleksandrov @ 2013-09-25 15:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet

This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.

The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
One line is intentionally left at 82 chars since it's the whole function
and IMO looks better that way.

 drivers/net/bonding/bond_3ad.c   |   2 +-
 drivers/net/bonding/bond_main.c  | 211 +++++++++++++++------------------------
 drivers/net/bonding/bond_sysfs.c |   2 -
 drivers/net/bonding/bonding.h    |   3 +-
 include/uapi/linux/if_bonding.h  |   2 +
 5 files changed, 86 insertions(+), 134 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0d8f427..b3ab703 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2442,7 +2442,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
 
 	bond_for_each_slave(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 55bbb8b..73e416b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -78,6 +78,7 @@
 #include <net/netns/generic.h>
 #include <net/pkt_sched.h>
 #include <linux/rculist.h>
+#include <net/flow_keys.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 #include "bond_alb.h"
@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
 module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
-				   "2 for layer 2+3");
+				   "2 for layer 2+3, 3 for encap layer 2+3, "
+				   "4 for encap layer 3+4");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
 {	"layer2",		BOND_XMIT_POLICY_LAYER2},
 {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
 {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
 {	NULL,			-1},
 };
 
@@ -3026,99 +3030,99 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+/* L2 hash helper */
+static inline u32 bond_eth_hash(struct sk_buff *skb)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
-		return (data->h_dest[5] ^ data->h_source[5]) % count;
+		return data->h_dest[5] ^ data->h_source[5];
 
 	return 0;
 }
 
-/*
- * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
- */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+static void bond_flow_get_ports(struct sk_buff *skb, struct flow_keys *fk,
+				int offset)
+{
+	__be32 *ports;
+
+	ports = skb_header_pointer(skb, offset, sizeof(fk->ports), &fk->ports);
+	if (ports)
+		fk->ports = *ports;
+}
+
+/* Extract the appropriate headers based on bond's xmit policy */
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
+			      struct flow_keys *fk)
 {
-	const struct ethhdr *data;
+	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	u32 v6hash;
-	const __be32 *s, *d;
+	int noff, proto = -1, poff = -1;
+	bool ret = false;
+
+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
+		return skb_flow_dissect(skb, fk);
 
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_network_may_pull(skb, sizeof(*iph))) {
+	fk->ports = 0;
+	noff = skb_network_offset(skb);
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
+			goto out;
 		iph = ip_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
-			(data->h_dest[5] ^ data->h_source[5])) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
-	}
-
-	return bond_xmit_hash_policy_l2(skb, count);
+		fk->src = iph->saddr;
+		fk->dst = iph->daddr;
+		noff += iph->ihl << 2;
+		if (!ip_is_fragment(iph))
+			proto = iph->protocol;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
+			goto out;
+		iph6 = ipv6_hdr(skb);
+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
+		noff += sizeof(*iph6);
+		proto = iph6->nexthdr;
+	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34) {
+		poff = proto_ports_offset(proto);
+		if (poff >= 0)
+			bond_flow_get_ports(skb, fk, poff + noff);
+	}
+	ret = true;
+
+out:
+	return ret;
 }
 
-/*
- * Hash for the output device based upon layer 3 and layer 4 data. If
- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ * @count: modulo value
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ * which will be reduced modulo count before returning.
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
 {
-	u32 layer4_xor = 0;
-	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	const __be32 *s, *d;
-	const __be16 *l4 = NULL;
-	__be16 _l4[2];
-	int noff = skb_network_offset(skb);
-	int poff;
-
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
-		iph = ip_hdr(skb);
-		poff = proto_ports_offset(iph->protocol);
+	struct flow_keys flow;
+	u32 hash;
 
-		if (!ip_is_fragment(iph) && poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		poff = proto_ports_offset(ipv6h->nexthdr);
-		if (poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
-			       (layer4_xor >> 8);
-		return layer4_xor % count;
-	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
+	    !bond_flow_dissect(bond, skb, &flow))
+		return bond_eth_hash(skb) % count;
+
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
+		hash = bond_eth_hash(skb);
+	else
+		hash = (__force u32)flow.ports;
+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
+	hash ^= (hash >> 16);
+	hash ^= (hash >> 8);
 
-	return bond_xmit_hash_policy_l2(skb, count);
+	return hash % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
@@ -3700,8 +3704,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
 	return NETDEV_TX_OK;
 }
 
-/*
- * In bond_xmit_xor() , we determine the output device by using a pre-
+/* In bond_xmit_xor() , we determine the output device by using a pre-
  * determined xmit_hash_policy(), If the selected device is not enabled,
  * find the next active slave.
  */
@@ -3709,8 +3712,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 
-	bond_xmit_slave_id(bond, skb,
-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
 
 	return NETDEV_TX_OK;
 }
@@ -3746,22 +3748,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
 
 /*------------------------- Device initialization ---------------------------*/
 
-static void bond_set_xmit_hash_policy(struct bonding *bond)
-{
-	switch (bond->params.xmit_policy) {
-	case BOND_XMIT_POLICY_LAYER23:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
-		break;
-	case BOND_XMIT_POLICY_LAYER34:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
-		break;
-	case BOND_XMIT_POLICY_LAYER2:
-	default:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
-		break;
-	}
-}
-
 /*
  * Lookup the slave that corresponds to a qid
  */
@@ -3871,38 +3857,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
-/*
- * set bond mode specific net device operations
- */
-void bond_set_mode_ops(struct bonding *bond, int mode)
-{
-	struct net_device *bond_dev = bond->dev;
-
-	switch (mode) {
-	case BOND_MODE_ROUNDROBIN:
-		break;
-	case BOND_MODE_ACTIVEBACKUP:
-		break;
-	case BOND_MODE_XOR:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_BROADCAST:
-		break;
-	case BOND_MODE_8023AD:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_ALB:
-		/* FALLTHRU */
-	case BOND_MODE_TLB:
-		break;
-	default:
-		/* Should never happen, mode already checked */
-		pr_err("%s: Error: Unknown bonding mode %d\n",
-		       bond_dev->name, mode);
-		break;
-	}
-}
-
 static int bond_ethtool_get_settings(struct net_device *bond_dev,
 				     struct ethtool_cmd *ecmd)
 {
@@ -4004,7 +3958,6 @@ static void bond_setup(struct net_device *bond_dev)
 	ether_setup(bond_dev);
 	bond_dev->netdev_ops = &bond_netdev_ops;
 	bond_dev->ethtool_ops = &bond_ethtool_ops;
-	bond_set_mode_ops(bond, bond->params.mode);
 
 	bond_dev->destructor = bond_destructor;
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index c29b836..dba3b9b 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -352,7 +352,6 @@ static ssize_t bonding_store_mode(struct device *d,
 	/* don't cache arp_validate between modes */
 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
 	bond->params.mode = new_value;
-	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",
 		bond->dev->name, bond_mode_tbl[new_value].modename,
 		new_value);
@@ -392,7 +391,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 		ret = -EINVAL;
 	} else {
 		bond->params.xmit_policy = new_value;
-		bond_set_mode_ops(bond, bond->params.mode);
 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
 			bond->dev->name,
 			xmit_hashtype_tbl[new_value].modename, new_value);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 03cf3fd..4db9ec4 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -245,7 +245,6 @@ struct bonding {
 	char     proc_file_name[IFNAMSIZ];
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, int);
 	u16      rr_tx_counter;
 	struct   ad_bond_info ad_info;
 	struct   alb_bond_info alb_info;
@@ -446,7 +445,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
 void bond_mii_monitor(struct work_struct *);
 void bond_loadbalance_arp_mon(struct work_struct *);
 void bond_activebackup_arp_mon(struct work_struct *);
-void bond_set_mode_ops(struct bonding *bond, int mode);
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
 int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
 void bond_select_active_slave(struct bonding *bond);
 void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index a17edda..9635a62 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -91,6 +91,8 @@
 #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
 #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
 #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 
 typedef struct ifbond {
 	__s32 bond_mode;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 2/2] bonding: document the new xmit policy modes and update the changed ones
  2013-09-25 15:19 [PATCH net-next 0/2] bonding: modify the current and add new hash functions Nikolay Aleksandrov
  2013-09-25 15:19 ` [PATCH net-next 1/2] bonding: modify the old and add new xmit " Nikolay Aleksandrov
@ 2013-09-25 15:19 ` Nikolay Aleksandrov
  1 sibling, 0 replies; 5+ messages in thread
From: Nikolay Aleksandrov @ 2013-09-25 15:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet

Add new documentation for encap2+3 and encap3+4, also update the formula
for the old modes due to the changes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
I'm not good at writing documentation so any suggestions on how to improve
this text are very welcome.

 Documentation/networking/bonding.txt | 66 ++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9b28e71..3856ed2 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -743,21 +743,16 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The IPv4 formula is
+		generate the hash.  The formula is
 
-		(((source IP XOR dest IP) AND 0xffff) XOR
-			( source MAC XOR destination MAC ))
-				modulo slave count
+		hash = source MAC XOR destination MAC
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		The IPv6 formula is
-
-		hash = (source ip quad 2 XOR dest IP quad 2) XOR
-		       (source ip quad 3 XOR dest IP quad 3) XOR
-		       (source ip quad 4 XOR dest IP quad 4)
-
-		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			XOR (source MAC XOR destination MAC))
-				modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
@@ -779,21 +774,16 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented IPv4 TCP and UDP packets is
-
-		((source port XOR dest port) XOR
-			 ((source IP XOR dest IP) AND 0xffff)
-				modulo slave count
+		The formula for unfragmented TCP and UDP packets is
 
-		The formula for unfragmented IPv6 TCP and UDP packets is
+		hash = source port, destination port (as in the header)
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		hash = (source port XOR dest port) XOR
-		       ((source ip quad 2 XOR dest IP quad 2) XOR
-			(source ip quad 3 XOR dest IP quad 3) XOR
-			(source ip quad 4 XOR dest IP quad 4))
-
-		((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		For fragmented TCP or UDP packets and all other IPv4 and
 		IPv6 protocol traffic, the source and destination port
@@ -801,10 +791,6 @@ xmit_hash_policy
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		The IPv4 policy is intended to mimic the behavior of
-		certain switches, notably Cisco switches with PFC2 as
-		well as some Foundry and IBM products.
-
 		This algorithm is not fully 802.3ad compliant.  A
 		single TCP or UDP conversation containing both
 		fragmented and unfragmented packets will see packets
@@ -815,6 +801,26 @@ xmit_hash_policy
 		conversations.  Other implementations of 802.3ad may
 		or may not tolerate this noncompliance.
 
+	encap2+3
+
+		This policy uses the same formula as layer2+3 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
+	encap3+4
+
+		This policy uses the same formula as layer3+4 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] bonding: modify the old and add new xmit hash functions
  2013-09-25 15:19 ` [PATCH net-next 1/2] bonding: modify the old and add new xmit " Nikolay Aleksandrov
@ 2013-09-25 16:26   ` Veaceslav Falico
  2013-09-25 16:36     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 5+ messages in thread
From: Veaceslav Falico @ 2013-09-25 16:26 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet

On Wed, Sep 25, 2013 at 05:19:57PM +0200, Nikolay Aleksandrov wrote:
>This patch adds two new hash policy modes which use skb_flow_dissect:
>3 - Encapsulated layer 2+3
>4 - Encapsulated layer 3+4
>There should be a good improvement for tunnel users in those modes.
>It also changes the old hash functions to:
>hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>hash ^= (hash >> 16);
>hash ^= (hash >> 8);
>
>Where hash will be initialized either to L2 hash, that is
>SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
>from the upper layer. Flow's dst and src are also extracted based on the
>xmit policy either directly from the buffer or by using skb_flow_dissect,
>but in both cases if the protocol is IPv6 then dst and src are obtained by
>ipv6_addr_hash() on the real addresses. In case of a non-dissectable
>packet, the algorithms fall back to L2 hashing.
>The bond_set_mode_ops() function is now obsolete and thus deleted
>because it was used only to set the proper hash policy. Also we trim a
>pointer from struct bonding because we no longer need to keep the hash
>function, now there's only a single hash function - bond_xmit_hash that
>works based on bond->params.xmit_policy.
>
>The hash function and skb_flow_dissect were suggested by Eric Dumazet.
>The layer names were suggested by Andy Gospodarek, because I suck at
>semantics.
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>---
>One line is intentionally left at 82 chars since it's the whole function
>and IMO looks better that way.
>
> drivers/net/bonding/bond_3ad.c   |   2 +-
> drivers/net/bonding/bond_main.c  | 211 +++++++++++++++------------------------
> drivers/net/bonding/bond_sysfs.c |   2 -
> drivers/net/bonding/bonding.h    |   3 +-
> include/uapi/linux/if_bonding.h  |   2 +
> 5 files changed, 86 insertions(+), 134 deletions(-)

Nice!!! Though some comments below.

>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 0d8f427..b3ab703 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2442,7 +2442,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
>
> 	bond_for_each_slave(bond, slave) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 55bbb8b..73e416b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -78,6 +78,7 @@
> #include <net/netns/generic.h>
> #include <net/pkt_sched.h>
> #include <linux/rculist.h>
>+#include <net/flow_keys.h>
> #include "bonding.h"
> #include "bond_3ad.h"
> #include "bond_alb.h"
>@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
> module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
> 				   "0 for layer 2 (default), 1 for layer 3+4, "
>-				   "2 for layer 2+3");
>+				   "2 for layer 2+3, 3 for encap layer 2+3, "
>+				   "4 for encap layer 3+4");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
> {	"layer2",		BOND_XMIT_POLICY_LAYER2},
> {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
> {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
>+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
>+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
> {	NULL,			-1},
> };
>
>@@ -3026,99 +3030,99 @@ static struct notifier_block bond_netdev_notifier = {
>
> /*---------------------------- Hashing Policies -----------------------------*/
>
>-/*
>- * Hash for the output device based upon layer 2 data
>- */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+/* L2 hash helper */
>+static inline u32 bond_eth_hash(struct sk_buff *skb)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>
> 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>-		return (data->h_dest[5] ^ data->h_source[5]) % count;
>+		return data->h_dest[5] ^ data->h_source[5];
>
> 	return 0;
> }
>
>-/*
>- * Hash for the output device based upon layer 2 and layer 3 data. If
>- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>- */
>-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>+static void bond_flow_get_ports(struct sk_buff *skb, struct flow_keys *fk,
>+				int offset)
>+{
>+	__be32 *ports;
>+
>+	ports = skb_header_pointer(skb, offset, sizeof(fk->ports), &fk->ports);
>+	if (ports)
>+		fk->ports = *ports;
>+}
>+
>+/* Extract the appropriate headers based on bond's xmit policy */
>+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>+			      struct flow_keys *fk)
> {
>-	const struct ethhdr *data;
>+	const struct ipv6hdr *iph6;
> 	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	u32 v6hash;
>-	const __be32 *s, *d;
>+	int noff, proto = -1, poff = -1;

Nitpick: Useless initialization of poff.

>+	bool ret = false;
>+
>+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
>+		return skb_flow_dissect(skb, fk);
>
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_network_may_pull(skb, sizeof(*iph))) {
>+	fk->ports = 0;
>+	noff = skb_network_offset(skb);
>+	if (skb->protocol == htons(ETH_P_IP)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
>+			goto out;
> 		iph = ip_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>-			(data->h_dest[5] ^ data->h_source[5])) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>-	}
>-
>-	return bond_xmit_hash_policy_l2(skb, count);
>+		fk->src = iph->saddr;
>+		fk->dst = iph->daddr;
>+		noff += iph->ihl << 2;
>+		if (!ip_is_fragment(iph))
>+			proto = iph->protocol;
>+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
>+			goto out;
>+		iph6 = ipv6_hdr(skb);
>+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
>+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
>+		noff += sizeof(*iph6);
>+		proto = iph6->nexthdr;
>+	}

else
	return false;

?

Otherwise we might report false-positive for vlans, per example, without
populating the flow keys.

>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34) {
>+		poff = proto_ports_offset(proto);
>+		if (poff >= 0)
>+			bond_flow_get_ports(skb, fk, poff + noff);
>+	}

One idea would be to move the same initialization code from
skb_flow_dissect() to an external function and re-use it here?

>+	ret = true;
>+
>+out:
>+	return ret;
> }
>
>-/*
>- * Hash for the output device based upon layer 3 and layer 4 data. If
>- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>+/**
>+ * bond_xmit_hash - generate a hash value based on the xmit policy
>+ * @bond: bonding device
>+ * @skb: buffer to use for headers
>+ * @count: modulo value
>+ *
>+ * This function will extract the necessary headers from the skb buffer and use
>+ * them to generate a hash based on the xmit_policy set in the bonding device
>+ * which will be reduced modulo count before returning.
>  */
>-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
> {
>-	u32 layer4_xor = 0;
>-	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	const __be32 *s, *d;
>-	const __be16 *l4 = NULL;
>-	__be16 _l4[2];
>-	int noff = skb_network_offset(skb);
>-	int poff;
>-
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
>-		iph = ip_hdr(skb);
>-		poff = proto_ports_offset(iph->protocol);
>+	struct flow_keys flow;
>+	u32 hash;
>
>-		if (!ip_is_fragment(iph) && poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		return (layer4_xor ^
>-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		poff = proto_ports_offset(ipv6h->nexthdr);
>-		if (poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
>-			       (layer4_xor >> 8);
>-		return layer4_xor % count;
>-	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>+	    !bond_flow_dissect(bond, skb, &flow))
>+		return bond_eth_hash(skb) % count;
>+
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
>+		hash = bond_eth_hash(skb);
>+	else
>+		hash = (__force u32)flow.ports;
>+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>+	hash ^= (hash >> 16);
>+	hash ^= (hash >> 8);
>
>-	return bond_xmit_hash_policy_l2(skb, count);
>+	return hash % count;
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>@@ -3700,8 +3704,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> 	return NETDEV_TX_OK;
> }
>
>-/*
>- * In bond_xmit_xor() , we determine the output device by using a pre-
>+/* In bond_xmit_xor() , we determine the output device by using a pre-
>  * determined xmit_hash_policy(), If the selected device is not enabled,
>  * find the next active slave.
>  */
>@@ -3709,8 +3712,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>
>-	bond_xmit_slave_id(bond, skb,
>-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
>+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
>
> 	return NETDEV_TX_OK;
> }
>@@ -3746,22 +3748,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
>
> /*------------------------- Device initialization ---------------------------*/
>
>-static void bond_set_xmit_hash_policy(struct bonding *bond)
>-{
>-	switch (bond->params.xmit_policy) {
>-	case BOND_XMIT_POLICY_LAYER23:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER34:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER2:
>-	default:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
>-		break;
>-	}
>-}
>-
> /*
>  * Lookup the slave that corresponds to a qid
>  */
>@@ -3871,38 +3857,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 	return ret;
> }
>
>-/*
>- * set bond mode specific net device operations
>- */
>-void bond_set_mode_ops(struct bonding *bond, int mode)
>-{
>-	struct net_device *bond_dev = bond->dev;
>-
>-	switch (mode) {
>-	case BOND_MODE_ROUNDROBIN:
>-		break;
>-	case BOND_MODE_ACTIVEBACKUP:
>-		break;
>-	case BOND_MODE_XOR:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_BROADCAST:
>-		break;
>-	case BOND_MODE_8023AD:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_ALB:
>-		/* FALLTHRU */
>-	case BOND_MODE_TLB:
>-		break;
>-	default:
>-		/* Should never happen, mode already checked */
>-		pr_err("%s: Error: Unknown bonding mode %d\n",
>-		       bond_dev->name, mode);
>-		break;
>-	}
>-}
>-
> static int bond_ethtool_get_settings(struct net_device *bond_dev,
> 				     struct ethtool_cmd *ecmd)
> {
>@@ -4004,7 +3958,6 @@ static void bond_setup(struct net_device *bond_dev)
> 	ether_setup(bond_dev);
> 	bond_dev->netdev_ops = &bond_netdev_ops;
> 	bond_dev->ethtool_ops = &bond_ethtool_ops;
>-	bond_set_mode_ops(bond, bond->params.mode);
>
> 	bond_dev->destructor = bond_destructor;
>
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index c29b836..dba3b9b 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -352,7 +352,6 @@ static ssize_t bonding_store_mode(struct device *d,
> 	/* don't cache arp_validate between modes */
> 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
> 	bond->params.mode = new_value;
>-	bond_set_mode_ops(bond, bond->params.mode);
> 	pr_info("%s: setting mode to %s (%d).\n",
> 		bond->dev->name, bond_mode_tbl[new_value].modename,
> 		new_value);
>@@ -392,7 +391,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
> 		ret = -EINVAL;
> 	} else {
> 		bond->params.xmit_policy = new_value;
>-		bond_set_mode_ops(bond, bond->params.mode);
> 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
> 			bond->dev->name,
> 			xmit_hashtype_tbl[new_value].modename, new_value);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 03cf3fd..4db9ec4 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -245,7 +245,6 @@ struct bonding {
> 	char     proc_file_name[IFNAMSIZ];
> #endif /* CONFIG_PROC_FS */
> 	struct   list_head bond_list;
>-	int      (*xmit_hash_policy)(struct sk_buff *, int);
> 	u16      rr_tx_counter;
> 	struct   ad_bond_info ad_info;
> 	struct   alb_bond_info alb_info;
>@@ -446,7 +445,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
> void bond_mii_monitor(struct work_struct *);
> void bond_loadbalance_arp_mon(struct work_struct *);
> void bond_activebackup_arp_mon(struct work_struct *);
>-void bond_set_mode_ops(struct bonding *bond, int mode);
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
> int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
> void bond_select_active_slave(struct bonding *bond);
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
>diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
>index a17edda..9635a62 100644
>--- a/include/uapi/linux/if_bonding.h
>+++ b/include/uapi/linux/if_bonding.h
>@@ -91,6 +91,8 @@
> #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
>+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
>+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
>
> typedef struct ifbond {
> 	__s32 bond_mode;
>-- 
>1.8.1.4
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] bonding: modify the old and add new xmit hash functions
  2013-09-25 16:26   ` Veaceslav Falico
@ 2013-09-25 16:36     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 5+ messages in thread
From: Nikolay Aleksandrov @ 2013-09-25 16:36 UTC (permalink / raw)
  To: Veaceslav Falico; +Cc: netdev, davem, andy, fubar, eric.dumazet

On 09/25/2013 06:26 PM, Veaceslav Falico wrote:
> On Wed, Sep 25, 2013 at 05:19:57PM +0200, Nikolay Aleksandrov wrote:
>> This patch adds two new hash policy modes which use skb_flow_dissect:
>> 3 - Encapsulated layer 2+3
>> 4 - Encapsulated layer 3+4
>> There should be a good improvement for tunnel users in those modes.
>> It also changes the old hash functions to:
>> hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>> hash ^= (hash >> 16);
>> hash ^= (hash >> 8);
>>
>> Where hash will be initialized either to L2 hash, that is
>> SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
>> from the upper layer. Flow's dst and src are also extracted based on the
>> xmit policy either directly from the buffer or by using skb_flow_dissect,
>> but in both cases if the protocol is IPv6 then dst and src are obtained by
>> ipv6_addr_hash() on the real addresses. In case of a non-dissectable
>> packet, the algorithms fall back to L2 hashing.
>> The bond_set_mode_ops() function is now obsolete and thus deleted
>> because it was used only to set the proper hash policy. Also we trim a
>> pointer from struct bonding because we no longer need to keep the hash
>> function, now there's only a single hash function - bond_xmit_hash that
>> works based on bond->params.xmit_policy.
>>
>> The hash function and skb_flow_dissect were suggested by Eric Dumazet.
>> The layer names were suggested by Andy Gospodarek, because I suck at
>> semantics.
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>> One line is intentionally left at 82 chars since it's the whole function
>> and IMO looks better that way.
>>
>> drivers/net/bonding/bond_3ad.c   |   2 +-
>> drivers/net/bonding/bond_main.c  | 211
>> +++++++++++++++------------------------
>> drivers/net/bonding/bond_sysfs.c |   2 -
>> drivers/net/bonding/bonding.h    |   3 +-
>> include/uapi/linux/if_bonding.h  |   2 +
>> 5 files changed, 86 insertions(+), 134 deletions(-)
> 
> Nice!!! Though some comments below.
> 
>>
>> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>> index 0d8f427..b3ab703 100644
>> --- a/drivers/net/bonding/bond_3ad.c
>> +++ b/drivers/net/bonding/bond_3ad.c
>> @@ -2442,7 +2442,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct
>> net_device *dev)
>>         goto out;
>>     }
>>
>> -    slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>> +    slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
>>
>>     bond_for_each_slave(bond, slave) {
>>         struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>> diff --git a/drivers/net/bonding/bond_main.c
>> b/drivers/net/bonding/bond_main.c
>> index 55bbb8b..73e416b 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -78,6 +78,7 @@
>> #include <net/netns/generic.h>
>> #include <net/pkt_sched.h>
>> #include <linux/rculist.h>
>> +#include <net/flow_keys.h>
>> #include "bonding.h"
>> #include "bond_3ad.h"
>> #include "bond_alb.h"
>> @@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of
>> available links before turning on
>> module_param(xmit_hash_policy, charp, 0);
>> MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing
>> method; "
>>                    "0 for layer 2 (default), 1 for layer 3+4, "
>> -                   "2 for layer 2+3");
>> +                   "2 for layer 2+3, 3 for encap layer 2+3, "
>> +                   "4 for encap layer 3+4");
>> module_param(arp_interval, int, 0);
>> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
>> module_param_array(arp_ip_target, charp, NULL, 0);
>> @@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
>> {    "layer2",        BOND_XMIT_POLICY_LAYER2},
>> {    "layer3+4",        BOND_XMIT_POLICY_LAYER34},
>> {    "layer2+3",        BOND_XMIT_POLICY_LAYER23},
>> +{    "encap2+3",        BOND_XMIT_POLICY_ENCAP23},
>> +{    "encap3+4",        BOND_XMIT_POLICY_ENCAP34},
>> {    NULL,            -1},
>> };
>>
>> @@ -3026,99 +3030,99 @@ static struct notifier_block bond_netdev_notifier
>> = {
>>
>> /*---------------------------- Hashing Policies
>> -----------------------------*/
>>
>> -/*
>> - * Hash for the output device based upon layer 2 data
>> - */
>> -static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>> +/* L2 hash helper */
>> +static inline u32 bond_eth_hash(struct sk_buff *skb)
>> {
>>     struct ethhdr *data = (struct ethhdr *)skb->data;
>>
>>     if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>> -        return (data->h_dest[5] ^ data->h_source[5]) % count;
>> +        return data->h_dest[5] ^ data->h_source[5];
>>
>>     return 0;
>> }
>>
>> -/*
>> - * Hash for the output device based upon layer 2 and layer 3 data. If
>> - * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>> - */
>> -static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>> +static void bond_flow_get_ports(struct sk_buff *skb, struct flow_keys *fk,
>> +                int offset)
>> +{
>> +    __be32 *ports;
>> +
>> +    ports = skb_header_pointer(skb, offset, sizeof(fk->ports), &fk->ports);
>> +    if (ports)
>> +        fk->ports = *ports;
>> +}
>> +
>> +/* Extract the appropriate headers based on bond's xmit policy */
>> +static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>> +                  struct flow_keys *fk)
>> {
>> -    const struct ethhdr *data;
>> +    const struct ipv6hdr *iph6;
>>     const struct iphdr *iph;
>> -    const struct ipv6hdr *ipv6h;
>> -    u32 v6hash;
>> -    const __be32 *s, *d;
>> +    int noff, proto = -1, poff = -1;
> 
> Nitpick: Useless initialization of poff.
> 
Yeah, I saw it after posting and have this in the prepared v2 :-)

>> +    bool ret = false;
>> +
>> +    if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
>> +        return skb_flow_dissect(skb, fk);
>>
>> -    if (skb->protocol == htons(ETH_P_IP) &&
>> -        pskb_network_may_pull(skb, sizeof(*iph))) {
>> +    fk->ports = 0;
>> +    noff = skb_network_offset(skb);
>> +    if (skb->protocol == htons(ETH_P_IP)) {
>> +        if (!pskb_may_pull(skb, noff + sizeof(*iph)))
>> +            goto out;
>>         iph = ip_hdr(skb);
>> -        data = (struct ethhdr *)skb->data;
>> -        return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>> -            (data->h_dest[5] ^ data->h_source[5])) % count;
>> -    } else if (skb->protocol == htons(ETH_P_IPV6) &&
>> -           pskb_network_may_pull(skb, sizeof(*ipv6h))) {
>> -        ipv6h = ipv6_hdr(skb);
>> -        data = (struct ethhdr *)skb->data;
>> -        s = &ipv6h->saddr.s6_addr32[0];
>> -        d = &ipv6h->daddr.s6_addr32[0];
>> -        v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>> -        v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>> -        return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>> -    }
>> -
>> -    return bond_xmit_hash_policy_l2(skb, count);
>> +        fk->src = iph->saddr;
>> +        fk->dst = iph->daddr;
>> +        noff += iph->ihl << 2;
>> +        if (!ip_is_fragment(iph))
>> +            proto = iph->protocol;
>> +    } else if (skb->protocol == htons(ETH_P_IPV6)) {
>> +        if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
>> +            goto out;
>> +        iph6 = ipv6_hdr(skb);
>> +        fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
>> +        fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
>> +        noff += sizeof(*iph6);
>> +        proto = iph6->nexthdr;
>> +    }
> 
> else
>     return false;
> 
> ?
> 
Hehe, I actually had that in v1 few months ago, I was wondering why I did
it that way...

> Otherwise we might report false-positive for vlans, per example, without
> populating the flow keys.
> 
Right.

>> +    if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34) {
>> +        poff = proto_ports_offset(proto);
>> +        if (poff >= 0)
>> +            bond_flow_get_ports(skb, fk, poff + noff);
>> +    }
> 
> One idea would be to move the same initialization code from
> skb_flow_dissect() to an external function and re-use it here?
> 
Yep, I could do an inline skb_flow_get_ports() and re-use it.

>> +    ret = true;
>> +
>> +out:
>> +    return ret;
>> }
>>
>> -/*
>> - * Hash for the output device based upon layer 3 and layer 4 data. If
>> - * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>> - * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>> +/**
>> + * bond_xmit_hash - generate a hash value based on the xmit policy
>> + * @bond: bonding device
>> + * @skb: buffer to use for headers
>> + * @count: modulo value
>> + *
>> + * This function will extract the necessary headers from the skb buffer
>> and use
>> + * them to generate a hash based on the xmit_policy set in the bonding
>> device
>> + * which will be reduced modulo count before returning.
>>  */
>> -static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>> +int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
>> {
>> -    u32 layer4_xor = 0;
>> -    const struct iphdr *iph;
>> -    const struct ipv6hdr *ipv6h;
>> -    const __be32 *s, *d;
>> -    const __be16 *l4 = NULL;
>> -    __be16 _l4[2];
>> -    int noff = skb_network_offset(skb);
>> -    int poff;
>> -
>> -    if (skb->protocol == htons(ETH_P_IP) &&
>> -        pskb_may_pull(skb, noff + sizeof(*iph))) {
>> -        iph = ip_hdr(skb);
>> -        poff = proto_ports_offset(iph->protocol);
>> +    struct flow_keys flow;
>> +    u32 hash;
>>
>> -        if (!ip_is_fragment(iph) && poff >= 0) {
>> -            l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
>> -                        sizeof(_l4), &_l4);
>> -            if (l4)
>> -                layer4_xor = ntohs(l4[0] ^ l4[1]);
>> -        }
>> -        return (layer4_xor ^
>> -            ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>> -    } else if (skb->protocol == htons(ETH_P_IPV6) &&
>> -           pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
>> -        ipv6h = ipv6_hdr(skb);
>> -        poff = proto_ports_offset(ipv6h->nexthdr);
>> -        if (poff >= 0) {
>> -            l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
>> -                        sizeof(_l4), &_l4);
>> -            if (l4)
>> -                layer4_xor = ntohs(l4[0] ^ l4[1]);
>> -        }
>> -        s = &ipv6h->saddr.s6_addr32[0];
>> -        d = &ipv6h->daddr.s6_addr32[0];
>> -        layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>> -        layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
>> -                   (layer4_xor >> 8);
>> -        return layer4_xor % count;
>> -    }
>> +    if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>> +        !bond_flow_dissect(bond, skb, &flow))
>> +        return bond_eth_hash(skb) % count;
>> +
>> +    if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>> +        bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
>> +        hash = bond_eth_hash(skb);
>> +    else
>> +        hash = (__force u32)flow.ports;
>> +    hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>> +    hash ^= (hash >> 16);
>> +    hash ^= (hash >> 8);
>>
>> -    return bond_xmit_hash_policy_l2(skb, count);
>> +    return hash % count;
>> }
>>
>> /*-------------------------- Device entry points
>> ----------------------------*/
>> @@ -3700,8 +3704,7 @@ static int bond_xmit_activebackup(struct sk_buff
>> *skb, struct net_device *bond_d
>>     return NETDEV_TX_OK;
>> }
>>
>> -/*
>> - * In bond_xmit_xor() , we determine the output device by using a pre-
>> +/* In bond_xmit_xor() , we determine the output device by using a pre-
>>  * determined xmit_hash_policy(), If the selected device is not enabled,
>>  * find the next active slave.
>>  */
>> @@ -3709,8 +3712,7 @@ static int bond_xmit_xor(struct sk_buff *skb,
>> struct net_device *bond_dev)
>> {
>>     struct bonding *bond = netdev_priv(bond_dev);
>>
>> -    bond_xmit_slave_id(bond, skb,
>> -               bond->xmit_hash_policy(skb, bond->slave_cnt));
>> +    bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb,
>> bond->slave_cnt));
>>
>>     return NETDEV_TX_OK;
>> }
>> @@ -3746,22 +3748,6 @@ static int bond_xmit_broadcast(struct sk_buff
>> *skb, struct net_device *bond_dev)
>>
>> /*------------------------- Device initialization
>> ---------------------------*/
>>
>> -static void bond_set_xmit_hash_policy(struct bonding *bond)
>> -{
>> -    switch (bond->params.xmit_policy) {
>> -    case BOND_XMIT_POLICY_LAYER23:
>> -        bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
>> -        break;
>> -    case BOND_XMIT_POLICY_LAYER34:
>> -        bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
>> -        break;
>> -    case BOND_XMIT_POLICY_LAYER2:
>> -    default:
>> -        bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
>> -        break;
>> -    }
>> -}
>> -
>> /*
>>  * Lookup the slave that corresponds to a qid
>>  */
>> @@ -3871,38 +3857,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff
>> *skb, struct net_device *dev)
>>     return ret;
>> }
>>
>> -/*
>> - * set bond mode specific net device operations
>> - */
>> -void bond_set_mode_ops(struct bonding *bond, int mode)
>> -{
>> -    struct net_device *bond_dev = bond->dev;
>> -
>> -    switch (mode) {
>> -    case BOND_MODE_ROUNDROBIN:
>> -        break;
>> -    case BOND_MODE_ACTIVEBACKUP:
>> -        break;
>> -    case BOND_MODE_XOR:
>> -        bond_set_xmit_hash_policy(bond);
>> -        break;
>> -    case BOND_MODE_BROADCAST:
>> -        break;
>> -    case BOND_MODE_8023AD:
>> -        bond_set_xmit_hash_policy(bond);
>> -        break;
>> -    case BOND_MODE_ALB:
>> -        /* FALLTHRU */
>> -    case BOND_MODE_TLB:
>> -        break;
>> -    default:
>> -        /* Should never happen, mode already checked */
>> -        pr_err("%s: Error: Unknown bonding mode %d\n",
>> -               bond_dev->name, mode);
>> -        break;
>> -    }
>> -}
>> -
>> static int bond_ethtool_get_settings(struct net_device *bond_dev,
>>                      struct ethtool_cmd *ecmd)
>> {
>> @@ -4004,7 +3958,6 @@ static void bond_setup(struct net_device *bond_dev)
>>     ether_setup(bond_dev);
>>     bond_dev->netdev_ops = &bond_netdev_ops;
>>     bond_dev->ethtool_ops = &bond_ethtool_ops;
>> -    bond_set_mode_ops(bond, bond->params.mode);
>>
>>     bond_dev->destructor = bond_destructor;
>>
>> diff --git a/drivers/net/bonding/bond_sysfs.c
>> b/drivers/net/bonding/bond_sysfs.c
>> index c29b836..dba3b9b 100644
>> --- a/drivers/net/bonding/bond_sysfs.c
>> +++ b/drivers/net/bonding/bond_sysfs.c
>> @@ -352,7 +352,6 @@ static ssize_t bonding_store_mode(struct device *d,
>>     /* don't cache arp_validate between modes */
>>     bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
>>     bond->params.mode = new_value;
>> -    bond_set_mode_ops(bond, bond->params.mode);
>>     pr_info("%s: setting mode to %s (%d).\n",
>>         bond->dev->name, bond_mode_tbl[new_value].modename,
>>         new_value);
>> @@ -392,7 +391,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
>>         ret = -EINVAL;
>>     } else {
>>         bond->params.xmit_policy = new_value;
>> -        bond_set_mode_ops(bond, bond->params.mode);
>>         pr_info("%s: setting xmit hash policy to %s (%d).\n",
>>             bond->dev->name,
>>             xmit_hashtype_tbl[new_value].modename, new_value);
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 03cf3fd..4db9ec4 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -245,7 +245,6 @@ struct bonding {
>>     char     proc_file_name[IFNAMSIZ];
>> #endif /* CONFIG_PROC_FS */
>>     struct   list_head bond_list;
>> -    int      (*xmit_hash_policy)(struct sk_buff *, int);
>>     u16      rr_tx_counter;
>>     struct   ad_bond_info ad_info;
>>     struct   alb_bond_info alb_info;
>> @@ -446,7 +445,7 @@ int bond_release(struct net_device *bond_dev, struct
>> net_device *slave_dev);
>> void bond_mii_monitor(struct work_struct *);
>> void bond_loadbalance_arp_mon(struct work_struct *);
>> void bond_activebackup_arp_mon(struct work_struct *);
>> -void bond_set_mode_ops(struct bonding *bond, int mode);
>> +int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
>> int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
>> void bond_select_active_slave(struct bonding *bond);
>> void bond_change_active_slave(struct bonding *bond, struct slave
>> *new_active);
>> diff --git a/include/uapi/linux/if_bonding.h
>> b/include/uapi/linux/if_bonding.h
>> index a17edda..9635a62 100644
>> --- a/include/uapi/linux/if_bonding.h
>> +++ b/include/uapi/linux/if_bonding.h
>> @@ -91,6 +91,8 @@
>> #define BOND_XMIT_POLICY_LAYER2        0 /* layer 2 (MAC only), default */
>> #define BOND_XMIT_POLICY_LAYER34    1 /* layer 3+4 (IP ^ (TCP || UDP)) */
>> #define BOND_XMIT_POLICY_LAYER23    2 /* layer 2+3 (IP ^ MAC) */
>> +#define BOND_XMIT_POLICY_ENCAP23    3 /* encapsulated layer 2+3 */
>> +#define BOND_XMIT_POLICY_ENCAP34    4 /* encapsulated layer 3+4 */
>>
>> typedef struct ifbond {
>>     __s32 bond_mode;
>> -- 
>> 1.8.1.4
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-25 16:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-25 15:19 [PATCH net-next 0/2] bonding: modify the current and add new hash functions Nikolay Aleksandrov
2013-09-25 15:19 ` [PATCH net-next 1/2] bonding: modify the old and add new xmit " Nikolay Aleksandrov
2013-09-25 16:26   ` Veaceslav Falico
2013-09-25 16:36     ` Nikolay Aleksandrov
2013-09-25 15:19 ` [PATCH net-next 2/2] bonding: document the new xmit policy modes and update the changed ones Nikolay Aleksandrov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).