Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH]
From: Eric Dumazet @ 2013-09-26 15:53 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Daniel Borkmann, netdev, davem, andy, fubar, vfalico
In-Reply-To: <52445553.1030308@redhat.com>

On Thu, 2013-09-26 at 17:40 +0200, Nikolay Aleksandrov wrote:
> > 
> 1 question, I might be missing something but proto_ports_offset() gets the SPI
> with that 4 byte offset as is written in the comments, in every other case
> proto_ports_offset() is 0, so why would we want the SPI in the ->ports field ?
> And even then isn't it supposed to be 16 bits (2 bytes) and not 4, since we need
> to pass over "next header" (8 bits) and length (8 bits) ?

struct ip_auth_hdr {
        __u8  nexthdr;
        __u8  hdrlen;           /* This one is measured in 32 bit units! */
        __be16 reserved;
        __be32 spi;
        __be32 seq_no;          /* Sequence number */
        __u8  auth_data[0];     /* Variable len but >=4. Mind the 64 bit alignment! */
};

offsetof(spi, struct ...) = 4

^ permalink raw reply

* Re: [PATCH] net: flow_dissector: fix thoff for IPPROTO_AH
From: Nikolay Aleksandrov @ 2013-09-26 15:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Daniel Borkmann, netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380210246.3165.184.camel@edumazet-glaptop>

On 09/26/2013 05:44 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for
> later usage"), we missed that existing code was using nhoff as a
> temporary variable that could not always contain transport header
> offset.
> 
> This is not a problem for TCP/UDP because port offset (@poff)
> is 0 for these protocols.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Daniel Borkmann <dborkman@redhat.com>
> Cc: Nikolay Aleksandrov <nikolay@redhat.com>
> ---

1 question about my current patchset, can I leave it to get reviewed and if
applied to post a follow-up for net-next that fixes the issue, or would you
prefer a v4 that integrates the fix ?
Anyway for this one,

Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>

^ permalink raw reply

* Re: [PATCH]
From: Nikolay Aleksandrov @ 2013-09-26 15:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Daniel Borkmann, netdev, davem, andy, fubar, vfalico
In-Reply-To: <52445553.1030308@redhat.com>

On 09/26/2013 05:40 PM, Nikolay Aleksandrov wrote:
> On 09/26/2013 05:27 PM, Eric Dumazet wrote:
>> On Thu, 2013-09-26 at 16:09 +0200, Nikolay Aleksandrov wrote:
>>> Factor out the code that extracts the ports from skb_flow_dissect and
>>> add a new function skb_flow_get_ports which can be re-used.
>>>
>>> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>>> ---
>>> v2: new patch
>>> v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
>>>     modifying thoff directly in skb_flow_get_ports as it's done anyway.
>>>     Also add the necessary export symbol for skb_flow_get_ports.
>>> This seems like a good idea because there're other users that can re-use
>>> it later as well.
>>
>> Wait a minute.... existing code seems buggy.
>>
>> Daniel, any objection if I submit this fix ?
>>
>> (commit 8ed781668dd49b608f)
>>
> 1 question, I might be missing something but proto_ports_offset() gets the SPI
> with that 4 byte offset as is written in the comments, in every other case
> proto_ports_offset() is 0, so why would we want the SPI in the ->ports field ?
> And even then isn't it supposed to be 16 bits (2 bytes) and not 4, since we need
> to pass over "next header" (8 bits) and length (8 bits) ?
> 
Nevermind the second part, I forgot about the 16-bit 0 region :-)

> Thanks,
>  Nik

^ permalink raw reply

* [PATCH] net: flow_dissector: fix thoff for IPPROTO_AH
From: Eric Dumazet @ 2013-09-26 15:44 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Daniel Borkmann, netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380209227.3165.176.camel@edumazet-glaptop>

From: Eric Dumazet <edumazet@google.com>

In commit 8ed781668dd49 ("flow_keys: include thoff into flow_keys for
later usage"), we missed that existing code was using nhoff as a
temporary variable that could not always contain transport header
offset.

This is not a problem for TCP/UDP because port offset (@poff)
is 0 for these protocols.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 net/core/flow_dissector.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1929af8..8d7d0dd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -154,8 +154,8 @@ ipv6:
 	if (poff >= 0) {
 		__be32 *ports, _ports;
 
-		nhoff += poff;
-		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
+		ports = skb_header_pointer(skb, nhoff + poff,
+					   sizeof(_ports), &_ports);
 		if (ports)
 			flow->ports = *ports;
 	}

^ permalink raw reply related

* Re: [PATCH]
From: Nikolay Aleksandrov @ 2013-09-26 15:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Daniel Borkmann, netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380209227.3165.176.camel@edumazet-glaptop>

On 09/26/2013 05:27 PM, Eric Dumazet wrote:
> On Thu, 2013-09-26 at 16:09 +0200, Nikolay Aleksandrov wrote:
>> Factor out the code that extracts the ports from skb_flow_dissect and
>> add a new function skb_flow_get_ports which can be re-used.
>>
>> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>> ---
>> v2: new patch
>> v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
>>     modifying thoff directly in skb_flow_get_ports as it's done anyway.
>>     Also add the necessary export symbol for skb_flow_get_ports.
>> This seems like a good idea because there're other users that can re-use
>> it later as well.
> 
> Wait a minute.... existing code seems buggy.
> 
> Daniel, any objection if I submit this fix ?
> 
> (commit 8ed781668dd49b608f)
> 
1 question, I might be missing something but proto_ports_offset() gets the SPI
with that 4 byte offset as is written in the comments, in every other case
proto_ports_offset() is 0, so why would we want the SPI in the ->ports field ?
And even then isn't it supposed to be 16 bits (2 bytes) and not 4, since we need
to pass over "next header" (8 bits) and length (8 bits) ?

Thanks,
 Nik

^ permalink raw reply

* [PATCH]
From: Eric Dumazet @ 2013-09-26 15:27 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Daniel Borkmann; +Cc: netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380204582-27144-2-git-send-email-nikolay@redhat.com>

On Thu, 2013-09-26 at 16:09 +0200, Nikolay Aleksandrov wrote:
> Factor out the code that extracts the ports from skb_flow_dissect and
> add a new function skb_flow_get_ports which can be re-used.
> 
> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---
> v2: new patch
> v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
>     modifying thoff directly in skb_flow_get_ports as it's done anyway.
>     Also add the necessary export symbol for skb_flow_get_ports.
> This seems like a good idea because there're other users that can re-use
> it later as well.

Wait a minute.... existing code seems buggy.

Daniel, any objection if I submit this fix ?

(commit 8ed781668dd49b608f)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1929af8..8d7d0dd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -154,8 +154,8 @@ ipv6:
 	if (poff >= 0) {
 		__be32 *ports, _ports;
 
-		nhoff += poff;
-		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
+		ports = skb_header_pointer(skb, nhoff + poff,
+					   sizeof(_ports), &_ports);
 		if (ports)
 			flow->ports = *ports;
 	}

^ permalink raw reply related

* [PATCH] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Oussama Ghorbel @ 2013-09-26 14:51 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Oussama Ghorbel

The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530

This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
 -In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
 -In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.

Signed-off-by: Oussama Ghorbel <oghorbell@gmail.com>
---
 net/ipv6/ip6_tunnel.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 1e55866..a66ead2 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1423,8 +1423,14 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 static int
 ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 {
-	if (new_mtu < IPV6_MIN_MTU) {
-		return -EINVAL;
+	struct ip6_tnl *t = netdev_priv(dev);
+
+	if (t->parms.proto == IPPROTO_IPIP) {
+		if (new_mtu < 68)
+			return -EINVAL;
+	} else {
+		if (new_mtu < IPV6_MIN_MTU)
+			return -EINVAL;
 	}
 	dev->mtu = new_mtu;
 	return 0;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net-next v3 1/3] flow_dissector: factor out the ports extraction in skb_flow_get_ports
From: Veaceslav Falico @ 2013-09-26 14:36 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet
In-Reply-To: <1380204582-27144-2-git-send-email-nikolay@redhat.com>

On Thu, Sep 26, 2013 at 04:09:40PM +0200, Nikolay Aleksandrov wrote:
>Factor out the code that extracts the ports from skb_flow_dissect and
>add a new function skb_flow_get_ports which can be re-used.
>
>Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Reviewed-by: Veaceslav Falico <vfalico@redhat.com>

>---
>v2: new patch
>v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
>    modifying thoff directly in skb_flow_get_ports as it's done anyway.
>    Also add the necessary export symbol for skb_flow_get_ports.
>This seems like a good idea because there're other users that can re-use
>it later as well.
>
> include/net/flow_keys.h   |  1 +
> net/core/flow_dissector.c | 41 ++++++++++++++++++++++++++++++-----------
> 2 files changed, 31 insertions(+), 11 deletions(-)
>
>diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
>index ac2439d..4db84ae 100644
>--- a/include/net/flow_keys.h
>+++ b/include/net/flow_keys.h
>@@ -14,4 +14,5 @@ struct flow_keys {
> };
>
> bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
>+__be32 skb_flow_get_ports(const struct sk_buff *skb, int *thoff, u8 ip_proto);
> #endif
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 1929af8..7785398 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -25,9 +25,37 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
> 	memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
> }
>
>+/**
>+ * skb_flow_get_ports - extract the upper layer ports and return them
>+ * @skb: buffer to extract the ports from
>+ * @thoff: pointer to transport header offset
>+ * @ip_proto: protocol for which to get port offset
>+ *
>+ * The function will try to retrieve the ports at offset thoff + poff where poff
>+ * is the protocol port offset returned from proto_ports_offset, and if poff is
>+ * more than or equal to 0 it'll add it to the value at the thoff address
>+ */
>+__be32 skb_flow_get_ports(const struct sk_buff *skb, int *thoff, u8 ip_proto)
>+{
>+	int poff = proto_ports_offset(ip_proto);
>+
>+	if (poff >= 0) {
>+		__be32 *ports, _ports;
>+
>+		*thoff += poff;
>+		ports = skb_header_pointer(skb, *thoff, sizeof(_ports),
>+					   &_ports);
>+		if (ports)
>+			return *ports;
>+	}
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(skb_flow_get_ports);
>+
> bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
> {
>-	int poff, nhoff = skb_network_offset(skb);
>+	int nhoff = skb_network_offset(skb);
> 	u8 ip_proto;
> 	__be16 proto = skb->protocol;
>
>@@ -150,16 +178,7 @@ ipv6:
> 	}
>
> 	flow->ip_proto = ip_proto;
>-	poff = proto_ports_offset(ip_proto);
>-	if (poff >= 0) {
>-		__be32 *ports, _ports;
>-
>-		nhoff += poff;
>-		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
>-		if (ports)
>-			flow->ports = *ports;
>-	}
>-
>+	flow->ports = skb_flow_get_ports(skb, &nhoff, ip_proto);
> 	flow->thoff = (u16) nhoff;
>
> 	return true;
>-- 
>1.8.1.4
>

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] bonding: modify the old and add new xmit hash policies
From: Veaceslav Falico @ 2013-09-26 14:34 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet
In-Reply-To: <1380204582-27144-3-git-send-email-nikolay@redhat.com>

On Thu, Sep 26, 2013 at 04:09:41PM +0200, Nikolay Aleksandrov wrote:
>This patch adds two new hash policy modes which use skb_flow_dissect:
>3 - Encapsulated layer 2+3
>4 - Encapsulated layer 3+4
>There should be a good improvement for tunnel users in those modes.
>It also changes the old hash functions to:
>hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>hash ^= (hash >> 16);
>hash ^= (hash >> 8);
>
>Where hash will be initialized either to L2 hash, that is
>SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
>from the upper layer. Flow's dst and src are also extracted based on the
>xmit policy either directly from the buffer or by using skb_flow_dissect,
>but in both cases if the protocol is IPv6 then dst and src are obtained by
>ipv6_addr_hash() on the real addresses. In case of a non-dissectable
>packet, the algorithms fall back to L2 hashing.
>The bond_set_mode_ops() function is now obsolete and thus deleted
>because it was used only to set the proper hash policy. Also we trim a
>pointer from struct bonding because we no longer need to keep the hash
>function, now there's only a single hash function - bond_xmit_hash that
>works based on bond->params.xmit_policy.
>
>The hash function and skb_flow_dissect were suggested by Eric Dumazet.
>The layer names were suggested by Andy Gospodarek, because I suck at
>semantics.
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
>---
>v2: fix a bug in bond_flow_dissect which might've caused the use of
>    uninitalized flow_keys and make use of skb_flow_get_ports
>v3: no change
>One line is intentionally left at 82 chars since it's the whole function
>and IMO looks better that way.
>
> drivers/net/bonding/bond_3ad.c   |   2 +-
> drivers/net/bonding/bond_main.c  | 197 ++++++++++++++-------------------------
> drivers/net/bonding/bond_sysfs.c |   2 -
> drivers/net/bonding/bonding.h    |   3 +-
> include/uapi/linux/if_bonding.h  |   2 +
> 5 files changed, 72 insertions(+), 134 deletions(-)

-62 lines, even better :).

Acked-by: Veaceslav Falico <vfalico@redhat.com>

>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 0d8f427..b3ab703 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2442,7 +2442,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
>
> 	bond_for_each_slave(bond, slave) {
> 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 55bbb8b..498e9e1 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -78,6 +78,7 @@
> #include <net/netns/generic.h>
> #include <net/pkt_sched.h>
> #include <linux/rculist.h>
>+#include <net/flow_keys.h>
> #include "bonding.h"
> #include "bond_3ad.h"
> #include "bond_alb.h"
>@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
> module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
> 				   "0 for layer 2 (default), 1 for layer 3+4, "
>-				   "2 for layer 2+3");
>+				   "2 for layer 2+3, 3 for encap layer 2+3, "
>+				   "4 for encap layer 3+4");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
> {	"layer2",		BOND_XMIT_POLICY_LAYER2},
> {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
> {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
>+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
>+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
> {	NULL,			-1},
> };
>
>@@ -3026,99 +3030,85 @@ static struct notifier_block bond_netdev_notifier = {
>
> /*---------------------------- Hashing Policies -----------------------------*/
>
>-/*
>- * Hash for the output device based upon layer 2 data
>- */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+/* L2 hash helper */
>+static inline u32 bond_eth_hash(struct sk_buff *skb)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>
> 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>-		return (data->h_dest[5] ^ data->h_source[5]) % count;
>+		return data->h_dest[5] ^ data->h_source[5];
>
> 	return 0;
> }
>
>-/*
>- * Hash for the output device based upon layer 2 and layer 3 data. If
>- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>- */
>-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>+/* Extract the appropriate headers based on bond's xmit policy */
>+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>+			      struct flow_keys *fk)
> {
>-	const struct ethhdr *data;
>+	const struct ipv6hdr *iph6;
> 	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	u32 v6hash;
>-	const __be32 *s, *d;
>+	int noff, proto = -1;
>
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_network_may_pull(skb, sizeof(*iph))) {
>+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
>+		return skb_flow_dissect(skb, fk);
>+
>+	fk->ports = 0;
>+	noff = skb_network_offset(skb);
>+	if (skb->protocol == htons(ETH_P_IP)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
>+			return false;
> 		iph = ip_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>-			(data->h_dest[5] ^ data->h_source[5])) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>-	}
>-
>-	return bond_xmit_hash_policy_l2(skb, count);
>+		fk->src = iph->saddr;
>+		fk->dst = iph->daddr;
>+		noff += iph->ihl << 2;
>+		if (!ip_is_fragment(iph))
>+			proto = iph->protocol;
>+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
>+			return false;
>+		iph6 = ipv6_hdr(skb);
>+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
>+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
>+		noff += sizeof(*iph6);
>+		proto = iph6->nexthdr;
>+	} else {
>+		return false;
>+	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
>+		fk->ports = skb_flow_get_ports(skb, &noff, proto);
>+
>+	return true;
> }
>
>-/*
>- * Hash for the output device based upon layer 3 and layer 4 data. If
>- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>+/**
>+ * bond_xmit_hash - generate a hash value based on the xmit policy
>+ * @bond: bonding device
>+ * @skb: buffer to use for headers
>+ * @count: modulo value
>+ *
>+ * This function will extract the necessary headers from the skb buffer and use
>+ * them to generate a hash based on the xmit_policy set in the bonding device
>+ * which will be reduced modulo count before returning.
>  */
>-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
> {
>-	u32 layer4_xor = 0;
>-	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	const __be32 *s, *d;
>-	const __be16 *l4 = NULL;
>-	__be16 _l4[2];
>-	int noff = skb_network_offset(skb);
>-	int poff;
>-
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
>-		iph = ip_hdr(skb);
>-		poff = proto_ports_offset(iph->protocol);
>+	struct flow_keys flow;
>+	u32 hash;
>
>-		if (!ip_is_fragment(iph) && poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		return (layer4_xor ^
>-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		poff = proto_ports_offset(ipv6h->nexthdr);
>-		if (poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
>-			       (layer4_xor >> 8);
>-		return layer4_xor % count;
>-	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>+	    !bond_flow_dissect(bond, skb, &flow))
>+		return bond_eth_hash(skb) % count;
>+
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
>+		hash = bond_eth_hash(skb);
>+	else
>+		hash = (__force u32)flow.ports;
>+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>+	hash ^= (hash >> 16);
>+	hash ^= (hash >> 8);
>
>-	return bond_xmit_hash_policy_l2(skb, count);
>+	return hash % count;
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>@@ -3700,8 +3690,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> 	return NETDEV_TX_OK;
> }
>
>-/*
>- * In bond_xmit_xor() , we determine the output device by using a pre-
>+/* In bond_xmit_xor() , we determine the output device by using a pre-
>  * determined xmit_hash_policy(), If the selected device is not enabled,
>  * find the next active slave.
>  */
>@@ -3709,8 +3698,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>
>-	bond_xmit_slave_id(bond, skb,
>-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
>+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
>
> 	return NETDEV_TX_OK;
> }
>@@ -3746,22 +3734,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
>
> /*------------------------- Device initialization ---------------------------*/
>
>-static void bond_set_xmit_hash_policy(struct bonding *bond)
>-{
>-	switch (bond->params.xmit_policy) {
>-	case BOND_XMIT_POLICY_LAYER23:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER34:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER2:
>-	default:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
>-		break;
>-	}
>-}
>-
> /*
>  * Lookup the slave that corresponds to a qid
>  */
>@@ -3871,38 +3843,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 	return ret;
> }
>
>-/*
>- * set bond mode specific net device operations
>- */
>-void bond_set_mode_ops(struct bonding *bond, int mode)
>-{
>-	struct net_device *bond_dev = bond->dev;
>-
>-	switch (mode) {
>-	case BOND_MODE_ROUNDROBIN:
>-		break;
>-	case BOND_MODE_ACTIVEBACKUP:
>-		break;
>-	case BOND_MODE_XOR:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_BROADCAST:
>-		break;
>-	case BOND_MODE_8023AD:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_ALB:
>-		/* FALLTHRU */
>-	case BOND_MODE_TLB:
>-		break;
>-	default:
>-		/* Should never happen, mode already checked */
>-		pr_err("%s: Error: Unknown bonding mode %d\n",
>-		       bond_dev->name, mode);
>-		break;
>-	}
>-}
>-
> static int bond_ethtool_get_settings(struct net_device *bond_dev,
> 				     struct ethtool_cmd *ecmd)
> {
>@@ -4004,7 +3944,6 @@ static void bond_setup(struct net_device *bond_dev)
> 	ether_setup(bond_dev);
> 	bond_dev->netdev_ops = &bond_netdev_ops;
> 	bond_dev->ethtool_ops = &bond_ethtool_ops;
>-	bond_set_mode_ops(bond, bond->params.mode);
>
> 	bond_dev->destructor = bond_destructor;
>
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index c29b836..dba3b9b 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -352,7 +352,6 @@ static ssize_t bonding_store_mode(struct device *d,
> 	/* don't cache arp_validate between modes */
> 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
> 	bond->params.mode = new_value;
>-	bond_set_mode_ops(bond, bond->params.mode);
> 	pr_info("%s: setting mode to %s (%d).\n",
> 		bond->dev->name, bond_mode_tbl[new_value].modename,
> 		new_value);
>@@ -392,7 +391,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
> 		ret = -EINVAL;
> 	} else {
> 		bond->params.xmit_policy = new_value;
>-		bond_set_mode_ops(bond, bond->params.mode);
> 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
> 			bond->dev->name,
> 			xmit_hashtype_tbl[new_value].modename, new_value);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 03cf3fd..4db9ec4 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -245,7 +245,6 @@ struct bonding {
> 	char     proc_file_name[IFNAMSIZ];
> #endif /* CONFIG_PROC_FS */
> 	struct   list_head bond_list;
>-	int      (*xmit_hash_policy)(struct sk_buff *, int);
> 	u16      rr_tx_counter;
> 	struct   ad_bond_info ad_info;
> 	struct   alb_bond_info alb_info;
>@@ -446,7 +445,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
> void bond_mii_monitor(struct work_struct *);
> void bond_loadbalance_arp_mon(struct work_struct *);
> void bond_activebackup_arp_mon(struct work_struct *);
>-void bond_set_mode_ops(struct bonding *bond, int mode);
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
> int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
> void bond_select_active_slave(struct bonding *bond);
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
>diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
>index a17edda..9635a62 100644
>--- a/include/uapi/linux/if_bonding.h
>+++ b/include/uapi/linux/if_bonding.h
>@@ -91,6 +91,8 @@
> #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
>+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
>+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
>
> typedef struct ifbond {
> 	__s32 bond_mode;
>-- 
>1.8.1.4
>

^ permalink raw reply

* RE: [PATCH] net: qmi_wwan: fix Cinterion PLXX product ID
From: Schmiedl Christian @ 2013-09-26 14:02 UTC (permalink / raw)
  To: Aleksander Morgado, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org
  Cc: davem@davemloft.net, bjorn@mork.no, dcbw@redhat.com,
	Schemmel Hans-Christoph, Colberg Nicolaus
In-Reply-To: <1380121356-28601-1-git-send-email-aleksander@lanedo.com>

Aleksander Morgado <aleksander@lanedo.com> writes:

> Cinterion PLXX LTE devices have a 0x0060 product ID, not 0x12d1.
>
> The blacklisting in the serial/option driver does actually use the
> correct PID, as per commit 8ff10bdb14a52e3f25d4ce09e0582a8684c1a6db
> ('USB: Blacklisted Cinterion's PLxx WWAN Interface').

Thanks for patching

Acked-by: Christian Schmiedl <christian.schmiedl@gemalto.com>

Chris



This message and any attachments are intended solely for the addressees and may contain confidential information. Any unauthorized use or disclosure, either whole or partial, is prohibited.
E-mails are susceptible to alteration. Our company shall not be liable for the message if altered, changed or falsified. If you are not the intended recipient of this message, please delete it and notify the sender.
Although all reasonable efforts have been made to keep this transmission free from viruses, the sender will not be liable for damages caused by a transmitted virus

^ permalink raw reply

* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-09-26 14:22 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <1380191899.3716.31.camel@ubuntu-vm-makita>

On 09/26/2013 06:38 AM, Toshiaki Makita wrote:
> On Tue, 2013-09-24 at 13:55 -0400, Vlad Yasevich wrote:
>> On 09/24/2013 01:30 PM, Toshiaki Makita wrote:
>>> On Tue, 2013-09-24 at 09:35 -0400, Vlad Yasevich wrote:
>>>> On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
>>>>> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich wrote:
>>>>>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>>>>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich wrote:
>>>>>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>>>>>> On Thu, 2013-09-12 at 16:00 -0400, David Miller wrote:
>>>>>>>>>> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>>>>>>>>>> Date: Tue, 10 Sep 2013 19:27:54 +0900
>>>>>>>>>>
>>>>>>>>>>> There seem to be some undesirable behaviors related with PVID.
>>>>>>>>>>> 1. It has no effect assigning PVID to a port. PVID cannot be applied
>>>>>>>>>>> to any frame regardless of whether we set it or not.
>>>>>>>>>>> 2. FDB entries learned via frames applied PVID are registered with
>>>>>>>>>>> VID 0 rather than VID value of PVID.
>>>>>>>>>>> 3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
>>>>>>>>>>> This leads interoperational problems such as sending frames with VID
>>>>>>>>>>> 4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
>>>>>>>>>>> 0 as they belong to VLAN 0, which is expected to be handled as they have
>>>>>>>>>>> no VID according to IEEE 802.1Q.
>>>>>>>>>>>
>>>>>>>>>>> Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
>>>>>>>>>>> is fixed, because we cannot activate PVID due to it.
>>>>>>>>>>
>>>>>>>>>> Please work out the issues in patch #2 with Vlad and resubmit this
>>>>>>>>>> series.
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>> I'm hovering between whether we should fix the issue by changing vlan 0
>>>>>>>>> interface behavior in 8021q module or enabling a bridge port to sending
>>>>>>>>> priority-tagged frames, or another better way.
>>>>>>>>>
>>>>>>>>> If you could comment it, I'd appreciate it :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> BTW, I think what is discussed in patch #2 is another problem about
>>>>>>>>> handling priority-tags, and it exists without this patch set applied.
>>>>>>>>> It looks like that we should prepare another patch set than this to fix
>>>>>>>>> that problem.
>>>>>>>>>
>>>>>>>>> Should I include patches that fix the priority-tags problem in this
>>>>>>>>> patch set and resubmit them all together?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I am thinking that we might need to do it in bridge and it looks like
>>>>>>>> the simplest way to do it is to have default priority regeneration table
>>>>>>>> (table 6-5 from 802.1Q doc).
>>>>>>>>
>>>>>>>> That way I think we would conform to the spec.
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>
>>>>>>> Unfortunately I don't think the default priority regeneration table
>>>>>>> resolves the problem because IEEE 802.1Q says that a VLAN-aware bridge
>>>>>>> can transmit untagged or VLAN-tagged frames only (the end of section 7.5
>>>>>>> and 8.1.7).
>>>>>>>
>>>>>>> No mechanism to send priority-tagged frames is found as far as I can see
>>>>>>> the standard. I think the regenerated priority is used for outgoing PCP
>>>>>>> field only if egress policy is not untagged (i.e. transmitting as
>>>>>>> VLAN-tagged), and unused if untagged (Section 6.9.2 3rd/4th Paragraph).
>>>>>>>
>>>>>>> If we want to transmit priority-tagged frames from a bridge port, I
>>>>>>> think we need to implement a new (optional) feature that is above the
>>>>>>> standard, as I stated previously.
>>>>>>>
>>>>>>> How do you feel about adding a per-port policy that enables a bridge to
>>>>>>> send priority-tagged frames instead of untagged frames when egress
>>>>>>> policy for the port is untagged?
>>>>>>> With this change, we can transmit frames for a given vlan as either all
>>>>>>> untagged, all priority-tagged or all VLAN-tagged.
>>>>>>
>>>>>> That would work.  What I am thinking is that we do it by special casing
>>>>>> the vid 0 egress policy specification.  Let it be untagged by default
>>>>>> and if it is tagged, then we preserve the priority field and forward
>>>>>> it on.
>>>>>>
>>>>>> This keeps the API stable and doesn't require user/admin from knowing
>>>>>> exactly what happens.  Default operation conforms to the spec and allows
>>>>>> simple change to make it backward-compatible.
>>>>>>
>>>>>> What do you think.  I've done a simple prototype of this an it seems to
>>>>>> work with the VMs I am testing with.
>>>>>
>>>>> Are you saying that
>>>>> - by default, set the 0th bit of untagged_bitmap; and
>>>>> - if we unset the 0th bit and set the "vid"th bit, we transmit frames
>>>>> classified as belonging to VLAN "vid" as priority-tagged?
>>>>>
>>>>> If so, though it's attractive to keep current API, I'm worried about if
>>>>> it could be a bit confusing and not intuitive for kernel/iproute2
>>>>> developers that VID 0 has a special meaning only in the egress policy.
>>>>> Wouldn't it be better to adding a new member to struct net_port_vlans
>>>>> instead of using VID 0 of untagged_bitmap?
>>>>>
>>>>> Or are you saying that we use a new flag in struct net_port_vlans but
>>>>> use the BRIDGE_VLAN_INFO_UNTAGGED bit with VID 0 in netlink to set the
>>>>> flag?
>>>>>
>>>>> Even in that case, I'm afraid that it might be confusing for developers
>>>>> for the same reason. We are going to prohibit to specify VID with 0 (and
>>>>> 4095) in adding/deleting a FDB entry or a vlan filtering entry, but it
>>>>> would allow us to use VID 0 only when a vlan filtering entry is
>>>>> configured.
>>>>> I am thinking a new nlattr is a straightforward approach to configure
>>>>> it.
>>>>
>>>> By making this an explicit attribute it makes vid 0 a special case for
>>>> any automatic tool that would provision such filtering.  Seeing vid 0
>>>> would mean that these tools would have to know that this would have to
>>>> be translated to a different attribute instead of setting the policy
>>>> values.
>>>
>>> Yes, I agree with you that we can do it by the way you explained.
>>> What I don't understand is the advantage of using vid 0 over another way
>>> such as adding a new nlattr.
>>> I think we can indicate transmitting priority-tags explicitly by such a
>>> nlattr. Using vid 0 seems to be easier to implement than a new nlattr,
>>> but, for me, it looks less intuitive and more difficult to maintain
>>> because we have to care about vid 0 instead of simply ignoring it.
>>>
>>
>> The point I am trying to make is that regardless of the approach someone
>> has to know what to do when enabling priority tagged frames.  You
>> proposal would require the administrator or config tool to have that
>> knowledge.  Example is:
>> 	Admin does: bridge vlan set priority on dev eth0
>>           Automated app:
>> 		if (vid == 0)
>> 			/* Turn on priority tagged frame support */
>>
>> My proposal would require the bridge filtering implementation to have it.
>> 	user tool: bridge vlan add vid 0 tagged
>> 	Automated app:  No special case.
>>
>> IMO its better to have 1 piece code handling the special case then
>> putting it multiple places.
>
> Thank you for the detailed explanation.
> Now I understand your intention.
>
> I have one question about your proposal.
> I guess the way to enable priority-tagged is something like
> 	bridge vlan add vid 10 dev eth0
> 	bridge vlan add vid 10 dev vnet0 pvid untagged
> 	bridge vlan add vid 0 dev vnet0 tagged
> where vnet0 has sub interface vnet0.0.
>
> Here the admin have to know the egress policy is applied to a frame
> twice in a certain order when it is transmitted from the port vnet0
> attached, that is, first, a frame with vid 10 get untagged, and then, an
> untagged frame get priority-tagged.
>
> This behavior looks difficult to know without previous knowledge.
> Any good idea to avoid such a need for the admin's additional knowledge?

To me, the fact that there is vnet0.0 (or typically, there is eth0.0 in 
the guest or on the remote host) already tells the admin vlan 0 has to 
be tagged.  The fact that we codify this in the policy makes it explicit.

However, I can see strong argument to be made for an addition egress 
policy attribute that could be for instance:

	bridge vlan add vid 10 dev eth0 pvid
	bridge vlan add vid 10 dev vnet0 pvid untagged prio_tag

But this has the same connotations as wrt to egress policy.  The 2 
policies are applied:
  (1) untag the frame.
  (2) add priority_tag.

(2) only happens if initial fame received on eth0 was priority tagged.

I think I am ok with either approach.  Explicit vid 0 policy is easier
for automatic provisioning.   The flag based one is easier for admin/
manual provisioning.

-vlad.

-vlad




>
>>
>> Thanks
>> -vlad
>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>> How it is implemented internally in the kernel isn't as big of an issue.
>>>> We can do it as a separate flag or as part of existing policy.
>>>>
>>>> -vlad
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Toshiaki Makita
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Toshiaki Makita
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>
>

^ permalink raw reply

* [PATCH net-next v3 3/3] bonding: document the new xmit policy modes and update the changed ones
From: Nikolay Aleksandrov @ 2013-09-26 14:09 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380204582-27144-1-git-send-email-nikolay@redhat.com>

Add new documentation for encap2+3 and encap3+4, also update the formula
for the old modes due to the changes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2, v3: no change

 Documentation/networking/bonding.txt | 66 ++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9b28e71..3856ed2 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -743,21 +743,16 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The IPv4 formula is
+		generate the hash.  The formula is
 
-		(((source IP XOR dest IP) AND 0xffff) XOR
-			( source MAC XOR destination MAC ))
-				modulo slave count
+		hash = source MAC XOR destination MAC
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		The IPv6 formula is
-
-		hash = (source ip quad 2 XOR dest IP quad 2) XOR
-		       (source ip quad 3 XOR dest IP quad 3) XOR
-		       (source ip quad 4 XOR dest IP quad 4)
-
-		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			XOR (source MAC XOR destination MAC))
-				modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
@@ -779,21 +774,16 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented IPv4 TCP and UDP packets is
-
-		((source port XOR dest port) XOR
-			 ((source IP XOR dest IP) AND 0xffff)
-				modulo slave count
+		The formula for unfragmented TCP and UDP packets is
 
-		The formula for unfragmented IPv6 TCP and UDP packets is
+		hash = source port, destination port (as in the header)
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		hash = (source port XOR dest port) XOR
-		       ((source ip quad 2 XOR dest IP quad 2) XOR
-			(source ip quad 3 XOR dest IP quad 3) XOR
-			(source ip quad 4 XOR dest IP quad 4))
-
-		((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		For fragmented TCP or UDP packets and all other IPv4 and
 		IPv6 protocol traffic, the source and destination port
@@ -801,10 +791,6 @@ xmit_hash_policy
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		The IPv4 policy is intended to mimic the behavior of
-		certain switches, notably Cisco switches with PFC2 as
-		well as some Foundry and IBM products.
-
 		This algorithm is not fully 802.3ad compliant.  A
 		single TCP or UDP conversation containing both
 		fragmented and unfragmented packets will see packets
@@ -815,6 +801,26 @@ xmit_hash_policy
 		conversations.  Other implementations of 802.3ad may
 		or may not tolerate this noncompliance.
 
+	encap2+3
+
+		This policy uses the same formula as layer2+3 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
+	encap3+4
+
+		This policy uses the same formula as layer3+4 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next v3 1/3] flow_dissector: factor out the ports extraction in skb_flow_get_ports
From: Nikolay Aleksandrov @ 2013-09-26 14:09 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380204582-27144-1-git-send-email-nikolay@redhat.com>

Factor out the code that extracts the ports from skb_flow_dissect and
add a new function skb_flow_get_ports which can be re-used.

Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2: new patch
v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
    modifying thoff directly in skb_flow_get_ports as it's done anyway.
    Also add the necessary export symbol for skb_flow_get_ports.
This seems like a good idea because there're other users that can re-use
it later as well.

 include/net/flow_keys.h   |  1 +
 net/core/flow_dissector.c | 41 ++++++++++++++++++++++++++++++-----------
 2 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index ac2439d..4db84ae 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -14,4 +14,5 @@ struct flow_keys {
 };
 
 bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
+__be32 skb_flow_get_ports(const struct sk_buff *skb, int *thoff, u8 ip_proto);
 #endif
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1929af8..7785398 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -25,9 +25,37 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
 	memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
 }
 
+/**
+ * skb_flow_get_ports - extract the upper layer ports and return them
+ * @skb: buffer to extract the ports from
+ * @thoff: pointer to transport header offset
+ * @ip_proto: protocol for which to get port offset
+ *
+ * The function will try to retrieve the ports at offset thoff + poff where poff
+ * is the protocol port offset returned from proto_ports_offset, and if poff is
+ * more than or equal to 0 it'll add it to the value at the thoff address
+ */
+__be32 skb_flow_get_ports(const struct sk_buff *skb, int *thoff, u8 ip_proto)
+{
+	int poff = proto_ports_offset(ip_proto);
+
+	if (poff >= 0) {
+		__be32 *ports, _ports;
+
+		*thoff += poff;
+		ports = skb_header_pointer(skb, *thoff, sizeof(_ports),
+					   &_ports);
+		if (ports)
+			return *ports;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(skb_flow_get_ports);
+
 bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
 {
-	int poff, nhoff = skb_network_offset(skb);
+	int nhoff = skb_network_offset(skb);
 	u8 ip_proto;
 	__be16 proto = skb->protocol;
 
@@ -150,16 +178,7 @@ ipv6:
 	}
 
 	flow->ip_proto = ip_proto;
-	poff = proto_ports_offset(ip_proto);
-	if (poff >= 0) {
-		__be32 *ports, _ports;
-
-		nhoff += poff;
-		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
-		if (ports)
-			flow->ports = *ports;
-	}
-
+	flow->ports = skb_flow_get_ports(skb, &nhoff, ip_proto);
 	flow->thoff = (u16) nhoff;
 
 	return true;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next v3 2/3] bonding: modify the old and add new xmit hash policies
From: Nikolay Aleksandrov @ 2013-09-26 14:09 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380204582-27144-1-git-send-email-nikolay@redhat.com>

This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.

The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2: fix a bug in bond_flow_dissect which might've caused the use of
    uninitalized flow_keys and make use of skb_flow_get_ports
v3: no change
One line is intentionally left at 82 chars since it's the whole function
and IMO looks better that way.

 drivers/net/bonding/bond_3ad.c   |   2 +-
 drivers/net/bonding/bond_main.c  | 197 ++++++++++++++-------------------------
 drivers/net/bonding/bond_sysfs.c |   2 -
 drivers/net/bonding/bonding.h    |   3 +-
 include/uapi/linux/if_bonding.h  |   2 +
 5 files changed, 72 insertions(+), 134 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 0d8f427..b3ab703 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2442,7 +2442,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
 
 	bond_for_each_slave(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 55bbb8b..498e9e1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -78,6 +78,7 @@
 #include <net/netns/generic.h>
 #include <net/pkt_sched.h>
 #include <linux/rculist.h>
+#include <net/flow_keys.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 #include "bond_alb.h"
@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
 module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
-				   "2 for layer 2+3");
+				   "2 for layer 2+3, 3 for encap layer 2+3, "
+				   "4 for encap layer 3+4");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
 {	"layer2",		BOND_XMIT_POLICY_LAYER2},
 {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
 {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
 {	NULL,			-1},
 };
 
@@ -3026,99 +3030,85 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+/* L2 hash helper */
+static inline u32 bond_eth_hash(struct sk_buff *skb)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
-		return (data->h_dest[5] ^ data->h_source[5]) % count;
+		return data->h_dest[5] ^ data->h_source[5];
 
 	return 0;
 }
 
-/*
- * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
- */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+/* Extract the appropriate headers based on bond's xmit policy */
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
+			      struct flow_keys *fk)
 {
-	const struct ethhdr *data;
+	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	u32 v6hash;
-	const __be32 *s, *d;
+	int noff, proto = -1;
 
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_network_may_pull(skb, sizeof(*iph))) {
+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
+		return skb_flow_dissect(skb, fk);
+
+	fk->ports = 0;
+	noff = skb_network_offset(skb);
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
+			return false;
 		iph = ip_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
-			(data->h_dest[5] ^ data->h_source[5])) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
-	}
-
-	return bond_xmit_hash_policy_l2(skb, count);
+		fk->src = iph->saddr;
+		fk->dst = iph->daddr;
+		noff += iph->ihl << 2;
+		if (!ip_is_fragment(iph))
+			proto = iph->protocol;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
+			return false;
+		iph6 = ipv6_hdr(skb);
+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
+		noff += sizeof(*iph6);
+		proto = iph6->nexthdr;
+	} else {
+		return false;
+	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
+		fk->ports = skb_flow_get_ports(skb, &noff, proto);
+
+	return true;
 }
 
-/*
- * Hash for the output device based upon layer 3 and layer 4 data. If
- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ * @count: modulo value
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ * which will be reduced modulo count before returning.
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
 {
-	u32 layer4_xor = 0;
-	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	const __be32 *s, *d;
-	const __be16 *l4 = NULL;
-	__be16 _l4[2];
-	int noff = skb_network_offset(skb);
-	int poff;
-
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
-		iph = ip_hdr(skb);
-		poff = proto_ports_offset(iph->protocol);
+	struct flow_keys flow;
+	u32 hash;
 
-		if (!ip_is_fragment(iph) && poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		poff = proto_ports_offset(ipv6h->nexthdr);
-		if (poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
-			       (layer4_xor >> 8);
-		return layer4_xor % count;
-	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
+	    !bond_flow_dissect(bond, skb, &flow))
+		return bond_eth_hash(skb) % count;
+
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
+		hash = bond_eth_hash(skb);
+	else
+		hash = (__force u32)flow.ports;
+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
+	hash ^= (hash >> 16);
+	hash ^= (hash >> 8);
 
-	return bond_xmit_hash_policy_l2(skb, count);
+	return hash % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
@@ -3700,8 +3690,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
 	return NETDEV_TX_OK;
 }
 
-/*
- * In bond_xmit_xor() , we determine the output device by using a pre-
+/* In bond_xmit_xor() , we determine the output device by using a pre-
  * determined xmit_hash_policy(), If the selected device is not enabled,
  * find the next active slave.
  */
@@ -3709,8 +3698,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 
-	bond_xmit_slave_id(bond, skb,
-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
 
 	return NETDEV_TX_OK;
 }
@@ -3746,22 +3734,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
 
 /*------------------------- Device initialization ---------------------------*/
 
-static void bond_set_xmit_hash_policy(struct bonding *bond)
-{
-	switch (bond->params.xmit_policy) {
-	case BOND_XMIT_POLICY_LAYER23:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
-		break;
-	case BOND_XMIT_POLICY_LAYER34:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
-		break;
-	case BOND_XMIT_POLICY_LAYER2:
-	default:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
-		break;
-	}
-}
-
 /*
  * Lookup the slave that corresponds to a qid
  */
@@ -3871,38 +3843,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
-/*
- * set bond mode specific net device operations
- */
-void bond_set_mode_ops(struct bonding *bond, int mode)
-{
-	struct net_device *bond_dev = bond->dev;
-
-	switch (mode) {
-	case BOND_MODE_ROUNDROBIN:
-		break;
-	case BOND_MODE_ACTIVEBACKUP:
-		break;
-	case BOND_MODE_XOR:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_BROADCAST:
-		break;
-	case BOND_MODE_8023AD:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_ALB:
-		/* FALLTHRU */
-	case BOND_MODE_TLB:
-		break;
-	default:
-		/* Should never happen, mode already checked */
-		pr_err("%s: Error: Unknown bonding mode %d\n",
-		       bond_dev->name, mode);
-		break;
-	}
-}
-
 static int bond_ethtool_get_settings(struct net_device *bond_dev,
 				     struct ethtool_cmd *ecmd)
 {
@@ -4004,7 +3944,6 @@ static void bond_setup(struct net_device *bond_dev)
 	ether_setup(bond_dev);
 	bond_dev->netdev_ops = &bond_netdev_ops;
 	bond_dev->ethtool_ops = &bond_ethtool_ops;
-	bond_set_mode_ops(bond, bond->params.mode);
 
 	bond_dev->destructor = bond_destructor;
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index c29b836..dba3b9b 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -352,7 +352,6 @@ static ssize_t bonding_store_mode(struct device *d,
 	/* don't cache arp_validate between modes */
 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
 	bond->params.mode = new_value;
-	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",
 		bond->dev->name, bond_mode_tbl[new_value].modename,
 		new_value);
@@ -392,7 +391,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 		ret = -EINVAL;
 	} else {
 		bond->params.xmit_policy = new_value;
-		bond_set_mode_ops(bond, bond->params.mode);
 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
 			bond->dev->name,
 			xmit_hashtype_tbl[new_value].modename, new_value);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 03cf3fd..4db9ec4 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -245,7 +245,6 @@ struct bonding {
 	char     proc_file_name[IFNAMSIZ];
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, int);
 	u16      rr_tx_counter;
 	struct   ad_bond_info ad_info;
 	struct   alb_bond_info alb_info;
@@ -446,7 +445,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
 void bond_mii_monitor(struct work_struct *);
 void bond_loadbalance_arp_mon(struct work_struct *);
 void bond_activebackup_arp_mon(struct work_struct *);
-void bond_set_mode_ops(struct bonding *bond, int mode);
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
 int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
 void bond_select_active_slave(struct bonding *bond);
 void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index a17edda..9635a62 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -91,6 +91,8 @@
 #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
 #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
 #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 
 typedef struct ifbond {
 	__s32 bond_mode;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next v3 0/3] bonding: modify the current and add new hash functions
From: Nikolay Aleksandrov @ 2013-09-26 14:09 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico

Hi all,
This is a complete remake of my old patch that modified the bonding hash
functions to use skb_flow_dissect which was suggested by Eric Dumazet.
This time around I've left the old modes although using a new hash function
again suggested by Eric, which is the same for all modes. The only
difference is the way the headers are obtained. The old modes obtain them
as before in order to address concerns about speed, but the 2 new ones use
skb_flow_dissect. The unification of the hash function allows to remove a
pointer from struct bonding and also a few extra functions that dealt with
it. Two new functions are added which take care of the hashing based on
bond->params.xmit_policy only:
bond_xmit_hash() - global function, used by XOR and 3ad modes
bond_flow_dissect() - used by bond_xmit_hash() to obtain the necessary
headers and combine them according to bond->params.xmit_policy.
Also factor out the ports extraction from skb_flow_dissect and add a new
function - skb_flow_get_ports() which can be re-used.

v2: add the flow_dissector patch and use skb_flow_get_ports in patch 02
v3: fix a bug in the flow_dissector patch that caused a different thoff
    by modifying the thoff argument in skb_flow_get_ports directly, most
    of the users already do it anyway.
    Also add the necessary export symbol for skb_flow_get_ports.

Best regards,
 Nikolay Aleksandrov


Nikolay Aleksandrov (3):
  flow_dissector: factor out the ports extraction in skb_flow_get_ports
  bonding: modify the old and add new xmit hash policies
  bonding: document the new xmit policy modes and update the changed
    ones

 Documentation/networking/bonding.txt |  66 ++++++------
 drivers/net/bonding/bond_3ad.c       |   2 +-
 drivers/net/bonding/bond_main.c      | 197 ++++++++++++-----------------------
 drivers/net/bonding/bond_sysfs.c     |   2 -
 drivers/net/bonding/bonding.h        |   3 +-
 include/net/flow_keys.h              |   1 +
 include/uapi/linux/if_bonding.h      |   2 +
 net/core/flow_dissector.c            |  41 ++++++--
 8 files changed, 139 insertions(+), 175 deletions(-)

-- 
1.8.1.4

^ permalink raw reply

* Re: [PATCH] moxa: drop free_irq of devm_request_irq allocated irq
From: Ben Hutchings @ 2013-09-26 14:02 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: jg1.han, jonas.jensen, davem, grant.likely, rob.herring,
	yongjun_wei, netdev, sachin.kamat
In-Reply-To: <CAPgLHd9VXaPx18ujDKbUik6XjOsnARcK-C-orCOynyJaWFREjw@mail.gmail.com>

On Thu, 2013-09-26 at 10:12 +0800, Wei Yongjun wrote:
> On 09/26/2013 08:47 AM, Jingoo Han wrote:
> > On Wednesday, September 25, 2013 4:33 PM, Wei Yongjun wrote:
> >> From: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> >>
> >> irq allocated with devm_request_irq should not be freed using
> >> free_irq, because doing so causes a dangling pointer, and a
> >> subsequent double free.
> >>
> >> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
> >> ---
> >>  drivers/net/ethernet/moxa/moxart_ether.c | 1 -
> >>  1 file changed, 1 deletion(-)
> >>
> >> diff --git a/drivers/net/ethernet/moxa/moxart_ether.c b/drivers/net/ethernet/moxa/moxart_ether.c
> >> index 83c2091..9a7fcb5 100644
> >> --- a/drivers/net/ethernet/moxa/moxart_ether.c
> >> +++ b/drivers/net/ethernet/moxa/moxart_ether.c
> >> @@ -531,7 +531,6 @@ static int moxart_remove(struct platform_device *pdev)
> >>  	struct net_device *ndev = platform_get_drvdata(pdev);
> >>
> >>  	unregister_netdev(ndev);
> >> -	free_irq(ndev->irq, ndev);
> >>  	moxart_mac_free_memory(ndev);
> >>  	free_netdev(ndev);
> > CC'ed Sachin Kamat,
> >
> > In this case, the free_irq() will be called, after calling
> > free_netdev(). 'ndev' is freed by free_netdev(). Then, 'ndev->irq'
> > is used by free_irq(). Is it right?
> >
> > In my humble opinion, it seems to make the problem.
> >
> 
> devm_request_irq() has recorded the irq and dev_id, so free_irq() by devm_*
> will not touch 'ndev' which has been freed by free_netdev().
> So, if we not need to call free_irq() before free_netdev(), there will be
> no problem.

What if this is a shared IRQ?  Then if free_irq() is not called here:

- The IRQ handler might still be called after free_netdev()
- The memory containing ndev could also be reallocated to another device
sharing the IRQ, so that it it uses the same dev_id for its IRQ handler

Maybe you can be sure that this device will never share an IRQ.  But it
still doesn't look like good practice to rely on this.  Perhaps there
should be devm functions for netdevs too, so it wouldn't be necessary to
free either IRQ or netdev explicitly.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Established sockets remain open after iface down or address lost
From: Eric Dumazet @ 2013-09-26 13:49 UTC (permalink / raw)
  To: Chris Verges; +Cc: davem, kuznet, jmorris, yoshfuji, kaber, netdev
In-Reply-To: <20130926060433.GA9170@cverges-dev-lnx.sentient-energy.com>

On Wed, 2013-09-25 at 23:04 -0700, Chris Verges wrote:
> Hello all,
> 
> I've encountered a behavior that appears to be known, but am seeking
> some clarity on its rationale.  The scenario is as follows:
> 
>   (0) A TCP server socket listens on :: (v4/v6).
>   (1) Connect a USB/Ethernet adapter to a Linux system.
>   (2) Adapter is brought up as 'eth0' with an IP address.
>   (3) A remote TCP client connects to the server socket.
>   (4) 'netstat -anp' shows the socket as ESTABLISHED
>   (5) The TCP server starts a blocking read waiting for data.
>   (6) Physically disconnect the USB/Ethernet adapter from the USB bus.
>   (7) Linux removes the 'eth0' interface and associated IP address.
> 
> At this point, the socket _still_ shows as ESTABLISHED under netstat.
> 
> This is the paradox.  Why is the blocking read not interrupted with a
> socket error to indicate that the socket is no longer viable?

Because TCP layer is not sensitive to such temporary events.

You can plug again your iface, and IP is valid again.

Why should we give a permanent error for such case ?

If network communication is cut somewhere, TCP is not supposed to
immediately react. Normal timeouts and retransmits take place. 

^ permalink raw reply

* Re: [Xen-devel] [PATCH net v6 1/1] xen-netback: Handle backend state transitions in a more robust way
From: Ian Campbell @ 2013-09-26 13:16 UTC (permalink / raw)
  To: Paul Durrant; +Cc: netdev, Wei Liu, David Vrabel, xen-devel
In-Reply-To: <1380200840.29483.82.camel@kazak.uk.xensource.com>

On Thu, 2013-09-26 at 14:07 +0100, Ian Campbell wrote:
> On Thu, 2013-09-26 at 12:09 +0100, Paul Durrant wrote:
> > When the frontend state changes netback now specifies its desired state to
> > a new function, set_backend_state(), which transitions through any
> > necessary intermediate states.
> > This fixes an issue observed with some old Windows frontend drivers where
> > they failed to transition through the Closing state and netback would not
> > behave correctly.
> > 
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Sorry, trimmed my quotes too aggressively and nuked my Ack ;-)

Ian.

^ permalink raw reply

* Re: [PATCH net-next] MAINTAINERS: add myself as maintainer of xen-netback
From: Ian Campbell @ 2013-09-26 13:08 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel
In-Reply-To: <1380195713-4557-1-git-send-email-wei.liu2@citrix.com>

On Thu, 2013-09-26 at 12:41 +0100, Wei Liu wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks!

> ---
>  MAINTAINERS |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e61c2e8..3c441ce 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9366,6 +9366,7 @@ F:	arch/arm64/include/asm/xen/
>  
>  XEN NETWORK BACKEND DRIVER
>  M:	Ian Campbell <ian.campbell@citrix.com>
> +M:	Wei Liu <wei.liu2@citrix.com>
>  L:	xen-devel@lists.xenproject.org (moderated for non-subscribers)
>  L:	netdev@vger.kernel.org
>  S:	Supported

^ permalink raw reply

* Re: [PATCH net v6 1/1] xen-netback: Handle backend state transitions in a more robust way
From: Ian Campbell @ 2013-09-26 13:07 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel
In-Reply-To: <1380193792-9751-2-git-send-email-paul.durrant@citrix.com>

On Thu, 2013-09-26 at 12:09 +0100, Paul Durrant wrote:
> When the frontend state changes netback now specifies its desired state to
> a new function, set_backend_state(), which transitions through any
> necessary intermediate states.
> This fixes an issue observed with some old Windows frontend drivers where
> they failed to transition through the Closing state and netback would not
> behave correctly.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply

* Re: [PATCH net v6 1/1] xen-netback: Handle backend state transitions in a more robust way
From: Ian Campbell @ 2013-09-26 13:07 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel
In-Reply-To: <1380193792-9751-2-git-send-email-paul.durrant@citrix.com>

On Thu, 2013-09-26 at 12:09 +0100, Paul Durrant wrote:
> When the frontend state changes netback now specifies its desired state to
> a new function, set_backend_state(), which transitions through any
> necessary intermediate states.
> This fixes an issue observed with some old Windows frontend drivers where
> they failed to transition through the Closing state and netback would not
> behave correctly.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>

^ permalink raw reply

* RE: Result Notification.
From: Marie Siggelin @ 2013-09-26 12:12 UTC (permalink / raw)
  To: info

 
Reply back to result.ukunit@gmail.com as soon as you receive  this notification massage for your claims. Ticket Number: 3134231835667

^ permalink raw reply

* [PATCH net-next] MAINTAINERS: add myself as maintainer of xen-netback
From: Wei Liu @ 2013-09-26 11:41 UTC (permalink / raw)
  To: netdev; +Cc: xen-devel, ian.campbell, Wei Liu

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 MAINTAINERS |    1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e61c2e8..3c441ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9366,6 +9366,7 @@ F:	arch/arm64/include/asm/xen/
 
 XEN NETWORK BACKEND DRIVER
 M:	Ian Campbell <ian.campbell@citrix.com>
+M:	Wei Liu <wei.liu2@citrix.com>
 L:	xen-devel@lists.xenproject.org (moderated for non-subscribers)
 L:	netdev@vger.kernel.org
 S:	Supported
-- 
1.7.10.4

^ permalink raw reply related

* RE: [PATCH 2/2] net: qmi_wwan: fix checkpatch warnings
From: David Laight @ 2013-09-26 11:29 UTC (permalink / raw)
  To: Bjørn Mork, Fabio Porcedda
  Cc: netdev, linux-usb, David S. Miller, Dan Williams
In-Reply-To: <87y56jn9ub.fsf@nemi.mork.no>

> Anything that breaks a previously unbroken argument list will reduce the
> readability in my opinion.  The lines can of course not be unlimited,
> but there is no need to set the limit as low as 80 columns.  Feedback
> I've got from developers using e.g. 80 column braille devices is that
> longer lines isn't really a problem for them either.

The main reason for limiting the line length is so that things look
'sensible' when you have a lot of screen windows displaying different
files. You don't want wrapped code, and you definitely don't want
the RHS of long lines hidden.
With a 1600x1200 monitor I'll display six 80x40 windows (and probably
have some more partially visible ones).

Personally I indent continuation lines by 4 chars if using 8 char
'normal' indentation and 8 chars if using 4. This gives a lot more
room on the continuation lines than the Linux 'line up with the
previous line'.

> This is the only one of your code changes which I can be convinced to
> agreeing may improve readability:
> 
> -     if ((on && || (!on && atomic_dec_and_test(&info->pmcount))) {
> +     if ((on && atomic_add_return(1, &info->pmcount) == 1) ||
> +         (!on && atomic_dec_and_test(&info->pmcount))) {

That can be written succinctly as:
	if (on ? atomic_add_return(1, &info->pmcount) == 1)
	       : atomic_dec_and_test(&info->pmcount)) {
although that construct is somewhat frowned upon!

	David


^ permalink raw reply

* [PATCH net v6 1/1] xen-netback: Handle backend state transitions in a more robust way
From: Paul Durrant @ 2013-09-26 11:09 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Ian Campbell, Wei Liu, David Vrabel
In-Reply-To: <1380193792-9751-1-git-send-email-paul.durrant@citrix.com>

When the frontend state changes netback now specifies its desired state to
a new function, set_backend_state(), which transitions through any
necessary intermediate states.
This fixes an issue observed with some old Windows frontend drivers where
they failed to transition through the Closing state and netback would not
behave correctly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/xenbus.c |  148 ++++++++++++++++++++++++++++++--------
 1 file changed, 118 insertions(+), 30 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index a53782e..b45bce2 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -24,6 +24,12 @@
 struct backend_info {
 	struct xenbus_device *dev;
 	struct xenvif *vif;
+
+	/* This is the state that will be reflected in xenstore when any
+	 * active hotplug script completes.
+	 */
+	enum xenbus_state state;
+
 	enum xenbus_state frontend_state;
 	struct xenbus_watch hotplug_status_watch;
 	u8 have_hotplug_status_watch:1;
@@ -136,6 +142,8 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		goto fail;
 
+	be->state = XenbusStateInitWait;
+
 	/* This kicks hotplug scripts, so do it immediately. */
 	backend_create_xenvif(be);
 
@@ -208,24 +216,113 @@ static void backend_create_xenvif(struct backend_info *be)
 	kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE);
 }
 
-
-static void disconnect_backend(struct xenbus_device *dev)
+static void backend_disconnect(struct backend_info *be)
 {
-	struct backend_info *be = dev_get_drvdata(&dev->dev);
-
 	if (be->vif)
 		xenvif_disconnect(be->vif);
 }
 
-static void destroy_backend(struct xenbus_device *dev)
+static void backend_connect(struct backend_info *be)
 {
-	struct backend_info *be = dev_get_drvdata(&dev->dev);
+	if (be->vif)
+		connect(be);
+}
 
-	if (be->vif) {
-		kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
-		xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status");
-		xenvif_free(be->vif);
-		be->vif = NULL;
+static inline void backend_switch_state(struct backend_info *be,
+					enum xenbus_state state)
+{
+	struct xenbus_device *dev = be->dev;
+
+	pr_debug("%s -> %s\n", dev->nodename, xenbus_strstate(state));
+	be->state = state;
+
+	/* If we are waiting for a hotplug script then defer the
+	 * actual xenbus state change.
+	 */
+	if (!be->have_hotplug_status_watch)
+		xenbus_switch_state(dev, state);
+}
+
+/* Handle backend state transitions:
+ *
+ * The backend state starts in InitWait and the following transitions are
+ * allowed.
+ *
+ * InitWait -> Connected
+ *
+ *    ^    \         |
+ *    |     \        |
+ *    |      \       |
+ *    |       \      |
+ *    |        \     |
+ *    |         \    |
+ *    |          V   V
+ *
+ *  Closed  <-> Closing
+ *
+ * The state argument specifies the eventual state of the backend and the
+ * function transitions to that state via the shortest path.
+ */
+static void set_backend_state(struct backend_info *be,
+			      enum xenbus_state state)
+{
+	while (be->state != state) {
+		switch (be->state) {
+		case XenbusStateClosed:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+				pr_info("%s: prepare for reconnect\n",
+					be->dev->nodename);
+				backend_switch_state(be, XenbusStateInitWait);
+				break;
+			case XenbusStateClosing:
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateInitWait:
+			switch (state) {
+			case XenbusStateConnected:
+				backend_connect(be);
+				backend_switch_state(be, XenbusStateConnected);
+				break;
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateConnected:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateClosing:
+			case XenbusStateClosed:
+				backend_disconnect(be);
+				backend_switch_state(be, XenbusStateClosing);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		case XenbusStateClosing:
+			switch (state) {
+			case XenbusStateInitWait:
+			case XenbusStateConnected:
+			case XenbusStateClosed:
+				backend_switch_state(be, XenbusStateClosed);
+				break;
+			default:
+				BUG();
+			}
+			break;
+		default:
+			BUG();
+		}
 	}
 }
 
@@ -237,40 +334,33 @@ static void frontend_changed(struct xenbus_device *dev,
 {
 	struct backend_info *be = dev_get_drvdata(&dev->dev);
 
-	pr_debug("frontend state %s\n", xenbus_strstate(frontend_state));
+	pr_debug("%s -> %s\n", dev->otherend, xenbus_strstate(frontend_state));
 
 	be->frontend_state = frontend_state;
 
 	switch (frontend_state) {
 	case XenbusStateInitialising:
-		if (dev->state == XenbusStateClosed) {
-			pr_info("%s: prepare for reconnect\n", dev->nodename);
-			xenbus_switch_state(dev, XenbusStateInitWait);
-		}
+		set_backend_state(be, XenbusStateInitWait);
 		break;
 
 	case XenbusStateInitialised:
 		break;
 
 	case XenbusStateConnected:
-		if (dev->state == XenbusStateConnected)
-			break;
-		if (be->vif)
-			connect(be);
+		set_backend_state(be, XenbusStateConnected);
 		break;
 
 	case XenbusStateClosing:
-		disconnect_backend(dev);
-		xenbus_switch_state(dev, XenbusStateClosing);
+		set_backend_state(be, XenbusStateClosing);
 		break;
 
 	case XenbusStateClosed:
-		xenbus_switch_state(dev, XenbusStateClosed);
+		set_backend_state(be, XenbusStateClosed);
 		if (xenbus_dev_is_online(dev))
 			break;
-		destroy_backend(dev);
 		/* fall through if not online */
 	case XenbusStateUnknown:
+		set_backend_state(be, XenbusStateClosed);
 		device_unregister(&dev->dev);
 		break;
 
@@ -363,7 +453,9 @@ static void hotplug_status_changed(struct xenbus_watch *watch,
 	if (IS_ERR(str))
 		return;
 	if (len == sizeof("connected")-1 && !memcmp(str, "connected", len)) {
-		xenbus_switch_state(be->dev, XenbusStateConnected);
+		/* Complete any pending state change */
+		xenbus_switch_state(be->dev, be->state);
+
 		/* Not interested in this watch anymore. */
 		unregister_hotplug_status_watch(be);
 	}
@@ -393,12 +485,8 @@ static void connect(struct backend_info *be)
 	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
 				   hotplug_status_changed,
 				   "%s/%s", dev->nodename, "hotplug-status");
-	if (err) {
-		/* Switch now, since we can't do a watch. */
-		xenbus_switch_state(dev, XenbusStateConnected);
-	} else {
+	if (!err)
 		be->have_hotplug_status_watch = 1;
-	}
 
 	netif_wake_queue(be->vif->dev);
 }
-- 
1.7.10.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox