Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Jiri Pirko @ 2013-10-02 11:20 UTC (permalink / raw)
  To: netdev
  Cc: yoshfuji, davem, kuznet, jmorris, kaber, herbert, eric.dumazet,
	hannes
In-Reply-To: <20131001232534.GM10771@order.stressinduktion.org>

Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
>> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
>> get an oops.
>
>This is incorrect, it just depends on the size of the writes and on the
>interface mtu.
>
>> IPv4 seems to work without problems, too.
>
>I also get kernel oopses from IPv4 now, too.

This patch should fix this on ipv4 as well:

Subject: ip_output: do skb ufo init for peeked non ufo skb as well

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv4/ip_output.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a04d872..bd21c5d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
 		/* initialize protocol header pointer */
 		skb->transport_header = skb->network_header + fragheaderlen;
 
-		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
 
-		/* specify the length of each IP datagram fragment */
-		skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
-		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+
 		__skb_queue_tail(queue, skb);
-	}
+	} else if (skb_is_gso(skb))
+		goto append;
+
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	/* specify the length of each IP datagram fragment */
+	skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
+	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 
+append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,
 				       (length - transhdrlen));
 }
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next] inet: consolidate INET_TW_MATCH
From: Eric Dumazet @ 2013-10-02 11:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCP listener refactoring, part 2 :

We can use a generic lookup, sockets being in whatever state, if
we are sure all relevant fields are at the same place in all socket
types (ESTABLISH, TIME_WAIT, SYN_RECV)

This patch removes these macros :

 inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair

And adds :

 sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr

Then, INET_TW_MATCH() is really the same than INET_MATCH()

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/ipv6.h             |    4 ++--
 include/net/inet_hashtables.h    |   26 ++++++++------------------
 include/net/inet_sock.h          |    2 --
 include/net/inet_timewait_sock.h |    8 --------
 include/net/sock.h               |    4 ++++
 net/ipv4/inet_connection_sock.c  |   11 +++++------
 net/ipv6/udp.c                   |    6 ++----
 7 files changed, 21 insertions(+), 40 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 28ea384..b7f1f3b 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -370,7 +370,7 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk)
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
 #define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif)	\
-	((inet_sk(__sk)->inet_portpair == (__ports))		&&	\
+	(((__sk)->sk_portpair == (__ports))			&&	\
 	 ((__sk)->sk_family == AF_INET6)			&&	\
 	 ipv6_addr_equal(&inet6_sk(__sk)->daddr, (__saddr))	&&	\
 	 ipv6_addr_equal(&inet6_sk(__sk)->rcv_saddr, (__daddr))	&&	\
@@ -379,7 +379,7 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk)
 	 net_eq(sock_net(__sk), (__net)))
 
 #define INET6_TW_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif)	   \
-	((inet_twsk(__sk)->tw_portpair == (__ports))			&& \
+	(((__sk)->sk_portpair == (__ports))				&& \
 	 ((__sk)->sk_family == AF_INET6)				&& \
 	 ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_daddr, (__saddr))	&& \
 	 ipv6_addr_equal(&inet6_twsk(__sk)->tw_v6_rcv_saddr, (__daddr)) && \
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 594dfee..10d6838 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -302,35 +302,25 @@ static inline struct sock *inet_lookup_listener(struct net *net,
 				   ((__force __u64)(__be32)(__saddr)));
 #endif /* __BIG_ENDIAN */
 #define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif)	\
-	((inet_sk(__sk)->inet_portpair == (__ports))		&&	\
-	 (inet_sk(__sk)->inet_addrpair == (__cookie))		&&	\
+	(((__sk)->sk_portpair == (__ports))			&&	\
+	 ((__sk)->sk_addrpair == (__cookie))			&&	\
 	 (!(__sk)->sk_bound_dev_if	||				\
 	   ((__sk)->sk_bound_dev_if == (__dif))) 		&& 	\
 	 net_eq(sock_net(__sk), (__net)))
-#define INET_TW_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif)\
-	((inet_twsk(__sk)->tw_portpair == (__ports))	&&		\
-	 (inet_twsk(__sk)->tw_addrpair == (__cookie))	&&		\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif)))	&&		\
-	 net_eq(sock_net(__sk), (__net)))
 #else /* 32-bit arch */
 #define INET_ADDR_COOKIE(__name, __saddr, __daddr)
 #define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif) \
-	((inet_sk(__sk)->inet_portpair == (__ports))	&&		\
-	 (inet_sk(__sk)->inet_daddr	== (__saddr))	&&		\
-	 (inet_sk(__sk)->inet_rcv_saddr	== (__daddr))	&&		\
-	 (!(__sk)->sk_bound_dev_if	||				\
-	   ((__sk)->sk_bound_dev_if == (__dif))) 	&&		\
-	 net_eq(sock_net(__sk), (__net)))
-#define INET_TW_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif) \
-	((inet_twsk(__sk)->tw_portpair == (__ports))	&&		\
-	 (inet_twsk(__sk)->tw_daddr	== (__saddr))	&&		\
-	 (inet_twsk(__sk)->tw_rcv_saddr	== (__daddr))	&&		\
+	(((__sk)->sk_portpair == (__ports))		&&		\
+	 ((__sk)->sk_daddr	== (__saddr))		&&		\
+	 ((__sk)->sk_rcv_saddr	== (__daddr))		&&		\
 	 (!(__sk)->sk_bound_dev_if	||				\
 	   ((__sk)->sk_bound_dev_if == (__dif))) 	&&		\
 	 net_eq(sock_net(__sk), (__net)))
 #endif /* 64-bit arch */
 
+#define INET_TW_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif)\
+	INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif)
+
 /*
  * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so we need
  * not check it for lookups anymore, thanks Alexey. -DaveM
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index f314177..6d9a7e6 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -146,10 +146,8 @@ struct inet_sock {
 	/* Socket demultiplex comparisons on incoming packets. */
 #define inet_daddr		sk.__sk_common.skc_daddr
 #define inet_rcv_saddr		sk.__sk_common.skc_rcv_saddr
-#define inet_addrpair		sk.__sk_common.skc_addrpair
 #define inet_dport		sk.__sk_common.skc_dport
 #define inet_num		sk.__sk_common.skc_num
-#define inet_portpair		sk.__sk_common.skc_portpair
 
 	__be32			inet_saddr;
 	__s16			uc_ttl;
diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index 828200a..48fd356 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -112,10 +112,8 @@ struct inet_timewait_sock {
 #define tw_net			__tw_common.skc_net
 #define tw_daddr        	__tw_common.skc_daddr
 #define tw_rcv_saddr    	__tw_common.skc_rcv_saddr
-#define tw_addrpair		__tw_common.skc_addrpair
 #define tw_dport		__tw_common.skc_dport
 #define tw_num			__tw_common.skc_num
-#define tw_portpair		__tw_common.skc_portpair
 
 	int			tw_timeout;
 	volatile unsigned char	tw_substate;
@@ -189,12 +187,6 @@ static inline struct inet_timewait_sock *inet_twsk(const struct sock *sk)
 	return (struct inet_timewait_sock *)sk;
 }
 
-static inline __be32 sk_rcv_saddr(const struct sock *sk)
-{
-/* both inet_sk() and inet_twsk() store rcv_saddr in skc_rcv_saddr */
-	return sk->__sk_common.skc_rcv_saddr;
-}
-
 void inet_twsk_put(struct inet_timewait_sock *tw);
 
 int inet_twsk_unhash(struct inet_timewait_sock *tw);
diff --git a/include/net/sock.h b/include/net/sock.h
index f0a44cc..e3bf213 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -300,6 +300,10 @@ struct sock {
 #define sk_dontcopy_begin	__sk_common.skc_dontcopy_begin
 #define sk_dontcopy_end		__sk_common.skc_dontcopy_end
 #define sk_hash			__sk_common.skc_hash
+#define sk_portpair		__sk_common.skc_portpair
+#define sk_addrpair		__sk_common.skc_addrpair
+#define sk_daddr		__sk_common.skc_daddr
+#define sk_rcv_saddr		__sk_common.skc_rcv_saddr
 #define sk_family		__sk_common.skc_family
 #define sk_state		__sk_common.skc_state
 #define sk_reuse		__sk_common.skc_reuse
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 7ac7aa1..56e82a4 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -71,17 +71,16 @@ int inet_csk_bind_conflict(const struct sock *sk,
 			    (!reuseport || !sk2->sk_reuseport ||
 			    (sk2->sk_state != TCP_TIME_WAIT &&
 			     !uid_eq(uid, sock_i_uid(sk2))))) {
-				const __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2);
-				if (!sk2_rcv_saddr || !sk_rcv_saddr(sk) ||
-				    sk2_rcv_saddr == sk_rcv_saddr(sk))
+
+				if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr ||
+				    sk2->sk_rcv_saddr == sk->sk_rcv_saddr)
 					break;
 			}
 			if (!relax && reuse && sk2->sk_reuse &&
 			    sk2->sk_state != TCP_LISTEN) {
-				const __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2);
 
-				if (!sk2_rcv_saddr || !sk_rcv_saddr(sk) ||
-				    sk2_rcv_saddr == sk_rcv_saddr(sk))
+				if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr ||
+				    sk2->sk_rcv_saddr == sk->sk_rcv_saddr)
 					break;
 			}
 		}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 72b7eaa..8119791 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -57,8 +57,6 @@ int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2)
 {
 	const struct in6_addr *sk_rcv_saddr6 = &inet6_sk(sk)->rcv_saddr;
 	const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2);
-	__be32 sk1_rcv_saddr = sk_rcv_saddr(sk);
-	__be32 sk2_rcv_saddr = sk_rcv_saddr(sk2);
 	int sk_ipv6only = ipv6_only_sock(sk);
 	int sk2_ipv6only = inet_v6_ipv6only(sk2);
 	int addr_type = ipv6_addr_type(sk_rcv_saddr6);
@@ -67,8 +65,8 @@ int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2)
 	/* if both are mapped, treat as IPv4 */
 	if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED)
 		return (!sk2_ipv6only &&
-			(!sk1_rcv_saddr || !sk2_rcv_saddr ||
-			  sk1_rcv_saddr == sk2_rcv_saddr));
+			(!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr ||
+			  sk->sk_rcv_saddr == sk2->sk_rcv_saddr));
 
 	if (addr_type2 == IPV6_ADDR_ANY &&
 	    !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED))

^ permalink raw reply related

* [PATCH net-next v5 0/3] bonding: modify the current and add new hash functions
From: Nikolay Aleksandrov @ 2013-10-02 11:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico

Hi all,
This is a complete remake of my old patch that modified the bonding hash
functions to use skb_flow_dissect which was suggested by Eric Dumazet.
This time around I've left the old modes although using a new hash function
again suggested by Eric, which is the same for all modes. The only
difference is the way the headers are obtained. The old modes obtain them
as before in order to address concerns about speed, but the 2 new ones use
skb_flow_dissect. The unification of the hash function allows to remove a
pointer from struct bonding and also a few extra functions that dealt with
it. Two new functions are added which take care of the hashing based on
bond->params.xmit_policy only:
bond_xmit_hash() - global function, used by XOR and 3ad modes
bond_flow_dissect() - used by bond_xmit_hash() to obtain the necessary
headers and combine them according to bond->params.xmit_policy.
Also factor out the ports extraction from skb_flow_dissect and add a new
function - skb_flow_get_ports() which can be re-used.

v2: add the flow_dissector patch and use skb_flow_get_ports in patch 02
v3: fix a bug in the flow_dissector patch that caused a different thoff
    by modifying the thoff argument in skb_flow_get_ports directly, most
    of the users already do it anyway.
    Also add the necessary export symbol for skb_flow_get_ports.
v4: integrate the thoff bug fix in patch 01
v5: disintegrate the thoff bug fix and re-base on top of Eric's fix

Best regards,
 Nikolay Aleksandrov


Nikolay Aleksandrov (3):
  flow_dissector: factor out the ports extraction in skb_flow_get_ports
  bonding: modify the old and add new xmit hash policies
  bonding: document the new xmit policy modes and update the changed
    ones

 Documentation/networking/bonding.txt |  66 ++++++------
 drivers/net/bonding/bond_3ad.c       |   2 +-
 drivers/net/bonding/bond_main.c      | 197 ++++++++++++-----------------------
 drivers/net/bonding/bond_sysfs.c     |   2 -
 drivers/net/bonding/bonding.h        |   3 +-
 include/net/flow_keys.h              |   1 +
 include/uapi/linux/if_bonding.h      |   2 +
 net/core/flow_dissector.c            |  39 +++++--
 8 files changed, 137 insertions(+), 175 deletions(-)

-- 
1.8.1.4

^ permalink raw reply

* [PATCH net-next v5 1/3] flow_dissector: factor out the ports extraction in skb_flow_get_ports
From: Nikolay Aleksandrov @ 2013-10-02 11:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380713966-3891-1-git-send-email-nikolay@redhat.com>

Factor out the code that extracts the ports from skb_flow_dissect and
add a new function skb_flow_get_ports which can be re-used.

Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2: new patch
v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
    modifying thoff directly in skb_flow_get_ports as it's done anyway.
    Also add the necessary export symbol for skb_flow_get_ports.
v4: integrate the thoff fix in skb_flow_get_ports
v5: disintegrate the thoff fix, and re-base on Eric's fix
This seems like a good idea because there're other users that can re-use
it later as well.

 include/net/flow_keys.h   |  1 +
 net/core/flow_dissector.c | 39 ++++++++++++++++++++++++++++-----------
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index ac2439d..7e64bd8 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -14,4 +14,5 @@ struct flow_keys {
 };
 
 bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto);
 #endif
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 8d7d0dd..f8e25ac 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -25,9 +25,35 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
 	memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
 }
 
+/**
+ * skb_flow_get_ports - extract the upper layer ports and return them
+ * @skb: buffer to extract the ports from
+ * @thoff: transport header offset
+ * @ip_proto: protocol for which to get port offset
+ *
+ * The function will try to retrieve the ports at offset thoff + poff where poff
+ * is the protocol port offset returned from proto_ports_offset
+ */
+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto)
+{
+	int poff = proto_ports_offset(ip_proto);
+
+	if (poff >= 0) {
+		__be32 *ports, _ports;
+
+		ports = skb_header_pointer(skb, thoff + poff,
+					   sizeof(_ports), &_ports);
+		if (ports)
+			return *ports;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(skb_flow_get_ports);
+
 bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
 {
-	int poff, nhoff = skb_network_offset(skb);
+	int nhoff = skb_network_offset(skb);
 	u8 ip_proto;
 	__be16 proto = skb->protocol;
 
@@ -150,16 +176,7 @@ ipv6:
 	}
 
 	flow->ip_proto = ip_proto;
-	poff = proto_ports_offset(ip_proto);
-	if (poff >= 0) {
-		__be32 *ports, _ports;
-
-		ports = skb_header_pointer(skb, nhoff + poff,
-					   sizeof(_ports), &_ports);
-		if (ports)
-			flow->ports = *ports;
-	}
-
+	flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
 	flow->thoff = (u16) nhoff;
 
 	return true;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next v5 2/3] bonding: modify the old and add new xmit hash policies
From: Nikolay Aleksandrov @ 2013-10-02 11:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380713966-3891-1-git-send-email-nikolay@redhat.com>

This patch adds two new hash policy modes which use skb_flow_dissect:
3 - Encapsulated layer 2+3
4 - Encapsulated layer 3+4
There should be a good improvement for tunnel users in those modes.
It also changes the old hash functions to:
hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
hash ^= (hash >> 16);
hash ^= (hash >> 8);

Where hash will be initialized either to L2 hash, that is
SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
from the upper layer. Flow's dst and src are also extracted based on the
xmit policy either directly from the buffer or by using skb_flow_dissect,
but in both cases if the protocol is IPv6 then dst and src are obtained by
ipv6_addr_hash() on the real addresses. In case of a non-dissectable
packet, the algorithms fall back to L2 hashing.
The bond_set_mode_ops() function is now obsolete and thus deleted
because it was used only to set the proper hash policy. Also we trim a
pointer from struct bonding because we no longer need to keep the hash
function, now there's only a single hash function - bond_xmit_hash that
works based on bond->params.xmit_policy.

The hash function and skb_flow_dissect were suggested by Eric Dumazet.
The layer names were suggested by Andy Gospodarek, because I suck at
semantics.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v2: fix a bug in bond_flow_dissect which might've caused the use of
    uninitalized flow_keys and make use of skb_flow_get_ports
v3, v4: no change
v5: re-base
One line is intentionally left at 82 chars since it's the whole function
and IMO looks better that way.

 drivers/net/bonding/bond_3ad.c   |   2 +-
 drivers/net/bonding/bond_main.c  | 197 ++++++++++++++-------------------------
 drivers/net/bonding/bond_sysfs.c |   2 -
 drivers/net/bonding/bonding.h    |   3 +-
 include/uapi/linux/if_bonding.h  |   2 +
 5 files changed, 72 insertions(+), 134 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c62606a..ea3e64e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2403,7 +2403,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
 	first_ok_slave = NULL;
 
 	bond_for_each_slave(bond, slave, iter) {
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index fe8a94f..dfb4f6d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -78,6 +78,7 @@
 #include <net/netns/generic.h>
 #include <net/pkt_sched.h>
 #include <linux/rculist.h>
+#include <net/flow_keys.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 #include "bond_alb.h"
@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
 module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
-				   "2 for layer 2+3");
+				   "2 for layer 2+3, 3 for encap layer 2+3, "
+				   "4 for encap layer 3+4");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
 {	"layer2",		BOND_XMIT_POLICY_LAYER2},
 {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
 {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
 {	NULL,			-1},
 };
 
@@ -3035,99 +3039,85 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+/* L2 hash helper */
+static inline u32 bond_eth_hash(struct sk_buff *skb)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
-		return (data->h_dest[5] ^ data->h_source[5]) % count;
+		return data->h_dest[5] ^ data->h_source[5];
 
 	return 0;
 }
 
-/*
- * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
- */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+/* Extract the appropriate headers based on bond's xmit policy */
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
+			      struct flow_keys *fk)
 {
-	const struct ethhdr *data;
+	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	u32 v6hash;
-	const __be32 *s, *d;
+	int noff, proto = -1;
 
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_network_may_pull(skb, sizeof(*iph))) {
+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
+		return skb_flow_dissect(skb, fk);
+
+	fk->ports = 0;
+	noff = skb_network_offset(skb);
+	if (skb->protocol == htons(ETH_P_IP)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
+			return false;
 		iph = ip_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
-			(data->h_dest[5] ^ data->h_source[5])) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		data = (struct ethhdr *)skb->data;
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
-	}
-
-	return bond_xmit_hash_policy_l2(skb, count);
+		fk->src = iph->saddr;
+		fk->dst = iph->daddr;
+		noff += iph->ihl << 2;
+		if (!ip_is_fragment(iph))
+			proto = iph->protocol;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
+			return false;
+		iph6 = ipv6_hdr(skb);
+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
+		noff += sizeof(*iph6);
+		proto = iph6->nexthdr;
+	} else {
+		return false;
+	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
+		fk->ports = skb_flow_get_ports(skb, noff, proto);
+
+	return true;
 }
 
-/*
- * Hash for the output device based upon layer 3 and layer 4 data. If
- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ * @count: modulo value
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ * which will be reduced modulo count before returning.
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
 {
-	u32 layer4_xor = 0;
-	const struct iphdr *iph;
-	const struct ipv6hdr *ipv6h;
-	const __be32 *s, *d;
-	const __be16 *l4 = NULL;
-	__be16 _l4[2];
-	int noff = skb_network_offset(skb);
-	int poff;
-
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
-		iph = ip_hdr(skb);
-		poff = proto_ports_offset(iph->protocol);
+	struct flow_keys flow;
+	u32 hash;
 
-		if (!ip_is_fragment(iph) && poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
-		ipv6h = ipv6_hdr(skb);
-		poff = proto_ports_offset(ipv6h->nexthdr);
-		if (poff >= 0) {
-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
-						sizeof(_l4), &_l4);
-			if (l4)
-				layer4_xor = ntohs(l4[0] ^ l4[1]);
-		}
-		s = &ipv6h->saddr.s6_addr32[0];
-		d = &ipv6h->daddr.s6_addr32[0];
-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
-			       (layer4_xor >> 8);
-		return layer4_xor % count;
-	}
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
+	    !bond_flow_dissect(bond, skb, &flow))
+		return bond_eth_hash(skb) % count;
+
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
+		hash = bond_eth_hash(skb);
+	else
+		hash = (__force u32)flow.ports;
+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
+	hash ^= (hash >> 16);
+	hash ^= (hash >> 8);
 
-	return bond_xmit_hash_policy_l2(skb, count);
+	return hash % count;
 }
 
 /*-------------------------- Device entry points ----------------------------*/
@@ -3721,8 +3711,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
 	return NETDEV_TX_OK;
 }
 
-/*
- * In bond_xmit_xor() , we determine the output device by using a pre-
+/* In bond_xmit_xor() , we determine the output device by using a pre-
  * determined xmit_hash_policy(), If the selected device is not enabled,
  * find the next active slave.
  */
@@ -3730,8 +3719,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
 
-	bond_xmit_slave_id(bond, skb,
-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
 
 	return NETDEV_TX_OK;
 }
@@ -3768,22 +3756,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
 
 /*------------------------- Device initialization ---------------------------*/
 
-static void bond_set_xmit_hash_policy(struct bonding *bond)
-{
-	switch (bond->params.xmit_policy) {
-	case BOND_XMIT_POLICY_LAYER23:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
-		break;
-	case BOND_XMIT_POLICY_LAYER34:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
-		break;
-	case BOND_XMIT_POLICY_LAYER2:
-	default:
-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
-		break;
-	}
-}
-
 /*
  * Lookup the slave that corresponds to a qid
  */
@@ -3894,38 +3866,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
-/*
- * set bond mode specific net device operations
- */
-void bond_set_mode_ops(struct bonding *bond, int mode)
-{
-	struct net_device *bond_dev = bond->dev;
-
-	switch (mode) {
-	case BOND_MODE_ROUNDROBIN:
-		break;
-	case BOND_MODE_ACTIVEBACKUP:
-		break;
-	case BOND_MODE_XOR:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_BROADCAST:
-		break;
-	case BOND_MODE_8023AD:
-		bond_set_xmit_hash_policy(bond);
-		break;
-	case BOND_MODE_ALB:
-		/* FALLTHRU */
-	case BOND_MODE_TLB:
-		break;
-	default:
-		/* Should never happen, mode already checked */
-		pr_err("%s: Error: Unknown bonding mode %d\n",
-		       bond_dev->name, mode);
-		break;
-	}
-}
-
 static int bond_ethtool_get_settings(struct net_device *bond_dev,
 				     struct ethtool_cmd *ecmd)
 {
@@ -4027,7 +3967,6 @@ static void bond_setup(struct net_device *bond_dev)
 	ether_setup(bond_dev);
 	bond_dev->netdev_ops = &bond_netdev_ops;
 	bond_dev->ethtool_ops = &bond_ethtool_ops;
-	bond_set_mode_ops(bond, bond->params.mode);
 
 	bond_dev->destructor = bond_destructor;
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index e06c644..e924952 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -318,7 +318,6 @@ static ssize_t bonding_store_mode(struct device *d,
 	/* don't cache arp_validate between modes */
 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
 	bond->params.mode = new_value;
-	bond_set_mode_ops(bond, bond->params.mode);
 	pr_info("%s: setting mode to %s (%d).\n",
 		bond->dev->name, bond_mode_tbl[new_value].modename,
 		new_value);
@@ -358,7 +357,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 		ret = -EINVAL;
 	} else {
 		bond->params.xmit_policy = new_value;
-		bond_set_mode_ops(bond, bond->params.mode);
 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
 			bond->dev->name,
 			xmit_hashtype_tbl[new_value].modename, new_value);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 9a26fbd..0bd04fb 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -217,7 +217,6 @@ struct bonding {
 	char     proc_file_name[IFNAMSIZ];
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, int);
 	u16      rr_tx_counter;
 	struct   ad_bond_info ad_info;
 	struct   alb_bond_info alb_info;
@@ -409,7 +408,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
 void bond_mii_monitor(struct work_struct *);
 void bond_loadbalance_arp_mon(struct work_struct *);
 void bond_activebackup_arp_mon(struct work_struct *);
-void bond_set_mode_ops(struct bonding *bond, int mode);
+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
 int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
 void bond_select_active_slave(struct bonding *bond);
 void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index a17edda..9635a62 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -91,6 +91,8 @@
 #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
 #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
 #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
 
 typedef struct ifbond {
 	__s32 bond_mode;
-- 
1.8.1.4

^ permalink raw reply related

* [PATCH net-next v5 3/3] bonding: document the new xmit policy modes and update the changed ones
From: Nikolay Aleksandrov @ 2013-10-02 11:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, andy, fubar, eric.dumazet, vfalico
In-Reply-To: <1380713966-3891-1-git-send-email-nikolay@redhat.com>

Add new documentation for encap2+3 and encap3+4, also update the formula
for the old modes due to the changes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
v1-5: no change

 Documentation/networking/bonding.txt | 66 ++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9b28e71..3856ed2 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -743,21 +743,16 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The IPv4 formula is
+		generate the hash.  The formula is
 
-		(((source IP XOR dest IP) AND 0xffff) XOR
-			( source MAC XOR destination MAC ))
-				modulo slave count
+		hash = source MAC XOR destination MAC
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		The IPv6 formula is
-
-		hash = (source ip quad 2 XOR dest IP quad 2) XOR
-		       (source ip quad 3 XOR dest IP quad 3) XOR
-		       (source ip quad 4 XOR dest IP quad 4)
-
-		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			XOR (source MAC XOR destination MAC))
-				modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
@@ -779,21 +774,16 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented IPv4 TCP and UDP packets is
-
-		((source port XOR dest port) XOR
-			 ((source IP XOR dest IP) AND 0xffff)
-				modulo slave count
+		The formula for unfragmented TCP and UDP packets is
 
-		The formula for unfragmented IPv6 TCP and UDP packets is
+		hash = source port, destination port (as in the header)
+		hash = hash XOR source IP XOR destination IP
+		hash = hash XOR (hash RSHIFT 16)
+		hash = hash XOR (hash RSHIFT 8)
+		And then hash is reduced modulo slave count.
 
-		hash = (source port XOR dest port) XOR
-		       ((source ip quad 2 XOR dest IP quad 2) XOR
-			(source ip quad 3 XOR dest IP quad 3) XOR
-			(source ip quad 4 XOR dest IP quad 4))
-
-		((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
-			modulo slave count
+		If the protocol is IPv6 then the source and destination
+		addresses are first hashed using ipv6_addr_hash.
 
 		For fragmented TCP or UDP packets and all other IPv4 and
 		IPv6 protocol traffic, the source and destination port
@@ -801,10 +791,6 @@ xmit_hash_policy
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		The IPv4 policy is intended to mimic the behavior of
-		certain switches, notably Cisco switches with PFC2 as
-		well as some Foundry and IBM products.
-
 		This algorithm is not fully 802.3ad compliant.  A
 		single TCP or UDP conversation containing both
 		fragmented and unfragmented packets will see packets
@@ -815,6 +801,26 @@ xmit_hash_policy
 		conversations.  Other implementations of 802.3ad may
 		or may not tolerate this noncompliance.
 
+	encap2+3
+
+		This policy uses the same formula as layer2+3 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
+	encap3+4
+
+		This policy uses the same formula as layer3+4 but it
+		relies on skb_flow_dissect to obtain the header fields
+		which might result in the use of inner headers if an
+		encapsulation protocol is used. For example this will
+		improve the performance for tunnel users because the
+		packets will be distributed according to the encapsulated
+		flows.
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH net-next v5 1/3] flow_dissector: factor out the ports extraction in skb_flow_get_ports
From: Eric Dumazet @ 2013-10-02 11:50 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380713966-3891-2-git-send-email-nikolay@redhat.com>

On Wed, 2013-10-02 at 13:39 +0200, Nikolay Aleksandrov wrote:
> Factor out the code that extracts the ports from skb_flow_dissect and
> add a new function skb_flow_get_ports which can be re-used.
> 
> Suggested-by: Veaceslav Falico <vfalico@redhat.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Eric Dumazet @ 2013-10-02 11:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, hannes
In-Reply-To: <20131002112052.GC1528@minipsycho.brq.redhat.com>


> This patch should fix this on ipv4 as well:
> 
> Subject: ip_output: do skb ufo init for peeked non ufo skb as well
> 

Is it an official patch ? s/Subject:/[PATCH]/ 

> Now, if user application does:

Any idea when the bug was added (commit id + title) ?

> sendto len<mtu flag MSG_MORE
> sendto len>mtu flag 0
> The skb is not treated as fragmented one because it is not initialized
> that way. So move the initialization to fix this.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  net/ipv4/ip_output.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index a04d872..bd21c5d 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
>  		/* initialize protocol header pointer */
>  		skb->transport_header = skb->network_header + fragheaderlen;
>  
> -		skb->ip_summed = CHECKSUM_PARTIAL;
>  		skb->csum = 0;

Any idea why we have skb->csum = 0 here ?

Thanks !

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-02 12:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert,
	eric.dumazet
In-Reply-To: <20131002103333.GB1528@minipsycho.brq.redhat.com>

On Wed, Oct 02, 2013 at 12:33:33PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> >On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
> >> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
> >> get an oops.
> >
> >This is incorrect, it just depends on the size of the writes and on the
> >interface mtu.
> >
> >> IPv4 seems to work without problems, too.
> >
> >I also get kernel oopses from IPv4 now, too.
> 
> I'm not able to trigger this with ipv4. Can you please send strace
> output for this as well?

I used this snippet on loopback with UFO enabled and lo mtu 1280.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, " ", 1);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH net-next v5 2/3] bonding: modify the old and add new xmit hash policies
From: Eric Dumazet @ 2013-10-02 12:03 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380713966-3891-3-git-send-email-nikolay@redhat.com>

On Wed, 2013-10-02 at 13:39 +0200, Nikolay Aleksandrov wrote:
> This patch adds two new hash policy modes which use skb_flow_dissect:
> 3 - Encapsulated layer 2+3
> 4 - Encapsulated layer 3+4
> There should be a good improvement for tunnel users in those modes.
> It also changes the old hash functions to:
> hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
> hash ^= (hash >> 16);
> hash ^= (hash >> 8);
> 
> Where hash will be initialized either to L2 hash, that is
> SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
> from the upper layer. Flow's dst and src are also extracted based on the
> xmit policy either directly from the buffer or by using skb_flow_dissect,
> but in both cases if the protocol is IPv6 then dst and src are obtained by
> ipv6_addr_hash() on the real addresses. In case of a non-dissectable
> packet, the algorithms fall back to L2 hashing.
> The bond_set_mode_ops() function is now obsolete and thus deleted
> because it was used only to set the proper hash policy. Also we trim a
> pointer from struct bonding because we no longer need to keep the hash
> function, now there's only a single hash function - bond_xmit_hash that
> works based on bond->params.xmit_policy.
> 
> The hash function and skb_flow_dissect were suggested by Eric Dumazet.
> The layer names were suggested by Andy Gospodarek, because I suck at
> semantics.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---

Very nice, thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* RE: [PATCH net-next] tcp: shrink tcp6_timewait_sock by one cache line
From: David Laight @ 2013-10-02 12:08 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev
In-Reply-To: <1380711604.19002.78.camel@edumazet-glaptop.roam.corp.google.com>

> -	tmo = tw->tw_ttd - jiffies;
> +	tmo = tw->tw_ttd - (u32)jiffies;

Do you need any of these (u32) casts?
The compiler will almost certainly use 32bit arithmetic (on 32bit systems at least)
because the 'as if' rule lets if use the smaller type.

> +		tw->tw_ttd = (u32)(jiffies + (slot << INET_TWDR_RECYCLE_TICK));

If that (u32) cast is needed in order to avoid 64bit maths, it is in the wrong place.

	David


^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Jiri Pirko @ 2013-10-02 12:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, hannes
In-Reply-To: <1380714835.19002.92.camel@edumazet-glaptop.roam.corp.google.com>

Wed, Oct 02, 2013 at 01:53:55PM CEST, eric.dumazet@gmail.com wrote:
>
>> This patch should fix this on ipv4 as well:
>> 
>> Subject: ip_output: do skb ufo init for peeked non ufo skb as well
>> 
>
>Is it an official patch ? s/Subject:/[PATCH]/ 

Yes. I thought that to state "Subject:" is enough for patchwork to parse
it. Apparently not :/

>
>> Now, if user application does:
>
>Any idea when the bug was added (commit id + title) ?

This is hard to say. This code is more or less broken from the very
beginning, which is:

commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach"

>
>> sendto len<mtu flag MSG_MORE
>> sendto len>mtu flag 0
>> The skb is not treated as fragmented one because it is not initialized
>> that way. So move the initialization to fix this.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  net/ipv4/ip_output.c | 14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index a04d872..bd21c5d 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
>>  		/* initialize protocol header pointer */
>>  		skb->transport_header = skb->network_header + fragheaderlen;
>>  
>> -		skb->ip_summed = CHECKSUM_PARTIAL;
>>  		skb->csum = 0;
>
>Any idea why we have skb->csum = 0 here ?

If I understand this correctly, that can be removed from here.


>
>Thanks !
>
>

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-02 12:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert
In-Reply-To: <1380710488.19002.67.camel@edumazet-glaptop.roam.corp.google.com>

Hi Eric!

On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> 
> > 
> > This seems correct to me. sk_is_gso would work as well is you apply my
> > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > well" which does the setting of gso_size.
> 
> Well, skb having frags or not should not be a concern :
> Thats an allocation choice (lets say to avoid high order allocations). 
> 
> Setting gso_size is probably better.

e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
approach") states:

"
skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.
"

This is the reason why I tried not to update the gso_size. If it is ok, I am
fine with that.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 1/3] net: mv643xx_eth: update statistics timer from timer context only
From: Jason Cooper @ 2013-10-02 12:17 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1380711442-24735-2-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:20PM +0200, Sebastian Hesselbarth wrote:
> Each port driver installs a periodic timer to update port statistics
> by calling mib_counters_update. As mib_counters_update is also called
> from non-timer context, we should not reschedule the timer there but
> rather move it to timer-only context.
> 
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Acked-by: Jason Cooper <jason@lakedaemon.net>

Introduced by:

  4ff3495a mv643xx_eth: enforce frequent hardware statistics polling

which goes all the way back to v2.6.28

thx,

Jason.

^ permalink raw reply

* [PATCH 10/19 v2] net: myri10ge: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: hykim; +Cc: netdev, linux-kernel, kernel-janitors, Peter Senna Tschudin
In-Reply-To: <1380716391-20214-1-git-send-email-peter.senna@gmail.com>

There is the rc variable on both myri10ge_ss_lock_napi and
myri10ge_ss_lock_poll functions. In both cases rc is only assigned the
values true and false. Both functions already return bool. Change rc
type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 149355b..7792264 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -934,7 +934,7 @@ static inline void myri10ge_ss_init_lock(struct myri10ge_slice_state *ss)
 
 static inline bool myri10ge_ss_lock_napi(struct myri10ge_slice_state *ss)
 {
-	int rc = true;
+	bool rc = true;
 	spin_lock(&ss->lock);
 	if ((ss->state & SLICE_LOCKED)) {
 		WARN_ON((ss->state & SLICE_STATE_NAPI));
@@ -957,7 +957,7 @@ static inline void myri10ge_ss_unlock_napi(struct myri10ge_slice_state *ss)
 
 static inline bool myri10ge_ss_lock_poll(struct myri10ge_slice_state *ss)
 {
-	int rc = true;
+	bool rc = true;
 	spin_lock_bh(&ss->lock);
 	if ((ss->state & SLICE_LOCKED)) {
 		ss->state |= SLICE_STATE_POLL_YIELD;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 19/19 v2] net: ipv4: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: davem
  Cc: kuznet, jmorris, kaber, netdev, linux-kernel, kernel-janitors,
	Peter Senna Tschudin
In-Reply-To: <1380716391-20214-1-git-send-email-peter.senna@gmail.com>

The variable fully_acked is only assigned the values true and false.
Change its type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 25a89ea..fa17dce 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2970,7 +2970,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_buff *skb;
 	u32 now = tcp_time_stamp;
-	int fully_acked = true;
+	bool fully_acked = true;
 	int flag = 0;
 	u32 pkts_acked = 0;
 	u32 reord = tp->packets_out;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 08/19 v2] net: atl1c: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: jcliburn
  Cc: chris.snook, jkosina, rdunlap, standby24x7, peter.senna, netdev,
	linux-kernel, kernel-janitors

The variable ret is only assigned the values true and false.
The function atl1c_read_eeprom already returns bool. Change
ret type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 drivers/net/ethernet/atheros/atl1c/atl1c_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
index 3ef7092..1cda49a 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
@@ -153,7 +153,7 @@ static int atl1c_get_permanent_address(struct atl1c_hw *hw)
 bool atl1c_read_eeprom(struct atl1c_hw *hw, u32 offset, u32 *p_value)
 {
 	int i;
-	int ret = false;
+	bool ret = false;
 	u32 otp_ctrl_data;
 	u32 control;
 	u32 data;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 09/19 v2] net: bnx2x: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: eilong; +Cc: netdev, linux-kernel, kernel-janitors, Peter Senna Tschudin
In-Reply-To: <1380716391-20214-1-git-send-email-peter.senna@gmail.com>

The variable rc is only assigned the values true and false.
The function bnx2x_prev_is_path_marked already returns bool.
Change rc type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index fccfc1d..105cc80 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9874,7 +9874,7 @@ static int bnx2x_prev_path_mark_eeh(struct bnx2x *bp)
 static bool bnx2x_prev_is_path_marked(struct bnx2x *bp)
 {
 	struct bnx2x_prev_path_list *tmp_list;
-	int rc = false;
+	bool rc = false;
 
 	if (down_trylock(&bnx2x_prev_sem))
 		return false;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 2/3] net: mv643xx_eth: fix orphaned statistics timer crash
From: Jason Cooper @ 2013-10-02 12:20 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1380711442-24735-3-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:21PM +0200, Sebastian Hesselbarth wrote:
> The periodic statistics timer gets started at port _probe() time, but
> is stopped on _stop() only. In a modular environment, this can cause
> the timer to access already deallocated memory, if the module is unloaded
> without starting the eth device. To fix this, we add the timer right
> before the port is started, instead of at _probe() time.
> 
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Jason Cooper <jason@lakedaemon.net>

Introduced by:

  4ff3495a mv643xx_eth: enforce frequent hardware statistics polling

which also goes all the way back to v2.6.28

thx,

Jason.

^ permalink raw reply

* Re: [PATCH 3/3] net: mv643xx_eth: fix missing device_node for port devices
From: Jason Cooper @ 2013-10-02 12:22 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1380711442-24735-4-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:22PM +0200, Sebastian Hesselbarth wrote:
> DT-based mv643xx_eth probes and creates platform_devices for the
> port devices on its own. To allow fixups for ports based on the
> device_node, we need to set .of_node of the corresponding device
> with the correct node.
> 
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    1 +
>  1 file changed, 1 insertion(+)

Acked-by: Jason Cooper <jason@lakedaemon.net>

thx,

Jason.

^ permalink raw reply

* See the attached file
From: Microsoft Promotion @ 2013-10-02 12:23 UTC (permalink / raw)

In-Reply-To: <1380716610.99461.YahooMailNeo@web5703.biz.mail.ne1.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 21 bytes --]

See the attached file

[-- Attachment #2: MICROSOFT_AWARD_PROMOTION_2013.doc --]
[-- Type: application/msword, Size: 124416 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: Neil Horman @ 2013-10-02 12:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, john.fastabend, David S. Miller
In-Reply-To: <524BC65B.4080803@gmail.com>

On Wed, Oct 02, 2013 at 12:08:11AM -0700, John Fastabend wrote:
> On 09/25/2013 01:16 PM, Neil Horman wrote:
> >Add a operations structure that allows a network interface to export the fact
> >that it supports package forwarding in hardware between physical interfaces and
> >other mac layer devices assigned to it (such as macvlans).  this operaions
> >structure can be used by virtual mac devices to bypass software switching so
> >that forwarding can be done in hardware more efficiently.
> 
> Some additional nits below which maybe you have already thought of.
> 
Its ok, you can say it, theyre more than nits :), gaping holes is more like it.
This pass was completely untested, I'm still ironing out bits, but I think for
the most part we're in agreement on the items below.  Comments inline.

> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@intel.com
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> >  drivers/net/macvlan.c      | 37 +++++++++++++++++++++++++++++++++++++
> >  include/linux/if_macvlan.h |  1 +
> >  include/linux/netdevice.h  | 10 ++++++++++
> >  net/core/dev.c             |  3 +++
> >  4 files changed, 51 insertions(+)
> >
> >diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> >index 9bf46bd..0c37b30 100644
> >--- a/drivers/net/macvlan.c
> >+++ b/drivers/net/macvlan.c
> >@@ -296,8 +296,16 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
> >  	unsigned int len = skb->len;
> >  	int ret;
> >  	const struct macvlan_dev *vlan = netdev_priv(dev);
> >+	const struct l2_forwarding_accel_ops *l2a_ops = vlan->lowerdev->l2a_ops;
> >+
> >+	if (l2a_ops->l2_accel_xmit) {
> >+		ret = l2a_ops->l2_accel_xmit(skb, vlan->l2a_priv);
> >+		if (likely(ret == NETDEV_TX_OK))
> 
> maybe dev_xmit_complete() would be more appropriate?
> 
Yup, I've currently modified this to dev_queue_xmit, but _complete might be a
better option.

> >+			goto update_stats;
> >+	}
> >
> >  	ret = macvlan_queue_xmit(skb, dev);
> >+update_stats:
> >  	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
> >  		struct macvlan_pcpu_stats *pcpu_stats;
> >
> >@@ -336,6 +344,7 @@ static int macvlan_open(struct net_device *dev)
> >  {
> >  	struct macvlan_dev *vlan = netdev_priv(dev);
> >  	struct net_device *lowerdev = vlan->lowerdev;
> >+	const struct l2_forwarding_accel_ops *l2a_ops = lowerdev->l2a_ops;
> >  	int err;
> >
> >  	if (vlan->port->passthru) {
> >@@ -347,6 +356,19 @@ static int macvlan_open(struct net_device *dev)
> >  		goto hash_add;
> 
> Looks like this might break in the passthru case? If you don't call
> l2_accel_add_dev here but still use the l2_accel_xmit.
> 
Yeah, I need to clean this up.

> >  	}
> >
> >+	if (l2a_ops->l2_accel_add_dev) {
> 
> In the error cases it might be preferred to fallback to the
> non-offloaded software path. For example hardware may have a limit
> to the number of VSIs that can be created but we wouldn't want to
> push that up the stack.
> 
Hadn't thought of that, but yes, I agree, falling back to software switching is
definately desireable here

> >+		/* The lowerdev supports l2 switching
> >+		 * try to add this macvlan to it
> >+		 */
> >+		vlan->l2a_priv = kzalloc(l2a_ops->priv_size, GFP_KERNEL);
> >+		if (!vlan->l2a_priv)
> >+			return -ENOMEM;
> >+		err = l2a_ops->l2_accel_add_dev(vlan->lowerdev,
> >+						dev, vlan->l2a_priv);
> >+		if (err < 0)
> >+			return err;
> >+	}
> >+
> >  	err = -EBUSY;
> >  	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
> >  		goto out;
> >@@ -367,6 +389,13 @@ hash_add:
> >  del_unicast:
> >  	dev_uc_del(lowerdev, dev->dev_addr);
> >  out:
> >+	if (vlan->l2a_priv) {
> 
> Add a feature flag here so it can be disabled.
> 
Makes sense.

I'm still working out kinks, but I hope to have a next version later this
week/early next.  I'll make sure to incorporate these changes. 

Thanks!
Neil

> >+		if (l2a_ops->l2_accel_del_dev)
> >+			l2a_ops->l2_accel_del_dev(vlan->lowerdev,
> >+						  vlan->l2a_priv);
> >+		kfree(vlan->l2a_priv);
> >+		vlan->l2a_priv = NULL;
> >+	}
> >  	return err;
> >  }
> 
> [...]
> 
> Thanks,
> John
> 
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply

* Re: [PATCH net-next v5 1/3] flow_dissector: factor out the ports extraction in skb_flow_get_ports
From: Veaceslav Falico @ 2013-10-02 12:55 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet
In-Reply-To: <1380713966-3891-2-git-send-email-nikolay@redhat.com>

On Wed, Oct 02, 2013 at 01:39:24PM +0200, Nikolay Aleksandrov wrote:
>Factor out the code that extracts the ports from skb_flow_dissect and
>add a new function skb_flow_get_ports which can be re-used.
>
>Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Reviewed-by: Veaceslav Falico <vfalico@redhat.com>

>---
>v2: new patch
>v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
>    modifying thoff directly in skb_flow_get_ports as it's done anyway.
>    Also add the necessary export symbol for skb_flow_get_ports.
>v4: integrate the thoff fix in skb_flow_get_ports
>v5: disintegrate the thoff fix, and re-base on Eric's fix
>This seems like a good idea because there're other users that can re-use
>it later as well.
>
> include/net/flow_keys.h   |  1 +
> net/core/flow_dissector.c | 39 ++++++++++++++++++++++++++++-----------
> 2 files changed, 29 insertions(+), 11 deletions(-)
>
>diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
>index ac2439d..7e64bd8 100644
>--- a/include/net/flow_keys.h
>+++ b/include/net/flow_keys.h
>@@ -14,4 +14,5 @@ struct flow_keys {
> };
>
> bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
>+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto);
> #endif
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 8d7d0dd..f8e25ac 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -25,9 +25,35 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
> 	memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
> }
>
>+/**
>+ * skb_flow_get_ports - extract the upper layer ports and return them
>+ * @skb: buffer to extract the ports from
>+ * @thoff: transport header offset
>+ * @ip_proto: protocol for which to get port offset
>+ *
>+ * The function will try to retrieve the ports at offset thoff + poff where poff
>+ * is the protocol port offset returned from proto_ports_offset
>+ */
>+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto)
>+{
>+	int poff = proto_ports_offset(ip_proto);
>+
>+	if (poff >= 0) {
>+		__be32 *ports, _ports;
>+
>+		ports = skb_header_pointer(skb, thoff + poff,
>+					   sizeof(_ports), &_ports);
>+		if (ports)
>+			return *ports;
>+	}
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(skb_flow_get_ports);
>+
> bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
> {
>-	int poff, nhoff = skb_network_offset(skb);
>+	int nhoff = skb_network_offset(skb);
> 	u8 ip_proto;
> 	__be16 proto = skb->protocol;
>
>@@ -150,16 +176,7 @@ ipv6:
> 	}
>
> 	flow->ip_proto = ip_proto;
>-	poff = proto_ports_offset(ip_proto);
>-	if (poff >= 0) {
>-		__be32 *ports, _ports;
>-
>-		ports = skb_header_pointer(skb, nhoff + poff,
>-					   sizeof(_ports), &_ports);
>-		if (ports)
>-			flow->ports = *ports;
>-	}
>-
>+	flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
> 	flow->thoff = (u16) nhoff;
>
> 	return true;
>-- 
>1.8.1.4
>

^ permalink raw reply

* Re: [PATCH net-next v5 2/3] bonding: modify the old and add new xmit hash policies
From: Veaceslav Falico @ 2013-10-02 12:56 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet
In-Reply-To: <1380713966-3891-3-git-send-email-nikolay@redhat.com>

On Wed, Oct 02, 2013 at 01:39:25PM +0200, Nikolay Aleksandrov wrote:
>This patch adds two new hash policy modes which use skb_flow_dissect:
>3 - Encapsulated layer 2+3
>4 - Encapsulated layer 3+4
>There should be a good improvement for tunnel users in those modes.
>It also changes the old hash functions to:
>hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>hash ^= (hash >> 16);
>hash ^= (hash >> 8);
>
>Where hash will be initialized either to L2 hash, that is
>SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
>from the upper layer. Flow's dst and src are also extracted based on the
>xmit policy either directly from the buffer or by using skb_flow_dissect,
>but in both cases if the protocol is IPv6 then dst and src are obtained by
>ipv6_addr_hash() on the real addresses. In case of a non-dissectable
>packet, the algorithms fall back to L2 hashing.
>The bond_set_mode_ops() function is now obsolete and thus deleted
>because it was used only to set the proper hash policy. Also we trim a
>pointer from struct bonding because we no longer need to keep the hash
>function, now there's only a single hash function - bond_xmit_hash that
>works based on bond->params.xmit_policy.
>
>The hash function and skb_flow_dissect were suggested by Eric Dumazet.
>The layer names were suggested by Andy Gospodarek, because I suck at
>semantics.
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Acked-by: Veaceslav Falico <vfalico@redhat.com>

>---
>v2: fix a bug in bond_flow_dissect which might've caused the use of
>    uninitalized flow_keys and make use of skb_flow_get_ports
>v3, v4: no change
>v5: re-base
>One line is intentionally left at 82 chars since it's the whole function
>and IMO looks better that way.
>
> drivers/net/bonding/bond_3ad.c   |   2 +-
> drivers/net/bonding/bond_main.c  | 197 ++++++++++++++-------------------------
> drivers/net/bonding/bond_sysfs.c |   2 -
> drivers/net/bonding/bonding.h    |   3 +-
> include/uapi/linux/if_bonding.h  |   2 +
> 5 files changed, 72 insertions(+), 134 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c62606a..ea3e64e 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2403,7 +2403,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
> 	first_ok_slave = NULL;
>
> 	bond_for_each_slave(bond, slave, iter) {
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index fe8a94f..dfb4f6d 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -78,6 +78,7 @@
> #include <net/netns/generic.h>
> #include <net/pkt_sched.h>
> #include <linux/rculist.h>
>+#include <net/flow_keys.h>
> #include "bonding.h"
> #include "bond_3ad.h"
> #include "bond_alb.h"
>@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
> module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
> 				   "0 for layer 2 (default), 1 for layer 3+4, "
>-				   "2 for layer 2+3");
>+				   "2 for layer 2+3, 3 for encap layer 2+3, "
>+				   "4 for encap layer 3+4");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
> {	"layer2",		BOND_XMIT_POLICY_LAYER2},
> {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
> {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
>+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
>+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
> {	NULL,			-1},
> };
>
>@@ -3035,99 +3039,85 @@ static struct notifier_block bond_netdev_notifier = {
>
> /*---------------------------- Hashing Policies -----------------------------*/
>
>-/*
>- * Hash for the output device based upon layer 2 data
>- */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+/* L2 hash helper */
>+static inline u32 bond_eth_hash(struct sk_buff *skb)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>
> 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>-		return (data->h_dest[5] ^ data->h_source[5]) % count;
>+		return data->h_dest[5] ^ data->h_source[5];
>
> 	return 0;
> }
>
>-/*
>- * Hash for the output device based upon layer 2 and layer 3 data. If
>- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>- */
>-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>+/* Extract the appropriate headers based on bond's xmit policy */
>+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>+			      struct flow_keys *fk)
> {
>-	const struct ethhdr *data;
>+	const struct ipv6hdr *iph6;
> 	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	u32 v6hash;
>-	const __be32 *s, *d;
>+	int noff, proto = -1;
>
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_network_may_pull(skb, sizeof(*iph))) {
>+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
>+		return skb_flow_dissect(skb, fk);
>+
>+	fk->ports = 0;
>+	noff = skb_network_offset(skb);
>+	if (skb->protocol == htons(ETH_P_IP)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
>+			return false;
> 		iph = ip_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>-			(data->h_dest[5] ^ data->h_source[5])) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>-	}
>-
>-	return bond_xmit_hash_policy_l2(skb, count);
>+		fk->src = iph->saddr;
>+		fk->dst = iph->daddr;
>+		noff += iph->ihl << 2;
>+		if (!ip_is_fragment(iph))
>+			proto = iph->protocol;
>+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
>+			return false;
>+		iph6 = ipv6_hdr(skb);
>+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
>+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
>+		noff += sizeof(*iph6);
>+		proto = iph6->nexthdr;
>+	} else {
>+		return false;
>+	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
>+		fk->ports = skb_flow_get_ports(skb, noff, proto);
>+
>+	return true;
> }
>
>-/*
>- * Hash for the output device based upon layer 3 and layer 4 data. If
>- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>+/**
>+ * bond_xmit_hash - generate a hash value based on the xmit policy
>+ * @bond: bonding device
>+ * @skb: buffer to use for headers
>+ * @count: modulo value
>+ *
>+ * This function will extract the necessary headers from the skb buffer and use
>+ * them to generate a hash based on the xmit_policy set in the bonding device
>+ * which will be reduced modulo count before returning.
>  */
>-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
> {
>-	u32 layer4_xor = 0;
>-	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	const __be32 *s, *d;
>-	const __be16 *l4 = NULL;
>-	__be16 _l4[2];
>-	int noff = skb_network_offset(skb);
>-	int poff;
>-
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
>-		iph = ip_hdr(skb);
>-		poff = proto_ports_offset(iph->protocol);
>+	struct flow_keys flow;
>+	u32 hash;
>
>-		if (!ip_is_fragment(iph) && poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		return (layer4_xor ^
>-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		poff = proto_ports_offset(ipv6h->nexthdr);
>-		if (poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
>-			       (layer4_xor >> 8);
>-		return layer4_xor % count;
>-	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>+	    !bond_flow_dissect(bond, skb, &flow))
>+		return bond_eth_hash(skb) % count;
>+
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
>+		hash = bond_eth_hash(skb);
>+	else
>+		hash = (__force u32)flow.ports;
>+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>+	hash ^= (hash >> 16);
>+	hash ^= (hash >> 8);
>
>-	return bond_xmit_hash_policy_l2(skb, count);
>+	return hash % count;
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>@@ -3721,8 +3711,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> 	return NETDEV_TX_OK;
> }
>
>-/*
>- * In bond_xmit_xor() , we determine the output device by using a pre-
>+/* In bond_xmit_xor() , we determine the output device by using a pre-
>  * determined xmit_hash_policy(), If the selected device is not enabled,
>  * find the next active slave.
>  */
>@@ -3730,8 +3719,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>
>-	bond_xmit_slave_id(bond, skb,
>-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
>+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
>
> 	return NETDEV_TX_OK;
> }
>@@ -3768,22 +3756,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
>
> /*------------------------- Device initialization ---------------------------*/
>
>-static void bond_set_xmit_hash_policy(struct bonding *bond)
>-{
>-	switch (bond->params.xmit_policy) {
>-	case BOND_XMIT_POLICY_LAYER23:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER34:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER2:
>-	default:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
>-		break;
>-	}
>-}
>-
> /*
>  * Lookup the slave that corresponds to a qid
>  */
>@@ -3894,38 +3866,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 	return ret;
> }
>
>-/*
>- * set bond mode specific net device operations
>- */
>-void bond_set_mode_ops(struct bonding *bond, int mode)
>-{
>-	struct net_device *bond_dev = bond->dev;
>-
>-	switch (mode) {
>-	case BOND_MODE_ROUNDROBIN:
>-		break;
>-	case BOND_MODE_ACTIVEBACKUP:
>-		break;
>-	case BOND_MODE_XOR:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_BROADCAST:
>-		break;
>-	case BOND_MODE_8023AD:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_ALB:
>-		/* FALLTHRU */
>-	case BOND_MODE_TLB:
>-		break;
>-	default:
>-		/* Should never happen, mode already checked */
>-		pr_err("%s: Error: Unknown bonding mode %d\n",
>-		       bond_dev->name, mode);
>-		break;
>-	}
>-}
>-
> static int bond_ethtool_get_settings(struct net_device *bond_dev,
> 				     struct ethtool_cmd *ecmd)
> {
>@@ -4027,7 +3967,6 @@ static void bond_setup(struct net_device *bond_dev)
> 	ether_setup(bond_dev);
> 	bond_dev->netdev_ops = &bond_netdev_ops;
> 	bond_dev->ethtool_ops = &bond_ethtool_ops;
>-	bond_set_mode_ops(bond, bond->params.mode);
>
> 	bond_dev->destructor = bond_destructor;
>
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index e06c644..e924952 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -318,7 +318,6 @@ static ssize_t bonding_store_mode(struct device *d,
> 	/* don't cache arp_validate between modes */
> 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
> 	bond->params.mode = new_value;
>-	bond_set_mode_ops(bond, bond->params.mode);
> 	pr_info("%s: setting mode to %s (%d).\n",
> 		bond->dev->name, bond_mode_tbl[new_value].modename,
> 		new_value);
>@@ -358,7 +357,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
> 		ret = -EINVAL;
> 	} else {
> 		bond->params.xmit_policy = new_value;
>-		bond_set_mode_ops(bond, bond->params.mode);
> 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
> 			bond->dev->name,
> 			xmit_hashtype_tbl[new_value].modename, new_value);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 9a26fbd..0bd04fb 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -217,7 +217,6 @@ struct bonding {
> 	char     proc_file_name[IFNAMSIZ];
> #endif /* CONFIG_PROC_FS */
> 	struct   list_head bond_list;
>-	int      (*xmit_hash_policy)(struct sk_buff *, int);
> 	u16      rr_tx_counter;
> 	struct   ad_bond_info ad_info;
> 	struct   alb_bond_info alb_info;
>@@ -409,7 +408,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
> void bond_mii_monitor(struct work_struct *);
> void bond_loadbalance_arp_mon(struct work_struct *);
> void bond_activebackup_arp_mon(struct work_struct *);
>-void bond_set_mode_ops(struct bonding *bond, int mode);
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
> int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
> void bond_select_active_slave(struct bonding *bond);
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
>diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
>index a17edda..9635a62 100644
>--- a/include/uapi/linux/if_bonding.h
>+++ b/include/uapi/linux/if_bonding.h
>@@ -91,6 +91,8 @@
> #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
>+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
>+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
>
> typedef struct ifbond {
> 	__s32 bond_mode;
>-- 
>1.8.1.4
>

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-02 13:03 UTC (permalink / raw)
  To: Eric Dumazet, Jiri Pirko, netdev, yoshfuji, davem, kuznet,
	jmorris, kaber, herbert
In-Reply-To: <20131002121207.GO10771@order.stressinduktion.org>

On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> Hi Eric!
> 
> On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > 
> > > 
> > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > well" which does the setting of gso_size.
> > 
> > Well, skb having frags or not should not be a concern :
> > Thats an allocation choice (lets say to avoid high order allocations). 
> > 
> > Setting gso_size is probably better.
> 
> e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> approach") states:
> 
> "
> skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> indicating that hardware has to do checksum calculation. Hardware should
> compute the UDP checksum of complete datagram and also ip header checksum of
> each fragmented IP packet.
> "
> 
> This is the reason why I tried not to update the gso_size. If it is ok, I am
> fine with that.

Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
mapping (skb->data with skb_headlen, which is fine) is used as the inband
header:

        if (offload_type == SKB_GSO_UDP)
                frg_cnt++; /* as Txd0 was used for inband header */

That is my only other hint that we maybe should not update gso_size and
gso_type. I guess software fallback does not have this problem, but I won't
have time to check until this evening.

I am really not sure if just setting gso_size does not break neterion UFO
offloading. :/

Greetings,

  Hannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox