[PATCH net-next v1 0/8] xen-netback: support toeplitz hashing

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
@ 2016-02-12 10:13 Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer Paul Durrant
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant

This patch series adds support for frontend-configurable toeplitz hashing
in xen-netback (on the guest receive side). This support has been testing
against a Windows frontend and has proven to be sufficient to pass the
Microsoft HCK NDIS RSS tests.

For convenience my development branch is available at http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=shortlog;h=refs/heads/rss18.

Patch #1 contains a small modification to struct sk_buff to use one more
bit's worth of space to enable full hasf information to be stored (as
opposed to simply whether a calculated hash is L4 or not).

Patch #2 re-imports the canonical netif.h from the Xen master branch at
git://xenbits.xen.org/xen.git. To minimize the diff it was post-processed
as detailed in the commit message.

Patch #3 fixes a short-coming in netback so that it can actually cope with
multiple extra_info fragments being passed from the frontend on the
guest transmit side.

Patch #4 is a trivial patch to reduce log spam from netback.

Patch #5 adds support for a new shared ring for control messages between
frontend and backend as detailed in the updated netif.h.

Patch #6 builds on patch #5 and adds support for messages passed from the
frontend to configure a toeplitz hash of packets on the guest receive side.

Patch #7 adds code to pass the hash calculated by netback to the frontend

Patch #8 adds code to take hashes calculated by the guest on its transmit
side and set the approprate values in the constructed socket buffer.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer...
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-14 22:25   ` Tom Herbert
  2016-02-12 10:13 ` [PATCH net-next v1 2/8] xen-netback: re-import canonical netif header Paul Durrant
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: Paul Durrant, David S. Miller, Jay Vosburgh, Veaceslav Falico,
	Andy Gospodarek

...rather than a boolean merely indicating a canonical L4 hash.

skb_set_hash() takes a hash type (from enum pkt_hash_types) as an
argument but information is lost since only a single bit in the skb
stores whether that hash type is PKT_HASH_TYPE_L4 or not.

By using two bits it's possible to store the complete hash type
information for use by drivers. A subsequent patch to xen-netback
needs this information to forward packet hashes to VM frontend drivers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
---
 drivers/net/bonding/bond_main.c |  2 +-
 include/linux/skbuff.h          | 53 ++++++++++++++++++++++++++++-------------
 include/net/flow_dissector.h    |  5 ++++
 include/net/sock.h              |  2 +-
 include/trace/events/net.h      |  2 +-
 net/core/flow_dissector.c       | 27 ++++++++++++++++-----
 6 files changed, 65 insertions(+), 26 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 45bdd87..ad0ee00 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3173,7 +3173,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	u32 hash;
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
+	    skb_has_l4_hash(skb))
 		return skb->hash;
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6ec86f1..9021b52 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -595,8 +595,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
  *	@xmit_more: More SKBs are pending for this queue
  *	@ndisc_nodetype: router type (from link layer)
  *	@ooo_okay: allow the mapping of a socket to a queue to be changed
- *	@l4_hash: indicate hash is a canonical 4-tuple hash over transport
- *		ports.
+ *	@hash_type: indicates type of hash (see enum pkt_hash_types below)
  *	@sw_hash: indicates hash was computed in software stack
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
@@ -701,10 +700,10 @@ struct sk_buff {
 	__u8			nf_trace:1;
 	__u8			ip_summed:2;
 	__u8			ooo_okay:1;
-	__u8			l4_hash:1;
 	__u8			sw_hash:1;
 	__u8			wifi_acked_valid:1;
 	__u8			wifi_acked:1;
+	/* 1 bit hole */
 
 	__u8			no_fcs:1;
 	/* Indicates the inner headers are valid in the skbuff. */
@@ -721,7 +720,8 @@ struct sk_buff {
 	__u8			ipvs_property:1;
 	__u8			inner_protocol_type:1;
 	__u8			remcsum_offload:1;
-	/* 3 or 5 bit hole */
+	__u8			hash_type:2;
+	/* 1 or 3 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
@@ -1030,19 +1030,35 @@ static inline void skb_clear_hash(struct sk_buff *skb)
 {
 	skb->hash = 0;
 	skb->sw_hash = 0;
-	skb->l4_hash = 0;
+	skb->hash_type = 0;
+}
+
+static inline enum pkt_hash_types skb_hash_type(struct sk_buff *skb)
+{
+	return skb->hash_type;
+}
+
+static inline bool skb_has_l4_hash(struct sk_buff *skb)
+{
+	return skb_hash_type(skb) == PKT_HASH_TYPE_L4;
+}
+
+static inline bool skb_has_sw_hash(struct sk_buff *skb)
+{
+	return !!skb->sw_hash;
 }
 
 static inline void skb_clear_hash_if_not_l4(struct sk_buff *skb)
 {
-	if (!skb->l4_hash)
+	if (!skb_has_l4_hash(skb))
 		skb_clear_hash(skb);
 }
 
 static inline void
-__skb_set_hash(struct sk_buff *skb, __u32 hash, bool is_sw, bool is_l4)
+__skb_set_hash(struct sk_buff *skb, __u32 hash, bool is_sw,
+	       enum pkt_hash_types type)
 {
-	skb->l4_hash = is_l4;
+	skb->hash_type = type;
 	skb->sw_hash = is_sw;
 	skb->hash = hash;
 }
@@ -1051,13 +1067,13 @@ static inline void
 skb_set_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types type)
 {
 	/* Used by drivers to set hash from HW */
-	__skb_set_hash(skb, hash, false, type == PKT_HASH_TYPE_L4);
+	__skb_set_hash(skb, hash, false, type);
 }
 
 static inline void
-__skb_set_sw_hash(struct sk_buff *skb, __u32 hash, bool is_l4)
+__skb_set_sw_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types type)
 {
-	__skb_set_hash(skb, hash, true, is_l4);
+	__skb_set_hash(skb, hash, true, type);
 }
 
 void __skb_get_hash(struct sk_buff *skb);
@@ -1110,9 +1126,10 @@ static inline bool skb_flow_dissect_flow_keys_buf(struct flow_keys *flow,
 				  data, proto, nhoff, hlen, flags);
 }
 
+
 static inline __u32 skb_get_hash(struct sk_buff *skb)
 {
-	if (!skb->l4_hash && !skb->sw_hash)
+	if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb))
 		__skb_get_hash(skb);
 
 	return skb->hash;
@@ -1122,11 +1139,12 @@ __u32 __skb_get_hash_flowi6(struct sk_buff *skb, const struct flowi6 *fl6);
 
 static inline __u32 skb_get_hash_flowi6(struct sk_buff *skb, const struct flowi6 *fl6)
 {
-	if (!skb->l4_hash && !skb->sw_hash) {
+	if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb)) {
 		struct flow_keys keys;
 		__u32 hash = __get_hash_from_flowi6(fl6, &keys);
 
-		__skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
+		__skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys) ?
+				  PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
 	}
 
 	return skb->hash;
@@ -1136,11 +1154,12 @@ __u32 __skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 *fl);
 
 static inline __u32 skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 *fl4)
 {
-	if (!skb->l4_hash && !skb->sw_hash) {
+	if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb)) {
 		struct flow_keys keys;
 		__u32 hash = __get_hash_from_flowi4(fl4, &keys);
 
-		__skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
+		__skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys) ?
+				  PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
 	}
 
 	return skb->hash;
@@ -1157,7 +1176,7 @@ static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
 {
 	to->hash = from->hash;
 	to->sw_hash = from->sw_hash;
-	to->l4_hash = from->l4_hash;
+	to->hash_type = from->hash_type;
 };
 
 static inline void skb_sender_cpu_clear(struct sk_buff *skb)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 8c8548c..418b8c5 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -182,6 +182,11 @@ static inline bool flow_keys_have_l4(struct flow_keys *keys)
 	return (keys->ports.ports || keys->tags.flow_label);
 }
 
+static inline bool flow_keys_have_l3(struct flow_keys *keys)
+{
+	return !!keys->control.addr_type;
+}
+
 u32 flow_hash_from_keys(struct flow_keys *keys);
 
 #endif
diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e0..21b8cdc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1828,7 +1828,7 @@ static inline void sock_poll_wait(struct file *filp,
 static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 {
 	if (sk->sk_txhash) {
-		skb->l4_hash = 1;
+		skb->hash_type = PKT_HASH_TYPE_L4;
 		skb->hash = sk->sk_txhash;
 	}
 }
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 49cc7c3..25e7979 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -180,7 +180,7 @@ DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
 		__entry->protocol = ntohs(skb->protocol);
 		__entry->ip_summed = skb->ip_summed;
 		__entry->hash = skb->hash;
-		__entry->l4_hash = skb->l4_hash;
+		__entry->l4_hash = skb->hash_type == PKT_HASH_TYPE_L4;
 		__entry->len = skb->len;
 		__entry->data_len = skb->data_len;
 		__entry->truesize = skb->truesize;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d79699c..956208b 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -658,17 +658,30 @@ EXPORT_SYMBOL(make_flow_keys_digest);
  *
  * This function calculates a flow hash based on src/dst addresses
  * and src/dst port numbers.  Sets hash in skb to non-zero hash value
- * on success, zero indicates no valid hash.  Also, sets l4_hash in skb
- * if hash is a canonical 4-tuple hash over transport ports.
+ * on success, zero indicates no valid hash in which case the hash type
+ * is set to NONE. If the hash is a canonical 4-tuple hash over transport
+ * ports then type is set to L4. If the hash did not include transport
+ * then type is set to L3, otherwise it is assumed to be L2 only.
  */
 void __skb_get_hash(struct sk_buff *skb)
 {
 	struct flow_keys keys;
+	u32 hash;
+	enum pkt_hash_types type;
 
 	__flow_hash_secret_init();
 
-	__skb_set_sw_hash(skb, ___skb_get_hash(skb, &keys, hashrnd),
-			  flow_keys_have_l4(&keys));
+	hash = ___skb_get_hash(skb, &keys, hashrnd);
+	if (hash == 0)
+		type = PKT_HASH_TYPE_NONE;
+	else if (flow_keys_have_l4(&keys))
+		type = PKT_HASH_TYPE_L4;
+	else if (flow_keys_have_l3(&keys))
+		type = PKT_HASH_TYPE_L3;
+	else
+		type = PKT_HASH_TYPE_L2;
+
+	__skb_set_sw_hash(skb, hash, type);
 }
 EXPORT_SYMBOL(__skb_get_hash);
 
@@ -698,7 +711,8 @@ __u32 __skb_get_hash_flowi6(struct sk_buff *skb, const struct flowi6 *fl6)
 	keys.basic.ip_proto = fl6->flowi6_proto;
 
 	__skb_set_sw_hash(skb, flow_hash_from_keys(&keys),
-			  flow_keys_have_l4(&keys));
+			  flow_keys_have_l4(&keys) ?
+			  PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
 
 	return skb->hash;
 }
@@ -719,7 +733,8 @@ __u32 __skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 *fl4)
 	keys.basic.ip_proto = fl4->flowi4_proto;
 
 	__skb_set_sw_hash(skb, flow_hash_from_keys(&keys),
-			  flow_keys_have_l4(&keys));
+			  flow_keys_have_l4(&keys) ?
+			  PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
 
 	return skb->hash;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 2/8] xen-netback: re-import canonical netif header
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 3/8] xen-netback: support multiple extra info fragments passed from frontend Paul Durrant
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel
  Cc: Paul Durrant, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	David Vrabel, Wei Liu

The canonical netif header (in the Xen source repo) and the Linux variant
have diverged significantly. Recently much documentation has been added to
the canonical header and new definitions and types to support packet hash
configuration. Subsequent patches in this series add support for packet
hash configuration in xen-netback so this patch re-imports the canonical
header in readiness.

To maintain compatibility and some style consistency with the old Linux
variant, the header was stripped of its emacs boilerplate, and
post-processed and copied into place with the following commands:

ed -s netif.h << EOF
H
,s/NETTXF_/XEN_NETTXF_/g
,s/NETRXF_/XEN_NETRXF_/g
,s/NETIF_RSP/XEN_NETIF_RSP/g
,s/NETIF_CTRL/XEN_NETIF_CTRL/g
,s/netif_tx/xen_netif_tx/g
,s/netif_rx/xen_netif_rx/g
,s/netif_ctrl/xen_netif_ctrl/g
,s/netif_extra_info/xen_netif_extra_info/g
$
w
EOF

indent --line-length 80 --linux-style netif.h \
-o include/xen/interface/io/netif.h

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 include/xen/interface/io/netif.h | 758 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 667 insertions(+), 91 deletions(-)

diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index 252ffd4..78e0815 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -3,14 +3,32 @@
  *
  * Unified network-device I/O interface for Xen guest OSes.
  *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
  * Copyright (c) 2003-2004, Keir Fraser
  */
 
 #ifndef __XEN_PUBLIC_IO_NETIF_H__
 #define __XEN_PUBLIC_IO_NETIF_H__
 
-#include <xen/interface/io/ring.h>
-#include <xen/interface/grant_table.h>
+#include "ring.h"
+#include "../grant_table.h"
 
 /*
  * Older implementation of Xen network frontend / backend has an
@@ -38,10 +56,10 @@
  * that it cannot safely queue packets (as it may not be kicked to send them).
  */
 
- /*
+/*
  * "feature-split-event-channels" is introduced to separate guest TX
- * and RX notificaion. Backend either doesn't support this feature or
- * advertise it via xenstore as 0 (disabled) or 1 (enabled).
+ * and RX notification. Backend either doesn't support this feature or
+ * advertises it via xenstore as 0 (disabled) or 1 (enabled).
  *
  * To make use of this feature, frontend should allocate two event
  * channels for TX and RX, advertise them to backend as
@@ -118,151 +136,709 @@
  */
 
 /*
- * This is the 'wire' format for packets:
- *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
- * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
- * [Request 3: xen_netif_extra_info]    (only if request 2 has XEN_NETIF_EXTRA_MORE)
- *  Request 4: xen_netif_tx_request  -- XEN_NETTXF_more_data
- *  Request 5: xen_netif_tx_request  -- XEN_NETTXF_more_data
+ * "feature-multicast-control" and "feature-dynamic-multicast-control"
+ * advertise the capability to filter ethernet multicast packets in the
+ * backend. If the frontend wishes to take advantage of this feature then
+ * it may set "request-multicast-control". If the backend only advertises
+ * "feature-multicast-control" then "request-multicast-control" must be set
+ * before the frontend moves into the connected state. The backend will
+ * sample the value on this state transition and any subsequent change in
+ * value will have no effect. However, if the backend also advertises
+ * "feature-dynamic-multicast-control" then "request-multicast-control"
+ * may be set by the frontend at any time. In this case, the backend will
+ * watch the value and re-sample on watch events.
+ *
+ * If the sampled value of "request-multicast-control" is set then the
+ * backend transmit side should no longer flood multicast packets to the
+ * frontend, it should instead drop any multicast packet that does not
+ * match in a filter list.
+ * The list is amended by the frontend by sending dummy transmit requests
+ * containing XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL} extra-info fragments as
+ * specified below.
+ * Note that the filter list may be amended even if the sampled value of
+ * "request-multicast-control" is not set, however the filter should only
+ * be applied if it is set.
+ */
+
+/*
+ * Control ring
+ * ============
+ *
+ * Some features, such as toeplitz hashing (detailed below), require a
+ * significant amount of out-of-band data to be passed from frontend to
+ * backend. Use of xenstore is not suitable for large quantities of data
+ * because of quota limitations and so a dedicated 'control ring' is used.
+ * The ability of the backend to use a control ring is advertised by
+ * setting:
+ *
+ * /local/domain/X/backend/<domid>/<vif>/feature-ctrl-ring = "1"
+ *
+ * The frontend provides a control ring to the backend by setting:
+ *
+ * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
+ * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>
+ *
+ * where <gref> is the grant reference of the shared page used to
+ * implement the control ring and <port> is an event channel to be used
+ * as a mailbox interrupt. These keys must be set before the frontend
+ * moves into the connected state.
+ *
+ * The control ring uses a fixed request/response message size and is
+ * balanced (i.e. one request to one response), so operationally it is much
+ * the same as a transmit or receive ring.
+ * Note that there is no requirement that responses are issued in the same
+ * order as requests.
+ */
+
+/*
+ * Toeplitz hash types
+ * ===================
+ *
+ * For the purposes of the definitions below, 'Packet[]' is an array of
+ * octets containing an IP packet without options, 'Array[X..Y]' means a
+ * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
+ * used to indicate concatenation of arrays.
+ */
+
+/*
+ * A hash calculated over an IP version 4 header as follows:
+ *
+ * Buffer[0..8] = Packet[12..15] (source address) +
+ *                Packet[16..19] (destination address)
+ *
+ * Result = ToeplitzHash(Buffer, 8)
+ */
+#define _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4     0
+#define XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4      (1 << _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4)
+
+/*
+ * A hash calculated over an IP version 4 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..12] = Packet[12..15] (source address) +
+ *                 Packet[16..19] (destination address) +
+ *                 Packet[20..21] (source port) +
+ *                 Packet[22..23] (destination port)
+ *
+ * Result = ToeplitzHash(Buffer, 12)
+ */
+#define _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP 1
+#define XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP  (1 << _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP)
+
+/*
+ * A hash calculated over an IP version 6 header as follows:
+ *
+ * Buffer[0..32] = Packet[8..23]  (source address ) +
+ *                 Packet[24..39] (destination address)
+ *
+ * Result = ToeplitzHash(Buffer, 32)
+ */
+#define _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6     2
+#define XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6      (1 << _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6)
+
+/*
+ * A hash calculated over an IP version 6 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..36] = Packet[8..23]  (source address) +
+ *                 Packet[24..39] (destination address) +
+ *                 Packet[40..41] (source port) +
+ *                 Packet[42..43] (destination port)
+ *
+ * Result = ToeplitzHash(Buffer, 36)
+ */
+#define _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP 3
+#define XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP  (1 << _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP)
+
+/*
+ * Control requests (xen_netif_ctrl_request_t)
+ * =======================================
+ *
+ * All requests have the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |    id     |   type    |         data[0]       |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |         data[1]       |         data[2]       |
+ * +-----+-----+-----+-----+-----------------------+
+ *
+ * id: the request identifier, echoed in response.
+ * type: the type of request (see below)
+ * data[]: any data associated with the request (determined by type)
+ */
+
+struct xen_netif_ctrl_request {
+	uint16_t id;
+	uint16_t type;
+
+#define XEN_NETIF_CTRL_TYPE_INVALID                    0
+#define XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS         1
+#define XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS         2
+#define XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY           3
+#define XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_MAPPING_ORDER 4
+#define XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING_ORDER 5
+#define XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING       6
+
+	uint32_t data[3];
+};
+typedef struct xen_netif_ctrl_request xen_netif_ctrl_request_t;
+
+/*
+ * Control responses (xen_netif_ctrl_response_t)
+ * =========================================
+ *
+ * All responses have the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |    id     |   type    |         status        |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |         data          |
+ * +-----+-----+-----+-----+
+ *
+ * id: the corresponding request identifier
+ * type: the type of the corresponding request
+ * status: the status of request processing
+ * data: any data associated with the response (determined by type and
+ *       status)
+ */
+
+struct xen_netif_ctrl_response {
+	uint16_t id;
+	uint16_t type;
+	uint32_t status;
+
+#define XEN_NETIF_CTRL_STATUS_SUCCESS           0
+#define XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED     1
+#define XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER 2
+#define XEN_NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3
+
+	uint32_t data;
+};
+typedef struct xen_netif_ctrl_response xen_netif_ctrl_response_t;
+
+/*
+ * Control messages
+ * ================
+ *
+ * XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS
+ * ----------------------------------
+ *
+ * This is sent by the frontend to query the types of toeplitz
+ * hash supported by the backend.
+ *
+ * Request:
+ *
+ *  type    = XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS
+ *  data[0] = 0
+ *  data[1] = 0
+ *  data[2] = 0
+ *
+ * Response:
+ *
+ *  status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED - Operation not supported
+ *           XEN_NETIF_CTRL_STATUS_SUCCESS       - Operation successful
+ *  data   = supported hash types (if operation was successful)
+ *
+ * XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS
+ * ----------------------------------
+ *
+ * This is sent by the frontend to set the types of toeplitz hash that
+ * the backend should calculate. (See above for hash type definitions).
+ * Note that the 'maximal' type of hash should always be chosen. For
+ * example, if the frontend sets both IPV4 and IPV4_TCP hash types then
+ * the latter hash type should be calculated for any TCP packet and the
+ * former only calculated for non-TCP packets.
+ *
+ * Request:
+ *
+ *  type    = XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS
+ *  data[0] = bitwise OR of XEN_NETIF_CTRL_TOEPLITZ_HASH_* values
+ *  data[1] = 0
+ *  data[2] = 0
+ *
+ * Response:
+ *
+ *  status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED     - Operation not supported
+ *           XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER - One or more flag value
+ *                                                 is invalid or
+ *                                                 unsupported
+ *           XEN_NETIF_CTRL_STATUS_SUCCESS           - Operation successful
+ *  data   = 0
+ *
+ * NOTE: Setting data[0] to zero disables toeplitz hashing and the backend
+ *       is free to choose how it steers packets to queues (which is the
+ *       default behaviour).
+ *
+ * XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY
+ * --------------------------------
+ *
+ * This is sent by the frontend to set the key of the toeplitz hash that
+ * the backend should calculate. The toeplitz algorithm is illustrated
+ * by the following pseudo-code:
+ *
+ * (Buffer[] and Key[] are treated as shift-registers where the MSB of
+ * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-1]
+ * is the 'right-most').
+ *
+ * Value = 0
+ * For number of bits in Buffer[]
+ *    If (left-most bit of Buffer[] is 1)
+ *        Value ^= left-most 32 bits of Key[]
+ *    Key[] << 1
+ *    Buffer[] << 1
+ *
+ * Request:
+ *
+ *  type    = XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY
+ *  data[0] = grant reference of page containing the key (assumed to
+ *            start at beginning of grant)
+ *  data[1] = size of key in octets
+ *  data[2] = 0
+ *
+ * Response:
+ *
+ *  status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED     - Operation not supported
+ *           XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER - Key size is invalid
+ *           XEN_NETIF_CTRL_STATUS_BUFFER_OVERFLOW   - Key size is larger than
+ *                                                 the backend supports
+ *           XEN_NETIF_CTRL_STATUS_SUCCESS           - Operation successful
+ *  data   = 0
+ *
+ * NOTE: Any key octets not specified are assumed to be zero (the key
+ *       is assumed to be empty by default) and specifying a new key
+ *       invalidates any previous key, hence specifying a key size of
+ *       zero will clear the key (which ensures that the calculated hash
+ *       will always be zero).
+ *       The maximum size of key is backend specific, but is also limited
+ *       by the single grant reference.
+ *       The grant reference may be read-only and must remain valid until
+ *       the response has been processed.
+ *
+ * XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_MAPPING_ORDER
+ * ------------------------------------------
+ *
+ * This is sent by the frontend to query the maximum order of mapping
+ * table supported by the backend. The order is specified in terms of
+ * table entries i.e. the table may contain up to 2^order entries.
+ *
+ * Request:
+ *
+ *  type    = XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_MAPPING_ORDER
+ *  data[0] = 0
+ *  data[1] = 0
+ *  data[2] = 0
+ *
+ * Response:
+ *
+ *  status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED - Operation not supported
+ *           XEN_NETIF_CTRL_STATUS_SUCCESS       - Operation successful
+ *  data   = maximum order of mapping table (if operation was successful)
+ *
+ * XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING_ORDER
+ * ------------------------------------------
+ *
+ * This is sent by the frontend to set the actual order of the mapping
+ * table to be used by the backend. The order is specified in terms of
+ * table entries i.e. the table will contain to 2^order entries.
+ * Any previous table is invalidated by this message and any new table
+ * is assumed to be zero filled. The default order is zero and hence the
+ * default table contains a single entry mapping all hashes to queue-0.
+ *
+ * Request:
+ *
+ *  type    = XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING_ORDER
+ *  data[0] = order of mapping table
+ *  data[1] = 0
+ *  data[2] = 0
+ *
+ * Response:
+ *
+ *  status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED     - Operation not supported
+ *           XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER - Table order is invalid
+ *           XEN_NETIF_CTRL_STATUS_SUCCESS           - Operation successful
+ *  data   = 0
+ *
+ * XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING
+ * ------------------------------------
+ *
+ * This is sent by the frontend to set the content of the table mapping
+ * toeplitz hash value to queue number. The backend should calculate the
+ * hash from the packet header, use it as an index into the table (modulo
+ * the size of the table) and then steer the packet to the queue number
+ * found at that index.
+ *
+ * Request:
+ *
+ *  type    = XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING
+ *  data[0] = grant reference of page containing the mapping (sub-)table
+ *            (assumed to start at beginning of grant)
+ *  data[1] = size of (sub-)table in entries
+ *  data[2] = offset, in entries, of sub-table within overall table
+ *
+ * Response:
+ *
+ *  status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED     - Operation not supported
+ *           XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER - Table size or content is
+ *                                                 invalid
+ *           XEN_NETIF_CTRL_STATUS_BUFFER_OVERFLOW   - Table size is larger than
+ *                                                 the backend supports
+ *           XEN_NETIF_CTRL_STATUS_SUCCESS           - Operation successful
+ *  data   = 0
+ *
+ * NOTE: The overall table has the following format:
+ *
+ *          0     1     2     3     4     5     6     7  octet
+ *       +-----+-----+-----+-----+-----+-----+-----+-----+
+ *       |       mapping[0]      |       mapping[1]      |
+ *       +-----+-----+-----+-----+-----+-----+-----+-----+
+ *       |                       .                       |
+ *       |                       .                       |
+ *       |                       .                       |
+ *       +-----+-----+-----+-----+-----+-----+-----+-----+
+ *       |      mapping[N-2]     |      mapping[N-1]     |
+ *       +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ *       where N == 2^order as specified by a
+ *       XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING_ORDER message and each
+ *       mapping must specifies a queue between 0 and
+ *       "multi-queue-num-queues" (see above).
+ *       The backend may support a mapping table larger than can be
+ *       mapped by a single grant reference. Thus sub-tables within a
+ *       larger table can be individually set by sending multiple messages
+ *       with differing offset values. Specifying a new sub-table does not
+ *       invalidate any table data outside that range.
+ *       The grant reference may be read-only and must remain valid until
+ *       the response has been processed.
+ */
+
+DEFINE_RING_TYPES(xen_netif_ctrl, struct xen_netif_ctrl_request,
+		  struct xen_netif_ctrl_response);
+
+/*
+ * Guest transmit
+ * ==============
+ *
+ * This is the 'wire' format for transmit (frontend -> backend) packets:
+ *
+ *  Fragment 1: xen_netif_tx_request_t  - flags = XEN_NETTXF_*
+ *                                    size = total packet size
+ * [Extra 1: xen_netif_extra_info_t]    - (only if fragment 1 flags include
+ *                                     XEN_NETTXF_extra_info)
+ *  ...
+ * [Extra N: xen_netif_extra_info_t]    - (only if extra N-1 flags include
+ *                                     XEN_NETIF_EXTRA_MORE)
+ *  ...
+ *  Fragment N: xen_netif_tx_request_t  - (only if fragment N-1 flags include
+ *                                     XEN_NETTXF_more_data - flags on preceding
+ *                                     extras are not relevent here)
+ *                                    flags = 0
+ *                                    size = fragment size
+ *
+ * NOTE:
+ *
+ * This format slightly is different from that used for receive
+ * (backend -> frontend) packets. Specifically, in a multi-fragment
+ * packet the actual size of fragment 1 can only be determined by
+ * subtracting the sizes of fragments 2..N from the total packet size.
+ *
+ * Ring slot size is 12 octets, however not all request/response
+ * structs use the full size.
+ *
+ * tx request data (xen_netif_tx_request_t)
+ * ------------------------------------
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | grant ref             | offset    | flags     |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | id        | size      |
+ * +-----+-----+-----+-----+
+ *
+ * grant ref: Reference to buffer page.
+ * offset: Offset within buffer page.
+ * flags: XEN_NETTXF_*.
+ * id: request identifier, echoed in response.
+ * size: packet size in bytes.
+ *
+ * tx response (xen_netif_tx_response_t)
+ * ---------------------------------
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | id        | status    | unused                |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | unused                |
+ * +-----+-----+-----+-----+
+ *
+ * id: reflects id in transmit request
+ * status: XEN_NETIF_RSP_*
+ *
+ * Guest receive
+ * =============
+ *
+ * This is the 'wire' format for receive (backend -> frontend) packets:
+ *
+ *  Fragment 1: xen_netif_rx_request_t  - flags = XEN_NETRXF_*
+ *                                    size = fragment size
+ * [Extra 1: xen_netif_extra_info_t]    - (only if fragment 1 flags include
+ *                                     XEN_NETRXF_extra_info)
+ *  ...
+ * [Extra N: xen_netif_extra_info_t]    - (only if extra N-1 flags include
+ *                                     XEN_NETIF_EXTRA_MORE)
  *  ...
- *  Request N: xen_netif_tx_request  -- 0
+ *  Fragment N: xen_netif_rx_request_t  - (only if fragment N-1 flags include
+ *                                     XEN_NETRXF_more_data - flags on preceding
+ *                                     extras are not relevent here)
+ *                                    flags = 0
+ *                                    size = fragment size
+ *
+ * NOTE:
+ *
+ * This format slightly is different from that used for transmit
+ * (frontend -> backend) packets. Specifically, in a multi-fragment
+ * packet the size of the packet can only be determined by summing the
+ * sizes of fragments 1..N.
+ *
+ * Ring slot size is 8 octets.
+ *
+ * rx request (xen_netif_rx_request_t)
+ * -------------------------------
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | id        | pad       | gref                  |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * id: request identifier, echoed in response.
+ * gref: reference to incoming granted frame.
+ *
+ * rx response (xen_netif_rx_response_t)
+ * ---------------------------------
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | id        | offset    | flags     | status    |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * id: reflects id in receive request
+ * offset: offset in page of start of received packet
+ * flags: XEN_NETRXF_*
+ * status: -ve: XEN_NETIF_RSP_*; +ve: Rx'ed pkt size.
+ *
+ * NOTE: Historically, to support GSO on the frontend receive side, Linux
+ *       netfront does not make use of the rx response id (because, as
+ *       described below, extra info structures overlay the id field).
+ *       Instead it assumes that responses always appear in the same ring
+ *       slot as their corresponding request. Thus, to maintain
+ *       compatibilty, backends must make sure this is the case.
+ *
+ * Extra Info
+ * ==========
+ *
+ * Can be present if initial request or response has NET{T,R}XF_extra_info,
+ * or previous extra request has XEN_NETIF_EXTRA_MORE.
+ *
+ * The struct therefore needs to fit into either a tx or rx slot and
+ * is therefore limited to 8 octets.
+ *
+ * NOTE: Because extra info data overlays the usual request/response
+ *       structures, there is no id information in the opposite direction.
+ *       So, if an extra info overlays an rx response the frontend can
+ *       assume that it is in the same ring slot as the request that was
+ *       consumed to make the slot available, and the backend must ensure
+ *       this assumption is true.
+ *
+ * extra info (xen_netif_extra_info_t)
+ * -------------------------------
+ *
+ * General format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |type |flags| type specific data                |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * | padding for tx        |
+ * +-----+-----+-----+-----+
+ *
+ * type: XEN_NETIF_EXTRA_TYPE_*
+ * flags: XEN_NETIF_EXTRA_FLAG_*
+ * padding for tx: present only in the tx case due to 8 octet limit
+ *                 from rx case. Not shown in type specific entries
+ *                 below.
+ *
+ * XEN_NETIF_EXTRA_TYPE_GSO:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |type |flags| size      |type | pad | features  |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * type: Must be XEN_NETIF_EXTRA_TYPE_GSO
+ * flags: XEN_NETIF_EXTRA_FLAG_*
+ * size: Maximum payload size of each segment. For example,
+ *       for TCP this is just the path MSS.
+ * type: XEN_NETIF_GSO_TYPE_*: This determines the protocol of
+ *       the packet and any extra features required to segment the
+ *       packet properly.
+ * features: EN_NETIF_GSO_FEAT_*: This specifies any extra GSO
+ *           features required to process this packet, such as ECN
+ *           support for TCPv4.
+ *
+ * XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL}:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |type |flags| addr                              |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * type: Must be XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL}
+ * flags: XEN_NETIF_EXTRA_FLAG_*
+ * addr: address to add/remove
+ *
+ * XEN_NETIF_EXTRA_TYPE_TOEPLITZ:
+ *
+ * A backend that supports teoplitz hashing is assumed to accept
+ * this type of extra info in transmit packets.
+ * A frontend that enables toeplitz hashing is assumed to accept
+ * this type of extra info in receive packets.
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |type |flags|htype| pad |LSB ---- value ---- MSB|
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * type: Must be XEN_NETIF_EXTRA_TYPE_TOEPLITZ
+ * flags: XEN_NETIF_EXTRA_FLAG_*
+ * htype: Hash type (one of _XEN_NETIF_CTRL_TOEPLITZ_HASH_* - see above)
+ * value: Hash value
  */
 
 /* Protocol checksum field is blank in the packet (hardware offload)? */
-#define _XEN_NETTXF_csum_blank		(0)
-#define  XEN_NETTXF_csum_blank		(1U<<_XEN_NETTXF_csum_blank)
+#define _XEN_NETTXF_csum_blank     (0)
+#define  XEN_NETTXF_csum_blank     (1U<<_XEN_NETTXF_csum_blank)
 
 /* Packet data has been validated against protocol checksum. */
-#define _XEN_NETTXF_data_validated	(1)
-#define  XEN_NETTXF_data_validated	(1U<<_XEN_NETTXF_data_validated)
+#define _XEN_NETTXF_data_validated (1)
+#define  XEN_NETTXF_data_validated (1U<<_XEN_NETTXF_data_validated)
 
 /* Packet continues in the next request descriptor. */
-#define _XEN_NETTXF_more_data		(2)
-#define  XEN_NETTXF_more_data		(1U<<_XEN_NETTXF_more_data)
+#define _XEN_NETTXF_more_data      (2)
+#define  XEN_NETTXF_more_data      (1U<<_XEN_NETTXF_more_data)
 
 /* Packet to be followed by extra descriptor(s). */
-#define _XEN_NETTXF_extra_info		(3)
-#define  XEN_NETTXF_extra_info		(1U<<_XEN_NETTXF_extra_info)
+#define _XEN_NETTXF_extra_info     (3)
+#define  XEN_NETTXF_extra_info     (1U<<_XEN_NETTXF_extra_info)
 
 #define XEN_NETIF_MAX_TX_SIZE 0xFFFF
 struct xen_netif_tx_request {
-    grant_ref_t gref;      /* Reference to buffer page */
-    uint16_t offset;       /* Offset within buffer page */
-    uint16_t flags;        /* XEN_NETTXF_* */
-    uint16_t id;           /* Echoed in response message. */
-    uint16_t size;         /* Packet size in bytes.       */
+	grant_ref_t gref;
+	uint16_t offset;
+	uint16_t flags;
+	uint16_t id;
+	uint16_t size;
 };
+typedef struct xen_netif_tx_request xen_netif_tx_request_t;
 
 /* Types of xen_netif_extra_info descriptors. */
-#define XEN_NETIF_EXTRA_TYPE_NONE	(0)  /* Never used - invalid */
-#define XEN_NETIF_EXTRA_TYPE_GSO	(1)  /* u.gso */
-#define XEN_NETIF_EXTRA_TYPE_MCAST_ADD	(2)  /* u.mcast */
-#define XEN_NETIF_EXTRA_TYPE_MCAST_DEL	(3)  /* u.mcast */
-#define XEN_NETIF_EXTRA_TYPE_MAX	(4)
+#define XEN_NETIF_EXTRA_TYPE_NONE      (0)	/* Never used - invalid */
+#define XEN_NETIF_EXTRA_TYPE_GSO       (1)	/* u.gso */
+#define XEN_NETIF_EXTRA_TYPE_MCAST_ADD (2)	/* u.mcast */
+#define XEN_NETIF_EXTRA_TYPE_MCAST_DEL (3)	/* u.mcast */
+#define XEN_NETIF_EXTRA_TYPE_TOEPLITZ  (4)	/* u.toeplitz */
+#define XEN_NETIF_EXTRA_TYPE_MAX       (5)
 
-/* xen_netif_extra_info flags. */
-#define _XEN_NETIF_EXTRA_FLAG_MORE	(0)
-#define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
+/* xen_netif_extra_info_t flags. */
+#define _XEN_NETIF_EXTRA_FLAG_MORE (0)
+#define XEN_NETIF_EXTRA_FLAG_MORE  (1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
 /* GSO types */
-#define XEN_NETIF_GSO_TYPE_NONE		(0)
-#define XEN_NETIF_GSO_TYPE_TCPV4	(1)
-#define XEN_NETIF_GSO_TYPE_TCPV6	(2)
+#define XEN_NETIF_GSO_TYPE_NONE         (0)
+#define XEN_NETIF_GSO_TYPE_TCPV4        (1)
+#define XEN_NETIF_GSO_TYPE_TCPV6        (2)
 
 /*
- * This structure needs to fit within both netif_tx_request and
- * netif_rx_response for compatibility.
+ * This structure needs to fit within both xen_netif_tx_request_t and
+ * xen_netif_rx_response_t for compatibility.
  */
 struct xen_netif_extra_info {
-	uint8_t type;  /* XEN_NETIF_EXTRA_TYPE_* */
-	uint8_t flags; /* XEN_NETIF_EXTRA_FLAG_* */
-
+	uint8_t type;
+	uint8_t flags;
 	union {
 		struct {
-			/*
-			 * Maximum payload size of each segment. For
-			 * example, for TCP this is just the path MSS.
-			 */
 			uint16_t size;
-
-			/*
-			 * GSO type. This determines the protocol of
-			 * the packet and any extra features required
-			 * to segment the packet properly.
-			 */
-			uint8_t type; /* XEN_NETIF_GSO_TYPE_* */
-
-			/* Future expansion. */
+			uint8_t type;
 			uint8_t pad;
-
-			/*
-			 * GSO features. This specifies any extra GSO
-			 * features required to process this packet,
-			 * such as ECN support for TCPv4.
-			 */
-			uint16_t features; /* XEN_NETIF_GSO_FEAT_* */
+			uint16_t features;
 		} gso;
-
 		struct {
-			uint8_t addr[6]; /* Address to add/remove. */
+			uint8_t addr[6];
 		} mcast;
-
+		struct {
+			uint8_t type;
+			uint8_t pad;
+			uint8_t value[4];
+		} toeplitz;
 		uint16_t pad[3];
 	} u;
 };
+typedef struct xen_netif_extra_info xen_netif_extra_info_t;
 
 struct xen_netif_tx_response {
 	uint16_t id;
-	int16_t  status;       /* XEN_NETIF_RSP_* */
+	int16_t status;
 };
+typedef struct xen_netif_tx_response xen_netif_tx_response_t;
 
 struct xen_netif_rx_request {
-	uint16_t    id;        /* Echoed in response message.        */
-	grant_ref_t gref;      /* Reference to incoming granted frame */
+	uint16_t id;		/* Echoed in response message.        */
+	uint16_t pad;
+	grant_ref_t gref;
 };
+typedef struct xen_netif_rx_request xen_netif_rx_request_t;
 
 /* Packet data has been validated against protocol checksum. */
-#define _XEN_NETRXF_data_validated	(0)
-#define  XEN_NETRXF_data_validated	(1U<<_XEN_NETRXF_data_validated)
+#define _XEN_NETRXF_data_validated (0)
+#define  XEN_NETRXF_data_validated (1U<<_XEN_NETRXF_data_validated)
 
 /* Protocol checksum field is blank in the packet (hardware offload)? */
-#define _XEN_NETRXF_csum_blank		(1)
-#define  XEN_NETRXF_csum_blank		(1U<<_XEN_NETRXF_csum_blank)
+#define _XEN_NETRXF_csum_blank     (1)
+#define  XEN_NETRXF_csum_blank     (1U<<_XEN_NETRXF_csum_blank)
 
 /* Packet continues in the next request descriptor. */
-#define _XEN_NETRXF_more_data		(2)
-#define  XEN_NETRXF_more_data		(1U<<_XEN_NETRXF_more_data)
+#define _XEN_NETRXF_more_data      (2)
+#define  XEN_NETRXF_more_data      (1U<<_XEN_NETRXF_more_data)
 
 /* Packet to be followed by extra descriptor(s). */
-#define _XEN_NETRXF_extra_info		(3)
-#define  XEN_NETRXF_extra_info		(1U<<_XEN_NETRXF_extra_info)
+#define _XEN_NETRXF_extra_info     (3)
+#define  XEN_NETRXF_extra_info     (1U<<_XEN_NETRXF_extra_info)
 
-/* GSO Prefix descriptor. */
-#define _XEN_NETRXF_gso_prefix		(4)
-#define  XEN_NETRXF_gso_prefix		(1U<<_XEN_NETRXF_gso_prefix)
+/* Packet has GSO prefix. Deprecated but included for compatibility */
+#define _XEN_NETRXF_gso_prefix     (4)
+#define  XEN_NETRXF_gso_prefix     (1U<<_XEN_NETRXF_gso_prefix)
 
 struct xen_netif_rx_response {
-    uint16_t id;
-    uint16_t offset;       /* Offset in page of start of received packet  */
-    uint16_t flags;        /* XEN_NETRXF_* */
-    int16_t  status;       /* -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
+	uint16_t id;
+	uint16_t offset;
+	uint16_t flags;
+	int16_t status;
 };
+typedef struct xen_netif_rx_response xen_netif_rx_response_t;
 
 /*
  * Generate netif ring structures and types.
  */
 
-DEFINE_RING_TYPES(xen_netif_tx,
-		  struct xen_netif_tx_request,
+DEFINE_RING_TYPES(xen_netif_tx, struct xen_netif_tx_request,
 		  struct xen_netif_tx_response);
-DEFINE_RING_TYPES(xen_netif_rx,
-		  struct xen_netif_rx_request,
+DEFINE_RING_TYPES(xen_netif_rx, struct xen_netif_rx_request,
 		  struct xen_netif_rx_response);
 
-#define XEN_NETIF_RSP_DROPPED	-2
-#define XEN_NETIF_RSP_ERROR	-1
-#define XEN_NETIF_RSP_OKAY	 0
-/* No response: used for auxiliary requests (e.g., xen_netif_extra_info). */
-#define XEN_NETIF_RSP_NULL	 1
+#define XEN_NETIF_RSP_DROPPED         -2
+#define XEN_NETIF_RSP_ERROR           -1
+#define XEN_NETIF_RSP_OKAY             0
+/* No response: used for auxiliary requests (e.g., xen_netif_extra_info_t). */
+#define XEN_NETIF_RSP_NULL             1
 
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 3/8] xen-netback: support multiple extra info fragments passed from frontend
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 2/8] xen-netback: re-import canonical netif header Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 4/8] xen-netback: reduce log spam Paul Durrant
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant, Ian Campbell, Wei Liu

The code does not currently support a frontend passing multiple extra info
fragments to the backend in a tx request. The xenvif_get_extras() function
handles multiple extra_info fragments but make_tx_response() assumes there
is only ever a single extra info fragment.

This patch modifies xenvif_get_extras() to pass back a count of extra
info fragments, which is then passed to make_tx_response() (after
possibly being stashed in pending_tx_info for deferred responses).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h  |  1 +
 drivers/net/xen-netback/netback.c | 65 +++++++++++++++++++++++++--------------
 2 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 1128252..f44b388 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -52,6 +52,7 @@ typedef unsigned int pending_ring_idx_t;
 
 struct pending_tx_info {
 	struct xen_netif_tx_request req; /* tx request */
+	unsigned int extra_count;
 	/* Callback data for released SKBs. The callback is always
 	 * xenvif_zerocopy_callback, desc contains the pending_idx, which is
 	 * also an index in pending_tx_info array. It is initialized in
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 61b97c3..b42f260 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -95,6 +95,7 @@ static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 
 static void make_tx_response(struct xenvif_queue *queue,
 			     struct xen_netif_tx_request *txp,
+			     unsigned int extra_count,
 			     s8       st);
 static void push_tx_responses(struct xenvif_queue *queue);
 
@@ -696,14 +697,15 @@ void xenvif_tx_credit_callback(unsigned long data)
 }
 
 static void xenvif_tx_err(struct xenvif_queue *queue,
-			  struct xen_netif_tx_request *txp, RING_IDX end)
+			  struct xen_netif_tx_request *txp,
+			  unsigned int extra_count, RING_IDX end)
 {
 	RING_IDX cons = queue->tx.req_cons;
 	unsigned long flags;
 
 	do {
 		spin_lock_irqsave(&queue->response_lock, flags);
-		make_tx_response(queue, txp, XEN_NETIF_RSP_ERROR);
+		make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR);
 		push_tx_responses(queue);
 		spin_unlock_irqrestore(&queue->response_lock, flags);
 		if (cons == end)
@@ -724,6 +726,7 @@ static void xenvif_fatal_tx_err(struct xenvif *vif)
 
 static int xenvif_count_requests(struct xenvif_queue *queue,
 				 struct xen_netif_tx_request *first,
+				 unsigned int extra_count,
 				 struct xen_netif_tx_request *txp,
 				 int work_to_do)
 {
@@ -812,7 +815,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 	} while (more_data);
 
 	if (drop_err) {
-		xenvif_tx_err(queue, first, cons + slots);
+		xenvif_tx_err(queue, first, extra_count, cons + slots);
 		return drop_err;
 	}
 
@@ -827,9 +830,10 @@ struct xenvif_tx_cb {
 #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
 
 static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
-					  u16 pending_idx,
-					  struct xen_netif_tx_request *txp,
-					  struct gnttab_map_grant_ref *mop)
+					   u16 pending_idx,
+					   struct xen_netif_tx_request *txp,
+					   unsigned int extra_count,
+					   struct gnttab_map_grant_ref *mop)
 {
 	queue->pages_to_map[mop-queue->tx_map_ops] = queue->mmap_pages[pending_idx];
 	gnttab_set_map_op(mop, idx_to_kaddr(queue, pending_idx),
@@ -838,6 +842,7 @@ static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
 
 	memcpy(&queue->pending_tx_info[pending_idx].req, txp,
 	       sizeof(*txp));
+	queue->pending_tx_info[pending_idx].extra_count = extra_count;
 }
 
 static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
@@ -880,7 +885,7 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 	     shinfo->nr_frags++, txp++, gop++) {
 		index = pending_index(queue->pending_cons++);
 		pending_idx = queue->pending_ring[index];
-		xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
+		xenvif_tx_create_map_op(queue, pending_idx, txp, 0, gop);
 		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
 
@@ -893,7 +898,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 		     shinfo->nr_frags++, txp++, gop++) {
 			index = pending_index(queue->pending_cons++);
 			pending_idx = queue->pending_ring[index];
-			xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
+			xenvif_tx_create_map_op(queue, pending_idx, txp, 0,
+						gop);
 			frag_set_pending_idx(&frags[shinfo->nr_frags],
 					     pending_idx);
 		}
@@ -1095,8 +1101,9 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 }
 
 static int xenvif_get_extras(struct xenvif_queue *queue,
-				struct xen_netif_extra_info *extras,
-				int work_to_do)
+			     struct xen_netif_extra_info *extras,
+			     unsigned int *extra_count,
+			     int work_to_do)
 {
 	struct xen_netif_extra_info extra;
 	RING_IDX cons = queue->tx.req_cons;
@@ -1109,9 +1116,12 @@ static int xenvif_get_extras(struct xenvif_queue *queue,
 		}
 
 		RING_COPY_REQUEST(&queue->tx, cons, &extra);
+
+		queue->tx.req_cons = ++cons;
+		(*extra_count)++;
+
 		if (unlikely(!extra.type ||
 			     extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
-			queue->tx.req_cons = ++cons;
 			netdev_err(queue->vif->dev,
 				   "Invalid extra type: %d\n", extra.type);
 			xenvif_fatal_tx_err(queue->vif);
@@ -1119,7 +1129,6 @@ static int xenvif_get_extras(struct xenvif_queue *queue,
 		}
 
 		memcpy(&extras[extra.type - 1], &extra, sizeof(extra));
-		queue->tx.req_cons = ++cons;
 	} while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
 	return work_to_do;
@@ -1294,6 +1303,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
+		unsigned int extra_count;
 		u16 pending_idx;
 		RING_IDX idx;
 		int work_to_do;
@@ -1330,8 +1340,10 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		queue->tx.req_cons = ++idx;
 
 		memset(extras, 0, sizeof(extras));
+		extra_count = 0;
 		if (txreq.flags & XEN_NETTXF_extra_info) {
 			work_to_do = xenvif_get_extras(queue, extras,
+						       &extra_count,
 						       work_to_do);
 			idx = queue->tx.req_cons;
 			if (unlikely(work_to_do < 0))
@@ -1344,7 +1356,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			extra = &extras[XEN_NETIF_EXTRA_TYPE_MCAST_ADD - 1];
 			ret = xenvif_mcast_add(queue->vif, extra->u.mcast.addr);
 
-			make_tx_response(queue, &txreq,
+			make_tx_response(queue, &txreq, extra_count,
 					 (ret == 0) ?
 					 XEN_NETIF_RSP_OKAY :
 					 XEN_NETIF_RSP_ERROR);
@@ -1358,12 +1370,14 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			extra = &extras[XEN_NETIF_EXTRA_TYPE_MCAST_DEL - 1];
 			xenvif_mcast_del(queue->vif, extra->u.mcast.addr);
 
-			make_tx_response(queue, &txreq, XEN_NETIF_RSP_OKAY);
+			make_tx_response(queue, &txreq, extra_count,
+					 XEN_NETIF_RSP_OKAY);
 			push_tx_responses(queue);
 			continue;
 		}
 
-		ret = xenvif_count_requests(queue, &txreq, txfrags, work_to_do);
+		ret = xenvif_count_requests(queue, &txreq, extra_count,
+					    txfrags, work_to_do);
 		if (unlikely(ret < 0))
 			break;
 
@@ -1372,7 +1386,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		if (unlikely(txreq.size < ETH_HLEN)) {
 			netdev_dbg(queue->vif->dev,
 				   "Bad packet size: %d\n", txreq.size);
-			xenvif_tx_err(queue, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, extra_count, idx);
 			break;
 		}
 
@@ -1397,7 +1411,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		if (unlikely(skb == NULL)) {
 			netdev_dbg(queue->vif->dev,
 				   "Can't allocate a skb in start_xmit.\n");
-			xenvif_tx_err(queue, &txreq, idx);
+			xenvif_tx_err(queue, &txreq, extra_count, idx);
 			break;
 		}
 
@@ -1416,7 +1430,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			nskb = xenvif_alloc_skb(0);
 			if (unlikely(nskb == NULL)) {
 				kfree_skb(skb);
-				xenvif_tx_err(queue, &txreq, idx);
+				xenvif_tx_err(queue, &txreq, extra_count, idx);
 				if (net_ratelimit())
 					netdev_err(queue->vif->dev,
 						   "Can't allocate the frag_list skb.\n");
@@ -1457,13 +1471,16 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		if (data_len < txreq.size) {
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     pending_idx);
-			xenvif_tx_create_map_op(queue, pending_idx, &txreq, gop);
+			xenvif_tx_create_map_op(queue, pending_idx, &txreq,
+						extra_count, gop);
 			gop++;
 		} else {
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     INVALID_PENDING_IDX);
-			memcpy(&queue->pending_tx_info[pending_idx].req, &txreq,
-			       sizeof(txreq));
+			memcpy(&queue->pending_tx_info[pending_idx].req,
+			       &txreq, sizeof(txreq));
+			queue->pending_tx_info[pending_idx].extra_count =
+				extra_count;
 		}
 
 		queue->pending_cons++;
@@ -1804,7 +1821,8 @@ static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 
 	spin_lock_irqsave(&queue->response_lock, flags);
 
-	make_tx_response(queue, &pending_tx_info->req, status);
+	make_tx_response(queue, &pending_tx_info->req,
+			 pending_tx_info->extra_count, status);
 
 	/* Release the pending index before pusing the Tx response so
 	 * its available before a new Tx request is pushed by the
@@ -1821,6 +1839,7 @@ static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 
 static void make_tx_response(struct xenvif_queue *queue,
 			     struct xen_netif_tx_request *txp,
+			     unsigned int extra_count,
 			     s8       st)
 {
 	RING_IDX i = queue->tx.rsp_prod_pvt;
@@ -1830,7 +1849,7 @@ static void make_tx_response(struct xenvif_queue *queue,
 	resp->id     = txp->id;
 	resp->status = st;
 
-	if (txp->flags & XEN_NETTXF_extra_info)
+	while (extra_count-- != 0)
 		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
 
 	queue->tx.rsp_prod_pvt = ++i;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 4/8] xen-netback: reduce log spam
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
                   ` (2 preceding siblings ...)
  2016-02-12 10:13 ` [PATCH net-next v1 3/8] xen-netback: support multiple extra info fragments passed from frontend Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 5/8] xen-netback: add support for the control ring Paul Durrant
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant, Ian Campbell, Wei Liu

Remove the "prepare for reconnect" pr_info in xenbus.c. It's largely
uninteresting and the states of the frontend and backend can easily be
observed by watching the (o)xenstored log.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/xenbus.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 39a303d..bd182cd 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -511,8 +511,6 @@ static void set_backend_state(struct backend_info *be,
 			switch (state) {
 			case XenbusStateInitWait:
 			case XenbusStateConnected:
-				pr_info("%s: prepare for reconnect\n",
-					be->dev->nodename);
 				backend_switch_state(be, XenbusStateInitWait);
 				break;
 			case XenbusStateClosing:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 5/8] xen-netback: add support for the control ring
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
                   ` (3 preceding siblings ...)
  2016-02-12 10:13 ` [PATCH net-next v1 4/8] xen-netback: reduce log spam Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-12 10:13 ` [PATCH net-next v1 6/8] xen-netback: add an implementation of toeplitz hashing Paul Durrant
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant, Ian Campbell, Wei Liu

My recent patch to include/xen/interface/io/netif.h defines a new shared
ring (in addition to the rx and tx rings) for passing control messages
from a VM frontend driver to a backend driver.

This patch adds the necessary code to xen-netback to map this new shared
ring, should it be created by a frontend, and process messages passed on
the ring. Note that, to avoid it becoming overly large, this patch does
not introduce an implementation of any of the control messages specified in
netif.h (that is deferred to a subsequent patch) and so all messages
elicit a 'not supported' response.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |  28 +++++++---
 drivers/net/xen-netback/interface.c | 101 +++++++++++++++++++++++++++++++++---
 drivers/net/xen-netback/netback.c   | 100 +++++++++++++++++++++++++++++++++--
 drivers/net/xen-netback/xenbus.c    |  75 ++++++++++++++++++++++----
 4 files changed, 274 insertions(+), 30 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index f44b388..093a12a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -260,6 +260,11 @@ struct xenvif {
 	struct dentry *xenvif_dbg_root;
 #endif
 
+	struct xen_netif_ctrl_back_ring ctrl;
+	struct task_struct *ctrl_task;
+	wait_queue_head_t ctrl_wq;
+	unsigned int ctrl_irq;
+
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
 };
@@ -285,10 +290,15 @@ struct xenvif *xenvif_alloc(struct device *parent,
 int xenvif_init_queue(struct xenvif_queue *queue);
 void xenvif_deinit_queue(struct xenvif_queue *queue);
 
-int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
-		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
-		   unsigned int rx_evtchn);
-void xenvif_disconnect(struct xenvif *vif);
+int xenvif_connect_data(struct xenvif_queue *queue,
+			unsigned long tx_ring_ref,
+			unsigned long rx_ring_ref,
+			unsigned int tx_evtchn,
+			unsigned int rx_evtchn);
+void xenvif_disconnect_data(struct xenvif *vif);
+int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t ring_ref,
+			unsigned int evtchn);
+void xenvif_disconnect_ctrl(struct xenvif *vif);
 void xenvif_free(struct xenvif *vif);
 
 int xenvif_xenbus_init(void);
@@ -300,10 +310,10 @@ int xenvif_queue_stopped(struct xenvif_queue *queue);
 void xenvif_wake_queue(struct xenvif_queue *queue);
 
 /* (Un)Map communication rings. */
-void xenvif_unmap_frontend_rings(struct xenvif_queue *queue);
-int xenvif_map_frontend_rings(struct xenvif_queue *queue,
-			      grant_ref_t tx_ring_ref,
-			      grant_ref_t rx_ring_ref);
+void xenvif_unmap_frontend_data_rings(struct xenvif_queue *queue);
+int xenvif_map_frontend_data_rings(struct xenvif_queue *queue,
+				   grant_ref_t tx_ring_ref,
+				   grant_ref_t rx_ring_ref);
 
 /* Check for SKBs from frontend and schedule backend processing */
 void xenvif_napi_schedule_or_enable_events(struct xenvif_queue *queue);
@@ -318,6 +328,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
 
 int xenvif_dealloc_kthread(void *data);
 
+int xenvif_ctrl_kthread(void *data);
+
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
 void xenvif_carrier_on(struct xenvif *vif);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index f5231a2..1850ebb 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -128,6 +128,15 @@ irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+irqreturn_t xenvif_ctrl_interrupt(int irq, void *dev_id)
+{
+	struct xenvif *vif = dev_id;
+
+	wake_up(&vif->ctrl_wq);
+
+	return IRQ_HANDLED;
+}
+
 int xenvif_queue_stopped(struct xenvif_queue *queue)
 {
 	struct net_device *dev = queue->vif->dev;
@@ -527,9 +536,66 @@ void xenvif_carrier_on(struct xenvif *vif)
 	rtnl_unlock();
 }
 
-int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
-		   unsigned long rx_ring_ref, unsigned int tx_evtchn,
-		   unsigned int rx_evtchn)
+int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t ring_ref,
+			unsigned int evtchn)
+{
+	struct net_device *dev = vif->dev;
+	void *addr;
+	struct xen_netif_ctrl_sring *shared;
+	struct task_struct *task;
+	int err = -ENOMEM;
+
+	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
+				     &ring_ref, 1, &addr);
+	if (err)
+		goto err;
+
+	shared = (struct xen_netif_ctrl_sring *)addr;
+	BACK_RING_INIT(&vif->ctrl, shared, XEN_PAGE_SIZE);
+
+	init_waitqueue_head(&vif->ctrl_wq);
+
+	err = bind_interdomain_evtchn_to_irqhandler(vif->domid, evtchn,
+						    xenvif_ctrl_interrupt,
+						    0, dev->name, vif);
+	if (err < 0)
+		goto err_unmap;
+
+	vif->ctrl_irq = err;
+
+	task = kthread_create(xenvif_ctrl_kthread, (void *)vif,
+			      "%s-control", dev->name);
+	if (IS_ERR(task)) {
+		pr_warn("Could not allocate kthread for %s\n", dev->name);
+		err = PTR_ERR(task);
+		goto err_unbind;
+	}
+
+	get_task_struct(task);
+	vif->ctrl_task = task;
+
+	wake_up_process(vif->ctrl_task);
+
+	return 0;
+
+err_unbind:
+	unbind_from_irqhandler(vif->ctrl_irq, vif);
+	vif->ctrl_irq = 0;
+
+err_unmap:
+	xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
+				vif->ctrl.sring);
+	vif->ctrl.sring = NULL;
+
+err:
+	return err;
+}
+
+int xenvif_connect_data(struct xenvif_queue *queue,
+			unsigned long tx_ring_ref,
+			unsigned long rx_ring_ref,
+			unsigned int tx_evtchn,
+			unsigned int rx_evtchn)
 {
 	struct task_struct *task;
 	int err = -ENOMEM;
@@ -538,7 +604,8 @@ int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
 	BUG_ON(queue->task);
 	BUG_ON(queue->dealloc_task);
 
-	err = xenvif_map_frontend_rings(queue, tx_ring_ref, rx_ring_ref);
+	err = xenvif_map_frontend_data_rings(queue, tx_ring_ref,
+					     rx_ring_ref);
 	if (err < 0)
 		goto err;
 
@@ -614,7 +681,7 @@ err_tx_unbind:
 	unbind_from_irqhandler(queue->tx_irq, queue);
 	queue->tx_irq = 0;
 err_unmap:
-	xenvif_unmap_frontend_rings(queue);
+	xenvif_unmap_frontend_data_rings(queue);
 	netif_napi_del(&queue->napi);
 err:
 	module_put(THIS_MODULE);
@@ -634,7 +701,7 @@ void xenvif_carrier_off(struct xenvif *vif)
 	rtnl_unlock();
 }
 
-void xenvif_disconnect(struct xenvif *vif)
+void xenvif_disconnect_data(struct xenvif *vif)
 {
 	struct xenvif_queue *queue = NULL;
 	unsigned int num_queues = vif->num_queues;
@@ -668,12 +735,32 @@ void xenvif_disconnect(struct xenvif *vif)
 			queue->tx_irq = 0;
 		}
 
-		xenvif_unmap_frontend_rings(queue);
+		xenvif_unmap_frontend_data_rings(queue);
 	}
 
 	xenvif_mcast_addr_list_free(vif);
 }
 
+void xenvif_disconnect_ctrl(struct xenvif *vif)
+{
+	if (vif->ctrl_task) {
+		kthread_stop(vif->ctrl_task);
+		put_task_struct(vif->ctrl_task);
+		vif->ctrl_task = NULL;
+	}
+
+	if (vif->ctrl_irq) {
+		unbind_from_irqhandler(vif->ctrl_irq, vif);
+		vif->ctrl_irq = 0;
+	}
+
+	if (vif->ctrl.sring) {
+		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
+					vif->ctrl.sring);
+		vif->ctrl.sring = NULL;
+	}
+}
+
 /* Reverse the relevant parts of xenvif_init_queue().
  * Used for queue teardown from xenvif_free(), and on the
  * error handling paths in xenbus.c:connect().
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index b42f260..a1f1a38 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1925,7 +1925,7 @@ static inline bool tx_dealloc_work_todo(struct xenvif_queue *queue)
 	return queue->dealloc_cons != queue->dealloc_prod;
 }
 
-void xenvif_unmap_frontend_rings(struct xenvif_queue *queue)
+void xenvif_unmap_frontend_data_rings(struct xenvif_queue *queue)
 {
 	if (queue->tx.sring)
 		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(queue->vif),
@@ -1935,9 +1935,9 @@ void xenvif_unmap_frontend_rings(struct xenvif_queue *queue)
 					queue->rx.sring);
 }
 
-int xenvif_map_frontend_rings(struct xenvif_queue *queue,
-			      grant_ref_t tx_ring_ref,
-			      grant_ref_t rx_ring_ref)
+int xenvif_map_frontend_data_rings(struct xenvif_queue *queue,
+				   grant_ref_t tx_ring_ref,
+				   grant_ref_t rx_ring_ref)
 {
 	void *addr;
 	struct xen_netif_tx_sring *txs;
@@ -1964,7 +1964,7 @@ int xenvif_map_frontend_rings(struct xenvif_queue *queue,
 	return 0;
 
 err:
-	xenvif_unmap_frontend_rings(queue);
+	xenvif_unmap_frontend_data_rings(queue);
 	return err;
 }
 
@@ -2163,6 +2163,96 @@ int xenvif_dealloc_kthread(void *data)
 	return 0;
 }
 
+static void make_ctrl_response(struct xenvif *vif,
+			       const struct xen_netif_ctrl_request *req,
+			       u32 status, u32 data)
+{
+	RING_IDX idx = vif->ctrl.rsp_prod_pvt;
+	struct xen_netif_ctrl_response rsp = {
+		.id = req->id,
+		.type = req->type,
+		.status = status,
+		.data = data,
+	};
+
+	*RING_GET_RESPONSE(&vif->ctrl, idx) = rsp;
+	vif->ctrl.rsp_prod_pvt = ++idx;
+}
+
+static void push_ctrl_response(struct xenvif *vif)
+{
+	int notify;
+
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->ctrl, notify);
+	if (notify)
+		notify_remote_via_irq(vif->ctrl_irq);
+}
+
+static void process_ctrl_request(struct xenvif *vif,
+				 const struct xen_netif_ctrl_request *req)
+{
+	/* There is no support for control requests yet. */
+	make_ctrl_response(vif, req,
+			   XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED, 0);
+	push_ctrl_response(vif);
+}
+
+static void xenvif_ctrl_action(struct xenvif *vif)
+{
+	for (;;) {
+		RING_IDX req_prod, req_cons;
+
+		req_prod = vif->ctrl.sring->req_prod;
+		req_cons = vif->ctrl.req_cons;
+
+		/* Make sure we can see requests before we process them. */
+		rmb();
+
+		if (req_cons == req_prod)
+			break;
+
+		while (req_cons != req_prod) {
+			struct xen_netif_ctrl_request req;
+
+			RING_COPY_REQUEST(&vif->ctrl, req_cons, &req);
+			req_cons++;
+
+			process_ctrl_request(vif, &req);
+		}
+
+		vif->ctrl.req_cons = req_cons;
+		vif->ctrl.sring->req_event = req_cons + 1;
+	}
+}
+
+static bool xenvif_ctrl_work_todo(struct xenvif *vif)
+{
+	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->ctrl)))
+		return 1;
+
+	return 0;
+}
+
+int xenvif_ctrl_kthread(void *data)
+{
+	struct xenvif *vif = data;
+
+	for (;;) {
+		wait_event_interruptible(vif->ctrl_wq,
+					 xenvif_ctrl_work_todo(vif) ||
+					 kthread_should_stop());
+		if (kthread_should_stop())
+			break;
+
+		while (xenvif_ctrl_work_todo(vif))
+			xenvif_ctrl_action(vif);
+
+		cond_resched();
+	}
+
+	return 0;
+}
+
 static int __init netback_init(void)
 {
 	int rc = 0;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index bd182cd..278f67b 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -38,7 +38,8 @@ struct backend_info {
 	const char *hotplug_script;
 };
 
-static int connect_rings(struct backend_info *be, struct xenvif_queue *queue);
+static int connect_data_rings(struct backend_info *be,
+			      struct xenvif_queue *queue);
 static void connect(struct backend_info *be);
 static int read_xenbus_vif_flags(struct backend_info *be);
 static int backend_create_xenvif(struct backend_info *be);
@@ -367,6 +368,12 @@ static int netback_probe(struct xenbus_device *dev,
 	if (err)
 		pr_debug("Error writing multi-queue-max-queues\n");
 
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "feature-ctrl-ring",
+			    "%u", true);
+	if (err)
+		pr_debug("Error writing feature-ctrl-ring\n");
+
 	script = xenbus_read(XBT_NIL, dev->nodename, "script", NULL);
 	if (IS_ERR(script)) {
 		err = PTR_ERR(script);
@@ -457,7 +464,8 @@ static void backend_disconnect(struct backend_info *be)
 #ifdef CONFIG_DEBUG_FS
 		xenvif_debugfs_delif(be->vif);
 #endif /* CONFIG_DEBUG_FS */
-		xenvif_disconnect(be->vif);
+		xenvif_disconnect_data(be->vif);
+		xenvif_disconnect_ctrl(be->vif);
 	}
 }
 
@@ -825,6 +833,44 @@ static void hotplug_status_changed(struct xenbus_watch *watch,
 	kfree(str);
 }
 
+static int connect_ctrl_ring(struct backend_info *be)
+{
+	struct xenbus_device *dev = be->dev;
+	struct xenvif *vif = be->vif;
+	unsigned int val;
+	grant_ref_t ring_ref;
+	unsigned int evtchn;
+	int err;
+
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "ctrl-ring-ref", "%u", &val, NULL);
+	if (err)
+		return 0; /* The frontend does not have a control ring */
+
+	ring_ref = val;
+
+	err = xenbus_gather(XBT_NIL, dev->otherend,
+			    "event-channel-ctrl", "%u", &val, NULL);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "reading %s/event-channel-ctrl",
+				 dev->otherend);
+		return err;
+	}
+
+	evtchn = val;
+
+	err = xenvif_connect_ctrl(vif, ring_ref, evtchn);
+	if (err) {
+		xenbus_dev_fatal(dev, err,
+				 "mapping shared-frame %u port %u",
+				 ring_ref, evtchn);
+		return err;
+	}
+
+	return 0;
+}
+
 static void connect(struct backend_info *be)
 {
 	int err;
@@ -861,6 +907,12 @@ static void connect(struct backend_info *be)
 	xen_register_watchers(dev, be->vif);
 	read_xenbus_vif_flags(be);
 
+	err = connect_ctrl_ring(be);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "connecting control ring");
+		return;
+	}
+
 	/* Use the number of queues requested by the frontend */
 	be->vif->queues = vzalloc(requested_num_queues *
 				  sizeof(struct xenvif_queue));
@@ -896,11 +948,12 @@ static void connect(struct backend_info *be)
 		queue->remaining_credit = credit_bytes;
 		queue->credit_usec = credit_usec;
 
-		err = connect_rings(be, queue);
+		err = connect_data_rings(be, queue);
 		if (err) {
-			/* connect_rings() cleans up after itself on failure,
-			 * but we need to clean up after xenvif_init_queue() here,
-			 * and also clean up any previously initialised queues.
+			/* connect_data_rings() cleans up after itself on
+			 * failure, but we need to clean up after
+			 * xenvif_init_queue() here, and also clean up any
+			 * previously initialised queues.
 			 */
 			xenvif_deinit_queue(queue);
 			be->vif->num_queues = queue_index;
@@ -935,15 +988,17 @@ static void connect(struct backend_info *be)
 
 err:
 	if (be->vif->num_queues > 0)
-		xenvif_disconnect(be->vif); /* Clean up existing queues */
+		xenvif_disconnect_data(be->vif); /* Clean up existing queues */
 	vfree(be->vif->queues);
 	be->vif->queues = NULL;
 	be->vif->num_queues = 0;
+	xenvif_disconnect_ctrl(be->vif);
 	return;
 }
 
 
-static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
+static int connect_data_rings(struct backend_info *be,
+			      struct xenvif_queue *queue)
 {
 	struct xenbus_device *dev = be->dev;
 	unsigned int num_queues = queue->vif->num_queues;
@@ -1007,8 +1062,8 @@ static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
 	}
 
 	/* Map the shared frame, irq etc. */
-	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
-			     tx_evtchn, rx_evtchn);
+	err = xenvif_connect_data(queue, tx_ring_ref, rx_ring_ref,
+				  tx_evtchn, rx_evtchn);
 	if (err) {
 		xenbus_dev_fatal(dev, err,
 				 "mapping shared-frames %lu/%lu port tx %u rx %u",
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 6/8] xen-netback: add an implementation of toeplitz hashing...
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
                   ` (4 preceding siblings ...)
  2016-02-12 10:13 ` [PATCH net-next v1 5/8] xen-netback: add support for the control ring Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-14 22:13   ` Tom Herbert
  2016-02-12 10:13 ` [PATCH net-next v1 7/8] xen-netback: pass toeplitz hash value to the frontend Paul Durrant
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant, Ian Campbell, Wei Liu

...for receive-side packets.

My recent patch to include/xen/interface/io/netif.h defines a set of
control messages that can be used by a VM frontend driver to configure
toeplitz hashing of receive-side packets and consequent steering of those
packets to particular queues.

This patch introduces an implementation of toeplitz hashing and into
xen-netback and allows it to be configured using the new control messages.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |  13 ++++
 drivers/net/xen-netback/interface.c | 149 ++++++++++++++++++++++++++++++++++++
 drivers/net/xen-netback/netback.c   | 128 ++++++++++++++++++++++++++++++-
 3 files changed, 287 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 093a12a..6687702 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -220,6 +220,12 @@ struct xenvif_mcast_addr {
 
 #define XEN_NETBK_MCAST_MAX 64
 
+#define XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE 40
+
+#define XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER 7
+#define XEN_NETBK_MAX_TOEPLITZ_MAPPING_SIZE \
+	BIT(XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER)
+
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
@@ -251,6 +257,13 @@ struct xenvif {
 	unsigned int num_queues; /* active queues, resource allocated */
 	unsigned int stalled_queues;
 
+	struct {
+		u32 flags;
+		u8 key[XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE];
+		u32 mapping[XEN_NETBK_MAX_TOEPLITZ_MAPPING_SIZE];
+		unsigned int order;
+	} toeplitz;
+
 	struct xenbus_watch credit_watch;
 	struct xenbus_watch mcast_ctrl_watch;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1850ebb..230afde 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -1,3 +1,4 @@
+
 /*
  * Network-device interface management.
  *
@@ -151,6 +152,153 @@ void xenvif_wake_queue(struct xenvif_queue *queue)
 	netif_tx_wake_queue(netdev_get_tx_queue(dev, id));
 }
 
+static u32 toeplitz_hash(const u8 *k, unsigned int klen,
+			 const u8 *d, unsigned int dlen)
+{
+	unsigned int di, ki;
+	u64 prefix = 0;
+	u64 hash = 0;
+
+	/* Pre-load prefix with the first 8 bytes of the key */
+	for (ki = 0; ki < 8; ki++) {
+		prefix <<= 8;
+		prefix |= (ki < klen) ? k[ki] : 0;
+	}
+
+	for (di = 0; di < dlen; di++) {
+		u8 byte = d[di];
+		unsigned int bit;
+
+		for (bit = 0x80; bit != 0; bit >>= 1) {
+			if (byte & bit)
+				hash ^= prefix;
+			prefix <<= 1;
+		}
+
+		/* prefix has now been left-shifted by 8, so OR in
+		 * the next byte.
+		 */
+		prefix |= (ki < klen) ? k[ki] : 0;
+		ki++;
+	}
+
+	/* The valid part of the hash is in the upper 32 bits. */
+	return hash >> 32;
+}
+
+static void xenvif_set_toeplitz_hash(struct xenvif *vif, struct sk_buff *skb)
+{
+	struct flow_keys flow;
+	u32 hash = 0;
+	enum pkt_hash_types type = PKT_HASH_TYPE_NONE;
+	const u8 *key = vif->toeplitz.key;
+	u32 flags = vif->toeplitz.flags;
+	const unsigned int len = XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE;
+	bool has_tcp_hdr;
+
+	/* Quick rejection test: If the network protocol doesn't
+	 * correspond to any enabled hash type then there's no point
+	 * in parsing the packet header.
+	 */
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (flags & (XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP |
+			     XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4))
+			break;
+
+		goto done;
+
+	case htons(ETH_P_IPV6):
+		if (flags & (XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP |
+			     XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6))
+			break;
+
+		goto done;
+
+	default:
+		goto done;
+	}
+
+	memset(&flow, 0, sizeof(flow));
+	if (!skb_flow_dissect_flow_keys(skb, &flow, 0))
+		goto done;
+
+	has_tcp_hdr = (flow.basic.ip_proto == IPPROTO_TCP) &&
+		      !(flow.control.flags & FLOW_DIS_IS_FRAGMENT);
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (has_tcp_hdr &&
+		    (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP)) {
+			u8 data[12];
+
+			memcpy(&data[0], &flow.addrs.v4addrs.src, 4);
+			memcpy(&data[4], &flow.addrs.v4addrs.dst, 4);
+			memcpy(&data[8], &flow.ports.src, 2);
+			memcpy(&data[10], &flow.ports.dst, 2);
+
+			hash = toeplitz_hash(key, len,
+					     data, sizeof(data));
+			type = PKT_HASH_TYPE_L4;
+		} else if (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4) {
+			u8 data[8];
+
+			memcpy(&data[0], &flow.addrs.v4addrs.src, 4);
+			memcpy(&data[4], &flow.addrs.v4addrs.dst, 4);
+
+			hash = toeplitz_hash(key, len,
+					     data, sizeof(data));
+			type = PKT_HASH_TYPE_L3;
+		}
+
+		break;
+
+	case htons(ETH_P_IPV6):
+		if (has_tcp_hdr &&
+		    (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP)) {
+			u8 data[36];
+
+			memcpy(&data[0], &flow.addrs.v6addrs.src, 16);
+			memcpy(&data[16], &flow.addrs.v6addrs.dst, 16);
+			memcpy(&data[32], &flow.ports.src, 2);
+			memcpy(&data[34], &flow.ports.dst, 2);
+
+			hash = toeplitz_hash(key, len,
+					     data, sizeof(data));
+			type = PKT_HASH_TYPE_L4;
+		} else if (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6) {
+			u8 data[32];
+
+			memcpy(&data[0], &flow.addrs.v6addrs.src, 16);
+			memcpy(&data[16], &flow.addrs.v6addrs.dst, 16);
+
+			hash = toeplitz_hash(key, len,
+					     data, sizeof(data));
+			type = PKT_HASH_TYPE_L3;
+		}
+
+		break;
+	}
+
+done:
+	skb_set_hash(skb, hash, type);
+}
+
+static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
+			       void *accel_priv,
+			       select_queue_fallback_t fallback)
+{
+	struct xenvif *vif = netdev_priv(dev);
+	unsigned int mask = (1u << vif->toeplitz.order) - 1;
+
+	if (vif->toeplitz.flags == 0)
+		return fallback(dev, skb) % dev->real_num_tx_queues;
+
+	xenvif_set_toeplitz_hash(vif, skb);
+
+	return vif->toeplitz.mapping[skb_get_hash_raw(skb) & mask];
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
@@ -395,6 +543,7 @@ static const struct ethtool_ops xenvif_ethtool_ops = {
 };
 
 static const struct net_device_ops xenvif_netdev_ops = {
+	.ndo_select_queue = xenvif_select_queue,
 	.ndo_start_xmit	= xenvif_start_xmit,
 	.ndo_get_stats	= xenvif_get_stats,
 	.ndo_open	= xenvif_open,
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a1f1a38..41ec7e9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -2163,6 +2163,89 @@ int xenvif_dealloc_kthread(void *data)
 	return 0;
 }
 
+static u32 xenvif_set_toeplitz_flags(struct xenvif *vif, u32 flags)
+{
+	if (flags & ~(XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4 |
+		      XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP |
+		      XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6 |
+		      XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP))
+		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	vif->toeplitz.flags = flags;
+
+	return XEN_NETIF_CTRL_STATUS_SUCCESS;
+}
+
+static u32 xenvif_set_toeplitz_key(struct xenvif *vif, u32 gref, u32 len)
+{
+	u8 *key = vif->toeplitz.key;
+	struct gnttab_copy copy_op = {
+		.source.u.ref = gref,
+		.source.domid = vif->domid,
+		.dest.u.gmfn = virt_to_gfn(key),
+		.dest.domid = DOMID_SELF,
+		.dest.offset = xen_offset_in_page(key),
+		.len = len,
+		.flags = GNTCOPY_source_gref
+	};
+
+	if (len > XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE)
+		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	gnttab_batch_copy(&copy_op, 1);
+
+	if (copy_op.status != GNTST_okay)
+		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	/* Clear any remaining key octets */
+	if (len < XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE)
+		memset(key + len, 0, XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE - len);
+
+	return XEN_NETIF_CTRL_STATUS_SUCCESS;
+}
+
+static u32 xenvif_set_toeplitz_mapping_order(struct xenvif *vif,
+					     u32 order)
+{
+	if (order > XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER)
+		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	vif->toeplitz.order = order;
+	memset(vif->toeplitz.mapping, 0, sizeof(u32) << order);
+
+	return XEN_NETIF_CTRL_STATUS_SUCCESS;
+}
+
+static u32 xenvif_set_toeplitz_mapping(struct xenvif *vif, u32 gref,
+				       u32 len, u32 off)
+{
+	u32 *mapping = &vif->toeplitz.mapping[off];
+	struct gnttab_copy copy_op = {
+		.source.u.ref = gref,
+		.source.domid = vif->domid,
+		.dest.u.gmfn = virt_to_gfn(mapping),
+		.dest.domid = DOMID_SELF,
+		.dest.offset = xen_offset_in_page(mapping),
+		.len = len * sizeof(u32),
+		.flags = GNTCOPY_source_gref
+	};
+
+	if ((off + len > (1u << vif->toeplitz.order)) ||
+	    copy_op.len > XEN_PAGE_SIZE)
+		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	while (len-- != 0)
+		if (mapping[off++] >= vif->num_queues)
+			return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	gnttab_batch_copy(&copy_op, 1);
+
+	if (copy_op.status != GNTST_okay)
+		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+
+	return XEN_NETIF_CTRL_STATUS_SUCCESS;
+}
+
 static void make_ctrl_response(struct xenvif *vif,
 			       const struct xen_netif_ctrl_request *req,
 			       u32 status, u32 data)
@@ -2191,9 +2274,48 @@ static void push_ctrl_response(struct xenvif *vif)
 static void process_ctrl_request(struct xenvif *vif,
 				 const struct xen_netif_ctrl_request *req)
 {
-	/* There is no support for control requests yet. */
-	make_ctrl_response(vif, req,
-			   XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED, 0);
+	u32 status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED;
+	u32 data = 0;
+
+	switch (req->type) {
+	case XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
+		status = XEN_NETIF_CTRL_STATUS_SUCCESS;
+		data = XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4 |
+		       XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP |
+		       XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6 |
+		       XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP;
+		break;
+
+	case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
+		status = xenvif_set_toeplitz_flags(vif, req->data[0]);
+		break;
+
+	case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY:
+		status = xenvif_set_toeplitz_key(vif, req->data[0],
+						 req->data[1]);
+		break;
+
+	case XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_MAPPING_ORDER:
+		status = XEN_NETIF_CTRL_STATUS_SUCCESS;
+		data = XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER;
+		break;
+
+	case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING_ORDER:
+		status = xenvif_set_toeplitz_mapping_order(vif,
+							   req->data[0]);
+		break;
+
+	case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
+		status = xenvif_set_toeplitz_mapping(vif, req->data[0],
+						     req->data[1],
+						     req->data[2]);
+		break;
+
+	default:
+		break;
+	}
+
+	make_ctrl_response(vif, req, status, data);
 	push_ctrl_response(vif);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 7/8] xen-netback: pass toeplitz hash value to the frontend
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
                   ` (5 preceding siblings ...)
  2016-02-12 10:13 ` [PATCH net-next v1 6/8] xen-netback: add an implementation of toeplitz hashing Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-14 22:17   ` Tom Herbert
  2016-02-12 10:13 ` [PATCH net-next v1 8/8] xen-netback: use toeplitz hash value from " Paul Durrant
  2016-02-12 10:54 ` [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing David Miller
  8 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant, Ian Campbell, Wei Liu

My recent patch to include/xen/interface/io/netif.h defines a new extra
info type that can be used to pass toeplitz hash values between backend
and VM frontend.

This patch adds code to xen-netback to pass hash values calculated for
receive-side packets to the VM frontend.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h  |  1 +
 drivers/net/xen-netback/netback.c | 76 ++++++++++++++++++++++++++++++++-------
 2 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 6687702..469dcf0 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -243,6 +243,7 @@ struct xenvif {
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
 	u8 multicast_control:1;
+	u8 hash_extra:1;
 
 	/* Is this interface disabled? True when backend discovers
 	 * frontend is rogue.
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 41ec7e9..57c91fe 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -163,6 +163,8 @@ static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue)
 	needed = DIV_ROUND_UP(skb->len, XEN_PAGE_SIZE);
 	if (skb_is_gso(skb))
 		needed++;
+	if (queue->vif->hash_extra)
+		needed++;
 
 	do {
 		prod = queue->rx.sring->req_prod;
@@ -280,6 +282,8 @@ struct gop_frag_copy {
 	struct xenvif_rx_meta *meta;
 	int head;
 	int gso_type;
+	int protocol;
+	int hash_type;
 
 	struct page *page;
 };
@@ -326,8 +330,19 @@ static void xenvif_setup_copy_gop(unsigned long gfn,
 	npo->copy_off += *len;
 	info->meta->size += *len;
 
+	if (!info->head)
+		return;
+
 	/* Leave a gap for the GSO descriptor. */
-	if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
+	if ((1 << info->gso_type) & queue->vif->gso_mask)
+		queue->rx.req_cons++;
+
+	/* Leave a gap for the hash extra segment. */
+	if (queue->vif->hash_extra &&
+	    (info->hash_type == PKT_HASH_TYPE_L4 ||
+	     info->hash_type == PKT_HASH_TYPE_L3) &&
+	    (info->protocol == htons(ETH_P_IP) ||
+	     info->protocol == htons(ETH_P_IPV6)))
 		queue->rx.req_cons++;
 
 	info->head = 0; /* There must be something in this buffer now */
@@ -362,6 +377,8 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		.npo = npo,
 		.head = *head,
 		.gso_type = XEN_NETIF_GSO_TYPE_NONE,
+		.protocol = skb->protocol,
+		.hash_type = skb_hash_type(skb),
 	};
 	unsigned long bytes;
 
@@ -550,6 +567,7 @@ void xenvif_kick_thread(struct xenvif_queue *queue)
 
 static void xenvif_rx_action(struct xenvif_queue *queue)
 {
+	struct xenvif *vif = queue->vif;
 	s8 status;
 	u16 flags;
 	struct xen_netif_rx_response *resp;
@@ -567,6 +585,8 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
 	skb_queue_head_init(&rxq);
 
+	vif->hash_extra = !!vif->toeplitz.flags;
+
 	while (xenvif_rx_ring_slots_available(queue)
 	       && (skb = xenvif_rx_dequeue(queue)) != NULL) {
 		queue->last_rx_time = jiffies;
@@ -585,9 +605,10 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 	gnttab_batch_copy(queue->grant_copy_op, npo.copy_prod);
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+		struct xen_netif_extra_info *extra = NULL;
 
 		if ((1 << queue->meta[npo.meta_cons].gso_type) &
-		    queue->vif->gso_prefix_mask) {
+		    vif->gso_prefix_mask) {
 			resp = RING_GET_RESPONSE(&queue->rx,
 						 queue->rx.rsp_prod_pvt++);
 
@@ -605,7 +626,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		queue->stats.tx_bytes += skb->len;
 		queue->stats.tx_packets++;
 
-		status = xenvif_check_gop(queue->vif,
+		status = xenvif_check_gop(vif,
 					  XENVIF_RX_CB(skb)->meta_slots_used,
 					  &npo);
 
@@ -627,21 +648,52 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 					flags);
 
 		if ((1 << queue->meta[npo.meta_cons].gso_type) &
-		    queue->vif->gso_mask) {
-			struct xen_netif_extra_info *gso =
-				(struct xen_netif_extra_info *)
+		    vif->gso_mask) {
+			extra = (struct xen_netif_extra_info *)
 				RING_GET_RESPONSE(&queue->rx,
 						  queue->rx.rsp_prod_pvt++);
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
-			gso->u.gso.type = queue->meta[npo.meta_cons].gso_type;
-			gso->u.gso.size = queue->meta[npo.meta_cons].gso_size;
-			gso->u.gso.pad = 0;
-			gso->u.gso.features = 0;
+			extra->u.gso.type = queue->meta[npo.meta_cons].gso_type;
+			extra->u.gso.size = queue->meta[npo.meta_cons].gso_size;
+			extra->u.gso.pad = 0;
+			extra->u.gso.features = 0;
+
+			extra->type = XEN_NETIF_EXTRA_TYPE_GSO;
+			extra->flags = 0;
+		}
+
+		if (vif->hash_extra &&
+		    (skb_hash_type(skb) == PKT_HASH_TYPE_L4 ||
+		     skb_hash_type(skb) == PKT_HASH_TYPE_L3) &&
+		    (skb->protocol == htons(ETH_P_IP) ||
+		     skb->protocol == htons(ETH_P_IPV6))) {
+			if (resp->flags & XEN_NETRXF_extra_info)
+				extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
+			else
+				resp->flags |= XEN_NETRXF_extra_info;
+
+			extra = (struct xen_netif_extra_info *)
+				RING_GET_RESPONSE(&queue->rx,
+						  queue->rx.rsp_prod_pvt++);
 
-			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
-			gso->flags = 0;
+			if (skb_hash_type(skb) == PKT_HASH_TYPE_L4)
+				extra->u.toeplitz.type =
+					skb->protocol == htons(ETH_P_IP) ?
+					_XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP :
+					_XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP;
+			else if (skb_hash_type(skb) == PKT_HASH_TYPE_L3)
+				extra->u.toeplitz.type =
+					skb->protocol == htons(ETH_P_IP) ?
+					_XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4 :
+					_XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6;
+
+			*(uint32_t *)extra->u.toeplitz.value =
+				skb_get_hash_raw(skb);
+
+			extra->type = XEN_NETIF_EXTRA_TYPE_TOEPLITZ;
+			extra->flags = 0;
 		}
 
 		xenvif_add_frag_responses(queue, status,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v1 8/8] xen-netback: use toeplitz hash value from the frontend
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
                   ` (6 preceding siblings ...)
  2016-02-12 10:13 ` [PATCH net-next v1 7/8] xen-netback: pass toeplitz hash value to the frontend Paul Durrant
@ 2016-02-12 10:13 ` Paul Durrant
  2016-02-12 10:54 ` [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing David Miller
  8 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 10:13 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Paul Durrant, Ian Campbell, Wei Liu

This patch adds code to xen-netback to use the value in a toeplitz hash
extra info fragment passed from the VM frontend in a transmit-side packet
to set the skb hash accordingly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 57c91fe..f550b8a 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1341,6 +1341,28 @@ void xenvif_mcast_addr_list_free(struct xenvif *vif)
 	}
 }
 
+static void xenvif_set_skb_hash(struct xenvif *vif,
+				struct sk_buff *skb,
+				struct xen_netif_extra_info *extra)
+{
+	u32 hash = *(u32 *)extra->u.toeplitz.value;
+	enum pkt_hash_types type = PKT_HASH_TYPE_NONE;
+
+	switch (extra->u.toeplitz.type) {
+	case _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4:
+	case _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6:
+		type = PKT_HASH_TYPE_L3;
+		break;
+
+	case _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP:
+	case _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP:
+		type = PKT_HASH_TYPE_L4;
+		break;
+	}
+
+	skb_set_hash(skb, hash, type);
+}
+
 static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 				     int budget,
 				     unsigned *copy_ops,
@@ -1502,6 +1524,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 			}
 		}
 
+		if (extras[XEN_NETIF_EXTRA_TYPE_TOEPLITZ - 1].type) {
+			struct xen_netif_extra_info *extra;
+
+			extra = &extras[XEN_NETIF_EXTRA_TYPE_TOEPLITZ - 1];
+			xenvif_set_skb_hash(queue->vif, skb, extra);
+		}
+
 		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
 
 		__skb_put(skb, data_len);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
  2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
                   ` (7 preceding siblings ...)
  2016-02-12 10:13 ` [PATCH net-next v1 8/8] xen-netback: use toeplitz hash value from " Paul Durrant
@ 2016-02-12 10:54 ` David Miller
  2016-02-12 11:07   ` Paul Durrant
  8 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2016-02-12 10:54 UTC (permalink / raw)
  To: paul.durrant; +Cc: netdev, xen-devel

From: Paul Durrant <paul.durrant@citrix.com>
Date: Fri, 12 Feb 2016 10:13:17 +0000

> This patch series adds support for frontend-configurable toeplitz hashing
> in xen-netback (on the guest receive side). This support has been testing
> against a Windows frontend and has proven to be sufficient to pass the
> Microsoft HCK NDIS RSS tests.

We want less Toeplitz users, not more of them.  It is a more unsuitable
hash than jhash2.

I'm not applying these patches, sorry.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
  2016-02-12 10:54 ` [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing David Miller
@ 2016-02-12 11:07   ` Paul Durrant
  2016-02-12 16:41     ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 11:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, xen-devel@lists.xenproject.org

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of David Miller
> Sent: 12 February 2016 10:54
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
> 
> From: Paul Durrant <paul.durrant@citrix.com>
> Date: Fri, 12 Feb 2016 10:13:17 +0000
> 
> > This patch series adds support for frontend-configurable toeplitz hashing
> > in xen-netback (on the guest receive side). This support has been testing
> > against a Windows frontend and has proven to be sufficient to pass the
> > Microsoft HCK NDIS RSS tests.
> 
> We want less Toeplitz users, not more of them.  It is a more unsuitable
> hash than jhash2.
> 
> I'm not applying these patches, sorry.

Windows *requires* use of Teoplitz so your position completely rules out being able to support receive side scaling in Windows PV frontends on Linux kernel backends, not only for Xen but for any other hypervisor, which I think is totally unreasonable.

  Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
  2016-02-12 11:07   ` Paul Durrant
@ 2016-02-12 16:41     ` David Miller
  2016-02-12 16:59       ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2016-02-12 16:41 UTC (permalink / raw)
  To: Paul.Durrant; +Cc: netdev, xen-devel

From: Paul Durrant <Paul.Durrant@citrix.com>
Date: Fri, 12 Feb 2016 11:07:50 +0000

> Windows *requires* use of Teoplitz so your position completely rules
> out being able to support receive side scaling in Windows PV
> frontends on Linux kernel backends, not only for Xen but for any
> other hypervisor, which I think is totally unreasonable.

We have strong reason to believe that Toeplitz has properties that
make it very weak if not exploitable, therefore doing software RSS
with Jenkins is superior.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
  2016-02-12 16:41     ` David Miller
@ 2016-02-12 16:59       ` Paul Durrant
  2016-02-14 22:01         ` Tom Herbert
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-02-12 16:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, xen-devel@lists.xenproject.org

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: 12 February 2016 16:42
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
> 
> From: Paul Durrant <Paul.Durrant@citrix.com>
> Date: Fri, 12 Feb 2016 11:07:50 +0000
> 
> > Windows *requires* use of Teoplitz so your position completely rules
> > out being able to support receive side scaling in Windows PV
> > frontends on Linux kernel backends, not only for Xen but for any
> > other hypervisor, which I think is totally unreasonable.
> 
> We have strong reason to believe that Toeplitz has properties that
> make it very weak if not exploitable, therefore doing software RSS
> with Jenkins is superior.

I don't disagree, but there is really no choice of algorithm where a Windows frontend is concerned. My patches only add Toeplitz hashing into xen-netback according to the specification in xen's netif.h; the algorithm is not exposed for use by any other part of the kernel.
If it would help I can add a module parameter to xen-netback to control advertising the option of the hash to frontends so that it can be globally disabled if it is deployed on host where there are no Windows guests.

  Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
  2016-02-12 16:59       ` Paul Durrant
@ 2016-02-14 22:01         ` Tom Herbert
  2016-02-15  8:48           ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2016-02-14 22:01 UTC (permalink / raw)
  To: Paul Durrant
  Cc: David Miller, netdev@vger.kernel.org,
	xen-devel@lists.xenproject.org

On Fri, Feb 12, 2016 at 5:59 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>> -----Original Message-----
>> From: David Miller [mailto:davem@davemloft.net]
>> Sent: 12 February 2016 16:42
>> To: Paul Durrant
>> Cc: netdev@vger.kernel.org; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
>>
>> From: Paul Durrant <Paul.Durrant@citrix.com>
>> Date: Fri, 12 Feb 2016 11:07:50 +0000
>>
>> > Windows *requires* use of Teoplitz so your position completely rules
>> > out being able to support receive side scaling in Windows PV
>> > frontends on Linux kernel backends, not only for Xen but for any
>> > other hypervisor, which I think is totally unreasonable.
>>
>> We have strong reason to believe that Toeplitz has properties that
>> make it very weak if not exploitable, therefore doing software RSS
>> with Jenkins is superior.
>
> I don't disagree, but there is really no choice of algorithm where a Windows frontend is concerned. My patches only add Toeplitz hashing into xen-netback according to the specification in xen's netif.h; the algorithm is not exposed for use by any other part of the kernel.

You are touching struct sk_buff for this. Changing such a core
networking data structure just to satisfy a requirement of Windows
seems like a long shot to me...

> If it would help I can add a module parameter to xen-netback to control advertising the option of the hash to frontends so that it can be globally disabled if it is deployed on host where there are no Windows guests.

Toeplitz is dog-slow to compute in software. Comparing your
implementation to Jenkins hash (hash) it's about 12x slower (102 nsecs
versus 8 nsecs for IPv4 on my system). Doing this for every packet
seems like a non-starter to me. I suspect the majority use case for
receive is simple uniform load balancing across the virtual queues in
which case jhash will be perfectly fine. The case where it probably
matters is if the Windows OS is trying to configure some non-uniform
distribution (isolating a single flow for instance). To handle this
case, you could implement a simple look up flow table to map a
skb->hash to a Toeplitz hash, so that the Toeplitz hash only needs to
be calculated one time for a flow (recomputed with some periodic
timeout). This would be probabilistic though, so if really Windows
requires the hash to be set correctly 100% of the time it won't work.

Tom

>
>   Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v1 6/8] xen-netback: add an implementation of toeplitz hashing...
  2016-02-12 10:13 ` [PATCH net-next v1 6/8] xen-netback: add an implementation of toeplitz hashing Paul Durrant
@ 2016-02-14 22:13   ` Tom Herbert
  0 siblings, 0 replies; 19+ messages in thread
From: Tom Herbert @ 2016-02-14 22:13 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Linux Kernel Network Developers, xen-devel, Ian Campbell, Wei Liu

On Fri, Feb 12, 2016 at 11:13 AM, Paul Durrant <paul.durrant@citrix.com> wrote:
> ...for receive-side packets.
>
> My recent patch to include/xen/interface/io/netif.h defines a set of
> control messages that can be used by a VM frontend driver to configure
> toeplitz hashing of receive-side packets and consequent steering of those
> packets to particular queues.
>
> This patch introduces an implementation of toeplitz hashing and into
> xen-netback and allows it to be configured using the new control messages.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/common.h    |  13 ++++
>  drivers/net/xen-netback/interface.c | 149 ++++++++++++++++++++++++++++++++++++
>  drivers/net/xen-netback/netback.c   | 128 ++++++++++++++++++++++++++++++-
>  3 files changed, 287 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 093a12a..6687702 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -220,6 +220,12 @@ struct xenvif_mcast_addr {
>
>  #define XEN_NETBK_MCAST_MAX 64
>
> +#define XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE 40
> +
> +#define XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER 7
> +#define XEN_NETBK_MAX_TOEPLITZ_MAPPING_SIZE \
> +       BIT(XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER)
> +
>  struct xenvif {
>         /* Unique identifier for this interface. */
>         domid_t          domid;
> @@ -251,6 +257,13 @@ struct xenvif {
>         unsigned int num_queues; /* active queues, resource allocated */
>         unsigned int stalled_queues;
>
> +       struct {
> +               u32 flags;
> +               u8 key[XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE];
> +               u32 mapping[XEN_NETBK_MAX_TOEPLITZ_MAPPING_SIZE];
> +               unsigned int order;
> +       } toeplitz;
> +
>         struct xenbus_watch credit_watch;
>         struct xenbus_watch mcast_ctrl_watch;
>
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1850ebb..230afde 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -1,3 +1,4 @@
> +
>  /*
>   * Network-device interface management.
>   *
> @@ -151,6 +152,153 @@ void xenvif_wake_queue(struct xenvif_queue *queue)
>         netif_tx_wake_queue(netdev_get_tx_queue(dev, id));
>  }
>
> +static u32 toeplitz_hash(const u8 *k, unsigned int klen,
> +                        const u8 *d, unsigned int dlen)

This should be a common library function, probably in lib directory.

> +{
> +       unsigned int di, ki;
> +       u64 prefix = 0;
> +       u64 hash = 0;
> +
> +       /* Pre-load prefix with the first 8 bytes of the key */
> +       for (ki = 0; ki < 8; ki++) {
> +               prefix <<= 8;
> +               prefix |= (ki < klen) ? k[ki] : 0;
> +       }
> +
> +       for (di = 0; di < dlen; di++) {
> +               u8 byte = d[di];
> +               unsigned int bit;
> +
> +               for (bit = 0x80; bit != 0; bit >>= 1) {
> +                       if (byte & bit)
> +                               hash ^= prefix;
> +                       prefix <<= 1;
> +               }
> +
> +               /* prefix has now been left-shifted by 8, so OR in
> +                * the next byte.
> +                */
> +               prefix |= (ki < klen) ? k[ki] : 0;
> +               ki++;
> +       }
> +
> +       /* The valid part of the hash is in the upper 32 bits. */
> +       return hash >> 32;
> +}
> +
> +static void xenvif_set_toeplitz_hash(struct xenvif *vif, struct sk_buff *skb)
> +{
> +       struct flow_keys flow;
> +       u32 hash = 0;
> +       enum pkt_hash_types type = PKT_HASH_TYPE_NONE;
> +       const u8 *key = vif->toeplitz.key;
> +       u32 flags = vif->toeplitz.flags;
> +       const unsigned int len = XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE;
> +       bool has_tcp_hdr;
> +
> +       /* Quick rejection test: If the network protocol doesn't
> +        * correspond to any enabled hash type then there's no point
> +        * in parsing the packet header.
> +        */
> +       switch (skb->protocol) {
> +       case htons(ETH_P_IP):
> +               if (flags & (XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP |
> +                            XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4))
> +                       break;
> +
> +               goto done;
> +
> +       case htons(ETH_P_IPV6):
> +               if (flags & (XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP |
> +                            XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6))
> +                       break;
> +
> +               goto done;
> +
> +       default:
> +               goto done;
> +       }
> +
> +       memset(&flow, 0, sizeof(flow));
> +       if (!skb_flow_dissect_flow_keys(skb, &flow, 0))

Flow dissector will parse into encapsulations to find IP addresses.
This may or may not be what you want (we'd have to look at NDIS spec).

> +               goto done;
> +
> +       has_tcp_hdr = (flow.basic.ip_proto == IPPROTO_TCP) &&
> +                     !(flow.control.flags & FLOW_DIS_IS_FRAGMENT);
> +
> +       switch (skb->protocol) {
> +       case htons(ETH_P_IP):
> +               if (has_tcp_hdr &&
> +                   (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP)) {
> +                       u8 data[12];
> +
> +                       memcpy(&data[0], &flow.addrs.v4addrs.src, 4);
> +                       memcpy(&data[4], &flow.addrs.v4addrs.dst, 4);
> +                       memcpy(&data[8], &flow.ports.src, 2);
> +                       memcpy(&data[10], &flow.ports.dst, 2);
> +
> +                       hash = toeplitz_hash(key, len,
> +                                            data, sizeof(data));
> +                       type = PKT_HASH_TYPE_L4;
> +               } else if (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4) {
> +                       u8 data[8];
> +
> +                       memcpy(&data[0], &flow.addrs.v4addrs.src, 4);
> +                       memcpy(&data[4], &flow.addrs.v4addrs.dst, 4);
> +
> +                       hash = toeplitz_hash(key, len,
> +                                            data, sizeof(data));
> +                       type = PKT_HASH_TYPE_L3;
> +               }
> +
> +               break;
> +
> +       case htons(ETH_P_IPV6):
> +               if (has_tcp_hdr &&
> +                   (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP)) {
> +                       u8 data[36];
> +
> +                       memcpy(&data[0], &flow.addrs.v6addrs.src, 16);
> +                       memcpy(&data[16], &flow.addrs.v6addrs.dst, 16);
> +                       memcpy(&data[32], &flow.ports.src, 2);
> +                       memcpy(&data[34], &flow.ports.dst, 2);
> +
> +                       hash = toeplitz_hash(key, len,
> +                                            data, sizeof(data));
> +                       type = PKT_HASH_TYPE_L4;
> +               } else if (flags & XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6) {
> +                       u8 data[32];
> +
> +                       memcpy(&data[0], &flow.addrs.v6addrs.src, 16);
> +                       memcpy(&data[16], &flow.addrs.v6addrs.dst, 16);
> +
> +                       hash = toeplitz_hash(key, len,
> +                                            data, sizeof(data));
> +                       type = PKT_HASH_TYPE_L3;
> +               }
> +
> +               break;
> +       }
> +
> +done:
> +       skb_set_hash(skb, hash, type);

Is this necessary, it is potentially overwriting a valid L4 hash that
might be used later in sometling like RFS? Why not just return the
hash value.

> +}
> +
> +static u16 xenvif_select_queue(struct net_device *dev, struct sk_buff *skb,
> +                              void *accel_priv,
> +                              select_queue_fallback_t fallback)
> +{
> +       struct xenvif *vif = netdev_priv(dev);
> +       unsigned int mask = (1u << vif->toeplitz.order) - 1;
> +
> +       if (vif->toeplitz.flags == 0)
> +               return fallback(dev, skb) % dev->real_num_tx_queues;
> +
> +       xenvif_set_toeplitz_hash(vif, skb);
> +
> +       return vif->toeplitz.mapping[skb_get_hash_raw(skb) & mask];
> +}
> +
>  static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>         struct xenvif *vif = netdev_priv(dev);
> @@ -395,6 +543,7 @@ static const struct ethtool_ops xenvif_ethtool_ops = {
>  };
>
>  static const struct net_device_ops xenvif_netdev_ops = {
> +       .ndo_select_queue = xenvif_select_queue,
>         .ndo_start_xmit = xenvif_start_xmit,
>         .ndo_get_stats  = xenvif_get_stats,
>         .ndo_open       = xenvif_open,
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index a1f1a38..41ec7e9 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -2163,6 +2163,89 @@ int xenvif_dealloc_kthread(void *data)
>         return 0;
>  }
>
> +static u32 xenvif_set_toeplitz_flags(struct xenvif *vif, u32 flags)
> +{
> +       if (flags & ~(XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4 |
> +                     XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP |
> +                     XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6 |
> +                     XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP))
> +               return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       vif->toeplitz.flags = flags;
> +
> +       return XEN_NETIF_CTRL_STATUS_SUCCESS;
> +}
> +
> +static u32 xenvif_set_toeplitz_key(struct xenvif *vif, u32 gref, u32 len)
> +{
> +       u8 *key = vif->toeplitz.key;
> +       struct gnttab_copy copy_op = {
> +               .source.u.ref = gref,
> +               .source.domid = vif->domid,
> +               .dest.u.gmfn = virt_to_gfn(key),
> +               .dest.domid = DOMID_SELF,
> +               .dest.offset = xen_offset_in_page(key),
> +               .len = len,
> +               .flags = GNTCOPY_source_gref
> +       };
> +
> +       if (len > XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE)
> +               return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       gnttab_batch_copy(&copy_op, 1);
> +
> +       if (copy_op.status != GNTST_okay)
> +               return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       /* Clear any remaining key octets */
> +       if (len < XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE)
> +               memset(key + len, 0, XEN_NETBK_MAX_TOEPLITZ_KEY_SIZE - len);
> +
> +       return XEN_NETIF_CTRL_STATUS_SUCCESS;
> +}
> +
> +static u32 xenvif_set_toeplitz_mapping_order(struct xenvif *vif,
> +                                            u32 order)
> +{
> +       if (order > XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER)
> +               return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       vif->toeplitz.order = order;
> +       memset(vif->toeplitz.mapping, 0, sizeof(u32) << order);
> +
> +       return XEN_NETIF_CTRL_STATUS_SUCCESS;
> +}
> +
> +static u32 xenvif_set_toeplitz_mapping(struct xenvif *vif, u32 gref,
> +                                      u32 len, u32 off)
> +{
> +       u32 *mapping = &vif->toeplitz.mapping[off];
> +       struct gnttab_copy copy_op = {
> +               .source.u.ref = gref,
> +               .source.domid = vif->domid,
> +               .dest.u.gmfn = virt_to_gfn(mapping),
> +               .dest.domid = DOMID_SELF,
> +               .dest.offset = xen_offset_in_page(mapping),
> +               .len = len * sizeof(u32),
> +               .flags = GNTCOPY_source_gref
> +       };
> +
> +       if ((off + len > (1u << vif->toeplitz.order)) ||
> +           copy_op.len > XEN_PAGE_SIZE)
> +               return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       while (len-- != 0)
> +               if (mapping[off++] >= vif->num_queues)
> +                       return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       gnttab_batch_copy(&copy_op, 1);
> +
> +       if (copy_op.status != GNTST_okay)
> +               return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +       return XEN_NETIF_CTRL_STATUS_SUCCESS;
> +}
> +
>  static void make_ctrl_response(struct xenvif *vif,
>                                const struct xen_netif_ctrl_request *req,
>                                u32 status, u32 data)
> @@ -2191,9 +2274,48 @@ static void push_ctrl_response(struct xenvif *vif)
>  static void process_ctrl_request(struct xenvif *vif,
>                                  const struct xen_netif_ctrl_request *req)
>  {
> -       /* There is no support for control requests yet. */
> -       make_ctrl_response(vif, req,
> -                          XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED, 0);
> +       u32 status = XEN_NETIF_CTRL_STATUS_NOT_SUPPORTED;
> +       u32 data = 0;
> +
> +       switch (req->type) {
> +       case XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
> +               status = XEN_NETIF_CTRL_STATUS_SUCCESS;
> +               data = XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4 |
> +                      XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP |
> +                      XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6 |
> +                      XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP;
> +               break;
> +
> +       case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> +               status = xenvif_set_toeplitz_flags(vif, req->data[0]);
> +               break;
> +
> +       case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY:
> +               status = xenvif_set_toeplitz_key(vif, req->data[0],
> +                                                req->data[1]);
> +               break;
> +
> +       case XEN_NETIF_CTRL_TYPE_GET_TOEPLITZ_MAPPING_ORDER:
> +               status = XEN_NETIF_CTRL_STATUS_SUCCESS;
> +               data = XEN_NETBK_MAX_TOEPLITZ_MAPPING_ORDER;
> +               break;
> +
> +       case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING_ORDER:
> +               status = xenvif_set_toeplitz_mapping_order(vif,
> +                                                          req->data[0]);
> +               break;
> +
> +       case XEN_NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
> +               status = xenvif_set_toeplitz_mapping(vif, req->data[0],
> +                                                    req->data[1],
> +                                                    req->data[2]);
> +               break;
> +
> +       default:
> +               break;
> +       }
> +
> +       make_ctrl_response(vif, req, status, data);
>         push_ctrl_response(vif);
>  }
>
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v1 7/8] xen-netback: pass toeplitz hash value to the frontend
  2016-02-12 10:13 ` [PATCH net-next v1 7/8] xen-netback: pass toeplitz hash value to the frontend Paul Durrant
@ 2016-02-14 22:17   ` Tom Herbert
  0 siblings, 0 replies; 19+ messages in thread
From: Tom Herbert @ 2016-02-14 22:17 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Linux Kernel Network Developers, xen-devel, Ian Campbell, Wei Liu

On Fri, Feb 12, 2016 at 11:13 AM, Paul Durrant <paul.durrant@citrix.com> wrote:
> My recent patch to include/xen/interface/io/netif.h defines a new extra
> info type that can be used to pass toeplitz hash values between backend
> and VM frontend.
>
> This patch adds code to xen-netback to pass hash values calculated for
> receive-side packets to the VM frontend.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/common.h  |  1 +
>  drivers/net/xen-netback/netback.c | 76 ++++++++++++++++++++++++++++++++-------
>  2 files changed, 65 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 6687702..469dcf0 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -243,6 +243,7 @@ struct xenvif {
>         u8 ip_csum:1;
>         u8 ipv6_csum:1;
>         u8 multicast_control:1;
> +       u8 hash_extra:1;
>
>         /* Is this interface disabled? True when backend discovers
>          * frontend is rogue.
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 41ec7e9..57c91fe 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -163,6 +163,8 @@ static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue)
>         needed = DIV_ROUND_UP(skb->len, XEN_PAGE_SIZE);
>         if (skb_is_gso(skb))
>                 needed++;
> +       if (queue->vif->hash_extra)
> +               needed++;
>
>         do {
>                 prod = queue->rx.sring->req_prod;
> @@ -280,6 +282,8 @@ struct gop_frag_copy {
>         struct xenvif_rx_meta *meta;
>         int head;
>         int gso_type;
> +       int protocol;
> +       int hash_type;
>
>         struct page *page;
>  };
> @@ -326,8 +330,19 @@ static void xenvif_setup_copy_gop(unsigned long gfn,
>         npo->copy_off += *len;
>         info->meta->size += *len;
>
> +       if (!info->head)
> +               return;
> +
>         /* Leave a gap for the GSO descriptor. */
> -       if (info->head && ((1 << info->gso_type) & queue->vif->gso_mask))
> +       if ((1 << info->gso_type) & queue->vif->gso_mask)
> +               queue->rx.req_cons++;
> +
> +       /* Leave a gap for the hash extra segment. */
> +       if (queue->vif->hash_extra &&
> +           (info->hash_type == PKT_HASH_TYPE_L4 ||
> +            info->hash_type == PKT_HASH_TYPE_L3) &&
> +           (info->protocol == htons(ETH_P_IP) ||
> +            info->protocol == htons(ETH_P_IPV6)))
>                 queue->rx.req_cons++;
>
>         info->head = 0; /* There must be something in this buffer now */
> @@ -362,6 +377,8 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
>                 .npo = npo,
>                 .head = *head,
>                 .gso_type = XEN_NETIF_GSO_TYPE_NONE,
> +               .protocol = skb->protocol,
> +               .hash_type = skb_hash_type(skb),
>         };
>         unsigned long bytes;
>
> @@ -550,6 +567,7 @@ void xenvif_kick_thread(struct xenvif_queue *queue)
>
>  static void xenvif_rx_action(struct xenvif_queue *queue)
>  {
> +       struct xenvif *vif = queue->vif;
>         s8 status;
>         u16 flags;
>         struct xen_netif_rx_response *resp;
> @@ -567,6 +585,8 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>
>         skb_queue_head_init(&rxq);
>
> +       vif->hash_extra = !!vif->toeplitz.flags;
> +
>         while (xenvif_rx_ring_slots_available(queue)
>                && (skb = xenvif_rx_dequeue(queue)) != NULL) {
>                 queue->last_rx_time = jiffies;
> @@ -585,9 +605,10 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>         gnttab_batch_copy(queue->grant_copy_op, npo.copy_prod);
>
>         while ((skb = __skb_dequeue(&rxq)) != NULL) {
> +               struct xen_netif_extra_info *extra = NULL;
>
>                 if ((1 << queue->meta[npo.meta_cons].gso_type) &
> -                   queue->vif->gso_prefix_mask) {
> +                   vif->gso_prefix_mask) {
>                         resp = RING_GET_RESPONSE(&queue->rx,
>                                                  queue->rx.rsp_prod_pvt++);
>
> @@ -605,7 +626,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>                 queue->stats.tx_bytes += skb->len;
>                 queue->stats.tx_packets++;
>
> -               status = xenvif_check_gop(queue->vif,
> +               status = xenvif_check_gop(vif,
>                                           XENVIF_RX_CB(skb)->meta_slots_used,
>                                           &npo);
>
> @@ -627,21 +648,52 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>                                         flags);
>
>                 if ((1 << queue->meta[npo.meta_cons].gso_type) &
> -                   queue->vif->gso_mask) {
> -                       struct xen_netif_extra_info *gso =
> -                               (struct xen_netif_extra_info *)
> +                   vif->gso_mask) {
> +                       extra = (struct xen_netif_extra_info *)
>                                 RING_GET_RESPONSE(&queue->rx,
>                                                   queue->rx.rsp_prod_pvt++);
>
>                         resp->flags |= XEN_NETRXF_extra_info;
>
> -                       gso->u.gso.type = queue->meta[npo.meta_cons].gso_type;
> -                       gso->u.gso.size = queue->meta[npo.meta_cons].gso_size;
> -                       gso->u.gso.pad = 0;
> -                       gso->u.gso.features = 0;
> +                       extra->u.gso.type = queue->meta[npo.meta_cons].gso_type;
> +                       extra->u.gso.size = queue->meta[npo.meta_cons].gso_size;
> +                       extra->u.gso.pad = 0;
> +                       extra->u.gso.features = 0;
> +
> +                       extra->type = XEN_NETIF_EXTRA_TYPE_GSO;
> +                       extra->flags = 0;
> +               }
> +
> +               if (vif->hash_extra &&
> +                   (skb_hash_type(skb) == PKT_HASH_TYPE_L4 ||
> +                    skb_hash_type(skb) == PKT_HASH_TYPE_L3) &&
> +                   (skb->protocol == htons(ETH_P_IP) ||
> +                    skb->protocol == htons(ETH_P_IPV6))) {
> +                       if (resp->flags & XEN_NETRXF_extra_info)
> +                               extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
> +                       else
> +                               resp->flags |= XEN_NETRXF_extra_info;
> +
> +                       extra = (struct xen_netif_extra_info *)
> +                               RING_GET_RESPONSE(&queue->rx,
> +                                                 queue->rx.rsp_prod_pvt++);
>
> -                       gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
> -                       gso->flags = 0;
> +                       if (skb_hash_type(skb) == PKT_HASH_TYPE_L4)

A non-zero skb->hash does is not necessarily a Toeplitz hash.

> +                               extra->u.toeplitz.type =
> +                                       skb->protocol == htons(ETH_P_IP) ?
> +                                       _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4_TCP :
> +                                       _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6_TCP;
> +                       else if (skb_hash_type(skb) == PKT_HASH_TYPE_L3)
> +                               extra->u.toeplitz.type =
> +                                       skb->protocol == htons(ETH_P_IP) ?
> +                                       _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV4 :
> +                                       _XEN_NETIF_CTRL_TOEPLITZ_HASH_IPV6;
> +
> +                       *(uint32_t *)extra->u.toeplitz.value =
> +                               skb_get_hash_raw(skb);
> +
> +                       extra->type = XEN_NETIF_EXTRA_TYPE_TOEPLITZ;
> +                       extra->flags = 0;
>                 }
>
>                 xenvif_add_frag_responses(queue, status,
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer...
  2016-02-12 10:13 ` [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer Paul Durrant
@ 2016-02-14 22:25   ` Tom Herbert
  2016-02-15  8:50     ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2016-02-14 22:25 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Linux Kernel Network Developers, xen-devel, David S. Miller,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek

On Fri, Feb 12, 2016 at 11:13 AM, Paul Durrant <paul.durrant@citrix.com> wrote:
> ...rather than a boolean merely indicating a canonical L4 hash.
>
> skb_set_hash() takes a hash type (from enum pkt_hash_types) as an
> argument but information is lost since only a single bit in the skb
> stores whether that hash type is PKT_HASH_TYPE_L4 or not.
>
> By using two bits it's possible to store the complete hash type
> information for use by drivers. A subsequent patch to xen-netback
> needs this information to forward packet hashes to VM frontend drivers.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jay Vosburgh <j.vosburgh@gmail.com>
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
> ---
>  drivers/net/bonding/bond_main.c |  2 +-
>  include/linux/skbuff.h          | 53 ++++++++++++++++++++++++++++-------------
>  include/net/flow_dissector.h    |  5 ++++
>  include/net/sock.h              |  2 +-
>  include/trace/events/net.h      |  2 +-
>  net/core/flow_dissector.c       | 27 ++++++++++++++++-----
>  6 files changed, 65 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 45bdd87..ad0ee00 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3173,7 +3173,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>         u32 hash;
>
>         if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> -           skb->l4_hash)
> +           skb_has_l4_hash(skb))
>                 return skb->hash;
>
>         if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 6ec86f1..9021b52 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -595,8 +595,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1,
>   *     @xmit_more: More SKBs are pending for this queue
>   *     @ndisc_nodetype: router type (from link layer)
>   *     @ooo_okay: allow the mapping of a socket to a queue to be changed
> - *     @l4_hash: indicate hash is a canonical 4-tuple hash over transport
> - *             ports.
> + *     @hash_type: indicates type of hash (see enum pkt_hash_types below)
>   *     @sw_hash: indicates hash was computed in software stack
>   *     @wifi_acked_valid: wifi_acked was set
>   *     @wifi_acked: whether frame was acked on wifi or not
> @@ -701,10 +700,10 @@ struct sk_buff {
>         __u8                    nf_trace:1;
>         __u8                    ip_summed:2;
>         __u8                    ooo_okay:1;
> -       __u8                    l4_hash:1;
>         __u8                    sw_hash:1;
>         __u8                    wifi_acked_valid:1;
>         __u8                    wifi_acked:1;
> +       /* 1 bit hole */
>
>         __u8                    no_fcs:1;
>         /* Indicates the inner headers are valid in the skbuff. */
> @@ -721,7 +720,8 @@ struct sk_buff {
>         __u8                    ipvs_property:1;
>         __u8                    inner_protocol_type:1;
>         __u8                    remcsum_offload:1;
> -       /* 3 or 5 bit hole */
> +       __u8                    hash_type:2;

This isn't needed by the stack-- we don't differentiate between L2 and
L3 hash anywhere. Also it doesn't help in the xen case for passing a
hash to Windows without having another bit to indicate that the hash
is indeed Toeplitz.

> +       /* 1 or 3 bit hole */
>
>  #ifdef CONFIG_NET_SCHED
>         __u16                   tc_index;       /* traffic control index */
> @@ -1030,19 +1030,35 @@ static inline void skb_clear_hash(struct sk_buff *skb)
>  {
>         skb->hash = 0;
>         skb->sw_hash = 0;
> -       skb->l4_hash = 0;
> +       skb->hash_type = 0;
> +}
> +
> +static inline enum pkt_hash_types skb_hash_type(struct sk_buff *skb)
> +{
> +       return skb->hash_type;
> +}
> +
> +static inline bool skb_has_l4_hash(struct sk_buff *skb)
> +{
> +       return skb_hash_type(skb) == PKT_HASH_TYPE_L4;
> +}
> +
> +static inline bool skb_has_sw_hash(struct sk_buff *skb)
> +{
> +       return !!skb->sw_hash;
>  }
>
>  static inline void skb_clear_hash_if_not_l4(struct sk_buff *skb)
>  {
> -       if (!skb->l4_hash)
> +       if (!skb_has_l4_hash(skb))
>                 skb_clear_hash(skb);
>  }
>
>  static inline void
> -__skb_set_hash(struct sk_buff *skb, __u32 hash, bool is_sw, bool is_l4)
> +__skb_set_hash(struct sk_buff *skb, __u32 hash, bool is_sw,
> +              enum pkt_hash_types type)
>  {
> -       skb->l4_hash = is_l4;
> +       skb->hash_type = type;
>         skb->sw_hash = is_sw;
>         skb->hash = hash;
>  }
> @@ -1051,13 +1067,13 @@ static inline void
>  skb_set_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types type)
>  {
>         /* Used by drivers to set hash from HW */
> -       __skb_set_hash(skb, hash, false, type == PKT_HASH_TYPE_L4);
> +       __skb_set_hash(skb, hash, false, type);
>  }
>
>  static inline void
> -__skb_set_sw_hash(struct sk_buff *skb, __u32 hash, bool is_l4)
> +__skb_set_sw_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types type)
>  {
> -       __skb_set_hash(skb, hash, true, is_l4);
> +       __skb_set_hash(skb, hash, true, type);
>  }
>
>  void __skb_get_hash(struct sk_buff *skb);
> @@ -1110,9 +1126,10 @@ static inline bool skb_flow_dissect_flow_keys_buf(struct flow_keys *flow,
>                                   data, proto, nhoff, hlen, flags);
>  }
>
> +

Extra line

>  static inline __u32 skb_get_hash(struct sk_buff *skb)
>  {
> -       if (!skb->l4_hash && !skb->sw_hash)
> +       if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb))
>                 __skb_get_hash(skb);
>
>         return skb->hash;
> @@ -1122,11 +1139,12 @@ __u32 __skb_get_hash_flowi6(struct sk_buff *skb, const struct flowi6 *fl6);
>
>  static inline __u32 skb_get_hash_flowi6(struct sk_buff *skb, const struct flowi6 *fl6)
>  {
> -       if (!skb->l4_hash && !skb->sw_hash) {
> +       if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb)) {
>                 struct flow_keys keys;
>                 __u32 hash = __get_hash_from_flowi6(fl6, &keys);
>
> -               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
> +               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys) ?
> +                                 PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>         }
>
>         return skb->hash;
> @@ -1136,11 +1154,12 @@ __u32 __skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 *fl);
>
>  static inline __u32 skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 *fl4)
>  {
> -       if (!skb->l4_hash && !skb->sw_hash) {
> +       if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb)) {
>                 struct flow_keys keys;
>                 __u32 hash = __get_hash_from_flowi4(fl4, &keys);
>
> -               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
> +               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys) ?
> +                                 PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>         }
>
>         return skb->hash;
> @@ -1157,7 +1176,7 @@ static inline void skb_copy_hash(struct sk_buff *to, const struct sk_buff *from)
>  {
>         to->hash = from->hash;
>         to->sw_hash = from->sw_hash;
> -       to->l4_hash = from->l4_hash;
> +       to->hash_type = from->hash_type;
>  };
>
>  static inline void skb_sender_cpu_clear(struct sk_buff *skb)
> diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> index 8c8548c..418b8c5 100644
> --- a/include/net/flow_dissector.h
> +++ b/include/net/flow_dissector.h
> @@ -182,6 +182,11 @@ static inline bool flow_keys_have_l4(struct flow_keys *keys)
>         return (keys->ports.ports || keys->tags.flow_label);
>  }
>
> +static inline bool flow_keys_have_l3(struct flow_keys *keys)
> +{
> +       return !!keys->control.addr_type;
> +}
> +
>  u32 flow_hash_from_keys(struct flow_keys *keys);
>
>  #endif
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 255d3e0..21b8cdc 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1828,7 +1828,7 @@ static inline void sock_poll_wait(struct file *filp,
>  static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
>  {
>         if (sk->sk_txhash) {
> -               skb->l4_hash = 1;
> +               skb->hash_type = PKT_HASH_TYPE_L4;
>                 skb->hash = sk->sk_txhash;
>         }
>  }
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 49cc7c3..25e7979 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
> @@ -180,7 +180,7 @@ DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
>                 __entry->protocol = ntohs(skb->protocol);
>                 __entry->ip_summed = skb->ip_summed;
>                 __entry->hash = skb->hash;
> -               __entry->l4_hash = skb->l4_hash;
> +               __entry->l4_hash = skb->hash_type == PKT_HASH_TYPE_L4;
>                 __entry->len = skb->len;
>                 __entry->data_len = skb->data_len;
>                 __entry->truesize = skb->truesize;
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index d79699c..956208b 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -658,17 +658,30 @@ EXPORT_SYMBOL(make_flow_keys_digest);
>   *
>   * This function calculates a flow hash based on src/dst addresses
>   * and src/dst port numbers.  Sets hash in skb to non-zero hash value
> - * on success, zero indicates no valid hash.  Also, sets l4_hash in skb
> - * if hash is a canonical 4-tuple hash over transport ports.
> + * on success, zero indicates no valid hash in which case the hash type
> + * is set to NONE. If the hash is a canonical 4-tuple hash over transport
> + * ports then type is set to L4. If the hash did not include transport
> + * then type is set to L3, otherwise it is assumed to be L2 only.
>   */
>  void __skb_get_hash(struct sk_buff *skb)
>  {
>         struct flow_keys keys;
> +       u32 hash;
> +       enum pkt_hash_types type;
>
>         __flow_hash_secret_init();
>
> -       __skb_set_sw_hash(skb, ___skb_get_hash(skb, &keys, hashrnd),
> -                         flow_keys_have_l4(&keys));
> +       hash = ___skb_get_hash(skb, &keys, hashrnd);
> +       if (hash == 0)
> +               type = PKT_HASH_TYPE_NONE;
> +       else if (flow_keys_have_l4(&keys))
> +               type = PKT_HASH_TYPE_L4;
> +       else if (flow_keys_have_l3(&keys))
> +               type = PKT_HASH_TYPE_L3;
> +       else
> +               type = PKT_HASH_TYPE_L2;
> +
> +       __skb_set_sw_hash(skb, hash, type);
>  }
>  EXPORT_SYMBOL(__skb_get_hash);
>
> @@ -698,7 +711,8 @@ __u32 __skb_get_hash_flowi6(struct sk_buff *skb, const struct flowi6 *fl6)
>         keys.basic.ip_proto = fl6->flowi6_proto;
>
>         __skb_set_sw_hash(skb, flow_hash_from_keys(&keys),
> -                         flow_keys_have_l4(&keys));
> +                         flow_keys_have_l4(&keys) ?
> +                         PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>
>         return skb->hash;
>  }
> @@ -719,7 +733,8 @@ __u32 __skb_get_hash_flowi4(struct sk_buff *skb, const struct flowi4 *fl4)
>         keys.basic.ip_proto = fl4->flowi4_proto;
>
>         __skb_set_sw_hash(skb, flow_hash_from_keys(&keys),
> -                         flow_keys_have_l4(&keys));
> +                         flow_keys_have_l4(&keys) ?
> +                         PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
>
>         return skb->hash;
>  }
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
  2016-02-14 22:01         ` Tom Herbert
@ 2016-02-15  8:48           ` Paul Durrant
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-15  8:48 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, netdev@vger.kernel.org,
	xen-devel@lists.xenproject.org

> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: 14 February 2016 22:02
> To: Paul Durrant
> Cc: David Miller; netdev@vger.kernel.org; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing
> 
> On Fri, Feb 12, 2016 at 5:59 PM, Paul Durrant <Paul.Durrant@citrix.com>
> wrote:
> >> -----Original Message-----
> >> From: David Miller [mailto:davem@davemloft.net]
> >> Sent: 12 February 2016 16:42
> >> To: Paul Durrant
> >> Cc: netdev@vger.kernel.org; xen-devel@lists.xenproject.org
> >> Subject: Re: [PATCH net-next v1 0/8] xen-netback: support toeplitz
> hashing
> >>
> >> From: Paul Durrant <Paul.Durrant@citrix.com>
> >> Date: Fri, 12 Feb 2016 11:07:50 +0000
> >>
> >> > Windows *requires* use of Teoplitz so your position completely rules
> >> > out being able to support receive side scaling in Windows PV
> >> > frontends on Linux kernel backends, not only for Xen but for any
> >> > other hypervisor, which I think is totally unreasonable.
> >>
> >> We have strong reason to believe that Toeplitz has properties that
> >> make it very weak if not exploitable, therefore doing software RSS
> >> with Jenkins is superior.
> >
> > I don't disagree, but there is really no choice of algorithm where a Windows
> frontend is concerned. My patches only add Toeplitz hashing into xen-
> netback according to the specification in xen's netif.h; the algorithm is not
> exposed for use by any other part of the kernel.
> 
> You are touching struct sk_buff for this. Changing such a core
> networking data structure just to satisfy a requirement of Windows
> seems like a long shot to me...

The patch to sk_buff is nothing to do with Toeplitz hashing. It's merely because I noticed that sk_buff was unable to store complete hash type information at the moment and that seems suboptimal. I've re-worded and resent that patch separately.

> 
> > If it would help I can add a module parameter to xen-netback to control
> advertising the option of the hash to frontends so that it can be globally
> disabled if it is deployed on host where there are no Windows guests.
> 
> Toeplitz is dog-slow to compute in software. Comparing your
> implementation to Jenkins hash (hash) it's about 12x slower (102 nsecs
> versus 8 nsecs for IPv4 on my system). Doing this for every packet
> seems like a non-starter to me. I suspect the majority use case for
> receive is simple uniform load balancing across the virtual queues in
> which case jhash will be perfectly fine. The case where it probably
> matters is if the Windows OS is trying to configure some non-uniform
> distribution (isolating a single flow for instance). To handle this
> case, you could implement a simple look up flow table to map a
> skb->hash to a Toeplitz hash, so that the Toeplitz hash only needs to
> be calculated one time for a flow (recomputed with some periodic
> timeout). This would be probabilistic though, so if really Windows
> requires the hash to be set correctly 100% of the time it won't work.
> 

I agree that Toeplitz is dog-slow to calculation in s/w. Keeping a cache of the N most recent flows and flushing it if the key changes (which is the only thing that's going to cause the hash to change) sounds like a good optimization. I'll certainly add that.

  Paul

> Tom
> 
> >
> >   Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer...
  2016-02-14 22:25   ` Tom Herbert
@ 2016-02-15  8:50     ` Paul Durrant
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-02-15  8:50 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, xen-devel@lists.xenproject.org,
	David S. Miller, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek

> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: 14 February 2016 22:25
> To: Paul Durrant
> Cc: Linux Kernel Network Developers; xen-devel@lists.xenproject.org; David
> S. Miller; Jay Vosburgh; Veaceslav Falico; Andy Gospodarek
> Subject: Re: [PATCH net-next v1 1/8] skbuff: store hash type in socket
> buffer...
> 
> On Fri, Feb 12, 2016 at 11:13 AM, Paul Durrant <paul.durrant@citrix.com>
> wrote:
> > ...rather than a boolean merely indicating a canonical L4 hash.
> >
> > skb_set_hash() takes a hash type (from enum pkt_hash_types) as an
> > argument but information is lost since only a single bit in the skb
> > stores whether that hash type is PKT_HASH_TYPE_L4 or not.
> >
> > By using two bits it's possible to store the complete hash type
> > information for use by drivers. A subsequent patch to xen-netback
> > needs this information to forward packet hashes to VM frontend drivers.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Jay Vosburgh <j.vosburgh@gmail.com>
> > Cc: Veaceslav Falico <vfalico@gmail.com>
> > Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
> > ---
> >  drivers/net/bonding/bond_main.c |  2 +-
> >  include/linux/skbuff.h          | 53 ++++++++++++++++++++++++++++-------
> ------
> >  include/net/flow_dissector.h    |  5 ++++
> >  include/net/sock.h              |  2 +-
> >  include/trace/events/net.h      |  2 +-
> >  net/core/flow_dissector.c       | 27 ++++++++++++++++-----
> >  6 files changed, 65 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c
> > index 45bdd87..ad0ee00 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -3173,7 +3173,7 @@ u32 bond_xmit_hash(struct bonding *bond,
> struct sk_buff *skb)
> >         u32 hash;
> >
> >         if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> > -           skb->l4_hash)
> > +           skb_has_l4_hash(skb))
> >                 return skb->hash;
> >
> >         if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 6ec86f1..9021b52 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -595,8 +595,7 @@ static inline bool skb_mstamp_after(const struct
> skb_mstamp *t1,
> >   *     @xmit_more: More SKBs are pending for this queue
> >   *     @ndisc_nodetype: router type (from link layer)
> >   *     @ooo_okay: allow the mapping of a socket to a queue to be changed
> > - *     @l4_hash: indicate hash is a canonical 4-tuple hash over transport
> > - *             ports.
> > + *     @hash_type: indicates type of hash (see enum pkt_hash_types
> below)
> >   *     @sw_hash: indicates hash was computed in software stack
> >   *     @wifi_acked_valid: wifi_acked was set
> >   *     @wifi_acked: whether frame was acked on wifi or not
> > @@ -701,10 +700,10 @@ struct sk_buff {
> >         __u8                    nf_trace:1;
> >         __u8                    ip_summed:2;
> >         __u8                    ooo_okay:1;
> > -       __u8                    l4_hash:1;
> >         __u8                    sw_hash:1;
> >         __u8                    wifi_acked_valid:1;
> >         __u8                    wifi_acked:1;
> > +       /* 1 bit hole */
> >
> >         __u8                    no_fcs:1;
> >         /* Indicates the inner headers are valid in the skbuff. */
> > @@ -721,7 +720,8 @@ struct sk_buff {
> >         __u8                    ipvs_property:1;
> >         __u8                    inner_protocol_type:1;
> >         __u8                    remcsum_offload:1;
> > -       /* 3 or 5 bit hole */
> > +       __u8                    hash_type:2;
> 
> This isn't needed by the stack-- we don't differentiate between L2 and
> L3 hash anywhere. Also it doesn't help in the xen case for passing a
> hash to Windows without having another bit to indicate that the hash
> is indeed Toeplitz.
> 

I hadn't read this before I split the patch and re-posted...

Given that the header defines an enum for hash types that *does* distinguish between L2 and L3 I find this confusing. Shouldn't that enum be removed if no-one cares?

  Paul

> > +       /* 1 or 3 bit hole */
> >
> >  #ifdef CONFIG_NET_SCHED
> >         __u16                   tc_index;       /* traffic control index */
> > @@ -1030,19 +1030,35 @@ static inline void skb_clear_hash(struct sk_buff
> *skb)
> >  {
> >         skb->hash = 0;
> >         skb->sw_hash = 0;
> > -       skb->l4_hash = 0;
> > +       skb->hash_type = 0;
> > +}
> > +
> > +static inline enum pkt_hash_types skb_hash_type(struct sk_buff *skb)
> > +{
> > +       return skb->hash_type;
> > +}
> > +
> > +static inline bool skb_has_l4_hash(struct sk_buff *skb)
> > +{
> > +       return skb_hash_type(skb) == PKT_HASH_TYPE_L4;
> > +}
> > +
> > +static inline bool skb_has_sw_hash(struct sk_buff *skb)
> > +{
> > +       return !!skb->sw_hash;
> >  }
> >
> >  static inline void skb_clear_hash_if_not_l4(struct sk_buff *skb)
> >  {
> > -       if (!skb->l4_hash)
> > +       if (!skb_has_l4_hash(skb))
> >                 skb_clear_hash(skb);
> >  }
> >
> >  static inline void
> > -__skb_set_hash(struct sk_buff *skb, __u32 hash, bool is_sw, bool is_l4)
> > +__skb_set_hash(struct sk_buff *skb, __u32 hash, bool is_sw,
> > +              enum pkt_hash_types type)
> >  {
> > -       skb->l4_hash = is_l4;
> > +       skb->hash_type = type;
> >         skb->sw_hash = is_sw;
> >         skb->hash = hash;
> >  }
> > @@ -1051,13 +1067,13 @@ static inline void
> >  skb_set_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types
> type)
> >  {
> >         /* Used by drivers to set hash from HW */
> > -       __skb_set_hash(skb, hash, false, type == PKT_HASH_TYPE_L4);
> > +       __skb_set_hash(skb, hash, false, type);
> >  }
> >
> >  static inline void
> > -__skb_set_sw_hash(struct sk_buff *skb, __u32 hash, bool is_l4)
> > +__skb_set_sw_hash(struct sk_buff *skb, __u32 hash, enum
> pkt_hash_types type)
> >  {
> > -       __skb_set_hash(skb, hash, true, is_l4);
> > +       __skb_set_hash(skb, hash, true, type);
> >  }
> >
> >  void __skb_get_hash(struct sk_buff *skb);
> > @@ -1110,9 +1126,10 @@ static inline bool
> skb_flow_dissect_flow_keys_buf(struct flow_keys *flow,
> >                                   data, proto, nhoff, hlen, flags);
> >  }
> >
> > +
> 
> Extra line
> 
> >  static inline __u32 skb_get_hash(struct sk_buff *skb)
> >  {
> > -       if (!skb->l4_hash && !skb->sw_hash)
> > +       if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb))
> >                 __skb_get_hash(skb);
> >
> >         return skb->hash;
> > @@ -1122,11 +1139,12 @@ __u32 __skb_get_hash_flowi6(struct sk_buff
> *skb, const struct flowi6 *fl6);
> >
> >  static inline __u32 skb_get_hash_flowi6(struct sk_buff *skb, const struct
> flowi6 *fl6)
> >  {
> > -       if (!skb->l4_hash && !skb->sw_hash) {
> > +       if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb)) {
> >                 struct flow_keys keys;
> >                 __u32 hash = __get_hash_from_flowi6(fl6, &keys);
> >
> > -               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
> > +               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys) ?
> > +                                 PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
> >         }
> >
> >         return skb->hash;
> > @@ -1136,11 +1154,12 @@ __u32 __skb_get_hash_flowi4(struct sk_buff
> *skb, const struct flowi4 *fl);
> >
> >  static inline __u32 skb_get_hash_flowi4(struct sk_buff *skb, const struct
> flowi4 *fl4)
> >  {
> > -       if (!skb->l4_hash && !skb->sw_hash) {
> > +       if (!skb_has_l4_hash(skb) && !skb_has_sw_hash(skb)) {
> >                 struct flow_keys keys;
> >                 __u32 hash = __get_hash_from_flowi4(fl4, &keys);
> >
> > -               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys));
> > +               __skb_set_sw_hash(skb, hash, flow_keys_have_l4(&keys) ?
> > +                                 PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
> >         }
> >
> >         return skb->hash;
> > @@ -1157,7 +1176,7 @@ static inline void skb_copy_hash(struct sk_buff
> *to, const struct sk_buff *from)
> >  {
> >         to->hash = from->hash;
> >         to->sw_hash = from->sw_hash;
> > -       to->l4_hash = from->l4_hash;
> > +       to->hash_type = from->hash_type;
> >  };
> >
> >  static inline void skb_sender_cpu_clear(struct sk_buff *skb)
> > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > index 8c8548c..418b8c5 100644
> > --- a/include/net/flow_dissector.h
> > +++ b/include/net/flow_dissector.h
> > @@ -182,6 +182,11 @@ static inline bool flow_keys_have_l4(struct
> flow_keys *keys)
> >         return (keys->ports.ports || keys->tags.flow_label);
> >  }
> >
> > +static inline bool flow_keys_have_l3(struct flow_keys *keys)
> > +{
> > +       return !!keys->control.addr_type;
> > +}
> > +
> >  u32 flow_hash_from_keys(struct flow_keys *keys);
> >
> >  #endif
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 255d3e0..21b8cdc 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -1828,7 +1828,7 @@ static inline void sock_poll_wait(struct file *filp,
> >  static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock
> *sk)
> >  {
> >         if (sk->sk_txhash) {
> > -               skb->l4_hash = 1;
> > +               skb->hash_type = PKT_HASH_TYPE_L4;
> >                 skb->hash = sk->sk_txhash;
> >         }
> >  }
> > diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> > index 49cc7c3..25e7979 100644
> > --- a/include/trace/events/net.h
> > +++ b/include/trace/events/net.h
> > @@ -180,7 +180,7 @@
> DECLARE_EVENT_CLASS(net_dev_rx_verbose_template,
> >                 __entry->protocol = ntohs(skb->protocol);
> >                 __entry->ip_summed = skb->ip_summed;
> >                 __entry->hash = skb->hash;
> > -               __entry->l4_hash = skb->l4_hash;
> > +               __entry->l4_hash = skb->hash_type == PKT_HASH_TYPE_L4;
> >                 __entry->len = skb->len;
> >                 __entry->data_len = skb->data_len;
> >                 __entry->truesize = skb->truesize;
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index d79699c..956208b 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -658,17 +658,30 @@ EXPORT_SYMBOL(make_flow_keys_digest);
> >   *
> >   * This function calculates a flow hash based on src/dst addresses
> >   * and src/dst port numbers.  Sets hash in skb to non-zero hash value
> > - * on success, zero indicates no valid hash.  Also, sets l4_hash in skb
> > - * if hash is a canonical 4-tuple hash over transport ports.
> > + * on success, zero indicates no valid hash in which case the hash type
> > + * is set to NONE. If the hash is a canonical 4-tuple hash over transport
> > + * ports then type is set to L4. If the hash did not include transport
> > + * then type is set to L3, otherwise it is assumed to be L2 only.
> >   */
> >  void __skb_get_hash(struct sk_buff *skb)
> >  {
> >         struct flow_keys keys;
> > +       u32 hash;
> > +       enum pkt_hash_types type;
> >
> >         __flow_hash_secret_init();
> >
> > -       __skb_set_sw_hash(skb, ___skb_get_hash(skb, &keys, hashrnd),
> > -                         flow_keys_have_l4(&keys));
> > +       hash = ___skb_get_hash(skb, &keys, hashrnd);
> > +       if (hash == 0)
> > +               type = PKT_HASH_TYPE_NONE;
> > +       else if (flow_keys_have_l4(&keys))
> > +               type = PKT_HASH_TYPE_L4;
> > +       else if (flow_keys_have_l3(&keys))
> > +               type = PKT_HASH_TYPE_L3;
> > +       else
> > +               type = PKT_HASH_TYPE_L2;
> > +
> > +       __skb_set_sw_hash(skb, hash, type);
> >  }
> >  EXPORT_SYMBOL(__skb_get_hash);
> >
> > @@ -698,7 +711,8 @@ __u32 __skb_get_hash_flowi6(struct sk_buff *skb,
> const struct flowi6 *fl6)
> >         keys.basic.ip_proto = fl6->flowi6_proto;
> >
> >         __skb_set_sw_hash(skb, flow_hash_from_keys(&keys),
> > -                         flow_keys_have_l4(&keys));
> > +                         flow_keys_have_l4(&keys) ?
> > +                         PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
> >
> >         return skb->hash;
> >  }
> > @@ -719,7 +733,8 @@ __u32 __skb_get_hash_flowi4(struct sk_buff *skb,
> const struct flowi4 *fl4)
> >         keys.basic.ip_proto = fl4->flowi4_proto;
> >
> >         __skb_set_sw_hash(skb, flow_hash_from_keys(&keys),
> > -                         flow_keys_have_l4(&keys));
> > +                         flow_keys_have_l4(&keys) ?
> > +                         PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
> >
> >         return skb->hash;
> >  }
> > --
> > 2.1.4
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-02-15  8:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-12 10:13 [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing Paul Durrant
2016-02-12 10:13 ` [PATCH net-next v1 1/8] skbuff: store hash type in socket buffer Paul Durrant
2016-02-14 22:25   ` Tom Herbert
2016-02-15  8:50     ` Paul Durrant
2016-02-12 10:13 ` [PATCH net-next v1 2/8] xen-netback: re-import canonical netif header Paul Durrant
2016-02-12 10:13 ` [PATCH net-next v1 3/8] xen-netback: support multiple extra info fragments passed from frontend Paul Durrant
2016-02-12 10:13 ` [PATCH net-next v1 4/8] xen-netback: reduce log spam Paul Durrant
2016-02-12 10:13 ` [PATCH net-next v1 5/8] xen-netback: add support for the control ring Paul Durrant
2016-02-12 10:13 ` [PATCH net-next v1 6/8] xen-netback: add an implementation of toeplitz hashing Paul Durrant
2016-02-14 22:13   ` Tom Herbert
2016-02-12 10:13 ` [PATCH net-next v1 7/8] xen-netback: pass toeplitz hash value to the frontend Paul Durrant
2016-02-14 22:17   ` Tom Herbert
2016-02-12 10:13 ` [PATCH net-next v1 8/8] xen-netback: use toeplitz hash value from " Paul Durrant
2016-02-12 10:54 ` [PATCH net-next v1 0/8] xen-netback: support toeplitz hashing David Miller
2016-02-12 11:07   ` Paul Durrant
2016-02-12 16:41     ` David Miller
2016-02-12 16:59       ` Paul Durrant
2016-02-14 22:01         ` Tom Herbert
2016-02-15  8:48           ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).