netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations
@ 2013-11-30 12:21 Thomas Graf
  2013-11-30 12:21 ` [net-next 1/7] genl: Add genlmsg_new_unicast() for unicast message allocation Thomas Graf
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Reposting this before the merge window as it will go through Jesse's
tree.

Given jumbo frames, the capacity of the slow path is improved by 
a factor of 2.5x.

V8: - Dropped patch adding NLM_F_REPLACE support, I'll pursue this in a
      separate patch series. Addresses Jesse's comments.
    - Improved comment describing OVS_DATAPATH_VERSION bump.
V7: - removed unintential kernel-doc comment
    - WARN_ONCE() -> WARN(), message on single line, added \n
V6: - Added memory mapped netlink i/o support
    - Drop user_features if old user space not aware of user features
      reuses an existing datapath
V5: - Removed padding requirement in user space
    - Added OVS_DP_F_UNALIGNED flag let user space signal ability to
      receive unaligned Netlink messages, fall back to linear copy
      otherwise.
V4: - Daniel Borkmann pointed out that the style in skbuff.h has changed
      in net-next, adapted to no longer using extern in headers.
V3: - Removed unneeded alignment of nlmsg_len after padding
V2: - Added skb_zerocopy_headlen() to calculate headroom of destination
      buffer. This also takes care of the from->head_frag issue.
    - Attribute alignment for frag_list case
    - API documentation
    - performance data for CHECKSUM_PARTIAL tx case

Thomas Graf (7):
  genl: Add genlmsg_new_unicast() for unicast message allocation
  netlink: Avoid netlink mmap alloc if msg size exceeds frame size
  openvswitch: Enable memory mapped Netlink i/o
  net: Export skb_zerocopy() to zerocopy from one skb to another
  openvswitch: Allow user space to announce ability to accept unaligned
    Netlink messages
  openvswitch: Drop user features if old user space attempted to create
    datapath
  openvswitch: Use skb_zerocopy() for upcall

 include/linux/skbuff.h               |   3 +
 include/net/genetlink.h              |   4 +
 include/uapi/linux/openvswitch.h     |  14 +++-
 net/core/skbuff.c                    |  85 +++++++++++++++++++++
 net/netfilter/nfnetlink_queue_core.c |  59 +--------------
 net/netlink/af_netlink.c             |   4 +
 net/netlink/genetlink.c              |  21 +++++
 net/openvswitch/datapath.c           | 143 ++++++++++++++++++++++++-----------
 net/openvswitch/datapath.h           |   2 +
 9 files changed, 236 insertions(+), 99 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net-next 1/7] genl: Add genlmsg_new_unicast() for unicast message allocation
  2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
@ 2013-11-30 12:21 ` Thomas Graf
       [not found] ` <cover.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/net/genetlink.h |  4 ++++
 net/netlink/genetlink.c | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 1b177ed..93695f0 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -73,6 +73,7 @@ struct genl_family {
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
+ * @dst_sk: destination socket
  */
 struct genl_info {
 	u32			snd_seq;
@@ -85,6 +86,7 @@ struct genl_info {
 	struct net *		_net;
 #endif
 	void *			user_ptr[2];
+	struct sock *		dst_sk;
 };
 
 static inline struct net *genl_info_net(struct genl_info *info)
@@ -177,6 +179,8 @@ void genl_notify(struct genl_family *family,
 		 struct sk_buff *skb, struct net *net, u32 portid,
 		 u32 group, struct nlmsghdr *nlh, gfp_t flags);
 
+struct sk_buff *genlmsg_new_unicast(size_t payload, struct genl_info *info,
+				    gfp_t flags);
 void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 		  struct genl_family *family, int flags, u8 cmd);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 713671a..b1dcdb9 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -461,6 +461,26 @@ int genl_unregister_family(struct genl_family *family)
 EXPORT_SYMBOL(genl_unregister_family);
 
 /**
+ * genlmsg_new_unicast - Allocate generic netlink message for unicast
+ * @payload: size of the message payload
+ * @info: information on destination
+ * @flags: the type of memory to allocate
+ *
+ * Allocates a new sk_buff large enough to cover the specified payload
+ * plus required Netlink headers. Will check receiving socket for
+ * memory mapped i/o capability and use it if enabled. Will fall back
+ * to non-mapped skb if message size exceeds the frame size of the ring.
+ */
+struct sk_buff *genlmsg_new_unicast(size_t payload, struct genl_info *info,
+				    gfp_t flags)
+{
+	size_t len = nlmsg_total_size(genlmsg_total_size(payload));
+
+	return netlink_alloc_skb(info->dst_sk, len, info->snd_portid, flags);
+}
+EXPORT_SYMBOL_GPL(genlmsg_new_unicast);
+
+/**
  * genlmsg_put - Add generic netlink header to netlink message
  * @skb: socket buffer holding the message
  * @portid: netlink portid the message is addressed to
@@ -600,6 +620,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 	info.genlhdr = nlmsg_data(nlh);
 	info.userhdr = nlmsg_data(nlh) + GENL_HDRLEN;
 	info.attrs = attrbuf;
+	info.dst_sk = skb->sk;
 	genl_info_net_set(&info, net);
 	memset(&info.user_ptr, 0, sizeof(info.user_ptr));
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 2/7] netlink: Avoid netlink mmap alloc if msg size exceeds frame size
       [not found] ` <cover.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2013-11-30 12:21   ` Thomas Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, fleitner-H+wXaHxf7aLQT0dZR+AlfA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, bhutchings-s/n/eUQHGBpZroRs9YW3xA

An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 net/netlink/af_netlink.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index bca50b9..6433489 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1769,6 +1769,9 @@ struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size,
 	if (ring->pg_vec == NULL)
 		goto out_put;
 
+	if (ring->frame_size - NL_MMAP_HDRLEN < size)
+		goto out_put;
+
 	skb = alloc_skb_head(gfp_mask);
 	if (skb == NULL)
 		goto err1;
@@ -1778,6 +1781,7 @@ struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size,
 	if (ring->pg_vec == NULL)
 		goto out_free;
 
+	/* check again under lock */
 	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
 	if (maxlen < size)
 		goto out_free;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o
  2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
  2013-11-30 12:21 ` [net-next 1/7] genl: Add genlmsg_new_unicast() for unicast message allocation Thomas Graf
       [not found] ` <cover.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2013-11-30 12:21 ` Thomas Graf
       [not found]   ` <9f7f05a726935c434d43d92c2908a997c403725b.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
  2013-12-04  5:45   ` Jesse Gross
  2013-11-30 12:21 ` [net-next 4/7] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.

Benchmark
  * pktgen -> ovs internal port
  * 5M pkts, 5M flows
  * 4 threads, 8 cores

Before:
Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
  74163pps 5339Mb/sec (5339736000bps) errors: 0
	+   2.98%     ovs-vswitchd  [k] copy_user_generic_string
	+   2.49%     ovs-vswitchd  [k] memcpy
	+   1.84%       kpktgend_2  [k] memcpy
	+   1.81%       kpktgend_1  [k] memcpy
	+   1.81%       kpktgend_3  [k] memcpy
	+   1.78%       kpktgend_0  [k] memcpy

After:
Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
  206358pps 14857Mb/sec (14857776000bps) errors: 0
	+   2.80%     ovs-vswitchd  [k] memcpy
	+   1.31%       kpktgend_2  [k] memcpy
	+   1.23%       kpktgend_0  [k] memcpy
	+   1.09%       kpktgend_1  [k] memcpy
	+   1.04%       kpktgend_3  [k] memcpy
	+   0.96%     ovs-vswitchd  [k] copy_user_generic_string

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/openvswitch/datapath.c | 56 ++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6f5e1dd..0ac9cde 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -402,6 +402,11 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 	struct sk_buff *nskb = NULL;
 	struct sk_buff *user_skb; /* to be queued to userspace */
 	struct nlattr *nla;
+	struct genl_info info = {
+		.dst_sk = net->genl_sock,
+		.snd_portid = upcall_info->portid,
+	};
+	size_t len;
 	int err;
 
 	if (vlan_tx_tag_present(skb)) {
@@ -422,7 +427,8 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		goto out;
 	}
 
-	user_skb = genlmsg_new(upcall_msg_size(skb, upcall_info->userdata), GFP_ATOMIC);
+	len = upcall_msg_size(skb, upcall_info->userdata);
+	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
 		goto out;
@@ -726,27 +732,30 @@ error:
 	return err;
 }
 
-static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow)
+static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow,
+					       struct genl_info *info)
 {
-	const struct sw_flow_actions *sf_acts;
+	size_t len;
 
-	sf_acts = ovsl_dereference(flow->sf_acts);
+	len = ovs_flow_cmd_msg_size(ovsl_dereference(flow->sf_acts));
 
-	return genlmsg_new(ovs_flow_cmd_msg_size(sf_acts), GFP_KERNEL);
+	return genlmsg_new_unicast(len, info, GFP_KERNEL);
 }
 
 static struct sk_buff *ovs_flow_cmd_build_info(struct sw_flow *flow,
 					       struct datapath *dp,
-					       u32 portid, u32 seq, u8 cmd)
+					       struct genl_info *info,
+					       u8 cmd)
 {
 	struct sk_buff *skb;
 	int retval;
 
-	skb = ovs_flow_cmd_alloc_info(flow);
+	skb = ovs_flow_cmd_alloc_info(flow, info);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	retval = ovs_flow_cmd_fill_info(flow, dp, skb, portid, seq, 0, cmd);
+	retval = ovs_flow_cmd_fill_info(flow, dp, skb, info->snd_portid,
+					info->snd_seq, 0, cmd);
 	BUG_ON(retval < 0);
 	return skb;
 }
@@ -835,8 +844,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 			goto err_flow_free;
 		}
 
-		reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-						info->snd_seq, OVS_FLOW_CMD_NEW);
+		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 	} else {
 		/* We found a matching flow. */
 		struct sw_flow_actions *old_acts;
@@ -864,8 +872,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		rcu_assign_pointer(flow->sf_acts, acts);
 		ovs_nla_free_flow_actions(old_acts);
 
-		reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-					       info->snd_seq, OVS_FLOW_CMD_NEW);
+		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 
 		/* Clear stats. */
 		if (a[OVS_FLOW_ATTR_CLEAR]) {
@@ -927,8 +934,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-					info->snd_seq, OVS_FLOW_CMD_NEW);
+	reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		goto unlock;
@@ -975,7 +981,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_flow_cmd_alloc_info(flow);
+	reply = ovs_flow_cmd_alloc_info(flow, info);
 	if (!reply) {
 		err = -ENOMEM;
 		goto unlock;
@@ -1127,17 +1133,17 @@ error:
 	return -EMSGSIZE;
 }
 
-static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp, u32 portid,
-					     u32 seq, u8 cmd)
+static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp,
+					     struct genl_info *info, u8 cmd)
 {
 	struct sk_buff *skb;
 	int retval;
 
-	skb = genlmsg_new(ovs_dp_cmd_msg_size(), GFP_KERNEL);
+	skb = genlmsg_new_unicast(ovs_dp_cmd_msg_size(), info, GFP_KERNEL);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	retval = ovs_dp_cmd_fill_info(dp, skb, portid, seq, 0, cmd);
+	retval = ovs_dp_cmd_fill_info(dp, skb, info->snd_portid, info->snd_seq, 0, cmd);
 	if (retval < 0) {
 		kfree_skb(skb);
 		return ERR_PTR(retval);
@@ -1232,8 +1238,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_destroy_ports_array;
 	}
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	err = PTR_ERR(reply);
 	if (IS_ERR(reply))
 		goto err_destroy_local_port;
@@ -1299,8 +1304,7 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_DEL);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_DEL);
 	err = PTR_ERR(reply);
 	if (IS_ERR(reply))
 		goto unlock;
@@ -1328,8 +1332,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		genl_set_err(&dp_datapath_genl_family, sock_net(skb->sk), 0,
@@ -1360,8 +1363,7 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		goto unlock;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 4/7] net: Export skb_zerocopy() to zerocopy from one skb to another
  2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
                   ` (2 preceding siblings ...)
  2013-11-30 12:21 ` [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o Thomas Graf
@ 2013-11-30 12:21 ` Thomas Graf
  2013-11-30 12:21 ` [net-next 5/7] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/skbuff.h               |  3 ++
 net/core/skbuff.c                    | 85 ++++++++++++++++++++++++++++++++++++
 net/netfilter/nfnetlink_queue_core.c | 59 ++-----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bec1cc7..7c48e2d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2345,6 +2345,9 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 		    struct pipe_inode_info *pipe, unsigned int len,
 		    unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
+unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
+void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+		  int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2718fed..55859cb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2122,6 +2122,91 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_and_csum_bits);
 
+ /**
+ *	skb_zerocopy_headlen - Calculate headroom needed for skb_zerocopy()
+ *	@from: source buffer
+ *
+ *	Calculates the amount of linear headroom needed in the 'to' skb passed
+ *	into skb_zerocopy().
+ */
+unsigned int
+skb_zerocopy_headlen(const struct sk_buff *from)
+{
+	unsigned int hlen = 0;
+
+	if (!from->head_frag ||
+	    skb_headlen(from) < L1_CACHE_BYTES ||
+	    skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+		hlen = skb_headlen(from);
+
+	if (skb_has_frag_list(from))
+		hlen = from->len;
+
+	return hlen;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
+
+/**
+ *	skb_zerocopy - Zero copy skb to skb
+ *	@to: destination buffer
+ *	@source: source buffer
+ *	@len: number of bytes to copy from source buffer
+ *	@hlen: size of linear headroom in destination buffer
+ *
+ *	Copies up to `len` bytes from `from` to `to` by creating references
+ *	to the frags in the source buffer.
+ *
+ *	The `hlen` as calculated by skb_zerocopy_headlen() specifies the
+ *	headroom in the `to` buffer.
+ */
+void
+skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+{
+	int i, j = 0;
+	int plen = 0; /* length of skb->head fragment */
+	struct page *page;
+	unsigned int offset;
+
+	BUG_ON(!from->head_frag && !hlen);
+
+	/* dont bother with small payloads */
+	if (len <= skb_tailroom(to)) {
+		skb_copy_bits(from, 0, skb_put(to, len), len);
+		return;
+	}
+
+	if (hlen) {
+		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+		len -= hlen;
+	} else {
+		plen = min_t(int, skb_headlen(from), len);
+		if (plen) {
+			page = virt_to_head_page(from->head);
+			offset = from->data - (unsigned char *)page_address(page);
+			__skb_fill_page_desc(to, 0, page, offset, plen);
+			get_page(page);
+			j = 1;
+			len -= plen;
+		}
+	}
+
+	to->truesize += len + plen;
+	to->len += len + plen;
+	to->data_len += len + plen;
+
+	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
+		if (!len)
+			break;
+		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
+		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
+		len -= skb_shinfo(to)->frags[j].size;
+		skb_frag_ref(to, j);
+		j++;
+	}
+	skb_shinfo(to)->nr_frags = j;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy);
+
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
 {
 	__wsum csum;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 21258cf..615ee12 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,51 +235,6 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
 	spin_unlock_bh(&queue->lock);
 }
 
-static void
-nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
-{
-	int i, j = 0;
-	int plen = 0; /* length of skb->head fragment */
-	struct page *page;
-	unsigned int offset;
-
-	/* dont bother with small payloads */
-	if (len <= skb_tailroom(to)) {
-		skb_copy_bits(from, 0, skb_put(to, len), len);
-		return;
-	}
-
-	if (hlen) {
-		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
-		len -= hlen;
-	} else {
-		plen = min_t(int, skb_headlen(from), len);
-		if (plen) {
-			page = virt_to_head_page(from->head);
-			offset = from->data - (unsigned char *)page_address(page);
-			__skb_fill_page_desc(to, 0, page, offset, plen);
-			get_page(page);
-			j = 1;
-			len -= plen;
-		}
-	}
-
-	to->truesize += len + plen;
-	to->len += len + plen;
-	to->data_len += len + plen;
-
-	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
-		if (!len)
-			break;
-		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
-		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
-		len -= skb_shinfo(to)->frags[j].size;
-		skb_frag_ref(to, j);
-		j++;
-	}
-	skb_shinfo(to)->nr_frags = j;
-}
-
 static int
 nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
 		      bool csum_verify)
@@ -304,7 +259,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 {
 	size_t size;
 	size_t data_len = 0, cap_len = 0;
-	int hlen = 0;
+	unsigned int hlen = 0;
 	struct sk_buff *skb;
 	struct nlattr *nla;
 	struct nfqnl_msg_packet_hdr *pmsg;
@@ -356,14 +311,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		if (data_len > entskb->len)
 			data_len = entskb->len;
 
-		if (!entskb->head_frag ||
-		    skb_headlen(entskb) < L1_CACHE_BYTES ||
-		    skb_shinfo(entskb)->nr_frags >= MAX_SKB_FRAGS)
-			hlen = skb_headlen(entskb);
-
-		if (skb_has_frag_list(entskb))
-			hlen = entskb->len;
-		hlen = min_t(int, data_len, hlen);
+		hlen = skb_zerocopy_headlen(entskb);
+		hlen = min_t(unsigned int, hlen, data_len);
 		size += sizeof(struct nlattr) + hlen;
 		cap_len = entskb->len;
 		break;
@@ -504,7 +453,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		nla->nla_type = NFQA_PAYLOAD;
 		nla->nla_len = nla_attr_size(data_len);
 
-		nfqnl_zcopy(skb, entskb, data_len, hlen);
+		skb_zerocopy(skb, entskb, data_len, hlen);
 	}
 
 	nlh->nlmsg_len = skb->len;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 5/7] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
  2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
                   ` (3 preceding siblings ...)
  2013-11-30 12:21 ` [net-next 4/7] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
@ 2013-11-30 12:21 ` Thomas Graf
  2013-11-30 12:21 ` [net-next 6/7] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
  2013-11-30 12:21 ` [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
  6 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/uapi/linux/openvswitch.h |  4 ++++
 net/openvswitch/datapath.c       | 14 ++++++++++++++
 net/openvswitch/datapath.h       |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d120f9f..07ef2c3 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -75,6 +75,7 @@ enum ovs_datapath_attr {
 	OVS_DP_ATTR_UPCALL_PID,		/* Netlink PID to receive upcalls */
 	OVS_DP_ATTR_STATS,		/* struct ovs_dp_stats */
 	OVS_DP_ATTR_MEGAFLOW_STATS,	/* struct ovs_dp_megaflow_stats */
+	OVS_DP_ATTR_USER_FEATURES,	/* OVS_DP_F_*  */
 	__OVS_DP_ATTR_MAX
 };
 
@@ -106,6 +107,9 @@ struct ovs_vport_stats {
 	__u64   tx_dropped;		/* no space available in linux  */
 };
 
+/* Allow last Netlink attribute to be unaligned */
+#define OVS_DP_F_UNALIGNED	(1 << 0)
+
 /* Fixed logical ports. */
 #define OVSP_LOCAL      ((__u32)0)
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0ac9cde..95d4424 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1067,6 +1067,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
 static const struct nla_policy datapath_policy[OVS_DP_ATTR_MAX + 1] = {
 	[OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
 	[OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
+	[OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
 };
 
 static struct genl_family dp_datapath_genl_family = {
@@ -1125,6 +1126,9 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 			&dp_megaflow_stats))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
+		goto nla_put_failure;
+
 	return genlmsg_end(skb, ovs_header);
 
 nla_put_failure:
@@ -1171,6 +1175,12 @@ static struct datapath *lookup_datapath(struct net *net,
 	return dp ? dp : ERR_PTR(-ENODEV);
 }
 
+static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
+{
+	if (a[OVS_DP_ATTR_USER_FEATURES])
+		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+}
+
 static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nlattr **a = info->attrs;
@@ -1229,6 +1239,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portid = nla_get_u32(a[OVS_DP_ATTR_UPCALL_PID]);
 
+	ovs_dp_change(dp, a);
+
 	vport = new_vport(&parms);
 	if (IS_ERR(vport)) {
 		err = PTR_ERR(vport);
@@ -1332,6 +1344,8 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
+	ovs_dp_change(dp, info->attrs);
+
 	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 4067ea4..193e2e0 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -88,6 +88,8 @@ struct datapath {
 	/* Network namespace ref. */
 	struct net *net;
 #endif
+
+	u32 user_features;
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 6/7] openvswitch: Drop user features if old user space attempted to create datapath
  2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
                   ` (4 preceding siblings ...)
  2013-11-30 12:21 ` [net-next 5/7] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
@ 2013-11-30 12:21 ` Thomas Graf
  2013-11-30 12:21 ` [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
  6 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/uapi/linux/openvswitch.h | 10 +++++++++-
 net/openvswitch/datapath.c       | 21 +++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 07ef2c3..970553c 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -40,7 +40,15 @@ struct ovs_header {
 
 #define OVS_DATAPATH_FAMILY  "ovs_datapath"
 #define OVS_DATAPATH_MCGROUP "ovs_datapath"
-#define OVS_DATAPATH_VERSION 0x1
+
+/* V2:
+ *   - API users are expected to provide OVS_DP_ATTR_USER_FEATURES
+ *     when creating the datapath.
+ */
+#define OVS_DATAPATH_VERSION 2
+
+/* First OVS datapath version to support features */
+#define OVS_DP_VER_FEATURES 2
 
 enum ovs_datapath_cmd {
 	OVS_DP_CMD_UNSPEC,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 95d4424..8eaa39a 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1175,6 +1175,18 @@ static struct datapath *lookup_datapath(struct net *net,
 	return dp ? dp : ERR_PTR(-ENODEV);
 }
 
+static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *info)
+{
+	struct datapath *dp;
+
+	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
+	if (!dp)
+		return;
+
+	WARN(dp->user_features, "Dropping previously announced user features\n");
+	dp->user_features = 0;
+}
+
 static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
 {
 	if (a[OVS_DP_ATTR_USER_FEATURES])
@@ -1247,6 +1259,15 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		if (err == -EBUSY)
 			err = -EEXIST;
 
+		if (err == -EEXIST) {
+			/* An outdated user space instance that does not understand
+			 * the concept of user_features has attempted to create a new
+			 * datapath and is likely to reuse it. Drop all user features.
+			 */
+			if (info->genlhdr->version < OVS_DP_VER_FEATURES)
+				ovs_dp_reset_user_features(skb, info);
+		}
+
 		goto err_destroy_ports_array;
 	}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall
  2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
                   ` (5 preceding siblings ...)
  2013-11-30 12:21 ` [net-next 6/7] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
@ 2013-11-30 12:21 ` Thomas Graf
  2013-12-04  5:43   ` Jesse Gross
  6 siblings, 1 reply; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 12:21 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Use of skb_zerocopy() can avoids the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if needed.

Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.

Cost of upcall is significantly reduced from:
+   7.48%       vhost-8471  [k] memcpy
+   5.57%     ovs-vswitchd  [k] memcpy
+   2.81%       vhost-8471  [k] csum_partial_copy_generic

to:
+   5.72%     ovs-vswitchd  [k] memcpy
+   3.32%       vhost-5153  [k] memcpy
+   0.68%       vhost-5153  [k] skb_zerocopy

(megaflows disabled)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/openvswitch/datapath.c | 54 +++++++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 8eaa39a..867edf1 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -108,10 +108,10 @@ int lockdep_ovsl_is_held(void)
 #endif
 
 static struct vport *new_vport(const struct vport_parms *);
-static int queue_gso_packets(struct net *, int dp_ifindex, struct sk_buff *,
-			     const struct dp_upcall_info *);
-static int queue_userspace_packet(struct net *, int dp_ifindex,
-				  struct sk_buff *,
+static int queue_gso_packets(struct datapath *, struct net *, int dp_ifindex,
+			     struct sk_buff *, const struct dp_upcall_info *);
+static int queue_userspace_packet(struct datapath *, struct net *,
+				  int dp_ifindex, struct sk_buff *,
 				  const struct dp_upcall_info *);
 
 /* Must be called with rcu_read_lock or ovs_mutex. */
@@ -292,9 +292,9 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 	}
 
 	if (!skb_is_gso(skb))
-		err = queue_userspace_packet(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(dp, ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
 	else
-		err = queue_gso_packets(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+		err = queue_gso_packets(dp, ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
 	if (err)
 		goto err;
 
@@ -310,7 +310,7 @@ err:
 	return err;
 }
 
-static int queue_gso_packets(struct net *net, int dp_ifindex,
+static int queue_gso_packets(struct datapath *dp, struct net *net, int dp_ifindex,
 			     struct sk_buff *skb,
 			     const struct dp_upcall_info *upcall_info)
 {
@@ -327,7 +327,7 @@ static int queue_gso_packets(struct net *net, int dp_ifindex,
 	/* Queue all of the segments. */
 	skb = segs;
 	do {
-		err = queue_userspace_packet(net, dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(dp, net, dp_ifindex, skb, upcall_info);
 		if (err)
 			break;
 
@@ -381,10 +381,11 @@ static size_t key_attr_size(void)
 }
 
 static size_t upcall_msg_size(const struct sk_buff *skb,
-			      const struct nlattr *userdata)
+			      const struct nlattr *userdata,
+			      unsigned int hdrlen)
 {
 	size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
-		+ nla_total_size(skb->len) /* OVS_PACKET_ATTR_PACKET */
+		+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
 		+ nla_total_size(key_attr_size()); /* OVS_PACKET_ATTR_KEY */
 
 	/* OVS_PACKET_ATTR_USERDATA */
@@ -394,8 +395,8 @@ static size_t upcall_msg_size(const struct sk_buff *skb,
 	return size;
 }
 
-static int queue_userspace_packet(struct net *net, int dp_ifindex,
-				  struct sk_buff *skb,
+static int queue_userspace_packet(struct datapath *dp, struct net *net,
+				  int dp_ifindex, struct sk_buff *skb,
 				  const struct dp_upcall_info *upcall_info)
 {
 	struct ovs_header *upcall;
@@ -407,6 +408,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		.snd_portid = upcall_info->portid,
 	};
 	size_t len;
+	unsigned int hlen;
 	int err;
 
 	if (vlan_tx_tag_present(skb)) {
@@ -427,7 +429,21 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		goto out;
 	}
 
-	len = upcall_msg_size(skb, upcall_info->userdata);
+	/* Complete checksum if needed */
+	if (skb->ip_summed == CHECKSUM_PARTIAL &&
+	    (err = skb_checksum_help(skb)))
+		goto out;
+
+	/* Older versions of OVS user space enforce alignment of the last
+	 * Netlink attribute to NLA_ALIGNTO which would require extensive
+	 * padding logic. Only perform zerocopy if padding is not required.
+	 */
+	if (dp->user_features & OVS_DP_F_UNALIGNED)
+		hlen = skb_zerocopy_headlen(skb);
+	else
+		hlen = skb->len;
+
+	len = upcall_msg_size(skb, upcall_info->userdata, hlen);
 	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
@@ -447,13 +463,17 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 			  nla_len(upcall_info->userdata),
 			  nla_data(upcall_info->userdata));
 
-	nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
+	/* Only reserve room for attribute header, packet data is added
+	 * in skb_zerocopy() */
+	if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
+		goto out;
+	nla->nla_len = nla_attr_size(skb->len);
+
+	skb_zerocopy(user_skb, skb, skb->len, hlen);
 
-	skb_copy_and_csum_dev(skb, nla_data(nla));
+	((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
 
-	genlmsg_end(user_skb, upcall);
 	err = genlmsg_unicast(net, user_skb, upcall_info->portid);
-
 out:
 	kfree_skb(nskb);
 	return err;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o
       [not found]   ` <9f7f05a726935c434d43d92c2908a997c403725b.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2013-11-30 12:35     ` Florian Westphal
  2013-11-30 14:04       ` Thomas Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2013-11-30 12:35 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, fleitner-H+wXaHxf7aLQT0dZR+AlfA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org> wrote:
> Benchmark
>   * pktgen -> ovs internal port
>   * 5M pkts, 5M flows
>   * 4 threads, 8 cores
> 
> Before:
> Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
>   74163pps 5339Mb/sec (5339736000bps) errors: 0
[..]
> After:
> Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
>   206358pps 14857Mb/sec (14857776000bps) errors: 0

I'm curious.  Is the 'old' value with skb_zerocopy() or without?
Does ovs-vswitchd 'read-access' the entire packet or just e.g. the
header?

I ask because in netfilter nfqueue tests I could not see any difference
between 'zerocopy' vs. mmap in the receive-path tests I made a while
back.

[ I have no objections to your patch, of course ].

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o
  2013-11-30 12:35     ` Florian Westphal
@ 2013-11-30 14:04       ` Thomas Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-30 14:04 UTC (permalink / raw)
  To: Florian Westphal
  Cc: jesse, davem, dev, netdev, dborkman, ffusco, fleitner,
	eric.dumazet, bhutchings

On 11/30/13 at 01:35pm, Florian Westphal wrote:
> Thomas Graf <tgraf@suug.ch> wrote:
> > Benchmark
> >   * pktgen -> ovs internal port
> >   * 5M pkts, 5M flows
> >   * 4 threads, 8 cores
> > 
> > Before:
> > Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
> >   74163pps 5339Mb/sec (5339736000bps) errors: 0
> [..]
> > After:
> > Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
> >   206358pps 14857Mb/sec (14857776000bps) errors: 0
> 
> I'm curious.  Is the 'old' value with skb_zerocopy() or without?
> Does ovs-vswitchd 'read-access' the entire packet or just e.g. the
> header?
> 
> I ask because in netfilter nfqueue tests I could not see any difference
> between 'zerocopy' vs. mmap in the receive-path tests I made a while
> back.

The numbers quoted do not involve any zerocopy at all. It's a pure
original vs. mmap comparison. I expect the numbers to grow even more
once we get rid of the immediate ofpbuf copy in user space.

Zerocopy as implemened right now does not provide much gains on
top of mmap. We have no way yet to inject the skb frags into the
ring so we are forced to copy the data. The mapped skb is simply a
buffer with a tremendous tailroom to the zerocopy code.

Eventually we can find a way to make the skb frags shared and link
them to the ring buffer and avoid the copy. This would especially
make sense if we add GSO support to openvswitch user space as nfqueue
provides and avoid the segmentation before the upcall.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall
  2013-11-30 12:21 ` [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
@ 2013-12-04  5:43   ` Jesse Gross
  2013-12-05 21:29     ` Thomas Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Jesse Gross @ 2013-12-04  5:43 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, dev@openvswitch.org, netdev, Daniel Borkmann,
	ffusco, fleitner, Eric Dumazet, Ben Hutchings

On Sat, Nov 30, 2013 at 4:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 8eaa39a..867edf1 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> +static int queue_gso_packets(struct datapath *, struct net *, int dp_ifindex,
> +                            struct sk_buff *, const struct dp_upcall_info *);
> +static int queue_userspace_packet(struct datapath *, struct net *,
> +                                 int dp_ifindex, struct sk_buff *,
>                                   const struct dp_upcall_info *);

Should we drop the dp_ifindex arguments from these functions? It
should be trivially derivable from struct datapath.

>  static size_t upcall_msg_size(const struct sk_buff *skb,
> -                             const struct nlattr *userdata)
> +                             const struct nlattr *userdata,
> +                             unsigned int hdrlen)

I think that 'skb' is now unused.

> @@ -427,7 +429,21 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
>                 goto out;
>         }
>
> -       len = upcall_msg_size(skb, upcall_info->userdata);
> +       /* Complete checksum if needed */
> +       if (skb->ip_summed == CHECKSUM_PARTIAL &&
> +           (err = skb_checksum_help(skb)))
> +               goto out;

I think that we can remove the hardware features argument to
__skb_gso_segment() in queue_gso_packet(). It was there to take
advantage of the copy and checksum optimization but that's no longer
present.

> @@ -447,13 +463,17 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
>                           nla_len(upcall_info->userdata),
>                           nla_data(upcall_info->userdata));
>
> -       nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
> +       /* Only reserve room for attribute header, packet data is added
> +        * in skb_zerocopy() */
> +       if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
> +               goto out;

Does this initialized 'err' on failure?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o
  2013-11-30 12:21 ` [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o Thomas Graf
       [not found]   ` <9f7f05a726935c434d43d92c2908a997c403725b.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2013-12-04  5:45   ` Jesse Gross
  1 sibling, 0 replies; 13+ messages in thread
From: Jesse Gross @ 2013-12-04  5:45 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, dev@openvswitch.org, netdev, Daniel Borkmann,
	ffusco, fleitner, Eric Dumazet, Ben Hutchings

On Sat, Nov 30, 2013 at 4:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> Use memory mapped Netlink i/o for all unicast openvswitch
> communication if a ring has been set up.
>
> Benchmark
>   * pktgen -> ovs internal port
>   * 5M pkts, 5M flows
>   * 4 threads, 8 cores
>
> Before:
> Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
>   74163pps 5339Mb/sec (5339736000bps) errors: 0
>         +   2.98%     ovs-vswitchd  [k] copy_user_generic_string
>         +   2.49%     ovs-vswitchd  [k] memcpy
>         +   1.84%       kpktgend_2  [k] memcpy
>         +   1.81%       kpktgend_1  [k] memcpy
>         +   1.81%       kpktgend_3  [k] memcpy
>         +   1.78%       kpktgend_0  [k] memcpy
>
> After:
> Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
>   206358pps 14857Mb/sec (14857776000bps) errors: 0
>         +   2.80%     ovs-vswitchd  [k] memcpy
>         +   1.31%       kpktgend_2  [k] memcpy
>         +   1.23%       kpktgend_0  [k] memcpy
>         +   1.09%       kpktgend_1  [k] memcpy
>         +   1.04%       kpktgend_3  [k] memcpy
>         +   0.96%     ovs-vswitchd  [k] copy_user_generic_string
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> Reviewed-by: Daniel Borkmann <dborkman@redhat.com>

First three patches applied, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall
  2013-12-04  5:43   ` Jesse Gross
@ 2013-12-05 21:29     ` Thomas Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-12-05 21:29 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, dev@openvswitch.org, netdev, Daniel Borkmann,
	ffusco, fleitner, Eric Dumazet, Ben Hutchings

On 12/03/13 at 09:43pm, Jesse Gross wrote:

Thanks for merging a first set of patches

> On Sat, Nov 30, 2013 at 4:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> > index 8eaa39a..867edf1 100644
> > --- a/net/openvswitch/datapath.c
> > +++ b/net/openvswitch/datapath.c
> > +static int queue_gso_packets(struct datapath *, struct net *, int dp_ifindex,
> > +                            struct sk_buff *, const struct dp_upcall_info *);
> > +static int queue_userspace_packet(struct datapath *, struct net *,
> > +                                 int dp_ifindex, struct sk_buff *,
> >                                   const struct dp_upcall_info *);
> 
> Should we drop the dp_ifindex arguments from these functions? It
> should be trivially derivable from struct datapath.

Sounds good
> 
> >  static size_t upcall_msg_size(const struct sk_buff *skb,
> > -                             const struct nlattr *userdata)
> > +                             const struct nlattr *userdata,
> > +                             unsigned int hdrlen)
> 
> I think that 'skb' is now unused.

Right

> 
> > @@ -427,7 +429,21 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
> >                 goto out;
> >         }
> >
> > -       len = upcall_msg_size(skb, upcall_info->userdata);
> > +       /* Complete checksum if needed */
> > +       if (skb->ip_summed == CHECKSUM_PARTIAL &&
> > +           (err = skb_checksum_help(skb)))
> > +               goto out;
> 
> I think that we can remove the hardware features argument to
> __skb_gso_segment() in queue_gso_packet(). It was there to take
> advantage of the copy and checksum optimization but that's no longer
> present.

In the case of a mapped skb we could still make use of it as we copy
the packet into the headroom. It would mean integrating the logic
into skb_zerocopy() though so I'd say we stay with what you propose.
It shouldn't be the fast path anyway.

> > @@ -447,13 +463,17 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
> >                           nla_len(upcall_info->userdata),
> >                           nla_data(upcall_info->userdata));
> >
> > -       nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
> > +       /* Only reserve room for attribute header, packet data is added
> > +        * in skb_zerocopy() */
> > +       if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
> > +               goto out;
> 
> Does this initialized 'err' on failure?

Good catch, I'll fix this up as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-12-05 21:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-30 12:21 [PATCH net-next 0/8 v8] Open vSwitch upcall optimiziations Thomas Graf
2013-11-30 12:21 ` [net-next 1/7] genl: Add genlmsg_new_unicast() for unicast message allocation Thomas Graf
     [not found] ` <cover.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
2013-11-30 12:21   ` [net-next 2/7] netlink: Avoid netlink mmap alloc if msg size exceeds frame size Thomas Graf
2013-11-30 12:21 ` [net-next 3/7] openvswitch: Enable memory mapped Netlink i/o Thomas Graf
     [not found]   ` <9f7f05a726935c434d43d92c2908a997c403725b.1385813891.git.tgraf-G/eBtMaohhA@public.gmane.org>
2013-11-30 12:35     ` Florian Westphal
2013-11-30 14:04       ` Thomas Graf
2013-12-04  5:45   ` Jesse Gross
2013-11-30 12:21 ` [net-next 4/7] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
2013-11-30 12:21 ` [net-next 5/7] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
2013-11-30 12:21 ` [net-next 6/7] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
2013-11-30 12:21 ` [net-next 7/7] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
2013-12-04  5:43   ` Jesse Gross
2013-12-05 21:29     ` Thomas Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).