netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums
@ 2013-04-19 14:58 Florian Westphal
  2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
  To: netfilter-devel

Hello Pablo,

here is V2 of the gso avoidance patchset for nfnetlink_queue.

With these patches, userspace can now instruct the kernel that it is gso/gro
aware and can handle "invalid" checksums that appear in packet headers.

For old userspace, nothing is changed: the kernel segments gso skbs
and adjusts checksums.

To avoid gso/checksum fixup overhead, userspace applications must set the
new NFQA_CFG_F_GSO config flag via NFQA_CFG_FLAGS attribute.

Then, for every packet received, userspace needs to check for the presence
of the new NFQA_SKB_INFO attribute.  If it exists, userspace needs to test
NFQA_SKB_CSUMNOTREADY bit.  If set, this means that userspace
must NOT very packet checksums, since they will be fixed later on
by the kernel.

The other bit is
NFQA_SKB_GSO, which could be used for statistics, or to determine when
packet size exceeds mtu.

Feedback welcome.

Update for libnetfilter_queue (including example program/documentation)
will follow later.

The following changes since commit d37d696804a83479f240b397670a07ccb53a7417:

  netfilter: xt_rpfilter: depend on raw or mangle table (2013-04-19 00:22:55 +0200)

are available in the git repository at:
  git://git.breakpoint.cc/fw/nf-next.git nfqueue_gso_avoidance_06

Florian Westphal (5):
      netfilter: nf_queue: move device refcount bump to extra function
      netfilter: nfnetlink_queue: avoid peer_portid test
      netfilter: move skb_gso_segment into nfnetlink_queue module
      netfilter: nfnetlink_queue: add skb info attribute
      netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup

Changes since V1:
 - fix OOPS if CONFIG_BRIDGE_NETFILTER=y and old non-gso userspace listener is killed
 - only add NFQA_SKB_INFO when skb is gso or CHECKSUM_PARTIAL

 include/net/netfilter/nf_queue.h               |    6 +
 include/uapi/linux/netfilter/nfnetlink_queue.h |   10 ++-
 net/netfilter/nf_queue.c                       |  143 +++++--------------
 net/netfilter/nfnetlink_queue_core.c           |  180 +++++++++++++++++++++---
 4 files changed, 209 insertions(+), 130 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function
  2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
  2013-04-27 17:46   ` Pablo Neira Ayuso
  2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

required by future patch that will need to duplicate the
nf_queue_entry, bumping refcounts of the copy.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_queue.c |   49 ++++++++++++++++++++++++++-------------------
 1 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5ccf01e..1d91e77 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -66,6 +66,33 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 	module_put(entry->elem->owner);
 }
 
+/* Bump dev refs so they don't vanish while packet is out */
+static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
+{
+	if (!try_module_get(entry->elem->owner))
+		return false;
+
+	if (entry->indev)
+		dev_hold(entry->indev);
+	if (entry->outdev)
+		dev_hold(entry->outdev);
+#ifdef CONFIG_BRIDGE_NETFILTER
+	if (entry->skb->nf_bridge) {
+		struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
+		struct net_device *physdev;
+
+		physdev = nf_bridge->physindev;
+		if (physdev)
+			dev_hold(physdev);
+		physdev = nf_bridge->physoutdev;
+		if (physdev)
+			dev_hold(physdev);
+	}
+#endif
+
+	return true;
+}
+
 /*
  * Any packet that leaves via this function must come back
  * through nf_reinject().
@@ -80,10 +107,6 @@ static int __nf_queue(struct sk_buff *skb,
 {
 	int status = -ENOENT;
 	struct nf_queue_entry *entry = NULL;
-#ifdef CONFIG_BRIDGE_NETFILTER
-	struct net_device *physindev;
-	struct net_device *physoutdev;
-#endif
 	const struct nf_afinfo *afinfo;
 	const struct nf_queue_handler *qh;
 
@@ -116,26 +139,10 @@ static int __nf_queue(struct sk_buff *skb,
 		.okfn	= okfn,
 	};
 
-	/* If it's going away, ignore hook. */
-	if (!try_module_get(entry->elem->owner)) {
+	if (!nf_queue_entry_get_refs(entry)) {
 		status = -ECANCELED;
 		goto err_unlock;
 	}
-	/* Bump dev refs so they don't vanish while packet is out */
-	if (indev)
-		dev_hold(indev);
-	if (outdev)
-		dev_hold(outdev);
-#ifdef CONFIG_BRIDGE_NETFILTER
-	if (skb->nf_bridge) {
-		physindev = skb->nf_bridge->physindev;
-		if (physindev)
-			dev_hold(physindev);
-		physoutdev = skb->nf_bridge->physoutdev;
-		if (physoutdev)
-			dev_hold(physoutdev);
-	}
-#endif
 	skb_dst_force(skb);
 	afinfo->saveroute(skb, entry);
 	status = qh->outfn(entry, queuenum);
-- 
1.7.8.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test
  2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
  2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
  2013-04-26  1:19   ` Pablo Neira Ayuso
  2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

The portid is the netlink port id of the skb that created the queue.

Add test to ensure the portid cannot be 0 at create time, and
the check at enqueue time will always be false.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nfnetlink_queue_core.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 5e280b3..94e2e4f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num,
 	unsigned int h;
 	int err;
 
+	if (portid == 0)
+		return ERR_PTR(-EINVAL);
+
 	spin_lock(&q->instances_lock);
 	if (instance_lookup(q, queue_num)) {
 		err = -EEXIST;
@@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 	}
 	spin_lock_bh(&queue->lock);
 
-	if (!queue->peer_portid) {
-		err = -EINVAL;
-		goto err_out_free_nskb;
-	}
 	if (queue->queue_total >= queue->queue_maxlen) {
 		if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
 			failopen = 1;
-- 
1.7.8.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module
  2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
  2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
  2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
  2013-04-27 17:46   ` Pablo Neira Ayuso
  2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
  2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
  4 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

There is nothing wrong with the current code.

However, skb_gso_segment is expensive, so it would be nice
if we could avoid it in the future.

Since userspace must be prepared to receive larger-than-mtu-packets
(which will also have incorrect l3/l4 checksums), we cannot simply
remove it.

The plan is to add a per-queue feature flag that userspace can
set when binding the queue.

The problem is that in nf_queue, we only have a queue number,
not the queue context/configuration settings.

This patch should have no impact other than the skb_gso_segment
call now being in a function that has access to the queue config data.

The size new size attribute in nf_queue_entry is needed so
nfnetlink_queue can duplicate the entry of the gso skb
when segmenting the skb while also copying the route key.

Next step: avoid skb_gso_segment when queue config says so.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
Change since V2:
  - don't mess up entry->skb assignment when duplicating
    nf_queue_entry to avoid OOPS with CONFIG_BRIDGE_NETFILTER=y

 include/net/netfilter/nf_queue.h     |    6 ++
 net/netfilter/nf_queue.c             |   96 ++-------------------
 net/netfilter/nfnetlink_queue_core.c |  154 ++++++++++++++++++++++++++++++----
 3 files changed, 152 insertions(+), 104 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index fb1c0be..aaba4bb 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -9,10 +9,13 @@ struct nf_queue_entry {
 
 	struct nf_hook_ops	*elem;
 	u_int8_t		pf;
+	u16			size; /* sizeof(entry) + saved route keys */
 	unsigned int		hook;
 	struct net_device	*indev;
 	struct net_device	*outdev;
 	int			(*okfn)(struct sk_buff *);
+
+	/* extra space to store route keys */
 };
 
 #define nf_queue_entry_reroute(x) ((void *)x + sizeof(struct nf_queue_entry))
@@ -27,4 +30,7 @@ void nf_register_queue_handler(const struct nf_queue_handler *qh);
 void nf_unregister_queue_handler(void);
 extern void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict);
 
+bool nf_queue_entry_get_refs(struct nf_queue_entry *entry);
+void nf_queue_entry_release_refs(struct nf_queue_entry *entry);
+
 #endif /* _NF_QUEUE_H */
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 1d91e77..5d24b1f 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -45,7 +45,7 @@ void nf_unregister_queue_handler(void)
 }
 EXPORT_SYMBOL(nf_unregister_queue_handler);
 
-static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
+void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 {
 	/* Release those devices we held, or Alexey will kill me. */
 	if (entry->indev)
@@ -65,9 +65,10 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
 	/* Drop reference to owner of hook which queued us. */
 	module_put(entry->elem->owner);
 }
+EXPORT_SYMBOL_GPL(nf_queue_entry_release_refs);
 
 /* Bump dev refs so they don't vanish while packet is out */
-static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
+bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 {
 	if (!try_module_get(entry->elem->owner))
 		return false;
@@ -92,12 +93,13 @@ static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
 
 	return true;
 }
+EXPORT_SYMBOL_GPL(nf_queue_entry_get_refs);
 
 /*
  * Any packet that leaves via this function must come back
  * through nf_reinject().
  */
-static int __nf_queue(struct sk_buff *skb,
+int nf_queue(struct sk_buff *skb,
 		      struct nf_hook_ops *elem,
 		      u_int8_t pf, unsigned int hook,
 		      struct net_device *indev,
@@ -137,6 +139,7 @@ static int __nf_queue(struct sk_buff *skb,
 		.indev	= indev,
 		.outdev	= outdev,
 		.okfn	= okfn,
+		.size	= sizeof(*entry) + afinfo->route_key_size,
 	};
 
 	if (!nf_queue_entry_get_refs(entry)) {
@@ -163,87 +166,6 @@ err:
 	return status;
 }
 
-#ifdef CONFIG_BRIDGE_NETFILTER
-/* When called from bridge netfilter, skb->data must point to MAC header
- * before calling skb_gso_segment(). Else, original MAC header is lost
- * and segmented skbs will be sent to wrong destination.
- */
-static void nf_bridge_adjust_skb_data(struct sk_buff *skb)
-{
-	if (skb->nf_bridge)
-		__skb_push(skb, skb->network_header - skb->mac_header);
-}
-
-static void nf_bridge_adjust_segmented_data(struct sk_buff *skb)
-{
-	if (skb->nf_bridge)
-		__skb_pull(skb, skb->network_header - skb->mac_header);
-}
-#else
-#define nf_bridge_adjust_skb_data(s) do {} while (0)
-#define nf_bridge_adjust_segmented_data(s) do {} while (0)
-#endif
-
-int nf_queue(struct sk_buff *skb,
-	     struct nf_hook_ops *elem,
-	     u_int8_t pf, unsigned int hook,
-	     struct net_device *indev,
-	     struct net_device *outdev,
-	     int (*okfn)(struct sk_buff *),
-	     unsigned int queuenum)
-{
-	struct sk_buff *segs;
-	int err = -EINVAL;
-	unsigned int queued;
-
-	if (!skb_is_gso(skb))
-		return __nf_queue(skb, elem, pf, hook, indev, outdev, okfn,
-				  queuenum);
-
-	switch (pf) {
-	case NFPROTO_IPV4:
-		skb->protocol = htons(ETH_P_IP);
-		break;
-	case NFPROTO_IPV6:
-		skb->protocol = htons(ETH_P_IPV6);
-		break;
-	}
-
-	nf_bridge_adjust_skb_data(skb);
-	segs = skb_gso_segment(skb, 0);
-	/* Does not use PTR_ERR to limit the number of error codes that can be
-	 * returned by nf_queue.  For instance, callers rely on -ECANCELED to mean
-	 * 'ignore this hook'.
-	 */
-	if (IS_ERR(segs))
-		goto out_err;
-	queued = 0;
-	err = 0;
-	do {
-		struct sk_buff *nskb = segs->next;
-
-		segs->next = NULL;
-		if (err == 0) {
-			nf_bridge_adjust_segmented_data(segs);
-			err = __nf_queue(segs, elem, pf, hook, indev,
-					   outdev, okfn, queuenum);
-		}
-		if (err == 0)
-			queued++;
-		else
-			kfree_skb(segs);
-		segs = nskb;
-	} while (segs);
-
-	if (queued) {
-		kfree_skb(skb);
-		return 0;
-	}
-  out_err:
-	nf_bridge_adjust_segmented_data(skb);
-	return err;
-}
-
 void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 {
 	struct sk_buff *skb = entry->skb;
@@ -283,9 +205,9 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 		local_bh_enable();
 		break;
 	case NF_QUEUE:
-		err = __nf_queue(skb, elem, entry->pf, entry->hook,
-				 entry->indev, entry->outdev, entry->okfn,
-				 verdict >> NF_VERDICT_QBITS);
+		err = nf_queue(skb, elem, entry->pf, entry->hook,
+				entry->indev, entry->outdev, entry->okfn,
+				verdict >> NF_VERDICT_QBITS);
 		if (err < 0) {
 			if (err == -ECANCELED)
 				goto next_hook;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 94e2e4f..38c8a7a 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -479,28 +479,13 @@ nla_put_failure:
 }
 
 static int
-nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
+__nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
+			struct nf_queue_entry *entry)
 {
 	struct sk_buff *nskb;
-	struct nfqnl_instance *queue;
 	int err = -ENOBUFS;
 	__be32 *packet_id_ptr;
 	int failopen = 0;
-	struct net *net = dev_net(entry->indev ?
-				  entry->indev : entry->outdev);
-	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
-
-	/* rcu_read_lock()ed by nf_hook_slow() */
-	queue = instance_lookup(q, queuenum);
-	if (!queue) {
-		err = -ESRCH;
-		goto err_out;
-	}
-
-	if (queue->copy_mode == NFQNL_COPY_NONE) {
-		err = -EINVAL;
-		goto err_out;
-	}
 
 	nskb = nfqnl_build_packet_message(queue, entry, &packet_id_ptr);
 	if (nskb == NULL) {
@@ -545,6 +530,141 @@ err_out:
 	return err;
 }
 
+static struct nf_queue_entry *
+nf_queue_entry_dup(struct nf_queue_entry *e)
+{
+	struct nf_queue_entry *entry = kmemdup(e, e->size, GFP_ATOMIC);
+	if (entry) {
+		if (nf_queue_entry_get_refs(entry))
+			return entry;
+		kfree(entry);
+	}
+	return NULL;
+}
+
+#ifdef CONFIG_BRIDGE_NETFILTER
+/* When called from bridge netfilter, skb->data must point to MAC header
+ * before calling skb_gso_segment(). Else, original MAC header is lost
+ * and segmented skbs will be sent to wrong destination.
+ */
+static void nf_bridge_adjust_skb_data(struct sk_buff *skb)
+{
+	if (skb->nf_bridge)
+		__skb_push(skb, skb->network_header - skb->mac_header);
+}
+
+static void nf_bridge_adjust_segmented_data(struct sk_buff *skb)
+{
+	if (skb->nf_bridge)
+		__skb_pull(skb, skb->network_header - skb->mac_header);
+}
+#else
+#define nf_bridge_adjust_skb_data(s) do {} while (0)
+#define nf_bridge_adjust_segmented_data(s) do {} while (0)
+#endif
+
+static void free_entry(struct nf_queue_entry *entry)
+{
+	nf_queue_entry_release_refs(entry);
+	kfree(entry);
+}
+
+static int
+__nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue,
+			   struct sk_buff *skb, struct nf_queue_entry *entry)
+{
+	int ret = -ENOMEM;
+	struct nf_queue_entry *entry_seg;
+
+	nf_bridge_adjust_segmented_data(skb);
+
+	if (skb->next == NULL) { /* last packet, no need to copy entry */
+		struct sk_buff *gso_skb = entry->skb;
+		entry->skb = skb;
+		ret = __nfqnl_enqueue_packet(net, queue, entry);
+		if (ret)
+			entry->skb = gso_skb;
+		return ret;
+	}
+
+	skb->next = NULL;
+
+	entry_seg = nf_queue_entry_dup(entry);
+	if (entry_seg) {
+		entry_seg->skb = skb;
+		ret = __nfqnl_enqueue_packet(net, queue, entry_seg);
+		if (ret)
+			free_entry(entry_seg);
+	}
+	return ret;
+}
+
+static int
+nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
+{
+	unsigned int queued;
+	struct nfqnl_instance *queue;
+	struct sk_buff *skb, *segs;
+	int err = -ENOBUFS;
+	struct net *net = dev_net(entry->indev ?
+				  entry->indev : entry->outdev);
+	struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+
+	/* rcu_read_lock()ed by nf_hook_slow() */
+	queue = instance_lookup(q, queuenum);
+	if (!queue)
+		return -ESRCH;
+
+	if (queue->copy_mode == NFQNL_COPY_NONE)
+		return -EINVAL;
+
+	if (!skb_is_gso(entry->skb))
+		return __nfqnl_enqueue_packet(net, queue, entry);
+
+	skb = entry->skb;
+
+	switch (entry->pf) {
+	case NFPROTO_IPV4:
+		skb->protocol = htons(ETH_P_IP);
+		break;
+	case NFPROTO_IPV6:
+		skb->protocol = htons(ETH_P_IPV6);
+		break;
+	}
+
+	nf_bridge_adjust_skb_data(skb);
+	segs = skb_gso_segment(skb, 0);
+	/* Does not use PTR_ERR to limit the number of error codes that can be
+	 * returned by nf_queue.  For instance, callers rely on -ECANCELED to
+	 * mean 'ignore this hook'.
+	 */
+	if (IS_ERR(segs))
+		goto out_err;
+	queued = 0;
+	err = 0;
+	do {
+		struct sk_buff *nskb = segs->next;
+		if (err == 0)
+			err = __nfqnl_enqueue_packet_gso(net, queue,
+							segs, entry);
+		if (err == 0)
+			queued++;
+		else
+			kfree_skb(segs);
+		segs = nskb;
+	} while (segs);
+
+	if (queued) {
+		if (err) /* some segments are already queued */
+			free_entry(entry);
+		kfree_skb(skb);
+		return 0;
+	}
+ out_err:
+	nf_bridge_adjust_segmented_data(skb);
+	return err;
+}
+
 static int
 nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff)
 {
-- 
1.7.8.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute
  2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
                   ` (2 preceding siblings ...)
  2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
  2013-04-27 17:46   ` Pablo Neira Ayuso
  2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
  4 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Once we allow userspace to receive gso/gro packets, userspace
needs to be able to determine when checksums appear to be
broken, but are not.

NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
later, pretend they are ok'.

NFQA_SKB_GSO could be used for statistics, or to determine when
packet size exceeds mtu.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Change since v2:
 add NFQA_SKB_INFO only if value is nonzero.

 include/uapi/linux/netfilter/nfnetlink_queue.h |    7 +++++++
 net/netfilter/nfnetlink_queue_core.c           |   16 ++++++++++++++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index 70ec8c2..0069da3 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -45,6 +45,7 @@ enum nfqnl_attr_type {
 	NFQA_CT,			/* nf_conntrack_netlink.h */
 	NFQA_CT_INFO,			/* enum ip_conntrack_info */
 	NFQA_CAP_LEN,			/* __u32 length of captured packet */
+	NFQA_SKB_INFO,			/* __u32 skb meta information */
 
 	__NFQA_MAX
 };
@@ -98,4 +99,10 @@ enum nfqnl_attr_config {
 #define NFQA_CFG_F_CONNTRACK			(1 << 1)
 #define NFQA_CFG_F_MAX				(1 << 2)
 
+/* flags for NFQA_SKB_INFO */
+/* packet appears to have wrong checksums, but they are ok */
+#define NFQA_SKB_CSUMNOTREADY (1 << 0)
+/* packet is GSO (i.e., exceeds device mtu) */
+#define NFQA_SKB_GSO (1 << 1)
+
 #endif /* _NFNETLINK_QUEUE_H */
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 38c8a7a..c927a7b 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -275,6 +275,18 @@ nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
 	skb_shinfo(to)->nr_frags = j;
 }
 
+static int nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet)
+{
+	__u32 flags = 0;
+
+	if (packet->ip_summed == CHECKSUM_PARTIAL)
+		flags = NFQA_SKB_CSUMNOTREADY;
+	if (skb_is_gso(packet))
+		flags |= NFQA_SKB_GSO;
+
+	return flags ? nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)) : 0;
+}
+
 static struct sk_buff *
 nfqnl_build_packet_message(struct nfqnl_instance *queue,
 			   struct nf_queue_entry *entry,
@@ -304,6 +316,7 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 #endif
 		+ nla_total_size(sizeof(u_int32_t))	/* mark */
 		+ nla_total_size(sizeof(struct nfqnl_msg_packet_hw))
+		+ nla_total_size(sizeof(u_int32_t))	/* skbinfo */
 		+ nla_total_size(sizeof(u_int32_t));	/* cap_len */
 
 	if (entskb->tstamp.tv64)
@@ -456,6 +469,9 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 	if (cap_len > 0 && nla_put_be32(skb, NFQA_CAP_LEN, htonl(cap_len)))
 		goto nla_put_failure;
 
+	if (nfqnl_put_packet_info(skb, entskb))
+		goto nla_put_failure;
+
 	if (data_len) {
 		struct nlattr *nla;
 
-- 
1.7.8.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup
  2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
                   ` (3 preceding siblings ...)
  2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
  2013-04-27 17:46   ` Pablo Neira Ayuso
  4 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Userspace can now indicate that it can cope with larger-than-mtu sized
packets and packets that have invalid ipv4/tcp checksums.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/uapi/linux/netfilter/nfnetlink_queue.h |    3 ++-
 net/netfilter/nfnetlink_queue_core.c           |    5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index 0069da3..a2308ae 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -97,7 +97,8 @@ enum nfqnl_attr_config {
 /* Flags for NFQA_CFG_FLAGS */
 #define NFQA_CFG_F_FAIL_OPEN			(1 << 0)
 #define NFQA_CFG_F_CONNTRACK			(1 << 1)
-#define NFQA_CFG_F_MAX				(1 << 2)
+#define NFQA_CFG_F_GSO				(1 << 2)
+#define NFQA_CFG_F_MAX				(1 << 3)
 
 /* flags for NFQA_SKB_INFO */
 /* packet appears to have wrong checksums, but they are ok */
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index c927a7b..f1e3c90c 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -330,7 +330,8 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 		break;
 
 	case NFQNL_COPY_PACKET:
-		if (entskb->ip_summed == CHECKSUM_PARTIAL &&
+		if (!(queue->flags & NFQA_CFG_F_GSO) &&
+		    entskb->ip_summed == CHECKSUM_PARTIAL &&
 		    skb_checksum_help(entskb))
 			return NULL;
 
@@ -634,7 +635,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
 	if (queue->copy_mode == NFQNL_COPY_NONE)
 		return -EINVAL;
 
-	if (!skb_is_gso(entry->skb))
+	if ((queue->flags & NFQA_CFG_F_GSO) || !skb_is_gso(entry->skb))
 		return __nfqnl_enqueue_packet(net, queue, entry);
 
 	skb = entry->skb;
-- 
1.7.8.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test
  2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
@ 2013-04-26  1:19   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-26  1:19 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Hi Florian,

On Fri, Apr 19, 2013 at 04:58:24PM +0200, Florian Westphal wrote:
> The portid is the netlink port id of the skb that created the queue.
> 
> Add test to ensure the portid cannot be 0 at create time, and
> the check at enqueue time will always be false.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nfnetlink_queue_core.c |    7 +++----
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
> index 5e280b3..94e2e4f 100644
> --- a/net/netfilter/nfnetlink_queue_core.c
> +++ b/net/netfilter/nfnetlink_queue_core.c
> @@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num,
>  	unsigned int h;
>  	int err;
>  
> +	if (portid == 0)
> +		return ERR_PTR(-EINVAL);

The instance_create function takes NETLINK_CB(skb).portid. IIRC,
netlink always sets that for us to non zero, so I think we would never
hit that error.

> +
>  	spin_lock(&q->instances_lock);
>  	if (instance_lookup(q, queue_num)) {
>  		err = -EEXIST;
> @@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
>  	}
>  	spin_lock_bh(&queue->lock);
>  
> -	if (!queue->peer_portid) {
> -		err = -EINVAL;
> -		goto err_out_free_nskb;
> -	}

I'm trying to remember under what circunstances the queue portid can
be left unset, but I don't find any. Will check again this tomorrow
with fresh mind.

>  	if (queue->queue_total >= queue->queue_maxlen) {
>  		if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
>  			failopen = 1;
> -- 
> 1.7.8.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function
  2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
@ 2013-04-27 17:46   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 19, 2013 at 04:58:23PM +0200, Florian Westphal wrote:
> required by future patch that will need to duplicate the
> nf_queue_entry, bumping refcounts of the copy.

Applied, thanks Florian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module
  2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
@ 2013-04-27 17:46   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 19, 2013 at 04:58:25PM +0200, Florian Westphal wrote:
> There is nothing wrong with the current code.
> 
> However, skb_gso_segment is expensive, so it would be nice
> if we could avoid it in the future.
> 
> Since userspace must be prepared to receive larger-than-mtu-packets
> (which will also have incorrect l3/l4 checksums), we cannot simply
> remove it.
> 
> The plan is to add a per-queue feature flag that userspace can
> set when binding the queue.
> 
> The problem is that in nf_queue, we only have a queue number,
> not the queue context/configuration settings.
> 
> This patch should have no impact other than the skb_gso_segment
> call now being in a function that has access to the queue config data.
> 
> The size new size attribute in nf_queue_entry is needed so
> nfnetlink_queue can duplicate the entry of the gso skb
> when segmenting the skb while also copying the route key.
> 
> Next step: avoid skb_gso_segment when queue config says so.

Applied, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute
  2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
@ 2013-04-27 17:46   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 19, 2013 at 04:58:26PM +0200, Florian Westphal wrote:
> Once we allow userspace to receive gso/gro packets, userspace
> needs to be able to determine when checksums appear to be
> broken, but are not.
> 
> NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
> later, pretend they are ok'.
> 
> NFQA_SKB_GSO could be used for statistics, or to determine when
> packet size exceeds mtu.

Applied, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup
  2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
@ 2013-04-27 17:46   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Apr 19, 2013 at 04:58:27PM +0200, Florian Westphal wrote:
> Userspace can now indicate that it can cope with larger-than-mtu sized
> packets and packets that have invalid ipv4/tcp checksums.

And also applied, thanks Florian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-04-27 17:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
2013-04-27 17:46   ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
2013-04-26  1:19   ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
2013-04-27 17:46   ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
2013-04-27 17:46   ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
2013-04-27 17:46   ` Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).